{"id":7083,"date":"2025-09-10T10:09:34","date_gmt":"2025-09-10T15:09:34","guid":{"rendered":"https:\/\/frontendmasters.com\/blog\/?p=7083"},"modified":"2025-09-10T10:09:35","modified_gmt":"2025-09-10T15:09:35","slug":"choosing-the-right-model-in-cursor","status":"publish","type":"post","link":"https:\/\/frontendmasters.com\/blog\/choosing-the-right-model-in-cursor\/","title":{"rendered":"Choosing the Right Model in Cursor"},"content":{"rendered":"\n<p>A number of the big players are coming out with their own AI coding assistants (e.g., <a href=\"https:\/\/openai.com\/codex\/\">OpenAI\u2019s Codex<\/a>, <a href=\"https:\/\/www.anthropic.com\/claude-code\">Anthropic\u2019s Claude Code<\/a>, and <a href=\"https:\/\/cloud.google.com\/gemini\/docs\/codeassist\/gemini-cli\">Google Gemini CLI<\/a>). However, one of the advantages of using a third-party tool like Cursor is that you have the option to choose from a wide selection of models. The downside\u2014of course\u2014is that, like Uncle Ben would always say, \u201cWith great power comes great responsibility.\u201d<\/p>\n\n\n\n<p>Cursor doesn\u2019t just give you a single AI model and call it a day\u2014it hands you a buffet. You\u2019ve got heavy hitters like OpenAI\u2019s GPT series (now including the newly-released GPT-5), Anthropic\u2019s Claude models (including the shiny new Opus 4.1), Google\u2019s Gemini, along with Cursor\u2019s own hosted options and even local models you can run on your machine.<\/p>\n\n\n\n<p>Different models excel in different areas, and selecting wisely has a significant impact on quality, latency, and cost. Think of it like picking the right guitar for the gig\u2014you <em>could<\/em> play metal riffs on a nylon-string classical, but wouldn\u2019t you rather have the right tool for the job?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A Word on &#8220;Auto&#8221; Mode<\/h2>\n\n\n\n<p>Cursor also offers Auto mode, which will pick a model for you based on the complexity of your query and current server reliability. It\u2019s like autopilot\u2014but if you care about cost or predictability, it\u2019s worth picking models manually. <a href=\"https:\/\/docs.cursor.com\/en\/models#auto\">Cursor\u2019s documentation<\/a> describes it as selecting \u201cthe premium model best fit for the immediate task\u201d and \u201cautomatically switch[ing] models\u201d when output quality or availability dips. In practice, it\u2019s a reliability\u2011first, hands\u2011off default so you can keep coding without thinking about providers.<\/p>\n\n\n\n<p>Use Auto when you want to stay in flow and avoid babysitting model choice. It\u2019s especially handy for day\u2011to\u2011day edits, smaller refactors, explanation\/QA over the codebase, and any situation where provider hiccups would otherwise force you to switch models manually. Because Auto can detect degraded performance and hop to a healthier model, it reduces stalls during outages or rate\u2011limit blips.&nbsp;<\/p>\n\n\n\n<p>Auto is also a good \u201cfirst try\u201d when you\u2019re unsure which model style fits\u2014Cursor\u2019s guidance explicitly calls it a safe default. If you later notice the conversation needs a different behavior (more initiative vs. tighter instruction\u2011following), you can switch and continue. But, with that said, let\u2019s dive into the differences between the models themselves for those situations where you want to take control of the wheel.<\/p>\n\n\n\n<p class=\"learn-more\"><strong>Nota bene<\/strong>: A lot of evaluating how \u201cgood\u201d a model is for a given task is a subjective art. So, for this post, we\u2019re going to be juggling a careful balance between my own experience and a requisite amount of reading other people\u2019s hot takes on Reddit so that you don\u2019t have to subject yourself to that.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Claude Models (Sonnet, Opus, Opus 4.1)<\/h2>\n\n\n\n<p>Claude has become a fan favorite in Cursor, especially for frontend work, UI\/UX refactoring, and code simplification. I will say, I like to think that I am pretty good at this whole front-end engineering schtick, but even sometimes, I am impressed.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Claude 3.5 Sonnet<\/strong> \u2013 Often the \u201cdefault choice\u201d for coding tasks. It\u2019s fast, reliable, and has a knack for simplifying messy code without losing nuance.<\/li>\n\n\n\n<li><strong>Claude 4 Opus<\/strong> \u2013 Anthropic\u2019s flagship for deep reasoning. Excellent for architectural planning and critical refactors, though slower and pricier.<\/li>\n\n\n\n<li><strong>Claude 4.1 Opus<\/strong> \u2013 The newest version, with sharper reasoning and longer context windows. This is the model you pull out when you\u2019re dealing with a sprawling repo or thorny system design and you want answers that feel almost like a senior architect wrote them.<\/li>\n<\/ul>\n\n\n\n<p><strong>Trade-off<\/strong>: Claude models are sometimes cautious\u2014they\u2019ll decline tasks that a GPT model might at least attempt. But the output is usually more focused and aligned with best practices. I\u2019ve also noticed that Claude has a tendency to get side-tracked and work on other tangentially-related tasks that I didn\u2019t explicitly ask for. That said, I\u2019m guilty of this too.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">GPT Models (GPT-3.5, GPT-4, GPT-4o, o3, GPT-5)<\/h2>\n\n\n\n<p>OpenAI\u2019s GPT line has been the workhorse of AI coding.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>GPT-3.5<\/strong> \u2013 Blazing fast and cheap, perfect for boilerplate generation and small tasks.<\/li>\n\n\n\n<li><strong>GPT-4 \/ GPT-4o<\/strong> \u2013 Solid all-rounders. Great for logic-heavy work, nuanced refactors, and design patterns. GPT-4o is especially nice as a \u201cdaily driver\u201d because it balances cost, speed, and capability.<\/li>\n\n\n\n<li><strong>o3<\/strong> \u2013 A variant tuned for better reasoning and structured answers. Handy for debugging or step-by-step problem solving.<\/li>\n\n\n\n<li><strong>GPT-5<\/strong> \u2013 The new heavyweight. Think GPT-4 but with significantly deeper reasoning, longer context, and a much better grasp of codebases at scale. It\u2019s particularly strong at handling multi-file architectural changes and design discussions. If GPT-4 was like working with a diligent senior dev, GPT-5 feels closer to having a staff engineer who can keep the whole system in their head.<\/li>\n<\/ul>\n\n\n\n<p><strong>Trade-off<\/strong>: GPT models sometimes get \u201clazy\u201d\u2014they\u2019ll sketch a partial solution instead of finishing the job. But when you want factual grounding or logic-intensive brainstorming, they\u2019re hard to beat. GPT-5 in particular tends to go slower and check in more often. So, it\u2019s a bit more of a hands-on experience than the Claude models. That said, given Claude\u2019s tendency to go on side quests, I am not sure this is a bad thing. GPT-5 will often do the bare minimum but then come to you with suggestions for what it ought to do next and I find myself either agreeing or choosing a subset of its suggestions.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Gemini Models (Gemini 2.5 Pro)<\/h2>\n\n\n\n<p>Google\u2019s Gemini slots in nicely for certain tasks: complex design, deep bug-hunting, and rapid completions. It\u2019s more of a specialist tool\u2014less universal than Claude or GPT, but very effective when you hit the right workload. Historically, one of the perks of Gemini is that it had a massive context window (around 2 million tokens). In the months since it was released, however, other models have caught up\u2014namely Opus and GPT-5. Even Sonnet 4 now rocks a 1 million token context window.<\/p>\n\n\n\n<p>I typically find myself using Gemini for research tasks. \u201cHey Gemini, look over my code base and come up with some suggestions for how I can make my tests less flaky and go write them to this file.\u201d Its large context window makes it great for these kinds of tasks. It\u2019s no slouch in your day-to-day coding tasks either. I just typically find myself reaching for something lighter\u2014and cheaper.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">DeepSeek Coder<\/h2>\n\n\n\n<p>Cursor also offers DeepSeek Coder, a leaner, cost-effective option hosted directly by Cursor. It\u2019s good for troubleshooting and analysis, and useful if you want more privacy and predictable costs. That said, it doesn\u2019t quite match the top-tier frontier models for heavy generative work.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Local Models (LLaMa2 Derivatives, etc.)<\/h2>\n\n\n\n<p>Sometimes you just need to keep everything on your own machine. Cursor supports local models, which are slower and less powerful but guarantee maximum privacy. These shine if you\u2019re working with highly sensitive code or under strict compliance requirements. This is not my area of expertise. Mainly because my four-year-old MacBook can\u2019t run these models at the same speed as one of OpenAI\u2019s datacenters can.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Model Selection Strategy<\/h2>\n\n\n\n<p>Here are some general heuristics I\u2019ve found useful:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>For small stuff<\/strong> (boilerplate, stubs, quick utilities): GPT-4o or a local model keeps things fast and cheap.<\/li>\n\n\n\n<li><strong>For day-to-day coding<\/strong>: Claude Sonnet 4 and GPT-4.1&nbsp; are solid defaults. They balance reliability with performance. Gemini 2.5 Flash is also a strong contender in this department.<\/li>\n\n\n\n<li><strong>For heavy lifting<\/strong> (large refactors, architecture, critical business logic): GPT-5 or Claude Opus 4.1 are the power tools. They\u2019re not cheap, but often it costs less to get it right the first time. What I\u2019ll typically do is have them write their plan to a Markdown file, review it, and then let a lighter weight model take over from there.<\/li>\n\n\n\n<li><strong>When stuck<\/strong>: Swap models. If Claude hesitates, try GPT. If GPT spins in circles, Claude often cuts to the chase. This is not a super scientific approach, but it\u2019s wildly effective\u2014or at least it <em>feels<\/em> that way.<\/li>\n\n\n\n<li><strong>Privacy first<\/strong>: Use local models or Cursor-hosted DeepSeek when your code should never leave your machine. I\u2019ve traditionally worked on open-source stuff. So, this hasn\u2019t been a huge concern of mine, personally.<\/li>\n<\/ul>\n\n\n\n<p class=\"learn-more\"><strong>Editor&#8217;s note:<\/strong> If you <em>really<\/em> want to level up with your AI coding skills, you should go from here right to Steve&#8217;s course: <a href=\"https:\/\/frontendmasters.com\/courses\/pro-ai\/?utm_source=boost&amp;utm_medium=blog&amp;utm_campaign=boost\">Cursor &amp; Claude Code: Professional AI Setup<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Evaluating New Models<\/h2>\n\n\n\n<p>New models drop all of the time, which raises the question: How should you think about evaluating a new model release to see if it\u2019s a good fit for your workflow?<\/p>\n\n\n\n<p><strong>Capability<\/strong>\u2014Can it actually ship fixes in your codebase, not just talk about them? Reasoning\u2011forward models like OpenAI\u2019s o3 and hybrid \u201cthinking\u201d models like Claude 3.7 Sonnet are pitched for deeper analysis; use them when you expect layered reasoning or ambiguous requirements.&nbsp;<\/p>\n\n\n\n<p><strong>Behavior<\/strong>\u2014Does it take initiative or wait for explicit instructions? Cursor\u2019s model guide groups \u201cthinking models\u201d (e.g., o3, Gemini 2.5 Pro) versus \u201cnon\u2011thinking models\u201d (e.g., Claude\u20114\u2011Sonnet, GPT\u20114.1) and spells out when each style helps. Assertive models are great for exploration and refactors; obedient models shine on surgical edits.&nbsp;<\/p>\n\n\n\n<p><strong>Context<\/strong>\u2014Do you need a lot of context right now? If you\u2019re touching broad cross\u2011cutting concerns, enable Max Mode on models that support 1M\u2011token windows and observe whether plan quality improves enough to justify the slower, pricier runs. Having a bigger context window isn&#8217;t always a good thing. Regardless of what the model&#8217;s maximum context window size is, the more you load into that window, the longer it&#8217;s going to take to process all of those tokens. Generally speaking, having the <em>right<\/em> context is way better than having <em>more<\/em> context.<\/p>\n\n\n\n<p><strong>Cost and reliability<\/strong>\u2014Cursor bills at provider API rates; Auto exists to keep you moving when a provider hiccups. New models often carry different throughput\/price curves\u2014compare under your real workload, not just benchmarks. Cost is a tricky thing to evaluate because if a model costs more per token, but can accomplish the task in few tokens, it might end up being a bit cheaper when all is said and done.<\/p>\n\n\n\n<p>Here is my pseudo-scientific guide for kicking the tires on a new model.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Freeze variables. Use the same branch, same repo state, and the same prompt for each run. Turn Auto off when you\u2019re pinning a candidate so you\u2019re not measuring routing noise. Cursor\u2019s guide confirms Auto isn\u2019t task\u2011aware and excludes o3\u2014so when you test o3 or any very new model, pin it.&nbsp;<\/li>\n\n\n\n<li>Pick three task archetypes. Choose one surgical edit, one bug\u2011hunt, and one broader refactor. That trio exposes obedience, reasoning, and context behavior in a single pass. Cursor\u2019s \u201cmodes\u201d page clarifies that Agent can run commands and do multi\u2011file edits\u2014ideal for these trials.&nbsp;<\/li>\n\n\n\n<li>As Peter Drucker (or John Doerr, but I digress)&nbsp; used to say: Measure what matters. For each task and model, record: did tests pass; how much did it modify; did it follow constraints; how many agent tool calls and shell runs; and wall\u2011clock duration. Cursor\u2019s headless CLI can stream structured events that include the chosen model and per\u2011request timing\u2014perfect for quick logging.<\/li>\n<\/ol>\n\n\n\n<p>Repeat this process with Max Mode if the model you&#8217;re evaluating advertises giant context. You\u2019re testing whether the larger window yields better plans or just slower ones.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Wrapping Up<\/h2>\n\n\n\n<p>Model choice in Cursor isn\u2019t just about \u201cwhich AI is best\u201d\u2014it\u2019s about matching the right tool to the task. Claude excels at simplifying and clarifying, GPT shines at reasoning and factual grounding, Gemini offers design chops, and local models guard your privacy.<\/p>\n\n\n\n<p>And with GPT-5 and Opus 4.1 now in the mix, we\u2019re entering a phase where models can reason about your codebase almost like a human teammate. The trick is knowing when to bring in the heavy artillery and when a lighter model will do the job faster and cheaper.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Cursor has an &#8220;auto&#8221; mode, &#8220;but if you care about cost or predictability, it\u2019s worth picking models manually.&#8221; says Steve Kinney.<\/p>\n","protected":false},"author":30,"featured_media":7079,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"sig_custom_text":"","sig_image_type":"featured-image","sig_custom_image":0,"sig_is_disabled":false,"inline_featured_image":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[104,97,391,392],"class_list":["post-7083","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog-post","tag-ai","tag-code-editors","tag-cursor","tag-llms"],"acf":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/frontendmasters.com\/blog\/wp-content\/uploads\/2025\/09\/Getting-Started-with-Cursor.jpg?fit=1140%2C676&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/frontendmasters.com\/blog\/wp-json\/wp\/v2\/posts\/7083","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/frontendmasters.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/frontendmasters.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/frontendmasters.com\/blog\/wp-json\/wp\/v2\/users\/30"}],"replies":[{"embeddable":true,"href":"https:\/\/frontendmasters.com\/blog\/wp-json\/wp\/v2\/comments?post=7083"}],"version-history":[{"count":4,"href":"https:\/\/frontendmasters.com\/blog\/wp-json\/wp\/v2\/posts\/7083\/revisions"}],"predecessor-version":[{"id":7111,"href":"https:\/\/frontendmasters.com\/blog\/wp-json\/wp\/v2\/posts\/7083\/revisions\/7111"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/frontendmasters.com\/blog\/wp-json\/wp\/v2\/media\/7079"}],"wp:attachment":[{"href":"https:\/\/frontendmasters.com\/blog\/wp-json\/wp\/v2\/media?parent=7083"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/frontendmasters.com\/blog\/wp-json\/wp\/v2\/categories?post=7083"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/frontendmasters.com\/blog\/wp-json\/wp\/v2\/tags?post=7083"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}