A GTM Guide to AI Models

A GTM Guide to AI Models

Most AI content in the GTM space is either “AI will replace your entire team” or “leverage AI for efficiency.” Both are useless I say.

This issue goes deep on the models themselves. What they are, how they work, the settings you can tune, and where each one creates real value across marketing, sales, CS, and RevOps. Consider it the technical foundation every revenue leader needs before making decisions about where AI fits in their GTM motion.

The Models: What’s Out There

Five families of models matter for GTM right now, and they’re not interchangeable. Each has distinct strengths. The practitioners getting the most value are using multiple models for different tasks.

Claude (Anthropic) is the strongest writer and analyst in the group. It excels at nuanced writing that doesn’t sound machine-generated, long-form analysis, complex multi-step reasoning, and processing large volumes of text. If you’re feeding it 50 pages of call transcripts or a messy CRM export and asking it to find patterns, Claude is your best bet. Its context window (the amount of text it can hold in memory during a conversation) is massive, which matters when you’re working with real business data.
ChatGPT / GPT (OpenAI) is the generalist. Good at most things, great at a few. Its ecosystem is the largest: more integrations, more plugins, more third-party tools built on top of it. Where it separates itself: image generation, voice interaction, and memory across conversations. If you need an AI that remembers you told it your ICP two weeks ago and applies that context automatically, ChatGPT does that better than anyone.
Gemini (Google) is the data engine. Its differentiator is integration with Google’s ecosystem and its ability to process enormous amounts of data. If your company runs on Google Workspace, Gemini can natively search your Drive, analyze your Sheets, and pull context from your Docs. Its context window is the largest in the industry, up to 1 million tokens, which means you can load an entire quarter’s worth of sales data into a single conversation. For deep research tasks, it’s thorough to the point of being verbose.
DeepSeek is the budget option that actually performs. It offers comparable reasoning capabilities at roughly 27x cheaper than OpenAI for similar tasks. For high-volume, repeatable tasks like data enrichment, content moderation, or lead scoring where you’re processing thousands of records, the cost difference is material. This is where your unit economics conversation starts.
Open-Source Models (Llama, Mistral) are the self-hosted option. If data privacy is a non-negotiable (healthcare, financial services, government contracts), open-source models let you run everything on your own infrastructure. The tradeoff is setup complexity and typically lower performance on nuanced tasks. But for companies where sending customer data to a third-party API is a dealbreaker, this is the path.

The “one model for everything” era never really existed. The smartest teams are routing different tasks to different models based on what each does best.

Tokens, Context Windows, and What They Mean for Your Data

Before we get into settings, you need to understand the currency these models trade in: tokens.

A token is the smallest unit of text that an AI model processes. Not exactly a word. More like a chunk. Sometimes a whole word, sometimes part of one. “Pipeline” is one token. “Unhappiness” gets split into two: “un” and “happiness.” Punctuation, spaces, and numbers all consume tokens too.

The rule of thumb:

  • 1,000 tokens ≈ 750 words.
  • Or flip it: 1 token ≈ 0.75 words ≈ 4 characters.

A standard single-spaced page of text (about 500 words) runs roughly 650-700 tokens.

So what?

Every model has a context window, the total number of tokens it can hold in a single conversation. This includes everything. Your prompt, any documents you upload, the conversation history, AND the model’s response. When you exceed the context window, the model starts dropping earlier content. It doesn’t tell you. It just quietly forgets.

How the major models compare:

GPT-5.2 has the largest output window at 128K tokens, roughly 96,000 words in a single response. That matters for tasks like producing full quarterly business reviews or comprehensive pipeline analyses. Gemini and Claude Sonnet can ingest the most data (1M tokens each), but both cap responses at 64K tokens. And DeepSeek at 27x cheaper than the premium models adds up fast when you’re running high volume enrichment or classification across thousands of records.

A typical CRM export of 1,000 opportunity records with 15 fields each might run 30,000-50,000 tokens. A quarter’s worth of call transcripts for a 10-person sales team could hit 500,000+ tokens. A 30-page sales playbook is roughly 20,000 tokens. Your average Gong call summary is about 1,500-3,000 tokens.

So when someone says “just upload your pipeline data and ask the AI to analyze it,” the context window determines whether that’s actually possible in a single conversation or whether you need to chunk your data across multiple passes.

Advertised context windows don’t always mean reliable performance at full capacity. Independent testing shows most models start degrading in accuracy around 65-70% of their stated limit. A model claiming 200K tokens typically becomes unreliable around 130K. Claude is the exception, showing less than 5% accuracy degradation across its full context window. GPT slows down and occasionally misses details near its ceiling. Gemini handles massive inputs but response latency increases significantly with long contexts, and multimodal processing (images, video, audio) adds overhead on top of that.

Input tokens vs. output tokens. When using AI via API (which is how most automation tools and integrations work), you’re billed separately for input tokens (what you send to the model) and output tokens (what the model generates back). Output tokens are always more expensive. Look at the table: Claude Opus charges $5 per million input tokens but $25 per million output. That’s a 5x multiplier. GPT-5.2 is $1.75 in, $14 out, an 8x multiplier. Generating text is computationally harder than reading it.

This has real implications for how you design AI workflows. A lead enrichment system that sends a brief record (200 input tokens) and gets back a scored summary (500 output tokens) costs meaningfully more per record than a system that sends a longer record (800 input tokens) and gets back a simple classification label (20 output tokens). The ratio of input to output directly drives your unit economics.

There’s also a hidden cost most teams miss: context window creep. In multi-turn conversations, whether between a human and an AI or between AI agents in an automated workflow, the entire conversation history gets re-sent as input with every turn. Turn 1 might be 500 tokens. By turn 10, you’re sending 15,000+ tokens of accumulated history as input, paying for all of it again. This is the single biggest source of unexpectedly high AI bills in production systems. To fix this I suggest truncating history, summarize previous turns, or use prompt caching (most providers now offer a 90% discount for repeated prefixes).

The context window determines what analysis is possible, the input/output split determines what it costs, and how you manage conversation history determines whether costs stay predictable or spiral.

The Settings That Matter

Everyone talks about prompts. Almost nobody talks about parameters, the settings that control how the model generates its output. These matter as much as what you ask. They’re the difference between an AI tool that works and one that feels unreliable.

Temperature (0.0 - 1.0) is the creativity dial. At 0, the model picks the most statistically probable next word every time. Consistent, predictable, sometimes boring output. At 1.0, it takes more risks. More creative, more varied, but also more likely to hallucinate or go off-script.

In practice, set temperature low (0.0-0.3) for anything that needs to be accurate and repeatable. Lead scoring, data classification, CRM field population, forecast analysis. Set it higher (0.6-0.8) for creative work like email copy, ad variations, or brainstorming campaign angles. The mistake most teams make is using the default (usually 1.0) for analytical tasks. That’s asking your analyst to be “creative” with your pipeline numbers.

Top-P / Nucleus Sampling (0.0 - 1.0) controls how many word options the model considers before picking one. At 0.1, it only considers words in the top 10% of probability. At 0.95, it considers nearly everything. It’s the width of the model’s vocabulary for any given response.

For GTM use consider pairing a low Top-P (0.1-0.3) with tasks where you want consistent terminology, like generating MEDDICC notes or writing SOW language. Use a higher Top-P (0.7-0.9) when you want variety, like generating multiple versions of outbound sequences.

Rule of thumb: adjust temperature OR Top-P, not both. They do similar things through different mechanisms. Tweaking both simultaneously makes outputs unpredictable.

Max Tokens sets the ceiling on how long the response can be. One token is roughly 0.75 words. This matters for cost control (you pay per token on API calls) and for keeping outputs focused. Building an AI-powered chatbot for customer support? Cap responses at 200-300 tokens. Generating quarterly business reviews? You need thousands.

Frequency and Presence Penalties reduce repetition. Frequency penalty punishes words based on how often they’ve already appeared. Presence penalty discourages any word that’s appeared at all, pushing the model toward new vocabulary. Useful when generating multiple variations of email copy or ad creative where you need genuine variety, not the same template with minor word swaps.

System Prompts / Custom Instructions are the most overlooked and most powerful setting. Every major model lets you set persistent instructions that shape every response. For GTM, this is where you define your ICP, your brand voice, your scoring criteria, or your deal stage definitions. Instead of re-explaining your sales methodology in every prompt, you put it in the system prompt once. The model applies it automatically. Same concept as a new hire’s onboarding docs, except the model actually reads and follows them.

Reasoning Effort / Thinking Mode is new. The latest generation of models offers adjustable reasoning depth. You can tell the model to think harder or faster. For simple classification tasks (is this lead SMB or Enterprise?), fast mode saves time and money. For complex analysis (why did our win rate drop 8 points in Q3?), extended thinking produces dramatically better output. All three major providers now let you dial this up or down, so you’re not paying for deep reasoning when you just need a quick data transformation.

Where AI Works in GTM

Now for the part that pays the bills. Where each model capability maps to real GTM workflows, organized by function.

Marketing

Content creation and variation is the most adopted use case, and for good reason. AI generates email copy, blog drafts, social posts, and ad creative at a speed that changes the math on content production. The key is treating AI output as a first draft, not a final product, and using temperature settings strategically: low for product messaging that needs to stay on-brand, higher for creative hooks and subject lines.

Personalization at scale. AI models can take a segment definition and generate personalized messaging for dozens of personas without each version reading like a mail merge. Feed the model your ICP profiles, your value propositions by segment, and your brand voice examples. Output quality scales with the quality of your inputs, which is a data problem, not an AI problem.

Competitive intelligence. Point a model with web search capabilities at competitor websites, press releases, and review sites. Ask it to synthesize positioning changes, pricing shifts, and feature launches into a weekly brief. This used to require a dedicated analyst. Now it takes 15 minutes to set up.

SEO and content optimization. Gemini excels here because of its ability to process competitor sites and search data simultaneously. Use it to analyze content gaps, generate keyword clusters, and draft content briefs at a fraction of the traditional cost.

Sales

Call analysis and coaching. Revenue intelligence platforms like Gong and Chorus already use AI to analyze call recordings, but the shift is toward real-time analysis. Models can now process call transcripts and automatically populate MEDDICC fields, identify competitive mentions, flag pricing objections, and score rep performance against talk tracks. The automation I built using Otter.ai, Zapier, and Claude to auto-populate MEDDICC fields in Salesforce saves reps 30+ minutes per deal.

Outbound personalization. A BDR used to spend 20 minutes researching a prospect, then writing a semi-personalized email. Now: Clay + an LLM enriches the prospect, identifies relevant triggers (job change, funding round, tech stack signal), and drafts a message personalized to the specific trigger, not just “I see you work at {Company}.” Set temperature to 0.5-0.7 for outbound. You want personality, not chaos.

Forecasting and deal intelligence. AI applied to CRM data can identify patterns in won and lost deals that humans miss. Which combination of fields (deal size + industry + champion title + competitor involved) actually predicts close rates? Models surface these insights faster than any BI tool, provided your data is clean. And that’s always the caveat.

Proposal and SOW generation. Feed a model your template library, the deal’s specific parameters, and your pricing matrix. It generates a first draft in minutes. Low temperature here. You want precision, not creativity, when you’re talking contract terms.

Customer Success

Health scoring. Traditional health scores use lagging indicators: login frequency, support tickets, NPS. AI can incorporate leading indicators by analyzing the content of support conversations, the sentiment of executive check-ins, and the patterns in product usage that precede churn. Models with large context windows matter here because you’re feeding them months of interaction history per account.

Automated QBR prep. Before every QBR, someone on your CS team spends hours pulling usage data, compiling ROI metrics, and writing an executive summary. AI can do 80% of that work. Feed it the account’s product usage data, support ticket history, and renewal timeline. It generates the first draft of the QBR deck content. Your CSM reviews, adds human insight, and presents.

Ticket routing and response. Not just chatbots. Modern AI can read an incoming support ticket, classify its urgency and topic, route it to the right specialist, and draft a response for human review. The models that perform best here run lower temperature settings with clear instruction sets. You want consistent, accurate responses, not creative ones.

RevOps

Pipeline hygiene. This is my bread and butter, and it’s where AI pays for itself fastest. Models can scan your entire pipeline and flag deals with missing fields, stalled stage durations, inconsistent close dates, or no recent activity. The audit I did for GrowthX revealed 51% of deals lacked assigned owners and $5.3M in stalled pipeline. An AI-powered hygiene system catches these issues in real-time instead of quarterly.

Lead scoring and routing. Build a model that analyzes your historical conversion data and identifies which combination of firmographic, behavioral, and intent signals actually predict conversion. Then use that model to score and route inbound leads automatically. The system I built for Constant Contact uses AI to process enrichment data from Clay and assigns scores in HubSpot that drive automated routing. No human touching a lead until it’s qualified.

Data enrichment. The highest-volume, lowest-glamour AI use case in RevOps, and probably the most valuable. Tools like Clay can orchestrate multiple AI models and data sources to fill in missing fields, standardize company names, verify contact information, and append technographic data. Use DeepSeek or similar cost-efficient models for this. You’re processing thousands of records and the cost-per-record matters.

Compensation modeling. Feed an AI model your comp plan structure, attainment data, and market benchmarks. Have it run scenarios: What happens to total payout if we change the accelerator threshold? What’s the impact on rep behavior if we add a migration commission tier? The model won’t design your comp plan for you, but it’ll stress-test your assumptions faster than any spreadsheet.

Choosing the Right Model for the Job

The decision matrix I use:

  • Need accuracy and consistency? Low temperature, Claude or GPT, structured prompts
  • Need creative variation? Higher temperature, any frontier model, multiple generations
  • Processing large datasets? Gemini (context window) or DeepSeek (cost efficiency)
  • Need ecosystem integration? GPT (largest plugin ecosystem) or Gemini (Google Workspace)
  • Data sensitivity concerns? Open-source models, self-hosted
  • High-volume, repeatable tasks? DeepSeek or Gemini Flash for cost optimization

Finding the best model doens’t make sense. Building a multi-model workflow where each task routes to the model that handles it best, at the right temperature setting, with the right context. That’s an architecture decision. And architecture is what RevOps does.

Where to Start

If none of this is happening at your org yet, don’t try to boil the ocean. Pick one workflow in one function, the one with the most manual effort and the cleanest data, and automate it. For most teams, that’s either pipeline hygiene (RevOps), call summarization (Sales), or content variation (Marketing).

Get the parameters right for that one workflow. Measure the time saved. Then expand.

The companies getting results with AI in 2026 aren’t the ones that adopted the most tools. They’re the ones that matched the right model and the right settings to the right problem, measured the impact, and scaled what worked.

Go forth and operate.

Paid members get the AI Model Tuner guide.

Like how generous you always are with what you share in public. Never feel like it's a shortcut to leaving the reader out of the loop on what to do or what's going on

Like
Reply

Mapping models by latency and cost is practical, because sales gets stuck on “why this one” fast. How are you handling the trade-off when a cheaper model drifts and support tickets spike?

Like
Reply

This is awesome, Jeff. Thanks for sharing your insights.

Like
Reply

To view or add a comment, sign in

More articles by Jeff Ignacio

  • WBRs vs QBRs

    Revenue teams run a lot of meetings. Most of them are too long, cover too much ground, or fail to generate decisions.

    7 Comments
  • Why your CS team keeps missing expansion signals

    When Figma decided to build an enterprise sales team in 2018, their Sales Ops team ran into a problem that had nothing…

    2 Comments
  • How GTM engineers are changing the way marketing finds pipeline

    Vanta monitors four signals at once: SOC2 certification announcements, compliance related website changes, funding…

    4 Comments
  • Measuring wallet share

    Most RevOps teams know their ARR number to the dollar. They know NRR.

  • GTM Engineers... okay so what now?

    LinkedIn listed over 3,000 open GTM Engineer roles in January 2026. Insanity!!! Compensation at top-tier companies now…

    8 Comments
  • Evidence Based Lead Scoring

    Every RevOps team I work with has a lead scoring model. Almost none of them trust it.

    21 Comments
  • Vibe Coding is coming for GTM

    Claude Code seems to be the new shiny object on the frontier for the bleeding edge operators. And I agree, it's kickass.

    8 Comments
  • What RevOps Actually Owns

    Ask ten companies what Revenue Operations owns, and you’ll get ten different answers. Some will say reporting.

    11 Comments
  • My journey into vibe coding

    We're entering a new era with these AI and vibe coding tools. I'm a STRONG BELIEVER that Revenue Operations…

    6 Comments
  • AI + GTM Unit Economics

    You just closed your biggest deal ever. $500K annual contract.

    3 Comments

Others also viewed

Explore content categories