Finding Unity

Finding Unity

Why Your AI Search Might Not Understand What You're Looking For

Imagine typing a straightforward query into your AI-powered business intelligence platform: "What JDs have unity in their requirements?" You expect to see results like the Mobile Games Developer role, which explicitly calls for "Experience with Unity or other game engines." Instead, the system replies: "None of your job descriptions mention unity as a requirement." Frustrating, right? You (I specifically) have just encountered one of the subtle pitfalls of modern semantic search, a technology that's brilliant at grasping nuance but can falter spectacularly on ambiguous terms.

In this article/moan , we (I) will unpack why this happens, how semantic search really works under the hood, and, most importantly, practical ways to fix it. We'll build on real-world examples like "Unity" to explore solutions, including advanced techniques like query rewriting and domain-specific embeddings, to make your platform more robust for cold, contextless prompts.

The Word That Means Everything (and Nothing)

"Unity" is a classic case of polysemy, a word with multiple, often unrelated meanings. Here's a quick rundown of its many faces:

  • Unity as a Game Engine (Unity3D): The popular software for building video games, used by developers worldwide (Marvin in space anyone ?).
  • Unity as a Linux Desktop Environment (Unity DE): An older interface for even older Ubuntu users.
  • Unity as a Philosophical or Social Concept: Think harmony, togetherness, or national unity in politics.
  • Unity as a Design Principle: In UI/UX, it refers to visual cohesion in layouts.
  • Unity as a Business Term: Organizational alignment, team unity, or even process unification in corporate speak.
  • Other Oddities: It's a bank name in some countries, a band, and a sinister goal by some European member states.

When you query "Unity" in a job search context, a human might infer the game engine based on surrounding clues. But an wide usage AI system? After a cold start, lacking any history, and without explicit guidance, it has to guess and it often leans toward the most common or generic interpretation, like "team unity" in HR documents.

This isn't a bug; it's a feature of how AI processes language.

How Semantic Search Actually Works (as far as I can figure)

Unlike old-school keyword searches that simply scan for exact matches, semantic search uses embeddings, mathematical vectors that capture the "meaning" of text. Tools like OpenAI's embeddings or Sentence-BERT convert your query and documents into high-dimensional points in vector space. Similar meanings cluster together, enabling fuzzy matches:

  • "Staff" clusters near "employees."
  • "Compensation" aligns with "salary."
  • "CV" vectors close to "resume."

This is revolutionary for natural language queries. But ambiguity strikes when a word like "Unity" pulls in multiple directions. The embedding for "Unity" might average out across its senses, drifting toward dominant usages (e.g., philosophical unity in general text corpora) rather than niche ones (e.g., the game engine in tech docs).

The Cold Start Problem: No Context, No Clarity

Humans rely on context to disambiguate. In the JD example:

  • Sure the phrase "or other game engines" screams software.
  • The job title "Mobile Games Developer" sets a tech/gaming frame.
  • The document type (a JD from a gaming company) adds layers.

But semantic systems often index documents in chunks, splitting long JDs into smaller fragments for efficiency. A chunk with just "Experience with Unity is a plus" loses those clues. Worse, your query is "cold": no conversation history, user profile, or hints. The AI is left interpreting in a vacuum, potentially defaulting to business jargon like "unity in requirements" meaning alignment or process unification.

Add in model biases, general-purpose embeddings are trained on vast internet data where "unity" as harmony dominates, and you've got a recipe for missed hits.

The Irony of Intelligence: Smarter Systems, Bigger Blind Spots

The more "intelligent" the search, the more it infers intent, which can backfire. A dumb keyword tool would flag "Unity" instantly (case-sensitive or not). But semantic search asks: "What does this really mean?" In a job platform, it might pivot to cultural fit or team dynamics, ignoring the tech angle. This is especially true for proper nouns or domain-specific terms, which embeddings handle poorly without tuning.

Real-World Implications and Examples

This isn't just theoretical. In platforms like LinkedIn or custom HR AI tools, similar issues arise with terms like "Python" (programming language vs. snake), "Java" (code vs. island/coffee), or "Swift" (Apple's language vs. fast). Users building AI search for resumes, patents, or codebases face the same: a query for "React" might return chemistry docs on reactions, not the JavaScript library.

As of now with advancements in multimodal AI, these problems persist but are easing through specialised models. For instance, tech-focused embeddings from Hugging Face now better isolate senses in code-heavy datasets.

Solutions: From Quick Fixes to Advanced Overhauls

Fixing this doesn't require scrapping your system. Here's a layered approach, starting simple and scaling up:

  1. Hybrid Search: The Best of Both Worlds Blend semantic embeddings with keyword matching. Use BM25 or exact-phrase searches for terms like "Unity" (boosting capitalized versions), then rerank with semantics. Tools like Pinecone or Elasticsearch make this seamless. Result: Literal hits for "Unity" get prioritized, even if embeddings wander.
  2. Keyword Indexing and Entity Recognition During document ingestion, extract proper nouns and technical terms using NLP libraries (e.g., spaCy for Named Entity Recognition). Tag "Unity" as "Unity (game engine)" in metadata if context suggests it. Index these separately for fast lookups. Capitalization is key: Treat "Unity" as a signal for proper nouns over lowercase "unity."
  3. Context Accumulation and User Signals For repeat users, build profiles: If someone often searches game dev terms (e.g., "Unreal," "C#"), bias toward tech interpretations. For cold queries, use heuristics like co-occurring words ("requirements" + "Unity" → check tech dict). Even subtle signals, like query capitalization, can trigger filters.
  4. Query Expansion and Rewriting Before embedding, rewrite ambiguous queries intelligently. A lightweight LLM or rule-based system can detect domain (e.g., job-related) and expand "Unity" to "Unity game engine OR Unity3D." This internal boost clarifies without user effort.
  5. Word Sense Disambiguation (WSD) and Entity Linking Link terms to knowledge bases like Wikipedia or custom ontologies (e.g., ESCO for skills). "Unity" in a gaming JD links to the software page, not philosophy. Advanced: Use multi-vector embeddings, generating separate vectors per sense (e.g., one for game engine, one for harmony) and searching across them.
  6. Domain-Specific Embeddings and Fine-Tuning Ditch general models for ones tuned on your data. Fine-tune BERT on tech JDs, using contrastive learning to separate "Unity" senses (e.g., push game engine examples away from unity-as-harmony). Models like GTE or custom Hugging Face variants excel here. For scale, start with a dataset of 1,000+ labelled examples.
  7. Testing and Iteration Build a test suite with edge cases (e.g., "JDs with unity requirements") and measure recall. Log production failures for retraining. If ambiguity persists, add post-search clarification: "Did you mean Unity the game engine?"

So what did I do, raised a bug, worry about it later :)

Insightful article. Love the various solution options. Love even more the "raise a bug and worry about it later" 😂

To view or add a comment, sign in

More articles by Scott C.

  • Now the chips are really down.

    The BBC Computer Literacy Project 2.0: Why 2026 Demands a New National Curriculum I’ve spent the better part of 40…

    2 Comments
  • Local AI for Local People

    Apart from my own time, which is of course priceless, and the increasing cost of local compute power (I could have…

  • Why Enterprise AI Governance Has Moved From Nice-to-Have to Operating Requirement

    For much of the last two years, enterprise AI conversations were dominated by capability. Which model performed best.

  • Stop replacing, Start enhancing

    We are overestimating the ease with which AI can replace human endeavour and underestimating the value that AI brings…

  • The Age of De-Disillusionment

    Three uncomfortable truths AI is forcing into the open We’re moving into a phase of AI adoption where the conversation…

  • AI might be coming for your job, but not in the way you think.

    AI's impact on jobs is real but somewhat nuanced and largely anticipatory. U.

  • Coding is cheap. Judgment not so much.

    Coding Isn’t the Bottleneck Anymore. Judgment Is.

  • Oh Gawd - bot

    Last week in a partner meeting we were discussing how already in 2026 there has been a slew of announcements of AI task…

  • The 'New Age' of Empires

    Those familiar with Microsoft’s seminal RTS Age of Empires will recognise the dynamic. Build your civilization at pace,…

  • The Anti Anti AI people's front

    Two things seem to be happening at the same time: a visible rise in anti-AI sentiment, and rapid qualitative…

    1 Comment

Others also viewed

Explore content categories