RAG Chunking with Azure SQL Database

Dhruv Malik

Published Apr 10, 2026

Disclaimer: The views and opinions expressed in this article are my own and do not represent the official position of Microsoft Corporation. This is not official Microsoft documentation.

Introduction

Building a RAG (Retrieval Augmented Generation) pipeline doesn’t have to be complicated. Before reaching for Python libraries, LangChain, or external vector databases, consider that Azure SQL Database now ships with native T-SQL capabilities that let you orchestrate the entire chunk-embed-store-search pipeline directly from T-SQL, even though embedding generation still calls an external model endpoint (such as Azure OpenAI) under the hood.

This article focuses on the two simplest and most practical chunking techniques fixed-size chunking and fixed-size with overlap both achievable entirely in T-SQL using the built-in AI_GENERATE_CHUNKS function. These are the recommended starting points for any RAG project.

What You Get Out of the Box

Azure SQL Database (and SQL Server 2025) now includes:

VECTOR(n) data type: store embeddings as first-class columns
AI_GENERATE_CHUNKS(): split text into chunks directly in T-SQL
AI_GENERATE_EMBEDDINGS(): call an external embedding model (e.g., Azure OpenAI) from T-SQL to vectorize text
VECTOR_DISTANCE(): cosine, dot product, or Euclidean distance between vectors
CREATE VECTOR INDEX: DiskANN-based approximate nearest neighbor indexing

No separate vector database needed. Your data, chunks, embeddings, and search are all orchestrated from one place.

Prerequisites

Before chunking and embedding, you need to configure your database and register an external Azure OpenAI model.

Step 0: Ensure Compatibility Level and Enable REST Endpoints

-- AI_GENERATE_CHUNKS requires compatibility level 170 or higher.
-- Check your current level:
SELECT compatibility_level FROM sys.databases
WHERE name = DB_NAME();
 
-- If below 170, update it:
ALTER DATABASE [YourDatabase] SET COMPATIBILITY_LEVEL = 170;
GO
 
-- AI_GENERATE_EMBEDDINGS requires external REST endpoint invocation.
-- Enable it:
EXECUTE sp_configure 'external rest endpoint enabled', 1;
RECONFIGURE WITH OVERRIDE;
GO

Step 1: Create Credential and External Model

-- Create a master key (if not already present)
IF NOT EXISTS (
    SELECT * FROM sys.symmetric_keys
    WHERE [name] = '##MS_DatabaseMasterKey##'
)
BEGIN
    CREATE MASTER KEY ENCRYPTION BY PASSWORD = N'<YourStrongPassword>';
END
GO
 
-- Create database scoped credential for Azure OpenAI
CREATE DATABASE SCOPED CREDENTIAL [https://<your-resource>.openai.azure.com/]
WITH IDENTITY = 'HTTPEndpointHeaders',
SECRET = '{"api-key":"<YOUR_AZURE_OPENAI_KEY>"}';
GO
 
-- Register the embedding model
CREATE EXTERNAL MODEL MyEmbeddingModel
WITH (
    LOCATION   = 'https://<your-resource>.openai.azure.com/openai/
                  deployments/<your-deployment-name>/embeddings
                  ?api-version=2024-06-01',
    API_FORMAT = 'Azure OpenAI',
    MODEL_TYPE = EMBEDDINGS,
    MODEL      = 'text-embedding-3-small',
    CREDENTIAL = [https://<your-resource>.openai.azure.com/]
    -- Optional: PARAMETERS = '{"dimensions":1536}'
);
GO

💡 Tip: The VECTOR(1536) dimension should match your model. text-embedding-3-small outputs 1536 dimensions by default, but you can request fewer (e.g., 768) via the dimensions parameter to reduce storage. Note: the deployment name in the URL may differ from the model name, use whatever you named your deployment in Azure AI Foundry.

Step 2: Create the Tables

-- Source documents
CREATE TABLE Documents (
    doc_id      INT IDENTITY(1,1) PRIMARY KEY,
    title       NVARCHAR(500),
    full_text   NVARCHAR(MAX),
    source_url  NVARCHAR(1000),
    created_at  DATETIME2 DEFAULT SYSUTCDATETIME()
);
 
-- Chunks with embeddings
CREATE TABLE DocumentChunks (
    chunk_id     INT IDENTITY(1,1) PRIMARY KEY,
    doc_id       INT FOREIGN KEY REFERENCES Documents(doc_id),
    chunk_text   NVARCHAR(MAX),
    chunk_order  BIGINT,         -- matches AI_GENERATE_CHUNKS return type
    chunk_offset BIGINT,         -- matches AI_GENERATE_CHUNKS return type
    chunk_length INT,
    chunk_method NVARCHAR(50),
    embedding    VECTOR(1536),   -- Native vector column
    created_at   DATETIME2 DEFAULT SYSUTCDATETIME()
);

Technique 1: Fixed-Size Chunking

The simplest possible approach: split text into uniform chunks of a specified character count using the built-in AI_GENERATE_CHUNKS() table-valued function. One line of T-SQL. No libraries, no scripts.

Understanding AI_GENERATE_CHUNKS

The function signature:

AI_GENERATE_CHUNKS (
    SOURCE     = <text_expression>,
    CHUNK_TYPE = FIXED,
    CHUNK_SIZE = <number_of_characters>
)

Example: View Chunks Before Storing

Preview how a document will be chunked:

-- Preview chunks for a specific document
SELECT
    d.title,
    c.chunk_order,
    c.chunk_offset,
    c.chunk_length,
    LEFT(c.chunk, 80) + '...' AS chunk_preview
FROM Documents d
CROSS APPLY AI_GENERATE_CHUNKS (
    SOURCE     = d.full_text,
    CHUNK_TYPE = FIXED,
    CHUNK_SIZE = 500
) AS c
WHERE d.doc_id = 1
ORDER BY c.chunk_order;

Full Pipeline: Chunk + Embed + Store

The real power is combining everything in a single INSERT statement:

-- Chunk, embed, and store — all in one T-SQL statement
INSERT INTO DocumentChunks
    (doc_id, chunk_text, chunk_order, chunk_offset, chunk_length,
     chunk_method, embedding)
SELECT
    d.doc_id,
    c.chunk,
    c.chunk_order,
    c.chunk_offset,
    c.chunk_length,
    'fixed_500',
    AI_GENERATE_EMBEDDINGS(c.chunk USE MODEL MyEmbeddingModel)
FROM Documents d
CROSS APPLY AI_GENERATE_CHUNKS (
    SOURCE     = d.full_text,
    CHUNK_TYPE = FIXED,
    CHUNK_SIZE = 500
) AS c;

🎯 Key Insight: This single statement reads your source documents, splits them into 500-character chunks, calls Azure OpenAI to generate a 1536-dimension embedding for each chunk, and stores everything in your chunks table. The workflow is orchestrated entirely from T-SQL, though the embedding step invokes your external Azure OpenAI endpoint behind the scenes.

Choosing a Chunk Size

The chunk size directly impacts retrieval quality:

Too large (2000+): The embedding averages across too many concepts. Retrieval pulls in irrelevant context that dilutes the LLM’s answer.
Too small (50–100): Individual chunks lack enough context to be meaningful. Sentence fragments don’t search well.
Sweet spot (300–800): Enough text to capture a coherent idea. Most RAG practitioners start around 500 characters and tune from there.

✅ Pros: Zero external dependencies beyond the embedding model; single T-SQL statement; dead simple to understand, debug, and maintain; predictable chunk counts for cost estimation.

❌ Cons: Splits mid-word and mid-sentence; ignores document structure; chunks may break semantic meaning at boundaries.

Technique 2: Fixed-Size with Overlap

The overlap technique solves the biggest problem with basic fixed-size chunking: lost context at chunk boundaries. By repeating a percentage of the previous chunk at the start of the next one, you ensure that concepts split across a boundary still appear intact in at least one chunk.

Searching Your Chunks

Once chunks are embedded and stored, you need to retrieve the most relevant ones for a user query. Azure SQL provides two approaches.

Exact Search (kNN) with VECTOR_DISTANCE

For smaller datasets (the Microsoft documentation recommends exact search for fewer than ~50,000 vectors), exact search is fast and guarantees perfect recall:

-- Generate embedding for the user's question
DECLARE @query NVARCHAR(MAX) = N'What are the key benefits of chunking?';
DECLARE @qv VECTOR(1536) = AI_GENERATE_EMBEDDINGS(
    @query USE MODEL MyEmbeddingModel
);
 
-- Find the 5 most relevant chunks
SELECT TOP 5
    dc.chunk_id,
    dc.chunk_text,
    d.title AS source_doc,
    dc.chunk_method,
    VECTOR_DISTANCE('cosine', @qv, dc.embedding) AS distance
FROM DocumentChunks dc
JOIN Documents d ON dc.doc_id = d.doc_id
ORDER BY distance ASC;

Approximate Search (ANN) with DiskANN

For larger datasets, create a vector index and use VECTOR_SEARCH.

⚠️ Important: Vector indexes require at least 100 rows with non-NULL vector values before the index can be created. Attempting to create an index on fewer rows fails with error Msg 42266. For development with small datasets, use VECTOR_DISTANCE (exact kNN search) instead it works without an index.

-- Create DiskANN index (requires 100+ rows with non-NULL embeddings)
CREATE VECTOR INDEX IX_Embedding_DiskANN
ON DocumentChunks(embedding)
WITH (METRIC = 'cosine', TYPE = 'diskann');
GO

Then query using the latest VECTOR_SEARCH syntax:

-- Approximate nearest neighbor search (latest syntax)
DECLARE @qv VECTOR(1536) = AI_GENERATE_EMBEDDINGS(
    N'What are the key benefits of chunking?'
    USE MODEL MyEmbeddingModel
);
 
SELECT TOP (5) WITH APPROXIMATE
    t.chunk_id,
    t.chunk_text,
    t.chunk_method,
    s.distance
FROM VECTOR_SEARCH(
    TABLE      = DocumentChunks AS t,
    COLUMN     = embedding,
    SIMILAR_TO = @qv,
    METRIC     = 'cosine'
) AS s
ORDER BY s.distance;

⚠️ Syntax Note: The TOP_N parameter in VECTOR_SEARCH is deprecated. Latest vector indexes require SELECT TOP (N) WITH APPROXIMATE syntax. Using TOP_N with latest indexes returns error Msg 42274.

💡 DiskANN: Developed by Microsoft Research, DiskANN is a graph-based approximate nearest neighbor algorithm optimized for SSD storage. It is designed to handle large-scale vector datasets efficiently with minimal memory and high throughput.

Putting It All Together: End-to-End Example

Here’s a complete, copy-paste-ready pipeline from raw document to RAG retrieval. This example uses exact search (VECTOR_DISTANCE) which works with any number of rows:

-- ========================================
-- STEP 1: Insert a source document
-- ========================================
INSERT INTO Documents (title, full_text)
VALUES (
    'Chunking Best Practices',
    'Chunking is the process of breaking long documents into smaller
     text fragments for use in RAG pipelines. The goal is to create
     pieces that are small enough to be topically focused but large
     enough to preserve meaningful context. Fixed-size chunking is
     the simplest approach, splitting text at uniform character
     intervals. Adding overlap between chunks helps preserve context
     at boundaries. A 15% overlap is a practical default. The VECTOR
     data type in Azure SQL Database stores embeddings natively,
     eliminating the need for external vector databases. DiskANN
     indexes enable fast approximate search at scale.'
);
 
-- ========================================
-- STEP 2: Chunk with overlap + embed + store
-- ========================================
INSERT INTO DocumentChunks
    (doc_id, chunk_text, chunk_order, chunk_offset,
     chunk_length, chunk_method, embedding)
SELECT
    d.doc_id,
    c.chunk,
    c.chunk_order,
    c.chunk_offset,
    c.chunk_length,
    'fixed_500_overlap15',
    AI_GENERATE_EMBEDDINGS(c.chunk USE MODEL MyEmbeddingModel)
FROM Documents d
CROSS APPLY AI_GENERATE_CHUNKS (
    SOURCE     = d.full_text,
    CHUNK_TYPE = FIXED,
    CHUNK_SIZE = 500,
    OVERLAP    = 15
) AS c
WHERE d.title = 'Chunking Best Practices';
 
-- ========================================
-- STEP 3: Search using exact kNN
-- (works with any number of rows)
-- ========================================
DECLARE @question NVARCHAR(MAX) = N'How do I preserve context
    at chunk boundaries?';
 
DECLARE @qv VECTOR(1536) = AI_GENERATE_EMBEDDINGS(
    @question USE MODEL MyEmbeddingModel
);
 
SELECT TOP 3
    dc.chunk_text,
    VECTOR_DISTANCE('cosine', @qv, dc.embedding) AS distance
FROM DocumentChunks dc
ORDER BY distance ASC;

💡 Scaling Up: Once your DocumentChunks table contains at least 100 rows with non-NULL embeddings, you can add a DiskANN vector index and switch to approximate search (SELECT TOP (N) WITH APPROXIMATE ... FROM VECTOR_SEARCH(...)) for significantly faster retrieval at scale.

References

Official Microsoft Documentation:

AI_GENERATE_CHUNKS (T-SQL) — https://learn.microsoft.com/en-us/sql/t-sql/functions/ai-generate-chunks-transact-sql?view=sql-server-ver17
AI_GENERATE_EMBEDDINGS (T-SQL) — https://learn.microsoft.com/en-us/sql/t-sql/functions/ai-generate-embeddings-transact-sql?view=sql-server-ver17
CREATE EXTERNAL MODEL (T-SQL) — https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-model-transact-sql?view=sql-server-ver17
Vector Search & Vector Index — https://learn.microsoft.com/en-us/sql/sql-server/ai/vectors?view=sql-server-ver17
VECTOR_DISTANCE (T-SQL) — https://learn.microsoft.com/en-us/sql/t-sql/functions/vector-distance-transact-sql?view=sql-server-ver17
VECTOR_SEARCH (T-SQL) — https://learn.microsoft.com/en-us/sql/t-sql/functions/vector-search-transact-sql?view=sql-server-ver17
CREATE VECTOR INDEX (T-SQL) — https://learn.microsoft.com/en-us/sql/t-sql/statements/create-vector-index-transact-sql?view=sql-server-ver17

Sample Repositories:

Azure SQL Vector Search Samples — https://github.com/Azure-Samples/azure-sql-db-vector-search
DiskANN Vector Index Samples — https://github.com/Azure-Samples/azure-sql-diskann

Martín Masella 5d

¡Great article! I deployed an architecture 100% private and local using Microsoft on every single artifact (almost): -SQL Server 2025 for vector store and chunking -NET Core for the Web App -LM Studio (this doesn't belong to Microsoft but it works on Windows) as en OpenAI compatible LLM Endpoint. https://www.garudax.id/posts/martinmasella_es-posible-montar-una-infraestructura-de-ugcPost-7454625625362976768-lF5Y?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAogVmkBmBKk_0_laPQLD4gZpq2MRQFxiCE

To view or add a comment, sign in

RAG Chunking with Azure SQL Database

Dhruv Malik

Introduction

What You Get Out of the Box

Prerequisites

Technique 1: Fixed-Size Chunking

Understanding AI_GENERATE_CHUNKS

Example: View Chunks Before Storing

Full Pipeline: Chunk + Embed + Store

Choosing a Chunk Size

Technique 2: Fixed-Size with Overlap

Recommended by LinkedIn

How Overlap Works

Implementation

Tracking Chunks Across Multiple Documents

Searching Your Chunks

Exact Search (kNN) with VECTOR_DISTANCE

Approximate Search (ANN) with DiskANN

Putting It All Together: End-to-End Example

References

More articles by Dhruv Malik

Others also viewed

Why SQL Joins Don't Work at Scale

The Modern Database Stack: How Companies Are Moving Beyond SQL

SQL: More Than Databases

SQL vs. No SQL

FHIR 2 SQL: Open Source Code For Ingesting FHIR JSON Files Into a SQL Database.

Synapse Serverless SQL and file types – the ultimate guide!

No SQL? No Way!

Harnessing Spark SQL with Microsoft Fabric: A Hands-On Guide to Powerful Data Insights

Empowering SQL with Transparent & Performant UDFs on Databricks

How to Improve RAG Retrieval Methods

How to Use RAG Architecture for Better Information Retrieval

New Approaches to RAG Models

How to Build Intelligent Rag Systems

How to Improve Retrieval-Augmented Generation Architectures

How to Improve AI Using Rag Techniques

Explore content categories

Introduction

What You Get Out of the Box

Prerequisites

Technique 1: Fixed-Size Chunking

Understanding AI_GENERATE_CHUNKS

Example: View Chunks Before Storing

Full Pipeline: Chunk + Embed + Store

Choosing a Chunk Size

Technique 2: Fixed-Size with Overlap

Recommended by LinkedIn

How Overlap Works

Implementation

Tracking Chunks Across Multiple Documents

Searching Your Chunks

Exact Search (kNN) with VECTOR_DISTANCE

Approximate Search (ANN) with DiskANN

Putting It All Together: End-to-End Example

References

More articles by Dhruv Malik

Chunking in RAG: The Make-or-Break Decision Nobody Talks About Enough

Others also viewed

Why SQL Joins Don't Work at Scale

The Modern Database Stack: How Companies Are Moving Beyond SQL

SQL: More Than Databases

SQL vs. No SQL

FHIR 2 SQL: Open Source Code For Ingesting FHIR JSON Files Into a SQL Database.

Synapse Serverless SQL and file types – the ultimate guide!

No SQL? No Way!

Harnessing Spark SQL with Microsoft Fabric: A Hands-On Guide to Powerful Data Insights

Empowering SQL with Transparent & Performant UDFs on Databricks

Similar topics

How to Improve RAG Retrieval Methods

How to Use RAG Architecture for Better Information Retrieval

New Approaches to RAG Models

How to Build Intelligent Rag Systems

How to Improve Retrieval-Augmented Generation Architectures

How to Improve AI Using Rag Techniques

Explore content categories