RAG Chunking with Azure SQL Database
Disclaimer: The views and opinions expressed in this article are my own and do not represent the official position of Microsoft Corporation. This is not official Microsoft documentation.
Introduction
Building a RAG (Retrieval Augmented Generation) pipeline doesn’t have to be complicated. Before reaching for Python libraries, LangChain, or external vector databases, consider that Azure SQL Database now ships with native T-SQL capabilities that let you orchestrate the entire chunk-embed-store-search pipeline directly from T-SQL, even though embedding generation still calls an external model endpoint (such as Azure OpenAI) under the hood.
This article focuses on the two simplest and most practical chunking techniques fixed-size chunking and fixed-size with overlap both achievable entirely in T-SQL using the built-in AI_GENERATE_CHUNKS function. These are the recommended starting points for any RAG project.
What You Get Out of the Box
Azure SQL Database (and SQL Server 2025) now includes:
No separate vector database needed. Your data, chunks, embeddings, and search are all orchestrated from one place.
Prerequisites
Before chunking and embedding, you need to configure your database and register an external Azure OpenAI model.
Step 0: Ensure Compatibility Level and Enable REST Endpoints
-- AI_GENERATE_CHUNKS requires compatibility level 170 or higher.
-- Check your current level:
SELECT compatibility_level FROM sys.databases
WHERE name = DB_NAME();
-- If below 170, update it:
ALTER DATABASE [YourDatabase] SET COMPATIBILITY_LEVEL = 170;
GO
-- AI_GENERATE_EMBEDDINGS requires external REST endpoint invocation.
-- Enable it:
EXECUTE sp_configure 'external rest endpoint enabled', 1;
RECONFIGURE WITH OVERRIDE;
GO
Step 1: Create Credential and External Model
-- Create a master key (if not already present)
IF NOT EXISTS (
SELECT * FROM sys.symmetric_keys
WHERE [name] = '##MS_DatabaseMasterKey##'
)
BEGIN
CREATE MASTER KEY ENCRYPTION BY PASSWORD = N'<YourStrongPassword>';
END
GO
-- Create database scoped credential for Azure OpenAI
CREATE DATABASE SCOPED CREDENTIAL [https://<your-resource>.openai.azure.com/]
WITH IDENTITY = 'HTTPEndpointHeaders',
SECRET = '{"api-key":"<YOUR_AZURE_OPENAI_KEY>"}';
GO
-- Register the embedding model
CREATE EXTERNAL MODEL MyEmbeddingModel
WITH (
LOCATION = 'https://<your-resource>.openai.azure.com/openai/
deployments/<your-deployment-name>/embeddings
?api-version=2024-06-01',
API_FORMAT = 'Azure OpenAI',
MODEL_TYPE = EMBEDDINGS,
MODEL = 'text-embedding-3-small',
CREDENTIAL = [https://<your-resource>.openai.azure.com/]
-- Optional: PARAMETERS = '{"dimensions":1536}'
);
GO
💡 Tip: The VECTOR(1536) dimension should match your model. text-embedding-3-small outputs 1536 dimensions by default, but you can request fewer (e.g., 768) via the dimensions parameter to reduce storage. Note: the deployment name in the URL may differ from the model name, use whatever you named your deployment in Azure AI Foundry.
Step 2: Create the Tables
-- Source documents
CREATE TABLE Documents (
doc_id INT IDENTITY(1,1) PRIMARY KEY,
title NVARCHAR(500),
full_text NVARCHAR(MAX),
source_url NVARCHAR(1000),
created_at DATETIME2 DEFAULT SYSUTCDATETIME()
);
-- Chunks with embeddings
CREATE TABLE DocumentChunks (
chunk_id INT IDENTITY(1,1) PRIMARY KEY,
doc_id INT FOREIGN KEY REFERENCES Documents(doc_id),
chunk_text NVARCHAR(MAX),
chunk_order BIGINT, -- matches AI_GENERATE_CHUNKS return type
chunk_offset BIGINT, -- matches AI_GENERATE_CHUNKS return type
chunk_length INT,
chunk_method NVARCHAR(50),
embedding VECTOR(1536), -- Native vector column
created_at DATETIME2 DEFAULT SYSUTCDATETIME()
);
Technique 1: Fixed-Size Chunking
The simplest possible approach: split text into uniform chunks of a specified character count using the built-in AI_GENERATE_CHUNKS() table-valued function. One line of T-SQL. No libraries, no scripts.
Understanding AI_GENERATE_CHUNKS
The function signature:
AI_GENERATE_CHUNKS (
SOURCE = <text_expression>,
CHUNK_TYPE = FIXED,
CHUNK_SIZE = <number_of_characters>
)
Example: View Chunks Before Storing
Preview how a document will be chunked:
-- Preview chunks for a specific document
SELECT
d.title,
c.chunk_order,
c.chunk_offset,
c.chunk_length,
LEFT(c.chunk, 80) + '...' AS chunk_preview
FROM Documents d
CROSS APPLY AI_GENERATE_CHUNKS (
SOURCE = d.full_text,
CHUNK_TYPE = FIXED,
CHUNK_SIZE = 500
) AS c
WHERE d.doc_id = 1
ORDER BY c.chunk_order;
Full Pipeline: Chunk + Embed + Store
The real power is combining everything in a single INSERT statement:
-- Chunk, embed, and store — all in one T-SQL statement
INSERT INTO DocumentChunks
(doc_id, chunk_text, chunk_order, chunk_offset, chunk_length,
chunk_method, embedding)
SELECT
d.doc_id,
c.chunk,
c.chunk_order,
c.chunk_offset,
c.chunk_length,
'fixed_500',
AI_GENERATE_EMBEDDINGS(c.chunk USE MODEL MyEmbeddingModel)
FROM Documents d
CROSS APPLY AI_GENERATE_CHUNKS (
SOURCE = d.full_text,
CHUNK_TYPE = FIXED,
CHUNK_SIZE = 500
) AS c;
🎯 Key Insight: This single statement reads your source documents, splits them into 500-character chunks, calls Azure OpenAI to generate a 1536-dimension embedding for each chunk, and stores everything in your chunks table. The workflow is orchestrated entirely from T-SQL, though the embedding step invokes your external Azure OpenAI endpoint behind the scenes.
Choosing a Chunk Size
The chunk size directly impacts retrieval quality:
✅ Pros: Zero external dependencies beyond the embedding model; single T-SQL statement; dead simple to understand, debug, and maintain; predictable chunk counts for cost estimation.
❌ Cons: Splits mid-word and mid-sentence; ignores document structure; chunks may break semantic meaning at boundaries.
Technique 2: Fixed-Size with Overlap
The overlap technique solves the biggest problem with basic fixed-size chunking: lost context at chunk boundaries. By repeating a percentage of the previous chunk at the start of the next one, you ensure that concepts split across a boundary still appear intact in at least one chunk.
Recommended by LinkedIn
How Overlap Works
The OVERLAP parameter in AI_GENERATE_CHUNKS specifies a percentage (0–50) of the chunk size to repeat. With CHUNK_SIZE = 500 and OVERLAP = 15, each chunk includes the last 75 characters of the previous chunk.
Visual example with CHUNK_SIZE = 10 and OVERLAP = 20 (2 chars):
Source: "The quick brown fox jumps over the lazy dog"
No overlap: With 20% overlap:
Chunk 1: "The quick " Chunk 1: "The quick "
Chunk 2: "brown fox " Chunk 2: "k brown fox" ← 'k ' repeated
Chunk 3: "jumps over" Chunk 3: "x jumps ove" ← 'x ' repeated
Chunk 4: " the lazy " Chunk 4: "ver the laz"
Chunk 5: "dog" Chunk 5: "azy dog"
Implementation
-- Fixed-size chunking with 15% overlap
INSERT INTO DocumentChunks
(doc_id, chunk_text, chunk_order, chunk_offset, chunk_length,
chunk_method, embedding)
SELECT
d.doc_id,
c.chunk,
c.chunk_order,
c.chunk_offset,
c.chunk_length,
'fixed_500_overlap15',
AI_GENERATE_EMBEDDINGS(c.chunk USE MODEL MyEmbeddingModel)
FROM Documents d
CROSS APPLY AI_GENERATE_CHUNKS (
SOURCE = d.full_text,
CHUNK_TYPE = FIXED,
CHUNK_SIZE = 500,
OVERLAP = 15 -- 15% of 500 = 75 chars of overlap
) AS c;
The only difference from Technique 1 is the single OVERLAP = 15 parameter. Everything else stays the same.
Tracking Chunks Across Multiple Documents
When processing multiple documents in one statement, use ENABLE_CHUNK_SET_ID = 1 to get a grouping column that distinguishes which chunks came from which row:
-- Chunk multiple documents with set tracking
SELECT
d.doc_id,
c.chunk,
c.chunk_order,
c.chunk_set_id -- Different ID per source document
FROM Documents d
CROSS APPLY AI_GENERATE_CHUNKS (
SOURCE = d.full_text,
CHUNK_TYPE = FIXED,
CHUNK_SIZE = 500,
OVERLAP = 15,
ENABLE_CHUNK_SET_ID = 1
) AS c
ORDER BY c.chunk_set_id, c.chunk_order;
✅ Pros: Preserves context at boundaries; still 100% native T-SQL; minimal added complexity over basic fixed-size; trivial to switch on (one parameter).
❌ Cons: Increases total chunk count by ~10–20%; still character-based (doesn’t respect paragraph/sentence boundaries); duplicated text in embeddings slightly increases storage cost.
Searching Your Chunks
Once chunks are embedded and stored, you need to retrieve the most relevant ones for a user query. Azure SQL provides two approaches.
Exact Search (kNN) with VECTOR_DISTANCE
For smaller datasets (the Microsoft documentation recommends exact search for fewer than ~50,000 vectors), exact search is fast and guarantees perfect recall:
-- Generate embedding for the user's question
DECLARE @query NVARCHAR(MAX) = N'What are the key benefits of chunking?';
DECLARE @qv VECTOR(1536) = AI_GENERATE_EMBEDDINGS(
@query USE MODEL MyEmbeddingModel
);
-- Find the 5 most relevant chunks
SELECT TOP 5
dc.chunk_id,
dc.chunk_text,
d.title AS source_doc,
dc.chunk_method,
VECTOR_DISTANCE('cosine', @qv, dc.embedding) AS distance
FROM DocumentChunks dc
JOIN Documents d ON dc.doc_id = d.doc_id
ORDER BY distance ASC;
Approximate Search (ANN) with DiskANN
For larger datasets, create a vector index and use VECTOR_SEARCH.
⚠️ Important: Vector indexes require at least 100 rows with non-NULL vector values before the index can be created. Attempting to create an index on fewer rows fails with error Msg 42266. For development with small datasets, use VECTOR_DISTANCE (exact kNN search) instead it works without an index.
-- Create DiskANN index (requires 100+ rows with non-NULL embeddings)
CREATE VECTOR INDEX IX_Embedding_DiskANN
ON DocumentChunks(embedding)
WITH (METRIC = 'cosine', TYPE = 'diskann');
GO
Then query using the latest VECTOR_SEARCH syntax:
-- Approximate nearest neighbor search (latest syntax)
DECLARE @qv VECTOR(1536) = AI_GENERATE_EMBEDDINGS(
N'What are the key benefits of chunking?'
USE MODEL MyEmbeddingModel
);
SELECT TOP (5) WITH APPROXIMATE
t.chunk_id,
t.chunk_text,
t.chunk_method,
s.distance
FROM VECTOR_SEARCH(
TABLE = DocumentChunks AS t,
COLUMN = embedding,
SIMILAR_TO = @qv,
METRIC = 'cosine'
) AS s
ORDER BY s.distance;
⚠️ Syntax Note: The TOP_N parameter in VECTOR_SEARCH is deprecated. Latest vector indexes require SELECT TOP (N) WITH APPROXIMATE syntax. Using TOP_N with latest indexes returns error Msg 42274.
💡 DiskANN: Developed by Microsoft Research, DiskANN is a graph-based approximate nearest neighbor algorithm optimized for SSD storage. It is designed to handle large-scale vector datasets efficiently with minimal memory and high throughput.
Putting It All Together: End-to-End Example
Here’s a complete, copy-paste-ready pipeline from raw document to RAG retrieval. This example uses exact search (VECTOR_DISTANCE) which works with any number of rows:
-- ========================================
-- STEP 1: Insert a source document
-- ========================================
INSERT INTO Documents (title, full_text)
VALUES (
'Chunking Best Practices',
'Chunking is the process of breaking long documents into smaller
text fragments for use in RAG pipelines. The goal is to create
pieces that are small enough to be topically focused but large
enough to preserve meaningful context. Fixed-size chunking is
the simplest approach, splitting text at uniform character
intervals. Adding overlap between chunks helps preserve context
at boundaries. A 15% overlap is a practical default. The VECTOR
data type in Azure SQL Database stores embeddings natively,
eliminating the need for external vector databases. DiskANN
indexes enable fast approximate search at scale.'
);
-- ========================================
-- STEP 2: Chunk with overlap + embed + store
-- ========================================
INSERT INTO DocumentChunks
(doc_id, chunk_text, chunk_order, chunk_offset,
chunk_length, chunk_method, embedding)
SELECT
d.doc_id,
c.chunk,
c.chunk_order,
c.chunk_offset,
c.chunk_length,
'fixed_500_overlap15',
AI_GENERATE_EMBEDDINGS(c.chunk USE MODEL MyEmbeddingModel)
FROM Documents d
CROSS APPLY AI_GENERATE_CHUNKS (
SOURCE = d.full_text,
CHUNK_TYPE = FIXED,
CHUNK_SIZE = 500,
OVERLAP = 15
) AS c
WHERE d.title = 'Chunking Best Practices';
-- ========================================
-- STEP 3: Search using exact kNN
-- (works with any number of rows)
-- ========================================
DECLARE @question NVARCHAR(MAX) = N'How do I preserve context
at chunk boundaries?';
DECLARE @qv VECTOR(1536) = AI_GENERATE_EMBEDDINGS(
@question USE MODEL MyEmbeddingModel
);
SELECT TOP 3
dc.chunk_text,
VECTOR_DISTANCE('cosine', @qv, dc.embedding) AS distance
FROM DocumentChunks dc
ORDER BY distance ASC;
💡 Scaling Up: Once your DocumentChunks table contains at least 100 rows with non-NULL embeddings, you can add a DiskANN vector index and switch to approximate search (SELECT TOP (N) WITH APPROXIMATE ... FROM VECTOR_SEARCH(...)) for significantly faster retrieval at scale.
References
Official Microsoft Documentation:
Sample Repositories:
¡Great article! I deployed an architecture 100% private and local using Microsoft on every single artifact (almost): -SQL Server 2025 for vector store and chunking -NET Core for the Web App -LM Studio (this doesn't belong to Microsoft but it works on Windows) as en OpenAI compatible LLM Endpoint. https://www.garudax.id/posts/martinmasella_es-posible-montar-una-infraestructura-de-ugcPost-7454625625362976768-lF5Y?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAogVmkBmBKk_0_laPQLD4gZpq2MRQFxiCE