Resolving MERGE Performance in Azure SQL Database
When merging large data-sets in Azure SQL Database its imperative to optimize our queries. Failure to do so will most likely result in 40552 : The session has been terminated because of excessive transaction log space usage.
Below are the steps to be followed:
- Create an index on the join columns in the source table that is unique and covering.
- Create a unique clustered index on the join columns in the target table.
- Parameterize all literal values in the ON clause and in the the WHEN clauses.
- Merge subsets of data from the source to the target table by using OFFSET and ROWS FETCH NEXT or by defining views on the source or target that return the filtered rows and reference the view as the source or target table. Furthermore the use of the WITH <common table expression> clause of the TOP clause to filter out rows from the source or target tables is not recommended because they can generate incorrect results.
Example script:
DECLARE @offset as bigint
DECLARE @offsetIndex as int = 0
DELCARE @batchSize as int = 10000
SELECT @offset = @offsetIndex * @batchSize
MERGE Customers AS Target
USING (SELECT [Name], [LastName], [Email] from CustomerInsertTable
ORDER BY [Email]
OFFSET @offset ROWS
FETCH NEXT @batchSize ROWS ONLY
) AS Source
ON (Target.[Email] = Source.[Email])
WHEN MATCHED THEN
UPDATE SET Target.[Name] = Source.[Name]
, Target.[LastName] = Source.[LastName]
WHEN NOT MATCHED BY TARGET THEN
INSERT ([Name], [LastName], [Email])
VALUES (
Source.[Name]
, Source.[LastName]
, Source.[Email]
);
**** Based upon the size of the source and target data-set we can chose nested loop or Mege join operator in the query ex: OPTION (LOOP JOIN) or OPTION (MERGE JOIN);
https://alexandrebrisebois.wordpress.com/2013/01/16/optimizing-merge-performance-in-windows-azure-sql-database/