Resolving MERGE Performance in Azure SQL Database

When merging large data-sets in Azure SQL Database its imperative to optimize our queries. Failure to do so will most likely result in 40552 : The session has been terminated because of excessive transaction log space usage.

Below are the steps to be followed:

  • Create an index on the join columns in the source table that is unique and covering.
  • Create a unique clustered index on the join columns in the target table.
  • Parameterize all literal values in the ON clause and in the the WHEN clauses.
  • Merge subsets of data from the source to the target table by using OFFSET and ROWS FETCH NEXT or by defining views on the source or target that return the filtered rows and reference the view as the source or target table. Furthermore the use of the WITH <common table expression> clause of the TOP clause to filter out rows from the source or target tables is not recommended because they can generate incorrect results.

Example script:

DECLARE @offset as bigint

DECLARE @offsetIndex as int = 0

DELCARE @batchSize as int = 10000

 

SELECT @offset = @offsetIndex *  @batchSize

 

MERGE Customers AS Target

USING (SELECT [Name], [LastName], [Email] from CustomerInsertTable

       ORDER BY [Email]

       OFFSET  @offset ROWS

       FETCH NEXT @batchSize ROWS ONLY

) AS Source

ON (Target.[Email] = Source.[Email])

WHEN MATCHED THEN

    UPDATE SET Target.[Name] = Source.[Name]

             , Target.[LastName] = Source.[LastName]

WHEN NOT MATCHED BY TARGET THEN

    INSERT ([Name], [LastName], [Email])

    VALUES (

     Source.[Name]

     , Source.[LastName]

     , Source.[Email]

    );

**** Based upon the size of the source and target data-set we can chose nested loop or Mege join operator in the query ex: OPTION (LOOP JOIN) or OPTION (MERGE JOIN);

To view or add a comment, sign in

More articles by Rahul Kumar

  • Why Multidimensional Scaling Fails?

    Why MDS Fails in giving us Meaningful Embeddings: MDS basically arranges points in 2D or lower dimension based on high…

  • Ensemble Learning

    Ensemble learning is a machine learning paradigm where multiple learners are trained to solve the same problem. In…

  • Radial basis function network

    If the classes or pattern are linearly separable then single layer Perceptron is sufficient otherwise we need to…

    4 Comments
  • Linear Discriminant Analysis

    Lets start with Limitations of Logistic Regression Logistic regression is a simple and powerful linear classification…

  • ORC vs RC file format

    ORC offers a number of features not available in RC files: * Better encoding of data. Integer values are run length…

  • Partitioning clustered columnstore tables in Azure Sql Data-warehouse

    Partitioning can be used to improve performance some scenarios, creating a table with too many partitions can hurt…

  • Best Practices for Azure Sql data warehouse Data Load using polybase or single-client gated load methods

    *Best practices and considerations when using PolyBase Here are a few more things to consider when using PolyBase for…

  • NoSql Database Modelling Challenges

    *Modelling Challenge: Related data to embed or reference related data?? related data: embedding Considerations: 1)data…

Explore content categories