Best Practices for Azure Sql data warehouse Data Load using polybase or single-client gated load methods
*Best practices and considerations when using PolyBase
Here are a few more things to consider when using PolyBase for SQL Data Warehouse loads:
1) A single PolyBase load operation provides best performance.
2) The load performance scales as you increase DWUs.
3) PolyBase automatically parallelizes the data load process, so you don’t need to explicitly break the input data into multiple sources and issue concurrent loads, unlike some traditional loading practices.
4)Multiple readers will not work against compressed text files (e.g. gzip). Only a single reader is used per compressed file since uncompressing the file in the buffer is single threaded. Alternatively, generate multiple compressed files. The number of files should be greater than or equal to the total number of readers.
5)Multiple readers will work against compressed columnar/block format files (e.g. ORC, RC) since individual blocks are compressed independently.
*Best practices and considerations for single-client gated load methods
Consider the following when using SSIS, BCP, or other Control-node and client-gated loading methods:
1) Issue multiple threads into different tables to improve throughput. SQL DW does not support loading multiple threads into the same table since it requires exclusive lock. (This only applies to non-PolyBase load method).
2) Include retry logic—very important for slower methods such as BCP, SSIS, and SQLBulkCopy.
3)For SSIS, consider increasing the client/connection timeout from the default 30 seconds to 300 seconds. For more information about moving data to Azure, see SSIS for Azure and Hybrid Data Movement.
4)Don’t specify the batch size with Control-node gated methods. The goal is to load all or nothing so that the retry logic will restart the load. If you designate a batch size and the load encounters failure (for example, network or database not available), you may need to add more logic to restart from the last successful commit.