Azure Data Factory (ADF)-Basic, Intermediate, SCD Implementation and Advanced levels Questions & Answers
Basic Level Questions
1. What is Azure Data Factory (ADF)?
2. What are the key components of Azure Data Factory?
3. What is a Linked Service in Azure Data Factory?
4. What types of Integration Runtimes are available in ADF?
5. How does ADF handle data movement?
Intermediate Level Questions
6. What is the difference between Pipeline and Activity in ADF?
7. What are the types of triggers in ADF?
8. How can you secure data in Azure Data Factory?
9. How to handle failures in ADF pipelines?
10. How does ADF manage data partitioning for large data sets?
Advanced Level Questions
11. What are the different debugging options in Azure Data Factory?
12. What is Mapping Data Flow in ADF?
13. How do you implement CI/CD (Continuous Integration/Deployment) in ADF?
14. How does ADF integrate with Databricks?
15. What are the limitations of Azure Data Factory?
16. How do you monitor pipeline runs in ADF?
17. What is the difference between Mapping Data Flow and Wrangling Data Flow?
Mapping Data Flow
Wrangling Data Flow
Based on Spark engine
Based on Power Query engine
Handles big data transformations
Designed for self-service data prep
Low-code
No-code interface
18. How can you parameterize a pipeline in ADF?
19. How do you handle incremental data loading in ADF?
20. What is a ForEach activity in ADF?
Scenario-Based Questions
21. How would you design a pipeline to load data from an SFTP to Azure SQL DB?
22. How would you handle schema drift in ADF?
23. How do you handle large file processing in ADF?
24. How would you implement error logging in ADF?
25. How would you move data from on-premises to Azure using ADF?
1. Optimize Data Movement
2. Optimize Data Flow Performance
3. Optimize Pipeline Performance
4. Optimize Linked Services and Datasets
5. Monitor and Debug Performance Issues
✅ Best Practices:
✔️ Minimize data movement across regions. ✔️ Keep activity and pipeline structures simple. ✔️ Limit data flow memory usage to avoid auto-scaling delays. ✔️ Test performance regularly using different configurations.
Recommended by LinkedIn
1. Data Movement Activities
Activities used to move data between different sources and destinations.
✅ Copy Data
✅ Data Flow
2. Data Transformation Activities
Activities used to clean, format, and transform data.
✅ Stored Procedure
✅ Lookup
✅ Script
✅ Data Flow
🔄 3. Control Flow Activities
Activities that control the flow of execution.
✅ ForEach
✅ Until
✅ If Condition
✅ Switch
📡 4. External Activities
Activities used to integrate with external systems or services.
✅ HDInsight
✅ Databricks Notebook
✅ Machine Learning
5. Azure-Specific Activities
Activities to integrate with other Azure services.
✅ Azure Function
✅ Web
✅ Azure Batch
📅 6. Scheduling and Monitoring Activities
Activities used to manage execution and monitor processes.
✅ Wait
✅ Set Variable
✅ Get Metadata
✅ Validation
7. Security and Notification Activities
Activities for handling security and notifications.
✅ Send Email
✅ Azure Key Vault
✅ Example Use Case: ETL Pipeline
How to Implement SCD in ADF:
SCD Type
Description
When to Use
Complexity
Type 1
Overwrite existing record
No need to track history
Low
Type 2
Insert new record and maintain history
Track full history of changes
Medium
Type 3
Keep previous and current value
Track only recent change
Low
. SCD Type 1 – Overwrite Existing Data
Steps:
Example Data Flow:
2. SCD Type 2 – Maintain History
🛠️ Steps:
Example Data Flow:
Thanks Krishna Kishore, nicely explained