I write this after going through several rounds of interviews myself and thus would suggest, the aspiring Data Engineers, to have sufficient knowledge in following mentioned domains:
- Ranking/Value/Aggregate Window functions (For instance: when asked to identify duplicate user IDs, use an aggregate window function instead of GROUP BY following HAVING caluse. These Windows functions are best fitted in cases where you also want to retain the original rows in the output.)
- All types of SQL Joins (Once, an interviewer asked me to write the output for fuller outer join.)
- SQL Cursors and its application.
- Types of constraints in SQL (all 6 types).
- OLTP vs OLAP (I happened to draw the entire architecture for OLTP in of the interviews.)
- Funcationalities of Data Marts, Data Lakes, Data Bricks.
- Components of Big Data (MapReduce, HDFS, Primary node, Data node along with their functionalities.)
- Batch processing vs Stream processing (Chances are that the interview might ask which is best suited for xyz scenario.)
- Different Data Modeling Schemas in DWH.
- Database architecture approach vs Datapipeline architecture approach.
- Identify different services that are used from the point of data extraction till data visualisation (Most apt method to answer this scanario based question is to draw a flowchart with different broad sections of data sources, staging, modeling, and visualisation.)
- Characteristics of Big Data.
- Differences between structured, semi-structured and un-structured data and similarly question for relational and non-relational databases.
- AWS S3 bucketing strategies and Lambda architecture (AWS related question only because of my prior experience.)
- Data cleaning techniques (Fill/remove/substitute NaN values could be few tchniques to mention in an interview.)
- How to spot outliers within a dataset (Scatter and Box plotting are two ways to spot outliers amonsgt many other techniques.)
- OOP and its concept (objects, classes, inheritance, encapsulation, polymorphism, and etc.)
- Data structures in Python (built-in vs user defined).
- Software design patterns in Software Engineering (Creational, structural and behavioral).
- Decision making statements in any programming language.
- Write pseudocode to find even/odd number in an array.
- Approach to split palindome string in half.
I hope this thread was useful while preparing for your Data Engineering jobs. Happy learing!
Really very nice and helpful guidelines for acing interview.