Data Science Interview: Communication is more than just talking
Most data science interviews check on a few key parameters. Obviously your technical depth. Depending on the role in question companies also look for business acumen and maybe some experience in building good software. Additionally, one extra layer everyone wants to see is good communication skills. But what does good communication mean in data science interviews?
On a day to day job, communication means talking with member in and out of your team. The topics of such conversations vary. You might be seeking help, giving help, explaining your approach, trying to convince external stakeholders, or talking to clients if the situation so requires.
This is not exactly what happens in an interview setting where you are typically talking to just one or two people. As a result, often candidates go into the ramble zone. They talk in a disorganized, unstructured manner about whatever is on their mind. Of course it is ok to think out aloud at certain points in the interview, but if that becomes your primary mode of communication, it sets you up for failure. What most companies are looking for is structured communication.
What does that mean?
Structured communication implies a few key differentiators:
1. answer are brief and to the point, and goes into details when asked.
2. there is a clear roadmap to the conversation, and the discussion follows and systematic, logical approach
Lets see this with a couple of examples.
If you are asked what paramaters matter when tuning a random forest model, dont go on a random babble about irrelevant details. The first thing the interviewer is expecting to hear from you is the answer - and in this case it is quite clear what that answer should be. "Random forest has quite a few paramaters, but there are only 3 or 4 that really matter when it comes to tuning a model for performance. Firstly is a parameter often called 'mtry' which means the number of features to sample from the total list of features in the training data when deciding on a split to make in a tree. Second is the number of trees. Third, maximum depth of each tree. Lastly, the minimum number of samples needed in the leaf layer of the trees." You can go into all the explanation in the world, like how number of trees doesnt matter beyond a point, or how depth and leaf size essentially control overfitting, after you have given this clear cut answer.
If you dont know, just say "I dont know". It makes a much cleaner impression than random babbling which interviewers can catch anyway and know that you dont really know the answer.
Secondly - and this is what candidates get wrong even more - is maintaining a roadmap of the conversation. For example, lets say you are asked to describe a previous project. People often give such a rambling answer to this that by the end of 6 minutes of talking interviewers are not even sure about what the problem statement was, let alone the solution approach. You can make a huge difference in an interview by structuring your answer clearly. Here's a quick sample.
"In one of my previous projects I built a classifier to predict which customers will default on a loan, and by marking which loans are high risk we saved the bank 1.2M annually in write-off costs".
Right off the bat you have made a shining impression. You have said in ONE line what the problem statement was, what you did and what the result was. You have the interviewer's attention now, and he/she is now a little interested in the detail. So we go on:
"For this problem we had 2 years of past data available to us on bank customer ids, the date they filed a loan application and the amount and duration they asked for. Subsequently, we labeled whether the customer defaulted on the loan at the end of the stipulated duration. Of course there was a little bias in the data, since really high risk candidates never got the application accepted in the first place. We looked at various sources of features, such as credit scores, purchase history on debit and credit cards, and some legally allowed demographic information to construct a classifier. We tried a logistic regression as a benchmark, but we soon realized that there were non-linear relationships between the features and the target. We therefore tried gradient boosted trees and a simple feedforward neural net. Ultimately, even though the NN gave a little better performance, we chose to go with the GBM since it was easier to productionize and the contributing features were easier to explain. We deployed this solution on the client servers and it now produces predictions for over 1000 applications that come in daily."
This description clearly tells us the business context, the approach, and the result, while giving little glimpses of other things - attention to bias, legal restrictions, an eye for practical production environments. You might then go into a detailed discussion of how GBMs work, or what you might do differently. But this lays the foundation for a solid conversation where everyone has clearly understood the background and impact of your work.
This conversation style applies to so many places. Talking about an analysis? Start with a hypothesis. "Based on your description of the problem, I hypothesize X. Ideally we would want to test this hypothesis by running a randomized controlled trial. But since this situation needs time and investment, perhaps we can do a first round based on past data to see if there is any signal there, with the caveat that we should not draw any causal conclusions on such observational studies. Do we have data about XYZ in the past? Great. Then I would take the situation before event Y and after it. My plan is to compare the pre-vs-post, and if we see a jump that is statistically significant then ..." Once again, you make it crystal clear why you are going in the direction you are going, where are you going with this, and how this will eventually get you to a decision.
In the end, data scientists help non-technical people make better decisions by evincing evidence from data. Therefore, the more clearly you demonstrate systematic data driven thinking to arrive at a solution, the more you show that you are suitable for this role!