Data Scientist (Coding) Position Overview
The role of Data Scientist calls for a unique blend of skills. The job requires them to solve problems by extracting information from the available data, communicate the results and persuade others to apply that information while making important business decisions.
When it comes to hiring for the position of a Data Scientist, an ideal candidate is the one with an exceptional skill-set spanning across math/statistics, programming/databases, and business.
Quantitative analysis alone doesn’t suffice for the role of a Data Scientist, hence you must look beyond technical skills and probe into their logical reasoning agility and ability to solve problem iteratively. Domain experience, experience working in collaboration with business analysts and decision makers, and communication skills thus become equally important.
Here is a list of data scientist (coding) interview questions that will help you evaluate the candidates’ skills.
Computer-Science questions
- How will you ensure that the performance of a model you trained does not degrade over time?
- What steps will you take to test your code?
- What do you understand by version control? Which tools and processes are used for this?
- Explain software patterns. State the patterns you are most familiar with.
- How would you deploy a model that was training in an environment such as R? How familiar are you with PMML?
- Describe technical debt. How is it relevant to deploying data-driven models in the real world?
- Explain dynamic programming and recursion.
- Have you ever used any online platform for machine learning like PredictionIO or Azure ML?
- Do you contribute to any open source projects?
- Mention the programming languages and environments you are most comfortable working with.
- How will you train and deploy a logistic regression model?
- By what means will you sort a large list of numbers?
- Describe hashing and also give an example of when you should use it.
Job-specific questions
- Describe a recent analysis completed by you along with the strategies used and findings extracted. Also, explain how were those findings used by the business.
- Which data cleaning techniques have you used in the past?
- What are the benefits of test-driven software development?
- What do you know about the technologies from Hadoop stack like Hive, Pig, etc.?
- What will be your approach for building a search engine for an wide assortment of documents?