What exactly do you need to know to become a Data Science Machine Learning specialist?

Data Science Machine Learning

What exactly do you need to know to become a specialist in Data Science Machine Learning?

Today, we’re going to give you 100 of the most common questions asked during job interviews at an IT company.

Do you know the answers to these questions?

Check yourself!

If not, the educational program “Computer Science. Artificial Intelligence and Project Management” will help you find the answers!

Questions on mathematical statistics

  1. What is a normal distribution?
  2. The average project grade in a group of 10 students was 7, and the median was 8. How did this happen? Which one should be trusted more?
  3. What is the probability of a patient’s infection if he/she tests positive and the probability of the disease in his/her country is 0.1%?
  4. What is the Central Limit Theorem and what is its practical meaning?
  5. What are some examples of a data set with a non-Gaussian distribution? What is the method of maximizing similarity?
  6. You are running for office, and out of a sample of 100 voters, 60 will vote for you.
  7. How can you estimate the statistical significance of the analysis?
  8. How many paths can a mouse take to get to the cheese if it moves only along the lines of the cage?
  9. What is the difference between linear and logistic regression?
  10. Give three examples of long-tailed distributions. Why are they important in classification and regression problems?
  11. What is the essence of the law of large numbers?
  12. What does the p-value show (significant probability)?
  13. What is the binomial probability formula?
  14. A Geiger counter records 100 radioactive decays in 5 minutes. Find the approximate 95% interval for the number of decays per hour.
  15. How do you calculate the appropriate sample size?
  16. When would you use MSE and MAE?
  17. When does the median better describe the data than the arithmetic mean?
  18. What is the difference between mode, median, and mathematical expectation?

Questions on SQL

  1. What is the difference between MySQL and SQL Server?
  2. What does UNION do?
  3. What is the difference between UNION and UNION ALL?
  4. How to optimize SQL queries?
  5. Display a list of employees with a salary higher than the manager’s.
  6. What window functions are there?
  7. Find the list of department IDs with the maximum total salary of employees.
  8. What is the difference between CHAR and VARCHAR?
  9. Select the highest salary not equal to the maximum salary from the table.
  10. What is the difference between SQL and NoSQL?
  11. What is the difference between DELETE and TRUNCATE?
  12. Number the rows in the employee table.
  13. Number the rows in the table by the payroll department.
  14. What are the levels of transaction isolation?

Questions on Python

  1. What are the differences between Series and DataFrame in Pandas?
  2. Write a function that determines the number of steps to convert one word to another.
  3. What are the advantages of NumPy arrays compared to (nested) python lists?
  4. What is the difference between map, apply, and applymap in Pandas?
  5. The easiest way to implement a moving average is with NumPy.
  6. Does Python support regular expressions?
  7. Continue: “try, except, …”.
  8. How to build a simple logistic regression model in Python?
  9. How to select rows from a DataFrame based on column values?
  10. How to find out the data type of elements from a NumPy array?
  11. What is the difference between loc and iloc in Pandas?
  12. Write a code that builds all N-grams based on a sentence.
  13. What are the possible ways to load an array from a text data file in Python?
  14. What is the difference between a multithreaded and multiprocessor application?
  15. How can you use groupby+transform?
  16. Write the final values of A0, …, A7
  17. What is the difference between mean() and average() in NumPy?
  18. Give an example of using filter and reduce on an iterated object.
  19. How to combine two NumPy arrays?
  20. Write a one-line program that counts the number of capital letters in a file.
  21. How would you clean up a dataset using Pandas?
  22. array and ndarray – what are the differences?
  23. Calculate the minimum element in each row of a 2D array.
  24. How to check if a data set or time series is random?
  25. What is the difference between pivot and pivot_table?
  26. Implement the k-average method using SciPy.
  27. What are the options for iterating through the lines of the DataFrame object?
  28. What is a decorator? How to write my own?

Questions on Data Science

  1. What is sampling? How many sampling methods do you know?
  2. How does correlation differ from covariance?
  3. What is cross-validation? What problems does it solve?
  4. What is an error matrix? Why is it needed?
  5. How does the Box-Cox transformation improve the quality of the model?
  6. What methods can be used to fill in missing data and what are the consequences of not filling in the data?
  7. What is a ROC curve? What is an AUC?
  8. What are recall and precision?
  9. How would you deal with different forms of seasonality when modeling time series?
  10. What mistakes can you make when sampling?
  11. What is RCA (root cause analysis)? How do you distinguish between cause and correlation?
  12. What are outliers and internal errors? Explain how to detect them and what would you do if you found them in a data set?
  13. What is A/B testing?
  14. In what situations is the general linear model unsuccessful?
  15. Is substituting averages for outliers acceptable? Why?
  16. You have data on the duration of calls.Develop a plan to analyze this data.What might the distribution of this data look like? How could you check if your expectations are being met?

Questions on Machine Learning

  1. What is TF/IDF vectorization?
  2. What is overfitting and how can it be avoided?
  3. You have been given a dataset of tweets and the task is to predict their tone (positive or negative). How would you perform the preprocessing?
  4. Tell us about SVM.
  5. When would you rather use SVMs than Random Forests (and vice versa)?
  6. What are the consequences of setting the wrong learning rate?
  7. Explain the difference between epoch, batch, and iteration.
  8. Why is the nonlinear Softmax function often the last operation in a complex neural network?
  9. Explain and give examples of collaborative filtering, content filtering, and hybrid filtering.
  10. What is the difference between bagging and boosting for ensembles?
  11. How do you choose the number k for the k-Means Clustering algorithm without looking at the clusters?
  12. How could you most effectively present data with five dimensions?
  13. What are ensembles and how are they useful?
  14. Your computer has 5 GB of RAM, and you need to train a model on a 10 GB dataset.
  15. Do gradient descent methods always converge to the same point?
  16. What are recommender systems?
  17. Explain the bias-variance tradeoff and give examples of high and low bias algorithms.
  18. What is PCA and how can it help?
  19. Explain the difference between L1 and L2 regularization methods.
Share to: