Decision trees are the building blocks of most ML models. So often questions regarding Decision Trees are asked in data science interviews. In this post, I’ll try to cover some questions which are asked during data science interviews but often catch people by surprise.
Are Decision Trees Parametric or non-parametric models ?
Decision Trees are non-parametric models. Linear Regression and Logistic regression are examples of parametric models
Why is Gini Index preferred way of growing decision trees than Entropy in Machine Learning libraries ?
The calculation for Gini Index is computationally more effecient than that for Entropy.
It’s because of this reason that it is the preferred way
How are continouis variables handled as predictor variables in decision trees ?
Continuous or numerical variables are binned and then used for splitting a node in Decision tree
What is optimised in case the target is a continuous variable or when the task is Regression ?
Variance reduction is used to choose the best split when the target is continuous.
How do decision trees handle multiple classes or in other words does multi-class classification ?
The split is done on Information gain like in case of binary classifier using Gini or Entropy. In the leaf where no further splits are possible, the class having the highest probability is the predicted class. You can even return the probability as well.
Leave a comment