Category: ML

Macro vs micro averages, how are they calculated ?
In this post, I’ll go over macro and micro averages, namely precision, and recall.

What is macro and micro averages ?

A macro takes the measurement independently of each class and then takes the average, thus giving equal weight to each class whereas a micro will take the class imbalances into account when computing the average.

When to use macro vs micro averages ?

If you suspect class imbalances to be there, then micro average should be preferred to macro.

How are they different ?

Let’s take an example scenario from here.
```
from sklearn.metrics import precision_score
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
precision_score(y_true, y_pred, average='macro')
0.22...
precision_score(y_true, y_pred, average='micro')
0.33...
```
You can see that the precision score is different for macro calculation vs micro calculation.

Breaking down the calculation here in the confusion matrix

A quick recap the the precision formula is for binary classification problem –

$Precision = \frac{TP}{TP+FP}$

For multi-class the micro and macro formula can be written as – $Precision_{micro} = \frac{\sum TP_{i} }{\sum TP_{i}+\sum FP_{i}}$

$Precision_{macro} = \frac{\sum PR_{i} }{n}$

So in the above example, the micro precision is
Similarly the Precision for each class individually is

P(0) = 2/3 = 0.66, P(1) = 0, P(2) = 0

So macro precision is
In this way the micro vs macro averages differ. Hope this article cleared your problems on macro vs micro averages in ML metrics.
January 11, 2023
When to use F2 or F0.5 score ? (F-beta score)
Whenever we come across an imbalanced class problem, the metric to measure is often F1 score and not accuracy. A quick reminder that the F1 score is the harmonic mean of precision and recall.
Precision is how accurate is your ML model in its predictions.
Recall is a measure of the model’s ability to correctly identify the positive class.
So the F1 score is a balanced measure of both recall and precision. But what if you want to prioritize reducing false positives or reducing false negatives, there comes F-beta. It’s a generalized metric, where a parameter beta is introduced to generalize the F-score.
This enables one to choose an appropriate beta value to tune for the task at hand. If you want to minimize false positives, you want to increase the weight of precisions, so you should choose a value of beta less than 1, typically 0.5 is chosen and is called F0.5 score.

Similarly, if you want to increase the importance of recall and reduce false negatives, you should choose a value of beta greater than 1, typically 2 is selected and is called F2 score.

In a nutshell, you should optimize F2 score to reduce false negatives and F0.5 score to reduce false positives.
January 9, 2023
Oddball Data Science Interview Questions on Decision Trees

Decision trees are the building blocks of most ML models. So often questions regarding Decision Trees are asked in data science interviews. In this post, I’ll try to cover some questions which are asked during data science interviews but often catch people by surprise.

Are Decision Trees Parametric or non-parametric models ?

Decision Trees are non-parametric models. Linear Regression and Logistic regression are examples of parametric models

Why is Gini Index preferred way of growing decision trees than Entropy in Machine Learning libraries ?

The calculation for Gini Index is computationally more effecient than that for Entropy.
$https://latex.codecogs.com/svg.image?1 - \sum_{i=0}^{C}p_{i}^{2}$

It’s because of this reason that it is the preferred way

How are continouis variables handled as predictor variables in decision trees ?

Continuous or numerical variables are binned and then used for splitting a node in Decision tree

What is optimised in case the target is a continuous variable or when the task is Regression ?

Variance reduction is used to choose the best split when the target is continuous.

How do decision trees handle multiple classes or in other words does multi-class classification ?

The split is done on Information gain like in case of binary classifier using Gini or Entropy. In the leaf where no further splits are possible, the class having the highest probability is the predicted class. You can even return the probability as well.

January 7, 2023
What is Heteroscedasticity and How do we Test for it ?
Once your linear regression model is trained, you should always plot your residuals (y – ŷ) whether the errors are homoscedastic or heteroscedastic. What do we mean by these terms? It means that there should not be any pattern in residuals and they should be uniformly distributed, or in other words, there should not be any variance in the residuals. Homoscedasticity is one of the assumptions of linear regression, so it is often important to check for it.

source: Wikipedia

source: Wikipedia

In the above figures, you can clearly see that the residuals have a clear pattern in the heteroscedastic image. In that scenario, you cannot rely on the regression analysis.

How to test for heteroscedasticity?

There are many ways to test for heteroscedasticity, I’ll list a. few ways here –
1. Visual Test – Just look at the residual plot and you’ll often see whether the residuals have any variance or not, not very accurate but often works.
2. Bartlett test
3. Breusch Pagan test
4. Goldfeld Quandt test
5. Glesjer test
6. Test based on Spearman’s rank correlation coefficient
7. White test
8. Ramsey test
9. Harvey Phillips test
10. Szroeter test
11. Peak test (nonparametric) test
All these tests in one way or another try to reject the null hypothesis H₀ : variance is constant and the alternative hypothesis is that H_a : variance is not constant. You can go into detail about the tests here.
January 6, 2023
Importance of VIF in Linear Regression
What is VIF

Variance Inflation Factor (VIF) determines the multicollinearity amongst the independent variables (predictors). Multicollinearity is when there is a high correlation between your predictor variables, usually 0.8 or higher. This can adversely affect your regression analysis.

How is it calculated?

VIF of a predictor variable is calculated by regressing it against all other predictor variables. This gives the R² value which can be plugged into this formula
This will give the VIF value of a predictor.
- VIF = 1, not correlated
- VIF < 5, slightly correlated
- VIF > 5, highly correlated
These values are just guidelines and how high acceptable VIF values are depends on the problem statement.

If you don’t want to use VIF and have very few predictor variables, one can plot a correlation matrix and remove the highly correlated variables.

You might also wonder why do we calculate the p-value of predictor variables in Linear regression. Find out why here.
January 5, 2023
Null Hypothesis of Linear Regression Explained

Ever wondered why we look for p-value less than 0.05 for the coefficients when looking at the linear regression results.

Let’s quickly recap the basics of linear regression. In Linear Regression we try to estimate a best fit line for given data points. In case we have only one predictor variable and a target the linear equation will look something like

Y = A + Bx

Here A being the intercept and B being the slope or coefficient.

The null hypothesis for linear regression is that B=0 and the alternate hypothesis is that B != 0.

This is the reason why we look for p-value < 0.05 to reject the null hypothesis and establish that there exists a relationship between the target and the predictor variable.

January 5, 2023