Tag: why r-squared is rss/tss

  • Understanding R-squared (R2) in Regression: A Comprehensive Explanation of Model Fit

    In the realm of regression analysis, one of the key metrics used to evaluate the goodness-of-fit of a model is the R-squared (R2) statistic. R-squared serves as a crucial tool for quantifying how well a regression model captures the variation in the dependent variable based on the independent variables. In this blog, we will delve into the concept of R-squared, its interpretation, calculation, and its strengths and limitations in assessing the performance of regression models.

    R^{2}=1- \frac{RSS}{TSS}

    But what do RSS and TSS mean?

    RSS is also called the residual sum of squares. It is calculated by the formula –

    RSS = \sum(y - \hat{y})^{2}

    So it is the sum of the squared difference between the predicted value and the actual value.

    Plotting this on the graph will look like this.

    Here we can see that the vertical lines are the residuals, and squaring and adding up these values will give us the RSS.

    Similarly, the TSS is given by the formula –

    TSS = \sum(y - \bar{y})^{2}

    Here we can see the error with respect to \bar{y}.

    But why is R^{2} = 1 -\frac{RSS}{TSS} ?

    The answer is very logical if you think. The simplest estimate of the predicted value is the mean. So if \hat{y} = \bar{y}, then RSS = TSS and your R-squared value becomes 0. On the other hand, if your regression line fits perfectly, i.e. \hat{y} = y, the RSS = 0, and R-squared becomes 1.

    So that’s why R-squared is a goodness of fit measurement, and its value is always between 0 and 1.