Tag: Ml Metrics

  • Cohen’s Kappa and its use in ML

    Suppose you’re building a classification model on an imbalanced dataset and you want to have other measures for your model other than accuracy, F1-score, and ROC-AUC curve, what else can you measure to be confident in your results. The answer is Cohen’s kappa.

    Cohen’s Kappa is a statistical measure that quantifies the level of agreement between two annotators or, in the context of ML, the agreement between the model’s predictions and the true labels. It accounts for the possibility of agreement occurring by chance, providing a more nuanced evaluation than traditional accuracy metrics.

    The Formula:
    The formula for Cohen’s Kappa is –

    \kappa = \frac{p_{0} - p_{e}}{1-p_{e}}

    Where p_{0} is the observed agreement between the model’s predictions and true labels and p_{e} is the expected agreement by chance.

    Let’s take an example to understand this better. A binary classification scenario where you’re building a spam email classifier. The task is to distinguish between spam and non-spam (ham) emails. We’ll use a simple logistic regression model for this example.

    from sklearn.model_selection import train_test_split
    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.linear_model import LogisticRegression
    from sklearn.metrics import accuracy_score, confusion_matrix, cohen_kappa_score

    # Sample data for spam and non-spam emails
    data = [
    ("Get rich quick! Claim your prize now!", "spam"),
    ("Meeting at 3 pm in the conference room.", "ham"),
    ("Exclusive offer for you!", "spam"),
    ("Reminder: Project deadline tomorrow.", "ham"),
    # ... more data ...
    ]

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(
    [text for text, label in data],
    [label for text, label in data],
    test_size=0.2,
    random_state=42
    )

    # Vectorize the text data
    vectorizer = CountVectorizer()
    X_train_vec = vectorizer.fit_transform(X_train)
    X_test_vec = vectorizer.transform(X_test)

    # Train a logistic regression classifier
    classifier = LogisticRegression()
    classifier.fit(X_train_vec, y_train)

    # Make predictions on the test set
    y_pred = classifier.predict(X_test_vec)

    # Evaluate the model
    accuracy = accuracy_score(y_test, y_pred)
    conf_matrix = confusion_matrix(y_test, y_pred)
    kappa_score = cohen_kappa_score(y_test, y_pred)

    # Print the results
    print(f"Accuracy: {accuracy}")
    print(f"Confusion Matrix:\n{conf_matrix}")
    print(f"Cohen's Kappa: {kappa_score}")

    After this, you get a kappa of 1, which means that it’s an excellent model and there is no variability that can be attributed to chance. Be aware that this is an ideal scenario.

    Another scenario is that you get a score of 0, meaning that the model’s performance is no better than random chance, that is your features don’t capture any meaningful patterns in the data.

    In the context of model evaluation:

    Kappa scores closer to 1 indicate a high level of agreement and are generally considered desirable.

    Kappa scores around 0 or below suggest poor agreement, and the model’s predictions might not be reliable.

    It’s essential to interpret Cohen’s Kappa alongside other evaluation metrics, such as accuracy, precision, recall, and the confusion matrix, to comprehensively understand the model’s performance. Additionally, the interpretation of Kappa may vary depending on the specific problem and the level of difficulty in the classification task.

  • ML Metrics | Top N Accuracy Explained

    This metric is usually used in multiclass classification problems.
    Each multiclass model gives a probability score for all the classes it is being trained on, but often you take the highest one, by using np.argmax but what if you took the top n classes and gave credit to the model if it got right in one of the n predictions.
    That is what is top n accuracy, it gives the model more chances to be right.

    Lets take an example.

    Suppose you built a model that predicts 3 classes and you want to find the top 2 accuracy of your model.
    Then you would pass the prediction array to the model and the true values and if the correct prediction is in the top 2 then you give it credit for being right.

    import numpy as np
    from sklearn.metrics import top_k_accuracy_score
    y_true = [0,1,1,2,2]
    y_pred = [[0.25, 0.2,0.3], #Here 0 is in the top 2
              [0.3, 0.35, 0.5], #Here 1 is in the top 2
              [0.2,0.4, 0.45], #Here 1 is in the top 2
              [0.5, 0.1, 0.2], #Here 2 is in the top 2
              [0.1, 0.4, 0.2]] #Here 2 is in the top 2
    top_k_accuracy_score(y_true, y_pred, k=2)
    

    It is 1.0, because the correct class was always in our top 2 prediction, actually, if you notice then it was always the second prediction of our model, so if we take regular accuracy or set the value k = 1 in top_k_accuracy_score(y_true, y_pred, k=2), the answer is 0.

    Hopefully, this explains what top N accuracy is, and if you want me to cover any ML topic, write in the comments below. Thanks for reading.