LogLoss – ML EXPLAINED

Catboost offers a multitude of evaluation metrics. You can read all about them here, but often you want to use a custom evaluation metric.

For example in this ongoing Kaggle competition, the evaluation metric is Balanced Log Loss. Such a metric is not supported by catboost. By this I mean that you can’t simply write this and expect it to work.

from catboost import CatBoostClassifier
model = CatBoostClassifier(eval_metric="BalancedLogLoss")
model.fit(X,y)

This will give you an error. What you need to define is a custom eval metric class. The template for which is pretty simple.

class UserDefinedMetric(object):
    def is_max_optimal(self):
        # Returns whether great values of metric are better
        pass

    def evaluate(self, approxes, target, weight):
        # approxes is a list of indexed containers
        # (containers with only __len__ and __getitem__ defined),
        # one container per approx dimension.
        # Each container contains floats.
        # weight is a one dimensional indexed container.
        # target is a one dimensional indexed container.

        # weight parameter can be None.
        # Returns pair (error, weights sum)
        pass

    def get_final_error(self, error, weight):
        # Returns final value of metric based on error and weight
        pass

Here there are three parts to the class.

get_final_error – Here you can just return the error, or if you want to modify the error like take the log or square root, you can do so.
is_max_optimal – Here you return True if greater is better like accuracy etc, otherwise return False.
evaluate – Here lies the meat of your code where you’ll actually write what metric you want. Remember that the approxes are the predictions and you need to take approxes[0] as the output.

Below you will find the code for Balanced Log Loss as an eval metric.


class BalancedLogLoss:
    def get_final_error(self, error, weight):
        return error

    def is_max_optimal(self):
        return False

    def evaluate(self, approxes, target, weight):
        y_true = target.astype(int)
        y_pred = approxes[0].astype(float)
        
        y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)
        individual_loss = -(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
        
        class_weights = np.where(y_true == 1, np.sum(y_true == 0) / np.sum(y_true == 1), np.sum(y_true == 1) / np.sum(y_true == 0))
        weighted_loss = individual_loss * class_weights
        
        balanced_logloss = np.mean(weighted_loss)
        
        return balanced_logloss, 0.0

Then you can simply call this in your grid search or randomised search like this –model = CatBoostClassifier(verbose = False,eval_metric=BalancedLogLoss())

Write in the comments below if you’ve any questions related to custom eval metrics in Catboost or any ML framework.

Tag: LogLoss

Using Custom Eval Metric with Catboost