Catboost offers a multitude of evaluation metrics. You can read all about them here, but often you want to use a custom evaluation metric.
For example in this ongoing Kaggle competition, the evaluation metric is Balanced Log Loss. Such a metric is not supported by catboost. By this I mean that you can’t simply write this and expect it to work.
from catboost import CatBoostClassifier
model = CatBoostClassifier(eval_metric="BalancedLogLoss")
model.fit(X,y)
This will give you an error. What you need to define is a custom eval metric class. The template for which is pretty simple.
class UserDefinedMetric(object):
def is_max_optimal(self):
# Returns whether great values of metric are better
pass
def evaluate(self, approxes, target, weight):
# approxes is a list of indexed containers
# (containers with only __len__ and __getitem__ defined),
# one container per approx dimension.
# Each container contains floats.
# weight is a one dimensional indexed container.
# target is a one dimensional indexed container.
# weight parameter can be None.
# Returns pair (error, weights sum)
pass
def get_final_error(self, error, weight):
# Returns final value of metric based on error and weight
pass
Here there are three parts to the class.
- get_final_error – Here you can just return the error, or if you want to modify the error like take the log or square root, you can do so.
- is_max_optimal – Here you return True if greater is better like accuracy etc, otherwise return False.
- evaluate – Here lies the meat of your code where you’ll actually write what metric you want. Remember that the approxes are the predictions and you need to take
approxes[0]
as the output.
Below you will find the code for Balanced Log Loss as an eval metric.
class BalancedLogLoss:
def get_final_error(self, error, weight):
return error
def is_max_optimal(self):
return False
def evaluate(self, approxes, target, weight):
y_true = target.astype(int)
y_pred = approxes[0].astype(float)
y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)
individual_loss = -(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
class_weights = np.where(y_true == 1, np.sum(y_true == 0) / np.sum(y_true == 1), np.sum(y_true == 1) / np.sum(y_true == 0))
weighted_loss = individual_loss * class_weights
balanced_logloss = np.mean(weighted_loss)
return balanced_logloss, 0.0
Then you can simply call this in your grid search or randomised search like this –model = CatBoostClassifier(verbose = False,eval_metric=BalancedLogLoss())
Write in the comments below if you’ve any questions related to custom eval metrics in Catboost or any ML framework.
Leave a comment