Weight of Evidence Encoding

So today I was participating in this Kaggle competition and the data had too many categorical variables. One way to build a model with too many categorical variables is to use a model like Catboost and let it deal with encoding categorical variables. But I wanted to ensemble my results with an Xgboost model, so I had to encode them. Using the weight of evidence encoding, I got a solution which was a top 10 solution when submitted. I have made the notebook public, you can go here and see it.

So what is weight of evidence ?

To put it simply –

woe = ln(\frac{percnegatives}{percpositives}) = ln(\frac{\frac{neggroup}{totalneg}}{\frac{posgroup}{totalpos}})

I’ve gone through an example explaining the weight of evidence in the youtube video below.

Comments

Leave a comment