KNN (K-Nearest Neighbours) is a supervised learning algorithm which uses the nearest neighbours to classify a new data point.
The tricky part is selecting the optimal k for the model.
sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, *, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None)
As you can see the weights by default is uniform and the n_neighbours is by default 5. Large values of k smooth things, but a very small value of k will be unreliable and could be affected by outliers.
You can pick the optimal value of the k by tuning the hyperparameter using GridSearchCV.
Then there is the value of p, which is by 2, meaning that it uses the euclidean distance, you can set it to 1 to use Manhattan distance. This is the distance it uses to chose the nearest points for classification.
Let’s code this in python-
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV
X, y = load_iris()['data'], load_iris()['target']
#defining the search grid
param_grid = {'n_neighbors': np.arange(3,10,1),
'p': [1,2,3]}
grid_search = GridSearchCV(estimator=KNeighborsClassifier(), param_grid=param_grid, scoring='accuracy', cv = 3)
grid_search.fit(X,y)
print(grid_search.best_params_)
>>> {'n_neighbors': 4, 'p': 2}
print(grid_search.best_score_)
>>>0.9866666666666667
Hope this post cleared how you can use KNN in your machine learning problems, and if you want me to write about any ML topic, just drop a comment below.