Tag: KNN

  • K-Nearest Neighbour Algorithm Explained

    KNN (K-Nearest Neighbours) is a supervised learning algorithm which uses the nearest neighbours to classify a new data point.

    The tricky part is selecting the optimal k for the model.

    sklearn.neighbors.KNeighborsClassifier(n_neighbors=5*weights='uniform'algorithm='auto'leaf_size=30p=2metric='minkowski'metric_params=Nonen_jobs=None)

    As you can see the weights by default is uniform and the n_neighbours is by default 5. Large values of k smooth things, but a very small value of k will be unreliable and could be affected by outliers.

    You can pick the optimal value of the k by tuning the hyperparameter using GridSearchCV.

    Then there is the value of p, which is by 2, meaning that it uses the euclidean distance, you can set it to 1 to use Manhattan distance. This is the distance it uses to chose the nearest points for classification.

    Let’s code this in python-

    from sklearn.neighbors import KNeighborsClassifier
    from sklearn.datasets import load_iris
    from sklearn.model_selection import GridSearchCV
    
    X, y = load_iris()['data'], load_iris()['target']
    
    #defining the search grid
    param_grid = {'n_neighbors': np.arange(3,10,1),
                 'p': [1,2,3]}
    
    grid_search = GridSearchCV(estimator=KNeighborsClassifier(), param_grid=param_grid, scoring='accuracy', cv = 3)
    
    grid_search.fit(X,y)
    
    print(grid_search.best_params_)
    >>> {'n_neighbors': 4, 'p': 2}
    print(grid_search.best_score_)
    >>>0.9866666666666667
    

    Hope this post cleared how you can use KNN in your machine learning problems, and if you want me to write about any ML topic, just drop a comment below.