k-Nearest Neighbors Algorithm Explained



What is k-Nearest Neighbors?
K-Nearest Neighbours is one of the most basic yet essential classification algorithms in Machine Learning. It belongs to the supervised learning domain and finds intense application in pattern recognition, data mining and intrusion detection.
KNN is a very simple algorithm that is based on similarity measures such as distance functions.
"k" refers to the number of nearest neighbors to the datapoint.
Algorithm:
A case is classified by a majority vote of its neighbors, with the case being assigned to the
class most common amongst its K nearest neighbors measured by a distance function.
If K = 1, then the case is simply assigned to the class of its nearest neighbor. 
It should also be noted that all three distance measures are only valid for continuous
variables. In the case of categorical variables, the Hamming distance must be used. 
If the groups are similar, then the distance between them will be 0, whereas 1 if not similar.
Example:
Lets consider the plot below, where we need to classify datapoints between 2 classes, i.e Class A and Class B. 
Now, lets find out whether the red star will be classified as Class A or Class B.
1. when 'k' = 3: when the number of nearest neighbors is 3, then the datapoint will be classified to the class with the maximum number of votes, in this case 2 votes. Now, since there are 2 purple and 1 yellow circles inside the 3 nearest neighbors radius, so red star will be classified as Class B.
2. when 'k' = 6: in this case, there are 4 yellow circles and only 2 purple circles so the red star will be classified as Class A.
Note: Sometimes considering an even number for 'k' can cause tie situations, as both the groups will have 50% chances. So, it is better to stick to odd numbers while considering 'k'.

Advantages:
1. robust to noisy data.
2. learns complex models easily.
Disadvantages:
1.need to determine value to 'k'.
2. not clear about which tpe of distance measure to use.
3. computational cost is high.

Comments

Popular posts from this blog

A/B Testing: Bernoulli Model VS the Beta-Binomial Hierarchical Model

Exploratory Data Analysis and Hypothesis Testing - Loan Prediction

Recurrent Neural Networks and LSTM explained