Naive Bayes Classifier explained




What is the Naive Bayes Classifier?
Naive Bayes is based on the bayes theorem with independent assumptions between the predictors. It is easy to build with no complicated iterative parameter estimation which makes it particularly useful for large datasets.

To understand the naive Bayes classifier we need to understand the Bayes theorem. So let’s first discuss the Bayes Theorem.

What is the Bayes Theorem?
Bayes theorem works on conditional probability. It states that an event 'a' will happen, given that an event 'b' has already occured.
Using conditional probability we can calculate the probability of an event using its prior knowledge.

where, 
P(A) : this is the prior probability. It describes the probability of our hypothesis A being true.
P(B) : probability of the evidence, regardless of the hypothesis
P(B/A) : probability of the event given that the hypothesis is true.
P(A/B) : probability of the hypothesis given that the event has occured.

Lets look at an example:
Question: 
1% of the people have a certain genetic defect. 
90% of the tests for the genes detect the defect, i.e. true positives
9.6% of the test are false positives. 
If a person gets a positive result, then what are the odds that they actually have the genetic defect?

Solution:
Let 'A' be the chance of having the defect and 'B' be the positive test result.

Given that:
P(A)     = 0.01, i.e. probability of having faulty genes.
P(B/A) =  0.9, probability of getting a positive result, given that defect is present. 
P(B/~A) = 0.096, i.e. probability positive results, given that the person doesn't have the defect.

Now, P(~A)  = 0.99, i.e. probability of not having faulty genes.

P(A/B) = [P(B/A) * P(A)] / P(B)
           = [P(B/A) * P(A)] / [(P(B/A) * P(A)) + (P(B/~A) * P(~A))]
           = [0.9 * 0.01] / [(0.9 * 0.01) + (0.096 * 0.99)]
           = 0.0865 
           = 8.65%
i.e. if a person gets positive results, then the odds of actually having the genetic disorder is 8.65%.


Now, lets work on an actual dataset and see how the classifier works:
So,here we need to predict whether play is yes/no from the given predictors.

Step 1:
At first we have to calculate the probabilities of each items in the column 'Play?'
i.e. P(Play = Yes) = 9/14
      P(Play = No ) = 5/14

Step 2:
Now, calculate the probabilities of each items in the predictors with respect to the items in the target/dependent variable:

Step 3:
Now, lets calculate the probabilities of Play = Yes and Play = No for the given condition: 
[Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = Strong]

P(X/Play=Yes)P(Play=Yes) = (2/9) * (3/9) * (3/9) * (3/9) * (9/14) = 0.0053
P(X/Play=No)P(Play=No)  = (3/5) * (1/5) * (4/5) * (3/5) * (5/14) = 0.0206

Step 4:
Now divide both the probabilities with the probability of the evidence P(X):
P(X) = P(Outlook=Sunny) * P(Temperature=Cool) * P(Humidity=High) * P(Wind=Strong)
       = (5/14) * (4/14) * (7/14) * (6/14)
       = 0.02186

So, P(Play=Yes/X) = 0.0053 / 0.02186 = 0.2424
      P(Play=No/X)  = 0.0206 / 0.02186 = 0.9421

Since, the probability of P(Play=No/X) > P(Play=Yes/X), so the classifier decides that play at these conditions is not possible.

Advantages and Disadvantages of Naive Bayes classifier

Advantages:
  • Naive Bayes Algorithm is a fast, highly scalable algorithm.
  • Naive Bayes can be use for Binary and Multiclass classification. It provides different types of Naive Bayes Algorithms like GaussianNB, MultinomialNB, BernoulliNB.
  • It is a simple algorithm that depends on doing a bunch of counts.
  • Great choice for Text Classification problems. It’s a popular choice for spam email classification.
  • It can be easily train on small dataset.

Disadvantages:

  • cannot handle complicated relationships between predictors. It can learn individual features importance but can’t determine the relationship among features.


Comments

Popular posts from this blog

A/B Testing: Bernoulli Model VS the Beta-Binomial Hierarchical Model

Exploratory Data Analysis and Hypothesis Testing - Loan Prediction

Recurrent Neural Networks and LSTM explained