Artificial Neural Network Project 1– Churn Modelling

Aim:
The main purpose of this post is to show how KerasClassifier(), i.e. an Artificial Neural Network can be used to predict a target variable. And how we can tune the parameters of a Neural Network to obtain a better model.

Dataset: 
For this purpose, I have used the Churn Modelling dataset, in which the model has to predict whether the customer will leave the bank or not.

Initially, I will build 7 models with different network structures and select the best model for further tuning.
Common parameters for each model:
1.   Epochs       = 150
2.   Batch_size = 10
3.   Optimizer  =  adam
4.   Activation function  = relu
5.   Network Weight Initialization = uniform

The dataset was divided into 75% training set and 25% validation set.

Models used:
Model 1:  Sequencial Model: a sequential model is a linear stack of layers.
               Network Structure:  [6 --> 1]


Accuracy
Loss
Training
0.85
0.37
Validation
0.85
0.38


Model 2:  Base KerasClassifier: Network Structure:  [6 --> 1]


Accuracy
Loss
Training
0.86
0.35
Validation
0.85
0.35




Model 3:  KerasClassifier Small Model: Network Structure:  [3 --> 1]


Accuracy
Loss
Training
0.83
0.40
Validation
0.84
0.41




Model 4:  KerasClassifier Wide Model:  Network Structure:  [12 --> 1]

Accuracy
Loss
Training
0.86
0.35
Validation
0.85
0.36




Model 5:  KerasClassifier Large Model:  Network Structure:  [6 --> 3 --> 1]

Accuracy
Loss
Training
0.79
0.51
Validation
0.80
0.50





Model 6:  KerasClassifier Deep Model: Network Structure:  [6 --> 3 --> 2 --> 1]


Accuracy
Loss
Training
0.79
0.51
Loss
0.80
0.50





Model 7:  KerasClassifier Wide and Deep Model:
               Network Structure:  [12 --> 6 --> 3 --> 1]


Accuracy
Loss
Training
0.84
0.40
Validation
0.83
0.41





Best Models:
From the mean accuracy scores and model loss, 2 models have performed better than others. They are Model 2 and Model 4, i.e. [Base KerasClassifier and Wide Model].

Lets look at some prediction diagnostics of these two models.







Looking at the recall’s, Model 4 performs better in predicting the type 1 class, about 52% as compared to 38% in Model 2.
Also the overall precision, recall and f1-score are better for Model 4.

Lets tune Model 4 to obtain a better model:


Parameters
Best parameter
Best Score

Epochs and Batch_size

Epochs: [50, 100, 150]
Batch_size : [10, 20, 40, 60, 80, 100]


Epochs  =  50
Batch_size =  60
     
      85.93

Optimizer

['SGD','Adam','RMSprop','Adagrad', 'Adadelta', 'Adamax', 'Nadam']


Nadam
     
      85.74

LearningRate


[0.001, 0.01, 0.1, 0.2, 0.3]

0.01
      85.70

Network Weight Initialization

['uniform', 'lecun_uniform', 'normal', 'zero', 'glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform']

 
Glorot_normal

     
      85.96
Neuron Activation Function

['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']


Relu

      85.84
Dropout and weight_constant

weight_constraint = [1, 2, 3, 4, 5]
dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6]


Dropout = 0.6
Wt_constant= 4
      86.05

No. of Neurons

[1, 5, 10, 15, 20, 25, 30, 40]


40
     
85.70

Final Model:





Although the overall precision, recall and f1-score of the tuned model remained same as the base model, if you look at the model accuracy and model loss graph of the final model, we notice that the model performs better in predicting the testing data, even better than training.

We are able to predict 45% of the people who are going to leave correctly and 97% of the people who will stay, an overall recall of 86%.

To view the full code for this project on Github, Click Here

Comments

Popular posts from this blog

A/B Testing: Bernoulli Model VS the Beta-Binomial Hierarchical Model

Recurrent Neural Networks and LSTM explained

Fraud Detection in Financial Data