Posts

Showing posts from October, 2017

Cluster Analysis - Project 1

Image
Problem Statement:     In this project, we will mainly concentrate on clustering customers from the YPedia homepage search data , find out various insights  and patterns of customer behavior and solve few business related queries. Section 1 : Handling Missing Data Section 2 : Exploratory Data Ananlysis and Feature Engineering Section 3 :    3.1    Find out Under-performing and Out-performing Marketing Channels.                         3.2    Perform A/B Test on the Out-performing Channels.       Section 4 : K - means Clustering:                  4.1   What is the optimal number of Clusters for this data?                 4.2   Descriptive Analysis of all the Clusters in terms of booking rate.                  4.3   What are the important features that best describes 95% of the variance for                     each cluster            Section 5 : What lead to a higher chance of booking for individuals in each Cluster? About the Dataset: The YPedia home

Natural Language Processing - Project 1

Image
Introduction: The objective of this project is to “Predict the Rating” of food products with the help of customer’s reviews. Dataset: The Amazon Fine Food Reviews dataset consists of 568,454 food reviews Amazon users left up to October 2012. The features of this dataset are as follows: Since the data which is provided is very large, in order to reduce our training time, only 10% of the data was taken for this project, i.e. 56846 customer reviews. NOTE : the distribution of data for this 10% is similar for every “Score Category” as compared to the whole data. The features that will be used for this project are:        Text    -     customer’s reviews       Score -     rating between 1 and 5          Exploratory Data Analysis: After analyzing the score or rating given by customers, it was discovered that majority of the ratings belong to Star Category 4 and 5, about 80%. Then, WordClouds were constructed to analyze words i.e. how customers d