A detailed step-by-step explanation on performing Customer Segmentation in Online Retail dataset using python, focussing on cohort analysis, understanding purchase patterns using RFM analysis and clustering.

Photo by Markus Spiske on Unsplash

In this article, I am going to write about how to carry out customer segmentation and other related analysis on online retail data using python.

This is going to get a bit long, so feel free to go through some sections at a time and come back again.

Before going into the definition of customer segmentation, let us take a look at how online retail works and how the associated data would look like. When a person goes into a retail store and purchases a few items, the following basic data points should be generated:

  1. Customer Name/Customer ID

A detailed explanation of assessing the quality of clustering performance and finding the optimal value of the number of clusters using the K-means algorithm.

Photo by Franki Chamaki on Unsplash

Clustering is a commonly used unsupervised machine learning technique that allows us to find patterns within data without having an explicit target variable. In simple terms, grouping unlabelled data is called Clustering. Clustering analysis uses similarity metrics to group data points that are close to each other and separate the ones which are farther apart. It is a widely used technique for market segmentation, pattern recognition, and image processing.

While there are many metrics, like classification accuracy, which one can use to evaluate a labeled data problem, for a clustering problem we have to understand how well the data is…

The Telecommunications industry regularly faces a lot of competition from various service providers, which generally provide services at very similar pricing points and services. In such a competitive stage, it is extremely essential for a telco company to analyze the rate at which their customers stop using their services to be able to understand how to better engage with those customers and understand their grievances.

Source: https://mspark.com/direct-mail-works-for-telecom-providers/

Customer churn is a major problem that many companies face today. This affects not only short term but also long term revenues of companies. …

This one is about Talent versus Hard Work.

Maybe you’ve already read about it or heard it, but if you can still think of even a single person on this planet as being talented, read on:

We all have seen people who say or think that I can never be as good as ‘he’ is, in some task. He is a natural. He is born talented.

Well, let me tell you that there is no such thing as Talent.

There is only Hard Work.


People who seem to be excellent and natural at doing something have toiled day in…

The amount of time that teens and adults are spending on Social Media is constantly increasing, day by day. With the enormous technological revolution giving us swifter internet speeds, billions of people can now access data intensive apps like YouTube and Instagram easier than ever before.

Therefore, it is integral that we start talking about the time we spend on social media and how it changes our personality over time.

Time spent on Social Media

We live in a world where our ‘real’ lives are overtaken by the virtual world. People are sharing more and more posts on Instagram, Facebook…

Rahul Khandelwal

Data Science enthusiast | Writer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store