Auto-Segmentation - how to use machine learning to generate customer segments with deeper insights for increased turn-around and improved ROI. ## What is Auto-Segmentation? Auto-Segmentation is essentially a marketing application of the K-Means clustering machine learning model. It works fairly simply — you feed the ML model all the data you have about your customers, choose how many groups you would like, and the model assigns a cluster label for each customer you feed in. For example, if I have 1000 customers and I want to devise three different marketing strategies, I would use an Auto-Segmentation model to divide the users into three separate groups, so I can design an optimal marketing plan for each group. This enables end users such as marketers, campaign managers, business strategists to achieve some of the following business outcomes: - **Improve targeted marketing and personalization** - having several distinct audiences allows marketers to customize and apply different marketing techniques to each group - **A/B testing** — compare performance across groups and discover which types of audiences perform best for various marketing campaigns and conversion goals. - **Improve ROMS / Marketing KPIs** – adjust marketing spend toward the audiences that perform better over time to achieve higher return on marketing spend and improve other KPIs important to your business. - **Understand your customers better** – analyze the distinct characteristics (attributes and behaviors) of each auto-segmentation cluster to discover the repeatable patterns and the business meaning that each group is defined by. This enables a deeper dive into each important customer group, rather than analyzing your customer base as a whole. ## How Does Auto-Segmentation Work? While there are different methods of performing auto-segmentation, the concept behind all these methods is fairly similar. Data points (customers) are grouped based on similarity and the ML model aims to get the most similar personas into each cluster and at the same time ensure that the groups formed are as different from each other as possible in terms of their combined characteristics. In other words, K-means aims to find the clusters that are most homogeneous within themselves and most different from each other. For example, if the auto-segmentation generates three distinct groups of customers, there is statistical certainty the customers in Group 1 will be most similar to each other, and less similar to customers in Groups 2 and 3. ## How is Treasure Data’s Auto-Segmentation Model Different from the classic K-Means clustering approach? Our goal at Treasure Data is to enable our customers to gain as much intelligence and business value from their data as possible in a way that is customized to their specific use cases and targeted outcomes. Thus, we have modified our approach to Auto-Segmentation to better serve the needs of marketing teams and business users by allowing them to pick a list of user characteristics they want to prioritize for the ML model when discovering audiences. ### “Supervising” an Unsupervised Model At Treasure Data, we set out to make a segmentation model that is a little more controllable and hopefully much more useful for marketers on our platform. Typically, marketers already have some idea of the different kinds of customers to target. The idea here is to simply nudge the Machine Learning model to prioritize a set of characteristics during the grouping process. After months of testing, we found a simple solution — to just scale values of more relevant characteristics or features up. Mathematically, by scaling a feature up, it makes a clustering algorithm more sensitive to it. Take this simple but extreme example where we try to split four data points into two groups based on two characteristics, amount spent, and whether they are based in the US. (Note: The features are Min-Max scaled to range between 0 and 1. Features should always be scaled in some way when fed into any distance-based model, so feature ranges don’t bias the outcome.)  Dividing 4 points into two groups based on distance. Panel on the left shows the original data values. Left: original input data. Right: amount spent is multiplied by a factor of 3. Now, by simply multiplying the value of the amount spent by an arbitrary positive integer, we see how the relative distance between points change, splitting the data by a more “relevant” feature. In the oversimplified example shown previously, only after multiplying the amount spent by 3 (see that values along the y-axis stay the same), the groups end up being split via **higher spend vs lower spend**, opposed being split based on whether the customer is based in or out of the US. Using simulated test data, we ran two versions of our auto-segmentation model: one fully unsupervised, and one with RFM (recency, frequency, monetary) and a customer’s next best channel prediction prioritized. We then measure the information gain of all features (i.e. how much a feature contributes to the separation of groups) and the resulting bar graphs comparing the two models. 