how math can clear up so many issues in the actual world. Once I was in grade faculty, I definitely didn’t see it that approach. I by no means hated math, by the best way, and neither did I’ve hassle studying a lot of the fundamental ideas.
Nonetheless, I confess that for a lot of the courses past the traditional arithmetic, I often thought, “I’ll by no means use that for something in my life”.
These have been different instances, although. There was no Web, no information science, and computer systems have been barely a factor. However time passes. Life occurs, and we get to see the day after we will clear up vital enterprise issues with good outdated math!
On this submit, we are going to use the well-known linear regression for a unique downside: predicting buyer churn.
Linear Regression vs Churn
Buyer churn hardly ever occurs in a single day. In lots of instances, prospects will regularly scale back their buying frequency earlier than stopping utterly. Some name that silent churn [1].
Predicting churn will be completed with the normal churn fashions, which (1) require labeled churn information; (2) generally are advanced to elucidate; (3) detect churn after it already occurred.
Alternatively, this undertaking reveals a unique resolution, answering a less complicated query:
Is that this buyer
slowing down the purchasing?
This query is answered with the next logic.
We use month-to-month buy traits and linear regression to measure buyer momentum over time. If the client continues to extend their bills, the summed quantity will develop over time, resulting in a pattern upward (or a optimistic slope in a linear regression, if you’ll). The other can also be true. Decrease transaction quantities will add as much as a downtrend.
Let’s break down the logic in small steps, and perceive what we are going to do with the information:
- Combination buyer transactions by month
- Create a steady time index (e.g. 1, 2, 3…n)
- Fill lacking months with zero purchases
- Match a linear regression line
- Use the slope (transformed to levels) to quantify shopping for habits
- Evaluation: A adverse slope signifies declining engagement. A optimistic slope signifies rising engagement.
Effectively, let’s transfer on to the implementation subsequent.
Code
The very first thing is importing some modules right into a Python session.
# Imports
import scipy.stats as stats
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
Then, we are going to generate some information that simulates some prospects transactions. You possibly can take a look at the entire code on this GitHub repository. The dataset generated brings the columns customer_id, transaction_date, and total_amt, and can appear to be the subsequent image.
Now we are going to create a brand new column that extracts the month of the date, so it turns into simpler for us to group the information later.
# Create new column month
df['mth'] = df['transaction_date'].dt.month
# Group prospects by month
df_group = (
df
.groupby(['mth','customer_id'])
['total_amt']
.sum()
.reset_index()
)
Right here is the end result.

If we rapidly test if there are prospects who haven’t made a transaction each month, we are going to discover a couple of instances.
That leads us to the subsequent level. We’ve got to make it possible for, if the client doesn’t have at the least one buy per thirty days, then we’ve got so as to add that month with a $0 expense.
Let’s construct a perform that may try this and likewise calculate the slope of the client’s purchasing pattern.
This perform seems monumental, however we are going to go over it in smaller chunks. Let’s do that.
- Filter the information for a given buyer utilizing Pandas
question()technique. - Make a fast group and test if the client has at the least one buy for each month.
- If not, we are going to add the lacking month with a $0 expense. I carried out this by merging a short lived dataframe with the 12 months and $0 with the unique information. After the merge on months, these durations lacking might be rows with
NaNfor the unique information column, which will be full of $0. - Then, we normalize the axes. Do not forget that the X-axis is an index from 1 to 12, however the Y-axis is the expense quantity, in 1000’s of {dollars}. So, to keep away from distortion in our slope, we normalize every part to the identical scale, between 0 and 1. For that, we use the customized perform
min_max_standardize. - Subsequent, we are able to plot the regression utilizing one other customized perform.
- Then we are going to calculate the slope, which is the primary end result returned from the perform
scipy.linregress(). - Lastly, to calculate the angle of the slope in levels, we are going to enchantment to pure arithmetic, utilizing the idea of arc tangent to calculate the angle between the X-axis and the linear regression slope line. In Python, simply use the features
np.arctan()andnp.levels()from numpy.

# Standardize the information
def min_max_standardize(vals):
return (vals - np.min(vals)) / (np.max(vals) - np.min(vals))
#------------
# Fast Operate to plot the regression
def plot_regression(x,y, cust):
plt.scatter(x,y, shade = 'grey')
plt.plot(x,
stats.linregress(x,y).slope*np.array(x) + stats.linregress(x,y).intercept,
shade = 'pink',
linestyle='--')
plt.suptitle("Slope of the Linear Regression [Expenses x Time]")
plt.title(f"Buyer {cust} | Slope: {np.levels(np.arctan(stats.linregress(x,y).slope)):.0f} levels. Constructive = Shopping for extra | Damaging = Shopping for much less", dimension=9, shade='grey')
plt.present()
#-----
def get_trend_degrees(buyer, plot=False):
# Filter the information
one_customer = df.question('customer_id == @buyer')
one_customer = one_customer.groupby('mth').total_amt.sum().reset_index().rename(columns={'mth':'period_idx'})
# Examine if all months are within the information
cnt = one_customer.groupby('period_idx').period_idx.nunique().sum()
# If not, add 0 to the months with out transactions
if cnt < 12:
# Create a DataFrame with all 12 months
all_months = pd.DataFrame({'period_idx': vary(1, 13), 'total_amt': 0})
# Merge with the prevailing one_customer information.
# Use 'proper' merge to maintain all 12 months from 'all_months' and fill lacking total_amt.
one_customer = pd.merge(all_months, one_customer, on='period_idx', how='left', suffixes=('_all', ''))
# Mix the total_amt columns, preferring the precise information over the 0 from all_months
one_customer['total_amt'] = one_customer['total_amt'].fillna(one_customer['total_amt_all'])
# Drop the short-term _all column if it exists
one_customer = one_customer.drop(columns=['total_amt_all'])
# Type by period_idx to make sure right order
one_customer = one_customer.sort_values(by='period_idx').reset_index(drop=True)
# Min Max Standardization
X = min_max_standardize(one_customer['period_idx'])
y = min_max_standardize(one_customer['total_amt'])
# Plot
if plot:
plot_regression(X,y, buyer)
# Calculate slope
slope = stats.linregress(X,y)[0]
# Calculate angle levels
angle = np.arctan(slope)
angle = np.levels(angle)
return angle
Nice. It’s time to put this perform to check. Let’s get two prospects:
- C_014.
- That is an uptrend buyer who’s shopping for extra over time.
# Instance of sturdy buyer
get_trend_degrees('C_014', plot=True)
The plot it yields reveals the pattern. We discover that, regardless that there are some weaker months in between, general, the quantities have a tendency to extend as time passes.

The pattern is 32 levels, thus pointing effectively up, indicating a powerful relationship with this buyer.
- C_003.
- It is a downtrend buyer who’s shopping for much less over time.
# Instance of buyer cease shopping for
get_trend_degrees('C_003', plot=True)

Right here, the bills over the months are clearly lowering, making the slope of this curve level down. The road is 29 levels adverse, indicating that this buyer goes away from the model, thus requires to be stimulated to come back again.
Earlier than You Go
Effectively, that may be a wrap. This undertaking demonstrates a easy, interpretable method to detecting declining buyer buy habits utilizing linear regression.
As an alternative of counting on advanced churn fashions, we analyze buy traits over time to establish when prospects are slowly disengaging.
This straightforward mannequin may give us an amazing notion of the place the client is transferring in the direction of, whether or not it’s a higher relationship with the model or transferring away from it.
Actually, with different information from the enterprise, it’s attainable to enhance this logic and apply a tuned threshold and rapidly establish potential churners each month, primarily based on previous information.
Earlier than wrapping up, I wish to give correct credit score to the unique submit that impressed me to be taught extra about this implementation. It’s a submit from Matheus da Rocha that you’ll find here, in this link.
Lastly, discover extra about me on my web site.
GitHub Repository
Right here you discover the complete code and documentation.
https://github.com/gurezende/Linear-Regression-Churn/tree/main
References
[2. Numpy Arctan] https://numpy.org/doc/2.1/reference/generated/numpy.arctan.html
[3. Arctan Explanation] https://www.cuemath.com/trigonometry/arctan/
[4. Numpy Degrees] https://numpy.org/doc/2.1/reference/generated/numpy.degrees.html
[5. Scipy Lineregress] https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html
