Close Menu
    Trending
    • Optimizing Data Transfer in Distributed AI/ML Training Workloads
    • Achieving 5x Agentic Coding Performance with Few-Shot Prompting
    • Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found
    • From Transactions to Trends: Predict When a Customer Is About to Stop Buying
    • America’s coming war over AI regulation
    • “Dr. Google” had its issues. Can ChatGPT Health do better?
    • Evaluating Multi-Step LLM-Generated Content: Why Customer Journeys Require Structural Metrics
    • Why SaaS Product Management Is the Best Domain for Data-Driven Professionals in 2026
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » EDA in Public (Part 3): RFM Analysis for Customer Segmentation in Pandas
    Artificial Intelligence

    EDA in Public (Part 3): RFM Analysis for Customer Segmentation in Pandas

    ProfitlyAIBy ProfitlyAIJanuary 1, 2026No Comments14 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    ! In case you’ve been following alongside, we’ve come a good distance. In Part 1, we did the “soiled work” of cleansing and prepping.

    In Part 2, we zoomed out to a high-altitude view of NovaShop’s world — recognizing the massive storms (high-revenue international locations) and the seasonal patterns (the huge This autumn rush).

    However right here’s the factor: a enterprise doesn’t truly promote to “months” or “international locations.” It sells to human beings.

    In case you deal with each buyer precisely the identical, you’re making two very costly errors:

    • Over-discounting: Giving a “20% off” coupon to somebody who was already reaching for his or her pockets.
    • Ignoring the “Quiet” Ones: Failing to note when a previously loyal buyer stops visiting, till they’ve been gone for six months and it’s too late to win them again.

    The Resolution? Behavioural Segmentation.

    As a substitute of guessing, we’re going to make use of the information to let the purchasers inform us who they’re. We do that utilizing the gold customary of retail analytics: RFM Evaluation.

    • Recency (R): How lately did they purchase? (Are they nonetheless engaged with us?)
    • Frequency (F): How typically do they purchase? (Are they loyal, or was it a one-off?)
    • Financial (M): How a lot do they spend? (What’s their complete enterprise affect?)

    By the tip of this half, we’ll transfer past “Prime 10 Merchandise” and truly assign a selected, actionable Label to each single buyer in NovaShop’s database.

    Knowledge Preparation: The “Lacking ID” Pivot

    Earlier than we are able to begin scoring, now we have to handle a call we made again in Half 1.

    In case you bear in mind our Preliminary Inspection, we seen that about 25% of our rows have been lacking a CustomerID. On the time, we made a strategic enterprise resolution to maintain these rows. We would have liked them to calculate the correct complete income and see which merchandise have been in style general.

    For RFM evaluation, the foundations change. You can’t observe conduct with out a constant id. We will’t know the way “frequent” a buyer is that if we don’t know who they’re!

    So, our first step in Half 3 is to isolate our “Trackable Universe” by filtering for rows the place a CustomerID exists.

    Engineering the RFM Metrics

    Now that now we have a dataset the place each row is linked to a selected particular person, we have to mixture all their particular person transactions into three abstract numbers: Recency, Frequency, and Financial.

    Defining the Snapshot Date

    Earlier than calculating RFM, we’d like a reference time limit, generally known as the snapshot date.

    Right here, we take the latest transaction date within the dataset and add sooner or later. This snapshot date represents the second at which we’re evaluating buyer behaviour.

    snapshot_date = df['InvoiceDate'].max() + dt.timedelta(days=1)

    We added sooner or later, so clients who purchased on the latest date nonetheless have a Recency worth of 1 day, not 0. This retains the metric intuitive and avoids edge-case issues.

    Aggregating Transactions on the Buyer Degree

    rfm = df.groupby(‘CustomerID’).agg({
    ‘InvoiceDate’: lambda x: (snapshot_date — x.max()).days,
    ‘InvoiceNo’: ‘nunique’,
    ‘Income’: ‘sum’
    })

    Every row in our dataset represents a single transaction. To calculate RFM, we have to collapse these transactions into one row per buyer.

    We do that by grouping the information by CustomerID and making use of totally different aggregation features:

    • Recency: For every buyer, we discover their most up-to-date buy date and calculate what number of days have handed since then.
    • Frequency: We depend the variety of distinctive invoices related to every buyer. This tells us how typically they’ve made purchases.
    • Financial: We sum the overall income generated by every buyer throughout all transactions.

    Renaming Columns for Readability

    rfm.rename(columns={
    'InvoiceDate': 'Recency',
    'InvoiceNo': 'Frequency',
    'Income': 'Financial'
    }, inplace=True)py

    The aggregation step retains the unique column names, which will be complicated. Renaming them makes the dataframe instantly readable and aligns it with customary RFM terminology.

    Now every column clearly solutions a enterprise query:

    • Recency → How lately did the shopper buy?
    • Frequency → How typically do they buy?
    • Financial → How a lot income do they generate?

    Inspecting the Outcome

    print(rfm.head())

    The ultimate rfm dataframe comprises one row per buyer, with three intuitive metrics summarizing their conduct. 

    Output:

    Let’s stroll via this the best way we’d with NovaShop in an actual dialog.

    “When was the final time this buyer purchased from us?”

    That’s precisely what Recency solutions.

    Take Buyer 12347:

    • Recency = 2
    • Translation: “This buyer purchased one thing simply two days in the past.”

    They’re contemporary. They bear in mind the model. They’re nonetheless engaged.

    Now evaluate that to Buyer 12346:

    • Recency = 326
    • Translation: “They haven’t purchased something in nearly a yr.”

    Although this buyer spent lots previously, they’re at present silent.

    From NovaShop’s perspective: Recency tells us who’s nonetheless listening and who may want a nudge (or a wake-up name).

    “Is that this a one-time purchaser or somebody who retains coming again?”

    That’s the place Frequency is available in.

    Look once more at Buyer 12347:

    • Frequency = 7
    • They didn’t simply purchase as soon as — they got here again many times.

    Now take a look at a number of others:

    • Frequency = 1
    • One buy, then gone.

    From a enterprise perspective, frequency separates informal consumers from loyal clients.

    “Who truly brings within the cash?”

    That’s the Financial column.
    And that is the place issues get attention-grabbing.

    Buyer 12346:

    • Financial = £77,183.60
    • Frequency = 1
    • Recency = 326

    This tells a really particular story:

    A single, very giant order… a very long time in the past… and nothing since.

    Now evaluate that to Buyer 12347:

    • Decrease complete spend
    • A number of purchases
    • Very current exercise

    Necessary perception for NovaShop: A “high-value” buyer previously isn’t essentially a priceless buyer at present.

    Why This View Adjustments the Dialog

    If NovaShop solely checked out complete income, they could focus all their consideration on clients like 12346.

    However RFM exhibits us that:

    • Some clients spent lots as soon as and disappeared
    • Some spend much less however keep loyal
    • Some are energetic proper now and able to be engaged

    This output helps NovaShop cease guessing and begin prioritizing:

    • Who ought to get retention emails?
    • Who wants reactivation campaigns?
    • Who’s already loyal and needs to be rewarded?

    Proper now, these are nonetheless uncooked numbers.

    Within the subsequent step, we’ll rank and rating these clients, so NovaShop doesn’t need to interpret rows manually. As a substitute, they’ll see clear segments like:

    • Champions
    • Loyal Prospects
    • At-Threat
    • Misplaced

    That’s the place this turns into an actual decision-making device — not only a dataframe.

    Turning RFM Numbers Into Significant Buyer Segments

    At this stage, NovaShop has a desk stuffed with numbers. Helpful — however not precisely decision-friendly.

    A advertising and marketing group can’t realistically scan a whole bunch or 1000’s of rows asking:

    • Is a Recency of 19 good or dangerous?
    • Is Frequency = 2 spectacular?
    • How a lot Financial worth is “excessive”?

    Our purpose is to rank clients relative to 1 one other and switch uncooked values into scores.

    Step 1: Rating Prospects by Every RFM Metric

    As a substitute of treating Recency, Frequency, and Financial as absolute values, we take a look at the place every buyer stands in comparison with everybody else.

    • Prospects with newer purchases ought to rating greater
    • Prospects who purchase extra typically ought to rating greater
    • Prospects who spend extra ought to rating greater

    In follow, we do that by splitting every metric into quantiles (normally 4 or 5 buckets).

    Nonetheless, there’s a small real-world wrinkle. That is one thing I got here throughout whereas engaged on this undertaking

    In transactional datasets, it’s frequent to see:

    • Many purchasers with the identical Frequency (e.g. one-time patrons)
    • Extremely skewed Financial values
    • Small samples the place quantile binning can fail

    To maintain issues sturdy and readable, we’ll wrap the scoring logic in a small helper perform.

    def rfm_score(sequence, ascending=True, n_bins=5):
    # Rank the values to make sure uniqueness
    ranked = sequence.rank(methodology=’first’, ascending=ascending)
    
    # Use pd.qcut on the ranks to assign bins
    return pd.qcut(
    ranked,
    q=n_bins,
    labels=vary(1, n_bins+1)
    ).astype(int)

    To elucidate what’s happening right here:

    • We’re making a helper perform that turns a uncooked numeric column right into a clear RFM rating utilizing quantile-based binning.
    • First, the values are ranked. So, as an alternative of binning the uncooked values instantly, we rank them first. This step ensures distinctive ordering, even when many shoppers share the identical worth (a typical situation in RFM knowledge). 
    • The ascending flag lets us flip the logic relying on the metric — for instance, decrease recency is healthier, whereas greater frequency and financial values are higher.
    • Subsequent, we’re making use of quantile-based binning. qcut splits the ranked values into n_bins equally sized teams. Every buyer is assigned a rating from 1 to five (by default), the place the rating represents their relative place inside the distribution.
    • Lastly, the outcomes can be transformed to integers for simple use in evaluation and segmentation.

    In brief, this perform offers a sturdy and reusable approach to attain RFM metrics with out working into duplicate bin edge errors — and with out overcomplicating the logic.

    Step 2: Making use of the Scores

    Now we are able to rating every metric cleanly and persistently:

    # Assign R, F, M scores
    rfm['R_Score'] = rfm_score(rfm['Recency'], ascending=False) # Current purchases = excessive rating
    rfm['F_Score'] = rfm_score(rfm['Frequency']) # Extra frequent = excessive rating
    rfm['M_Score'] = rfm_score(rfm['Monetary']) # Increased spend = excessive rating

    The one particular case right here is Recency:

    • Decrease values imply newer exercise
    • So we reverse the rating with ascending=False
    • All the things else follows the pure “greater is healthier” rule.

    What This Means for NovaShop

    As a substitute of seeing this:

    Recency = 326
    Frequency = 1
    Financial = 77,183.60

    NovaShop now sees one thing like:

    R = 1, F = 1, M = 5

    That’s immediately extra interpretable:

    • Not current
    • Not frequent
    • Excessive spender (traditionally)

    Step 3: Making a Mixed RFM Rating

    Now we mix these three scores right into a single RFM code:

    rfm['RFM_Score'] = (
    rfm['R_Score'].astype(str) +
    rfm['F_Score'].astype(str) +
    rfm['M_Score'].astype(str)
    )

    This produces values like:

    • 555 → Greatest clients
    • 155 → Excessive spenders who haven’t returned
    • 111 → Prospects who’re possible gone

    Every buyer now carries a compact behavioral fingerprint. And we’re not achieved but.

    Translating RFM Scores Into Buyer Segments

    Uncooked scores are good, however let’s be sincere: no advertising and marketing supervisor desires to take a look at 555, 154, or 311 all day.

    NovaShop wants labels that make sense at a look. That’s the place RFM segments are available in.

    Step 1: Defining Segments

    Utilizing RFM scores, we are able to classify clients into significant classes. Right here’s a typical strategy:

    • Champions: Prime Recency, high Frequency, high Financial (555) — your greatest clients
    • Loyal Prospects: Common patrons, is probably not spending probably the most, however maintain coming again
    • Massive Spenders: Excessive Financial, however not essentially current or frequent
    • At-Threat: Used to purchase, however haven’t returned lately
    • Misplaced: Low scores in all three metrics — possible disengaged
    • Promising / New: Current clients with decrease frequency or financial spend

    This transforms summary numbers right into a narrative that advertising and marketing and administration can act on.

    Step 2: Mapping Scores to Segments

    Right here’s an instance utilizing easy conditional logic:

    def rfm_segment(row):
    if row['R_Score'] >= 4 and row['F_Score'] >= 4 and row['M_Score'] >= 4:
    return 'Champions'
    elif row['F_Score'] >= 4:
    return 'Loyal Prospects'
    elif row['M_Score'] >= 4:
    return 'Massive Spenders'
    elif row['R_Score'] <= 2:
    return 'At-Threat'
    else:
    return 'Others'
    rfm['Segment'] = rfm.apply(rfm_segment, axis=1)

    Now every buyer has a human-readable label, making it instantly actionable.

    Let’s evaluate our outcomes utilizing rfm.head()

    Step 3: Turning Segments into Technique

    With labeled segments, NovaShop can:

    • Reward Champions → Unique offers, loyalty factors
    • Re-engage Massive Spenders & At-Threat clients → Personalised emails or reductions
    • Focus advertising and marketing properly → Don’t waste effort on clients who’re really misplaced

    That is the second the place knowledge turns into technique.

    What NovaShop Ought to Do Subsequent (Key Takeaways & Suggestions)

    In the beginning of this evaluation, NovaShop had a well-known downside:
    Lots of transactional knowledge, however restricted readability on buyer behaviour.

    By making use of the RFM framework, we’ve turned uncooked buy historical past into a transparent, structured view of who NovaShop’s clients are — and the way they behave.

    Now let’s speak about what to truly do with it.

    1. Defend and Reward Your Greatest Prospects

    Champions and Loyal Prospects are already doing what each enterprise desires:

    • They purchase lately
    • They purchase typically
    • They generate constant income

    These clients don’t want heavy reductions — they want recognition.

    Really useful actions:

    • Early entry to gross sales
    • Loyalty factors or VIP tiers
    • Personalised thank-you emails

    The purpose right here isn’t acquisition, it’s retention.

    2. Re-Interact Excessive-Worth Prospects Earlier than They’re Misplaced

    Essentially the most harmful phase for NovaShop isn’t “Misplaced” clients.
    It’s At-Threat and Massive Spenders.

    These clients:

    • Have proven clear worth previously
    • However haven’t bought lately
    • Are one step away from churning fully

    Really useful actions:

    • Focused win-back campaigns
    • Personalised affords (not blanket reductions)
    • Reminder emails tied to previous buy conduct

    Successful again an present buyer is nearly all the time cheaper than buying a brand new one.

    3. Don’t Over-Put money into Really Misplaced Prospects

    Some clients will inevitably churn. RFM helps NovaShop establish these clients early and keep away from spending advert funds, reductions and advertising and marketing effort on customers who’re unlikely to return. This isn’t about being chilly — it’s about being environment friendly.

    4. Use RFM as a Residing Framework, Not a One-Off Evaluation

    The true energy of RFM comes when it’s:

    • Recomputed month-to-month or quarterly
    • Built-in into dashboards
    • Used to trace motion between segments over time

    For NovaShop, this implies asking questions like:

    • What number of At-Threat clients grew to become Loyal this month?
    • Are Champions rising or shrinking?
    • Which campaigns truly transfer clients up the ladder?

    RFM turns buyer behaviour into one thing measurable and trackable.

    Last Ideas: Closing the EDA in Public Collection

    After I began this EDA in Public sequence, I wasn’t attempting to construct the proper evaluation or display superior methods. I needed to decelerate and share how I truly assume when working with actual knowledge. Not the polished model, however the messy, iterative course of that normally stays hidden.

    This undertaking started with a loud CSV and numerous open questions. Alongside the best way, there have been small points that solely surfaced as soon as I paid nearer consideration — dates saved as strings, assumptions that didn’t fairly maintain up, metrics that wanted context earlier than they made sense. Working via these moments in public was uncomfortable at instances, but in addition genuinely priceless. Every correction made the evaluation stronger and extra sincere.

    One factor this course of strengthened for me is that the majority significant insights don’t come from complexity. They arrive from slowing down, structuring the information correctly, and asking higher questions. By the point I reached the RFM evaluation, the worth wasn’t within the formulation themselves — it was in what they pressured me to confront. A buyer who spent lots as soon as isn’t essentially priceless at present. Recency issues. Frequency issues. And none of those metrics imply a lot in isolation.

    Ending the sequence with RFM felt deliberate. It sits on the level the place technical work meets enterprise considering, the place tables flip into conversations and numbers flip into selections. It’s additionally the place exploratory evaluation stops being purely descriptive and begins changing into sensible. At that stage, the purpose is now not simply to know the information, however to resolve what to do subsequent.

    Doing this work in public modified how I strategy evaluation. Writing issues out pressured me to clarify my reasoning, query my assumptions, and be snug displaying imperfect work. It jogged my memory that EDA isn’t a guidelines you rush via — it’s a dialogue with the information. Sharing that dialogue makes you extra considerate and extra accountable.

    This can be the ultimate a part of the EDA in Public sequence, however it doesn’t really feel like an endpoint. All the things right here might evolve into dashboards, automated pipelines, or deeper buyer evaluation. 

    And in the event you’re a founder, analyst, or group working with buyer or gross sales knowledge and attempting to make sense of it, this type of exploratory work is commonly the place the most important readability comes from. These are precisely the sorts of issues I get pleasure from working via — slowly, thoughtfully, and with the enterprise context in thoughts.

    In case you’re documenting your personal analyses, I’d like to see the way you strategy it. And in the event you’re wrestling with related questions in your knowledge and wish to speak via them, be happy to succeed in out on any of the platforms beneath. Good knowledge conversations normally begin there.

    Thanks for following alongside!

    Medium

    LinkedIn

    Twitter

    YouTube



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleDeep Reinforcement Learning: The Actor-Critic Method
    Next Article The Real Challenge in Data Storytelling: Getting Buy-In for Simplicity
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Optimizing Data Transfer in Distributed AI/ML Training Workloads

    January 23, 2026
    Artificial Intelligence

    Achieving 5x Agentic Coding Performance with Few-Shot Prompting

    January 23, 2026
    Artificial Intelligence

    Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found

    January 23, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Accuracy Is Dead: Calibration, Discrimination, and Other Metrics You Actually Need

    July 15, 2025

    Robots that spare warehouse workers the heavy lifting | MIT News

    December 5, 2025

    TruthScan vs. Grammarly: Which AI Detector Works Best?

    December 3, 2025

    Understanding AI Hallucinations: The Risks and Prevention Strategies with Shaip

    April 7, 2025

    Gratis Perplexity i ett år: Så tar du del av PayPals erbjudande

    September 8, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    Helping K-12 schools navigate the complex world of AI | MIT News

    November 3, 2025

    Your Next ‘Large’ Language Model Might Not Be Large After All

    November 23, 2025

    Therapists Too Expensive? Why Thousands of Women Are Spilling Their Deepest Secrets to ChatGPT

    May 6, 2025
    Our Picks

    Optimizing Data Transfer in Distributed AI/ML Training Workloads

    January 23, 2026

    Achieving 5x Agentic Coding Performance with Few-Shot Prompting

    January 23, 2026

    Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found

    January 23, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.