Close Menu
    Trending
    • Gemini introducerar funktionen schemalagda åtgärder i Gemini-appen
    • AIFF 2025 Runway’s tredje årliga AI Film Festival
    • AI-agenter kan nu hjälpa läkare fatta bättre beslut inom cancervård
    • Not Everything Needs Automation: 5 Practical AI Agents That Deliver Enterprise Value
    • Prescriptive Modeling Unpacked: A Complete Guide to Intervention With Bayesian Modeling.
    • 5 Crucial Tweaks That Will Make Your Charts Accessible to People with Visual Impairments
    • Why AI Projects Fail | Towards Data Science
    • The Role of Luck in Sports: Can We Measure It?
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » Choose the Right One: Evaluating Topic Models for Business Intelligence
    Artificial Intelligence

    Choose the Right One: Evaluating Topic Models for Business Intelligence

    ProfitlyAIBy ProfitlyAIApril 24, 2025No Comments12 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    are utilized in companies to categorise brand-related textual content datasets (comparable to product and website critiques, surveys, and social media feedback) and to trace how buyer satisfaction metrics change over time.

    There’s a myriad of latest matter fashions one can select from: the broadly used BERTopic by Maarten Grootendorst (2022), the latest FASTopic introduced ultimately 12 months’s NeurIPS, (Xiaobao Wu et al.,2024), the Dynamic Topic Model by Blei and Lafferty (2006), or a recent semi-supervised Seeded Poisson Factorization mannequin (Prostmaier et al., 2025).

    For a enterprise use case, coaching matter fashions on buyer texts, we regularly get outcomes that aren’t equivalent and generally even conflicting. In enterprise, imperfections price cash, so the engineers ought to place into manufacturing the mannequin that gives the perfect answer and solves the issue most successfully. On the identical tempo that new matter fashions seem in the marketplace, strategies for evaluating their high quality utilizing new metrics additionally evolve.

    This sensible tutorial will deal with bigram matter fashions, which give extra related data and establish higher key qualities and issues for enterprise choices than single-word fashions (“supply” vs. “poor supply”, “abdomen” vs. “delicate abdomen”, and so on.). On one facet, bigram fashions are extra detailed; on the opposite, many analysis metrics weren’t initially designed for his or her analysis. To offer extra background on this space, we’ll discover intimately:

    • Tips on how to consider the standard of bigram matter fashions
    • Tips on how to put together an e mail classification pipeline in Python. 

    Our instance use case will present how bigram matter fashions (BERTopic and FASTopic) assist prioritize e mail communication with prospects on sure matters and cut back response instances.

    1. What are matter mannequin high quality indicators?

    The analysis process ought to goal the best state:

    The best matter mannequin ought to produce matters the place phrases or bigrams (two consecutive phrases) in every matter are extremely semantically associated and distinct for every matter.

    In follow, which means that the phrases predicted for every matter are semantically similar to human judgment, and there’s low duplication of phrases between matters. 

    It’s normal to calculate a set of metrics for every skilled mannequin to make a professional choice on which mannequin to put into manufacturing or use for a enterprise choice, evaluating the mannequin efficiency metrics.

    • Coherence metrics consider how nicely the phrases found by a subject mannequin make sense to people (have similar semantics in every matter).
    • Matter range measures how completely different the found matters are from each other. 

    Bigram matter fashions work nicely with these metrics:

    • NPMI (Normalized Level-wise Mutual Data) makes use of possibilities estimated in a reference corpus to calculate a [-1:1] rating for every phrase (or bigram) predicted by the mannequin. Learn [1] for extra particulars.

    The reference corpus could be both inner (the coaching set) or exterior (e.g., an exterior e mail dataset). A big, exterior, and comparable corpus is a better option as a result of it will probably assist cut back bias in coaching units. As a result of this metric works with phrase frequencies, the coaching set and the reference corpus must be preprocessed the identical manner (i.e., if we take away numbers and stopwords within the coaching set, we also needs to do it within the reference corpus). The mixture mannequin rating is the common of phrases throughout matters. 

    • SC (Semantic Coherence) doesn’t want a reference corpus. It makes use of the identical dataset as was used to coach the subject mannequin. Learn extra in [2].

    Let’s say we now have the High 4 phrases for one matter: “apple”, “banana”, “juice”, “smoothie” predicted by a subject mannequin. Then SC seems in any respect combos of phrases within the coaching set going from left to proper, beginning with the primary phrase {apple, banana}, {apple, juice}, {apple, smoothie} then the second phrase {banana, juice}, {banana, smoothie}, then final phrase {juice, smoothie} and it counts the variety of paperwork that include each phrases, divided by the frequency of paperwork that include the primary phrase. General SC rating for a mannequin is the imply of all topic-level scores. 

    Picture 1. Semantic coherence by Mimno et al. (2011) illustration. Picture by creator.

    PUV (Share of Distinctive Phrases) calculates the share of distinctive phrases throughout matters within the mannequin. PUV = 1 signifies that every matter within the mannequin incorporates distinctive bigrams. Values near 1 point out a well-shaped, high-quality mannequin with small phrase overlap between matters. [3].

    The nearer to 0 the SC and NIMP scores are, the extra coherent the mannequin is (bigrams predicted by the subject mannequin for every matter are semantically comparable). The nearer to 1 PUV is, the better the mannequin is to interpret and use, as a result of bigrams between matters don’t overlap. 

    2. How can we prioritize e mail communication with matter fashions?

    A big share of buyer communication, not solely in e-commerce companies, is now solved with chatbots and private consumer sections. But, it is not uncommon to speak with prospects by e mail. Many e mail suppliers provide builders broad flexibility in APIs to customise their e mail platform (e.g., MailChimp, SendGrid, Brevo). On this place, matter fashions make mailing extra versatile and efficient.

    On this use case, the pipeline takes the enter from the incoming emails and makes use of the skilled matter classifier to categorize the incoming e mail content material. The end result is the categorized matter that the Buyer Care (CC) Division sees subsequent to every e mail. The principle goal is to permit the CC employees to prioritize the classes of emails and cut back the response time to essentially the most delicate requests (that immediately have an effect on margin-related KPIs or OKRs).

    Picture 2. Matter mannequin pipeline illustration. Picture by creator.

    3. Knowledge and mannequin set-ups

    We are going to prepare FASTopic and Bertopic to categorise emails into 8 and 10 matters and consider the standard of all mannequin specs. Learn my earlier TDS tutorial on matter modeling with these cutting-edge matter fashions.

    As a coaching set, we use a synthetically generated Customer Care Email dataset obtainable on Kaggle with a GPL-3 license. The prefiltered knowledge covers 692 incoming emails and appears like this:

    Picture 3. Buyer Care E mail dataset. Picture by creator.

    3.1. Knowledge preprocessing

    Cleansing textual content in the suitable order is important for matter fashions to work in follow as a result of it minimizes the bias of every cleansing operation. 

    Numbers are sometimes eliminated first, adopted by emojis, until we don’t want them for particular conditions, comparable to extracting sentiment. Stopwords for a number of languages are eliminated afterward, adopted by punctuation in order that stopwords don’t break up into two tokens (“we’ve” -> “we” + ‘ve”). Further tokens (firm and folks’s names, and so on.) are eliminated within the subsequent step within the clear knowledge earlier than lemmatization, which unifies tokens with the identical semantics.

    Picture 4. Normal preprocessing steps for matter modeling. Picture by creator

    “Supply” and “deliveries”, “field” and “Containers”, or “Value” and “costs” share the identical phrase root, however with out lemmatization, matter fashions would mannequin them as separate components. That’s why buyer emails must be lemmatized within the final step of preprocessing.

    Textual content preprocessing is model-specific:

    • FASTopic works with clear knowledge on enter; some cleansing (stopwords) could be achieved throughout the coaching. The only and simplest manner is to make use of the Washer, a no-code app for text data cleaning that gives a no-code manner of knowledge preprocessing for textual content mining tasks.
    • BERTopic: the documentation recommends that “removing cease phrases as a preprocessing step will not be suggested because the transformer-based embedding fashions that we use want the total context to create correct embeddings”. For that reason, cleansing operations must be included within the mannequin coaching.

    3.2. Mannequin compilation and coaching

    You may examine the total codes for FASTopic and BERTopic’s coaching with bigram preprocessing and cleansing in this repo. My earlier TDS tutorials (4) and (5) clarify all steps intimately.

    We prepare each fashions to categorise 8 matters in buyer e mail knowledge. A easy inspection of the subject distribution reveals that incoming emails to FASTopic are fairly nicely distributed throughout matters. BERTopic classifies emails erratically, holding outliers (uncategorized tokens) in T-1 and a big share of incoming emails in T0.

    Picture 5: Matter distribution, e mail classification. Picture by creator.

    Listed here are the expected bigrams for each fashions with matter labels:

    Picture 6: Fashions’ predictions. Picture by creator.

    As a result of the e-mail corpus is an artificial LLM-generated dataset, the naive labelling of the matters for each fashions reveals matters which are:

    • Comparable: Time Delays, Latency Points, Consumer Permissions, Deployment Points, Compilation Errors,
    • Differing: Unclassified (BERTopic classifies outliers into T-1), Enchancment Ideas, Authorization Errors, Efficiency Complaints (FASTopic), Cloud Administration, Asynchronous Requests, Normal Requests (BERTopic)

    For enterprise functions, matters must be labelled by the corporate’s insiders who know the shopper base and the enterprise priorities.

    4. Mannequin analysis

    If three out of eight categorized matters are labeled otherwise, then which mannequin must be deployed? Let’s now consider the coherence and variety for the skilled BERTopic and FASTopic T-8 fashions.

    4.1. NPMI

    We’d like a reference corpus to calculate an NPMI for every mannequin. The Customer IT Support Ticket Dataset from Kaggle, distributed with Attribution 4.0 International license, supplies comparable knowledge to our coaching set. The information is filtered to 11923 English e mail our bodies. 

    1. Calculate an NPMI for every bigram within the reference corpus with this code.
    2. Merge bigrams predicted by FASTopic and BERTopic with their NPMI scores from the reference corpus. The less NaNs are within the desk, the extra correct the metric is.
    Picture 7: NPMI coherence analysis.Picture by creator.

    3. Common NPMIs inside and throughout matters to get a single rating for every mannequin.

    4.2. SC

    With SC, we study the context and semantic similarity of bigrams predicted by a subject mannequin by calculating their place within the corpus in relation to different tokens. To take action, we:

    1. Create a document-term matrix (DTM) with a rely of what number of instances every bigram seems in every doc.
    2. Calculate matter SC scores by trying to find bigram co-occurrences within the DTM and the bigrams predicted by matter fashions.
    3. Common matter SC to a mannequin SC rating.

    4.3. PUV

    Matter range PUV metric checks the duplicates of bigrams between matters in a mannequin.

    1. Be a part of bigrams into tokens by changing areas with underscores within the FASTopic and BERTopic tables of predicted bigrams.
    Picture 8: Matter range illustration. Picture by creator.

    2. Calculate matter range as rely of distinct tokens/ rely of tokens within the tables for each fashions.

    4.4. Mannequin comparability

    Let’s now summarize the coherence and variety analysis in Picture 9. BERTopic fashions are extra coherent however much less numerous than FASTopic. The variations will not be very massive, however BERTopic suffers from uneven distribution of incoming emails into the pipeline (see charts in Picture 5). Round 32% of categorized emails fall into T0, and 15% into T-1, which covers the unclassified outliers. The fashions are skilled with a min. of 20 tokens per matter. Growing this parameter causes the mannequin to be unable to coach, in all probability due to the small knowledge dimension. 

    For that reason, FASTopic is a better option for matter modelling in e mail classification with small coaching datasets.

    Picture 9: Matter mannequin analysis metrics. Picture by creator.

    The final step is to deploy the mannequin with matter labels within the e mail platform to categorise incoming emails:

    Picture 10. Matter mannequin classification pipeline, output. Picture by creator.

    Abstract

    Coherence and variety metrics evaluate fashions with comparable coaching setups, the identical dataset, and cleansing technique. We can not evaluate their absolute values with the outcomes of various coaching periods. However they assist us resolve on the perfect mannequin for our particular use case. They provide a relative comparability of varied mannequin specs and assist resolve which mannequin must be deployed within the pipeline. Matter fashions analysis ought to at all times be the final step earlier than mannequin deployment in enterprise follow.

    How does buyer care profit from the subject modelling train? After the subject mannequin is put into manufacturing, the pipeline sends a categorized matter for every e mail to the e-mail platform that Buyer Care makes use of for speaking with prospects. With a restricted employees, it’s now attainable to prioritize and reply sooner to essentially the most delicate enterprise requests (comparable to “time delays” and “latency points”), and alter priorities dynamically. 

    Knowledge and full codes for this tutorial are here.


    Petr Korab is a Python Engineer and Founding father of Text Mining Stories with over eight years of expertise in Enterprise Intelligence and NLP.

    Acknowledgments: I thank Tomáš Horský (Lentiamo, Prague), Martin Feldkircher, and Viktoriya Teliha (Vienna Faculty of Worldwide Research) for helpful feedback and strategies.

    References

    [1] Blei, D. M., Lafferty, J. D. 2006. Dynamic matter fashions. In Proceedings of the twenty third worldwide convention on Machine studying (pp. 113–120).

    [2] Dieng A.B., Ruiz F. J. R., and Blei D. M. 2020. Topic Modeling in embedding areas. Transactions of the Association for Computational Linguistics, 8:439-453.

    [3] Grootendorst, M. 2022. Bertopic: Neural Matter Modeling With A Class-Primarily based TF-IDF Process. Computer Science.

    [4] Korab, P. Matter Modelling in Enterprise Intelligence: FASTopic and BERTopic in Code. In direction of Knowledge Science. 22.1.2025. Accessible from: link.

    [5] Korab, P. Matter Modelling with BERTtopic in Python. In direction of Knowledge Science. 4.1.2024. Accessible from: link.

    [6] Wu, X, Nguyen, T., Ce Zhang, D., Yang Wang, W., Luu, A. T. 2024. FASTopic: A Fast, Adaptive, Stable, and Transferable Topic Modeling Paradigm. arXiv preprint: 2405.17978.

    [7] Mimno, D., Wallach, H., M., Talley, E., Leenders, M, McCallum. A. 2011. Optimizing Semantic Coherence in Topic Models. Proceedings of the 2011 Convention on Empirical Strategies in Pure Language Processing.

    [8] Prostmaier, B., Vávra, J., Grün, B., Hofmarcher., P. 2025. Seeded Poisson Factorization: Leveraging domain knowledge to fit topic models. arXiv preprint: 2405.17978.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleHow a furniture retailer automated order confirmation processing
    Next Article Forget Siri: Elon Musk’s Grok Just Took Over Your iPhone
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Not Everything Needs Automation: 5 Practical AI Agents That Deliver Enterprise Value

    June 6, 2025
    Artificial Intelligence

    Prescriptive Modeling Unpacked: A Complete Guide to Intervention With Bayesian Modeling.

    June 6, 2025
    Artificial Intelligence

    5 Crucial Tweaks That Will Make Your Charts Accessible to People with Visual Impairments

    June 6, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    A new AI translation system for headphones clones multiple voices simultaneously

    May 9, 2025

    Ethical AI: Overcoming Bias in Human-AI Collaborative Evaluations

    April 9, 2025

    The Westworld Blunder | Towards Data Science

    May 13, 2025

    Google integerar Gemini Nano i Chrome för att identifiera bedrägerier

    May 10, 2025

    Predicting the NBA Champion with Machine Learning

    April 24, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    Google’s AlphaEvolve Is Evolving New Algorithms — And It Could Be a Game Changer

    May 16, 2025

    Avoiding Costly Mistakes with Uncertainty Quantification for Algorithmic Home Valuations

    April 8, 2025

    What’s next for AI and math

    June 4, 2025
    Our Picks

    Gemini introducerar funktionen schemalagda åtgärder i Gemini-appen

    June 7, 2025

    AIFF 2025 Runway’s tredje årliga AI Film Festival

    June 7, 2025

    AI-agenter kan nu hjälpa läkare fatta bättre beslut inom cancervård

    June 7, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.