Close Menu
    Trending
    • What health care providers actually want from AI
    • Alibaba har lanserat Qwen-Image-Edit en AI-bildbehandlingsverktyg som öppenkällkod
    • Can an AI doppelgänger help me do my job?
    • Therapists are secretly using ChatGPT during sessions. Clients are triggered.
    • Anthropic testar ett AI-webbläsartillägg för Chrome
    • A Practical Blueprint for AI Document Classification
    • Top Priorities for Shared Services and GBS Leaders for 2026
    • The Generalist: The New All-Around Type of Data Professional?
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » From Reactive to Predictive: Forecasting Network Congestion with Machine Learning and INT
    Artificial Intelligence

    From Reactive to Predictive: Forecasting Network Congestion with Machine Learning and INT

    ProfitlyAIBy ProfitlyAIJuly 18, 2025No Comments8 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Context

    facilities, community slowdowns can seem out of nowhere. A sudden burst of site visitors from distributed methods, microservices, or AI coaching jobs can overwhelm change buffers in seconds. The issue is not only figuring out when one thing goes unsuitable. It’s with the ability to see it coming earlier than it occurs.
    Telemetry methods are extensively used to watch community well being, however most function in a reactive mode. They flag congestion solely after efficiency has degraded. As soon as a hyperlink is saturated or a queue is full, you’re already previous the purpose of early analysis, and tracing the unique trigger turns into considerably more durable.

    In-band Community Telemetry, or INT, tries to unravel that hole by tagging dwell packets with metadata as they journey by the community. It provides you a real-time view of how site visitors flows, the place queues are increase, the place latency is creeping in, and the way every change is dealing with forwarding. It’s a highly effective instrument when used rigorously. Nevertheless it comes with a value. Enabling INT on each packet can introduce severe overhead and push a flood of telemetry knowledge to the management airplane, a lot of which you may not even want.

    What if we could possibly be extra selective? As an alternative of monitoring every part, we forecast the place bother is prone to type and allow INT only for these areas and only for a short while. This fashion, we get detailed visibility when it issues most with out paying the complete price of always-on monitoring.

    The Drawback with At all times-On Telemetry

    INT provides you a strong, detailed view of what’s occurring contained in the community. You may observe queue lengths, hop-by-hop latency, and timestamps straight from the packet path. However there’s a value: this telemetry knowledge provides weight to each packet, and for those who apply it to all site visitors, it may possibly eat up important bandwidth and processing capability.
    To get round that, many methods take shortcuts:

    Sampling: Tag solely a fraction (e.g. — 1%) of packets with telemetry knowledge.

    Occasion-triggered telemetry: Activate INT solely when one thing dangerous is already occurring, like a queue crossing a threshold.

    These methods assist management overhead, however they miss the vital early moments of a site visitors surge, the half you most wish to perceive for those who’re making an attempt to forestall slowdowns.

    Introducing a Predictive Method

    As an alternative of reacting to signs, we designed a system that may forecast congestion earlier than it occurs and activate detailed telemetry proactively. The thought is straightforward: if we are able to anticipate when and the place site visitors goes to spike, we are able to selectively allow INT only for that hotspot and just for the proper window of time.

    This retains overhead low however provides you deep visibility when it truly issues.

    System Design

    We got here up with a easy strategy that makes community monitoring extra clever. It may possibly predict when and the place monitoring is definitely wanted. The thought is to not pattern each packet and to not watch for congestion to occur. As an alternative, we wish a system that would catch indicators of bother early and selectively allow high-fidelity monitoring solely when it’s wanted.

    So, how’d we get this achieved? We created the next 4 vital parts, every for a definite job.

    Picture supply: Creator

    Information Collector

    We start by amassing community knowledge to watch how a lot knowledge is shifting by completely different community ports at any given second. We use sFlow for knowledge assortment as a result of it helps to gather essential metrics with out affecting community efficiency. These metrics are captured at common intervals to get a real-time view of the community at any time.

    Forecasting Engine

    The Forecasting engine is a very powerful part of our system. It’s constructed utilizing a Lengthy Quick-Time period Reminiscence (LSTM) mannequin. We went with LSTM as a result of it learns how patterns evolve over time, making it appropriate for community site visitors. We’re not in search of perfection right here. The essential factor is to identify uncommon site visitors spikes that usually present up earlier than congestion begins.

    Telemetry Controller

    The controller listens to these forecasts and makes choices. When a predicted spike crosses alert threshold the system would reply. It sends a command to the switches to change into an in depth monitoring mode, however just for the flows or ports that matter. It additionally is aware of when to again off, turning off the additional telemetry as soon as situations return to regular.

    Programmable Information Aircraft

    The ultimate piece is the change itself. In our setup, we use P4 programmable BMv2 switches that allow us alter packet conduct on the fly. More often than not, the change merely forwards site visitors with out making any adjustments. However when the controller activates INT, the change begins embedding telemetry metadata into packets that match particular guidelines. These guidelines are pushed by the controller and allow us to goal simply the site visitors we care about.

    This avoids the tradeoff between fixed monitoring and blind sampling. As an alternative, we get detailed visibility precisely when it’s wanted, with out flooding the system with pointless knowledge the remainder of the time.

    Experimental Setup

    We constructed a full simulation of this technique utilizing:

    • Mininet for emulating a leaf-spine community
    • BMv2 (P4 software program change) for programmable knowledge airplane conduct
    • sFlow-RT for real-time site visitors stats
    • TensorFlow + Keras for the LSTM forecasting mannequin
    • Python + gRPC + P4Runtime for the controller logic

    The LSTM was educated on artificial site visitors traces generated in Mininet utilizing iperf. As soon as educated, the mannequin runs in a loop, making predictions each 30 seconds and storing forecasts for the controller to behave on.

    Right here’s a simplified model of the prediction loop:

    For each 30 seconds:
    latest_sample = data_collector.current_traffic()
    slinding_window += latest_sample
    if sliding_window measurement >= window measurement:
    forecast = forecast_engine.predict_upcoming_traffic()
    if forecast > alert_threshold:
    telem_controller.trigger_INT()

    Switches reply instantly by switching telemetry modes for particular flows.

    Why LSTM?

    We went with an LSTM mannequin as a result of community site visitors tends to have construction. It’s not fully random. There are patterns tied to time of day, background load, or batch processing jobs, and LSTMs are significantly good at selecting up on these temporal relationships. Not like easier fashions that deal with every knowledge level independently, an LSTM can keep in mind what got here earlier than and use that reminiscence to make higher short-term predictions. For our use case, meaning recognizing early indicators of an upcoming surge simply by how the previous couple of minutes behaved. We didn’t want it to forecast precise numbers, simply to flag when one thing irregular is perhaps coming. LSTM gave us simply sufficient accuracy to set off proactive telemetry with out overfitting to noise.

    Analysis

    We didn’t run large-scale efficiency benchmarks, however by our prototype and system conduct in take a look at situations, we are able to define the sensible benefits of this design strategy.

    Lead Time Benefit

    One of many foremost advantages of a predictive system like that is its capacity to catch bother early. Reactive telemetry options usually wait till a queue threshold is crossed or efficiency degrades, which suggests you’re already behind the curve. Against this, our design anticipates congestion primarily based on site visitors tendencies and prompts detailed monitoring prematurely, giving operators a clearer image of what led to the difficulty, not simply the signs as soon as they seem.

    Monitoring Effectivity

    A key purpose on this challenge was to maintain overhead low with out compromising visibility. As an alternative of making use of full INT throughout all site visitors or counting on coarse-grained sampling, our system selectively allows high-fidelity telemetry for brief bursts, and solely the place forecasts point out potential issues. Whereas we haven’t quantified the precise price financial savings, the design naturally limits overhead by protecting INT centered and short-lived, one thing that static sampling or reactive triggering can’t match.

    Conceptual Comparability of Telemetry Methods

    Whereas we didn’t file overhead metrics, the intent of the design was to discover a center floor, delivering deeper visibility than sampling or reactive methods however at a fraction of the price of always-on telemetry. Right here’s how the strategy compares at a excessive stage:

    Picture supply: Creator

    Conclusion

    We needed to determine a greater option to monitor the community site visitors. By combining machine studying and programmable switches, we constructed a system that predicts congestion earlier than it occurs and prompts detailed telemetry in simply the proper place and time.

    It looks like a minor change to foretell as a substitute of react, however it opens up a brand new stage of observability. As telemetry turns into more and more essential in AI-scale knowledge facilities and low-latency companies, this sort of clever monitoring will turn out to be a baseline expectation, not only a good to have.

    References

    1. https://www.researchgate.net/publication/340034106_Adaptive_Telemetry_for_Software-Defined_Mobile_Networks
    2. https://liyuliang001.github.io/publications/hpcc.pdf



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleTDS Authors Can Now Edit Their Published Articles
    Next Article Gain a Better Understanding of Computer Vision: Dynamic SOLO (SOLOv2) with TensorFlow
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    The Generalist: The New All-Around Type of Data Professional?

    September 1, 2025
    Artificial Intelligence

    How to Develop a Bilingual Voice Assistant

    August 31, 2025
    Artificial Intelligence

    The Machine Learning Lessons I’ve Learned This Month

    August 31, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    An ancient RNA-guided system could simplify delivery of gene editing therapies | MIT News

    April 5, 2025

    Can TurnItHuman Bypass Winston? | Gold Penguin

    April 3, 2025

    What Counts as AGI? The Test That Could Rewrite One of AI’s Richest Deals

    August 5, 2025

    Hybrid AI model crafts smooth, high-quality videos in seconds | MIT News

    May 6, 2025

    Gamers Nexus avslöjar omfattande GPU-smugglingsimperium från Kina

    August 19, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    AI learns how vision and sound are connected, without human intervention | MIT News

    May 22, 2025

    This tool strips away anti-AI protections from digital art

    July 10, 2025

    Can we fix AI’s evaluation crisis?

    June 24, 2025
    Our Picks

    What health care providers actually want from AI

    September 2, 2025

    Alibaba har lanserat Qwen-Image-Edit en AI-bildbehandlingsverktyg som öppenkällkod

    September 2, 2025

    Can an AI doppelgänger help me do my job?

    September 2, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.