Close Menu
    Trending
    • Claude Skills and Subagents: Escaping the Prompt Engineering Hamster Wheel
    • Scaling ML Inference on Databricks: Liquid or Partitioned? Salted or Not?
    • what enterprise leaders need to get right
    • Featured video: Coding for underwater robotics | MIT News
    • Coding the Pong Game from Scratch in Python
    • Stop Asking if a Model Is Interpretable
    • Generative AI, Discriminative Human | Towards Data Science
    • The Gap Between Junior and Senior Data Scientists Isn’t Code
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » Scaling ML Inference on Databricks: Liquid or Partitioned? Salted or Not?
    Artificial Intelligence

    Scaling ML Inference on Databricks: Liquid or Partitioned? Salted or Not?

    ProfitlyAIBy ProfitlyAIFebruary 28, 2026No Comments12 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Introduction

    a steady variable for 4 completely different merchandise. The machine studying pipeline was in-built Databricks and there are two main parts. 

    1. Characteristic preparation in SQL with serverless compute.
    2. Inference on an ensemble of a number of hundred fashions utilizing job clusters to have management over compute energy.

    In our first try, a 420-core cluster spent almost 10 hours processing simply 18 partitions.

    The target is to tune the info stream to maximise cluster utilization and guarantee scalability. Inference is completed on 4 units of ML fashions, one set per product. Nonetheless, we are going to concentrate on how the info is saved as it can lay out how a lot parallelism we will leverage for inference. We is not going to concentrate on the inside workings of the inference itself.

    If there are too few file partitions, the cluster will take a very long time scanning massive information and at that time, until repartitioned (which means added community latency and knowledge shuffling), you is perhaps inferencing on a big set of rows in each partition too. Additionally leading to long term instances.

    Fig 1. Don’t be afraid so as to add a little bit salt to your knowledge if it’s worthwhile to. Picture by Faran Raufi on Unsplash

    Nonetheless, enterprise has restricted endurance to ship out ML pipelines with a direct influence on the org. So checks are restricted.

    On this article, we are going to evaluation our characteristic knowledge panorama, then present an outline of the ML inference, and current the outcomes and discussions of the inference efficiency based mostly on 4 dataset therapy situations:

    1. Partitioned desk, no salt, no row restrict in partitions (non-salted and Partitioned)
    2. Partitioned desk, salted, with 1M row restrict (salty and Partitioned)
    3. Liquid-clustered desk, no salt, no row restrict in partitions (non-salted and Liquid)
    4. Liquid-clustered desk, salted, with 1M row restrict (salty and liquid)

    Information Panorama

    The dataset comprises options that the set of ML fashions makes use of for inference. It has ~550M rows and comprises 4 merchandise recognized within the attribute ProductLine:

    • Product A: ~10.45M (1.9%)
    • Product B: ~4.4M (0.8%)
    • Product C: ~100M (17.6%)
    • Product D: ~354M (79.7%)

    It then has one other low cardinality attribute attrB, that comprises solely two distinct values and is used as a filter to extract subsets of the dataset for each a part of the ML system.

    Furthermore, RunDate logs the date when the options have been generated. They’re append-only. Lastly, the dataset is learn utilizing the next question:

    SELECT
      Id,
      ProductLine,
      AttrB,
      AttrC,
      RunDate,
      {model_features}
    FROM
      catalog.schema.FeatureStore
    WHERE
      ProductLine = :product AND
      AttrB = :attributeB AND
      RunDate = :RunDate

    Salt Implementation

    The salting right here is generated dynamically. Its objective is to distribute the info in accordance with the volumes. Which means massive merchandise obtain extra buckets and smaller merchandise obtain fewer buckets. As an example, Product D ought to obtain round 80% of the buckets, given the proportions within the knowledge panorama.

    We do that so we will have predictable inference run instances and maximize cluster utilization.

    # Calculate proportion of every (ProductLine, AttrB) based mostly on row counts
    brand_cat_counts = df_demand_price_grid_load.groupBy(
       "ProductLine", "AttrB"
    ).rely()
    total_count = df_demand_price_grid_load.rely()
    brand_cat_percents = brand_cat_counts.withColumn(
       "p.c", F.col("rely") / F.lit(total_count)
    )
    
    # Accumulate percentages as dicts with string keys (this may later decide
    # the variety of salt buckets every product receives
    brand_cat_percent_dict = {
       f"{row['ProductLine']}|{row['AttrB']}": row['percent']
       for row in brand_cat_percents.accumulate()
    }
    
    # Accumulate counts as dicts with string keys (this may assist
    # so as to add a further bucket if counts just isn't divisible by the variety of 
    # buckets for the product
    brand_cat_count_dict = {
       f"{row['ProductLine']}|{row['AttrB']}": row['count']
       for row in brand_cat_percents.accumulate()
    }
    
    # Helper to flatten key-value pairs for create_map
    def dict_to_map_expr(d):
       expr = []
       for ok, v in d.gadgets():
           expr.append(F.lit(ok))
           expr.append(F.lit(v))
       return expr
    
    percent_case = F.create_map(*dict_to_map_expr(brand_cat_percent_dict))
    count_case = F.create_map(*dict_to_map_expr(brand_cat_count_dict))
    
    # Add string key column in pyspark
    df_demand_price_grid_load = df_demand_price_grid_load.withColumn(
       "product_cat_key",
       F.concat_ws("|", F.col("ProductLine"), F.col("AttrB"))
    )
    
    df_demand_price_grid_load = df_demand_price_grid_load.withColumn(
       "p.c", percent_case.getItem(F.col("product_cat_key"))
    ).withColumn(
       "product_count", count_case.getItem(F.col("product_cat_key"))
    )
    
    # Set min/max buckets
    min_buckets = 10
    max_buckets = 1160
    
    # Calculate buckets per row based mostly on (BrandName, price_delta_cat) proportion
    df_demand_price_grid_load = df_demand_price_grid_load.withColumn(
       "buckets_base",
       (F.lit(min_buckets) + (F.col("p.c") * (max_buckets - min_buckets))).solid("int")
    )
    
    # Add an additional bucket if brand_count just isn't divisible by buckets_base
    df_demand_price_grid_load = df_demand_price_grid_load.withColumn(
       "buckets",
       F.when(
           (F.col("product_count") % F.col("buckets_base")) != 0,
           F.col("buckets_base") + 1
       ).in any other case(F.col("buckets_base"))
    )
    
    # Generate salt per row based mostly on (ProductLine, AttrB) bucket rely
    df_demand_price_grid_load = df_demand_price_grid_load.withColumn(
       "salt",
       (F.rand(seed=42) * F.col("buckets")).solid("int")
    )
    
    # Carry out the repartition utilizing the core attributes and the salt column
    df_demand_price_grid_load = df_demand_price_grid_load.repartition(
       1200, "AttrB", "ProductLine", "salt"
    ).drop("product_cat_key", "p.c", "brand_count", "buckets_base", "buckets", "salt")

    Lastly, we save our dataset to the characteristic desk and add a max variety of rows per partition. That is to forestall Spark from producing partitions with too many rows, which it may well do even when we have now already computed the salt.

    Why can we implement 1M rows? The first focus is on mannequin inference time, not a lot on file measurement. After just a few checks with 1M, 1.5M, 2M, the primary yields the very best efficiency in our case. Once more, very finances and time-constrained for this venture, so we have now to take advantage of our sources.

    df_demand_price_grid_load.write
       .mode("overwrite")
       .possibility("replaceWhere", f"RunDate = '{params['RunDate']}'")
       .possibility("maxRecordsPerFile", 1_000_000) 
       .partitionBy("RunDate", "price_delta_cat", "BrandName") 
       .saveAsTable(f"{params['catalog_revauto']}.{params['schema_revenueautomation']}.demand_features_price_grid")

    Why not simply depend on Spark’s Adaptive Question Execution (AQE)?

    Recall that the first focus is on inference instances, not on measurements tuned for normal Spark SQL queries like file measurement. Utilizing solely AQE was truly our preliminary try. As you will notice within the outcomes, the run instances have been very undesirable and didn’t maximize the cluster utilization given our knowledge proportions. 

    Machine Studying inference 

    There’s a pipeline with 4 duties, one per product. Each activity does the next normal steps:

    • Hundreds the options from the corresponding product
    • Hundreds the subset of ML fashions for the corresponding product
    • Performs inference in half the subset sliced by AttrB 
    • Performs inference within the different half sliced by AttrB
    • Saves knowledge to the outcomes desk

    We are going to concentrate on one of many inference levels to not overwhelm this text with numbers, though the opposite stage may be very comparable in construction and outcomes. Furthermore, you may see the DAG for the inference to guage in Fig. 2.

    Fig 2. DAG for the ML inference spark stage. Personal authorship.

    It appears very easy, however the run instances can range relying on how your knowledge is saved and the scale of your cluster. 

    Cluster configuration

    For the inference stage we’re analyzing, there’s one cluster per product, tuned for the infrastructure limitations of the venture, and in addition the distribution of knowledge:

    • Product A: 35 employees (Standard_DS14v2, 420 cores)
    • Product B: 5 employees (Standard_DS14v2, 70 cores)
    • Product C: 1 employee (Standard_DS14v2, 14 cores)
    • Product D: 1 employee (Standard_DS14v2, 14 cores)

    As well as, AdaptiveQueryExecution is enabled by default, which is able to let Spark determine tips on how to finest save the info given the context you present.

    Outcomes and dialogue

    You will notice for every state of affairs an outline of the variety of file partitions per product and the common variety of rows per partition to present you a sign of what number of rows the ML system will do inference per Spark activity. Moreover, we current Spark UI metrics to watch run-time efficiency and search for the distribution of knowledge at inference time. We are going to do the Spark UI portion just for Product D, which is the most important, to not embody an extra of data. As well as, relying on the state of affairs, inference on Product D turns into a bottleneck in run time. Another excuse why it was the first focus of the outcomes.

    Non-Salted and Partitioned

    You may see in Fig. 3that the common file partition has tens of tens of millions of rows, which suggests appreciable run time for a single executor. The biggest on common is Product C with greater than 45M rows in a single partition. The smallest is Product B with roughly 12M common rows.

    Fig 3. Common row rely in a partition vs the merchandise.

    Fig 4. depict the variety of partitions per product, with a complete of 26 for all. Checking product D, 18 partitions fall very in need of the 420 cores we have now obtainable and on common, each partition will carry out inference on ~40M rows.

    Fig 4. Complete variety of file partitions per product

    Check out Fig 5. In complete, the cluster spent 9.9 hours and it nonetheless wasn’t full, as we needed to kill the job, for it was changing into costly and blocking different individuals’s checks.

    Fig 5. Abstract of the inference stage for the partitioned, non-salted dataset for Product D.

    From the abstract statistics in Fig. 6 for the duties that did end, we will see that there was heavy skew within the partitions for Product D. The utmost enter measurement was ~56M and the runtime was 7.8h.

    Fig 6. Abstract Statistics for the executors’ inference on the partitioned and non-salted dataset.

    Non-salted and Liquid

    On this state of affairs, we will observe very comparable outcomes by way of common variety of rows per file partition and variety of partitions per product, as seen in Fig. 7 and Fig. 8, respectively.

    Fig 7. Common row rely in a partition vs the merchandise

    Product D has 19 file partitions, nonetheless very in need of 420 cores. 

    Fig 8. Complete variety of file partitions per product

    We are able to already anticipate that this experiment was going to be very costly, so I made a decision to skip the inference take a look at for this state of affairs. Once more, in a great state of affairs, we supply it ahead, however there’s a backlog of tickets in my board.

    Salty and Partitioned

    After making use of the salting and repartition course of, we find yourself with ~2.5M common information per partition for merchandise A and B, and ~1M for merchandise C and D as depicted in Fig 9.

    Fig 9. Common row rely in a partition vs the merchandise

    Furthermore, we will see in Fig. 10 that the variety of file partitions elevated to roughly 860 for product D, which supplies 430 for every inference stage.

    Fig 10. Complete variety of file partitions per product

    This leads to a run time of 3h for inferencing Product D with 360 duties as seen in Fig 11.

    Fig 11. Abstract for the inference stage for partitioned and salted dataset

    Checking the abstract statistics from Fig. 12, the distribution seems balanced with run instances round 1.7, however a most activity taking 3h, which is price additional investigating sooner or later.

    Fig 12. Abstract Statistics for the executors’ inference on the partitioned and salted dataset.

    One nice profit is that the salt distributes the info in accordance with the proportions of the merchandise. If we had extra availability of sources, we might improve the variety of shuffle partitions in repartition() and add employees in accordance with the proportions of the info. This ensures that our course of scales predictably.

    Salty and Liquid

    This state of affairs combines the 2 strongest levers we have now explored up to now:

    salting to manage file measurement and parallelism, and liquid clustering to maintain associated knowledge colocated with out inflexible partition boundaries.

    After making use of the identical salting technique and a 1M row restrict per partition, the liquid-clustered desk exhibits a really comparable common partition measurement to the salted and partitioned case, as proven in Fig 13. Merchandise C and D stay near the 1M rows goal, whereas merchandise A and B settle barely above that threshold.

    Fig 13. Common row rely in a partition vs the merchandise

    Nonetheless, the principle distinction seems in how these partitions are distributed and consumed by Spark. As proven in Fig. 14, product D once more reaches a excessive variety of file partitions, offering sufficient parallelism to saturate the obtainable cores throughout inference.

    Fig 14. Complete variety of file partitions per product.

    In contrast to the partitioned counterpart, liquid clustering permits Spark to adapt file structure over time whereas nonetheless benefiting from the salt. This leads to a extra even distribution of labor throughout executors, with fewer excessive outliers in each enter measurement and activity length.

    From the abstract statistics in Fig. 15, we observe that almost all of duties are accomplished inside a decent runtime window, and the utmost activity length is decrease than within the salty and partitioned state of affairs. This means diminished skew and higher load balancing throughout the cluster.

    Fig 15. Abstract for the inference stage for liquid clustered and salted dataset
    Fig 16. Abstract Statistics for the executors’ inference on the liquid clustered and salted dataset.

    An vital facet impact is that liquid clustering preserves knowledge locality for the filtered columns with out imposing strict partition boundaries. This enables Spark to nonetheless profit from knowledge skipping, whereas the salt ensures that no single executor is overwhelmed with tens of tens of millions of rows.

    General, salty and liquid emerges as probably the most strong setup: it maximizes parallelism, minimizes skew, and reduces operational threat when inference workloads develop or cluster configurations change.

    Key Takeaways

    • Inference scalability is usually restricted by knowledge structure, not mannequin complexity. Poorly sized file partitions can go away a whole lot of cores idle whereas just a few executors course of tens of tens of millions of rows.
    • Partitioning alone just isn’t sufficient for large-scale inference. With out controlling file measurement, partitioned tables can nonetheless produce huge partitions that result in long-running, skewed duties.
    • Salting is an efficient software to unlock parallelism. Introducing a salt key and imposing a row restrict per partition dramatically will increase the variety of runnable duties and stabilizes runtimes.
    • Liquid clustering enhances salting by decreasing skew with out inflexible boundaries. It permits Spark to adapt file structure over time, making the system extra resilient as knowledge grows.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous Articlewhat enterprise leaders need to get right
    Next Article Claude Skills and Subagents: Escaping the Prompt Engineering Hamster Wheel
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Claude Skills and Subagents: Escaping the Prompt Engineering Hamster Wheel

    February 28, 2026
    Artificial Intelligence

    Featured video: Coding for underwater robotics | MIT News

    February 27, 2026
    Artificial Intelligence

    Coding the Pong Game from Scratch in Python

    February 27, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Should Sapling AI Be Your AI Detector: Sapling Review

    April 3, 2025

    Creating psychological safety in the AI era

    December 16, 2025

    Google släpper Veo 2 – Nu gratis att testa i AI Studio

    April 16, 2025

    Metas AI-app exponerar privata konversationer

    June 15, 2025

    Why Most Cyber Risk Models Fail Before They Begin

    April 24, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    ChatGPT Gets More Personal. Is Society Ready for It?

    October 21, 2025

    AI’s giants want to take over the classroom

    July 15, 2025

    No Peeking Ahead: Time-Aware Graph Fraud Detection

    September 14, 2025
    Our Picks

    Claude Skills and Subagents: Escaping the Prompt Engineering Hamster Wheel

    February 28, 2026

    Scaling ML Inference on Databricks: Liquid or Partitioned? Salted or Not?

    February 28, 2026

    what enterprise leaders need to get right

    February 27, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.