Close Menu
    Trending
    • Why Care About Prompt Caching in LLMs?
    • How Vision Language Models Are Trained from “Scratch”
    • Why physical AI is becoming manufacturing’s next advantage
    • Personalized Restaurant Ranking with a Two-Tower Embedding Variant
    • A Tale of Two Variances: Why NumPy and Pandas Give Different Answers
    • How to Build Agentic RAG with Hybrid Search
    • Building a strong data infrastructure for AI agent success
    • Defense official reveals how AI chatbots could be used for targeting decisions
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » Personalized Restaurant Ranking with a Two-Tower Embedding Variant
    Artificial Intelligence

    Personalized Restaurant Ranking with a Two-Tower Embedding Variant

    ProfitlyAIBy ProfitlyAIMarch 13, 2026No Comments7 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    , I’d prefer to share a sensible variation of Uber’s Two-Tower Embedding (TTE) method for instances the place each user-related knowledge and computing sources are restricted. The issue got here from a high traffic discovery widget on the house display screen of a meals supply app. This widget exhibits curated picks corresponding to Italian, Burgers, Sushi, or Wholesome. The picks are created from tags: every restaurant can have a number of tags, and every tile is actually a tag-defined slice of the catalog (with the addition of some handbook choosing). In different phrases, the candidate set is already recognized, so the actual downside isn’t retrieval however rating.

    At the moment this widget was considerably underperforming compared to different widgets on a discovery (predominant) display screen. The ultimate choice was ranked on normal recognition with out making an allowance for any personalised indicators. What we found is that customers are reluctant to scroll and in the event that they don’t discover one thing fascinating throughout the first 10 to 12 positions then they normally don’t convert. However the picks will be large generally, in some instances as much as 1500 eating places. On high of {that a} single restaurant might be chosen for various picks, which implies that for instance McDonald’s will be chosen for each Burgers and Ice Cream, however it’s clear that its recognition is simply legitimate for the primary choice, however the normal recognition sorting would put it on high in each picks.

    The product setup makes the issue even much less pleasant to static options corresponding to normal recognition sorting. These collections are dynamic and alter ceaselessly as a consequence of seasonal campaigns, operational wants, or new enterprise initiatives. Due to that, coaching a devoted mannequin for every particular person choice isn’t life like. A helpful recommender has to generalize to new tag-based collections from day one.

    Earlier than transferring to a two-tower-style answer, we tried less complicated approaches corresponding to localized recognition rating on the city-district stage and multi-armed bandits. In our case, neither delivered a measurable uplift over a normal recognition kind. As part of our analysis initiative we tried to regulate Uber’s TTE for our case.

    Two-Tower Embeddings Recap

    A two-tower mannequin learns two encoders in parallel: one for the person facet and one for the restaurant facet. Every tower produces a vector in a shared latent house, and relevance is estimated from a similarity rating, normally a dot product. The operational benefit is decoupling: restaurant embeddings will be precomputed offline, whereas the person embedding is generated on-line at request time. This makes the method engaging for methods that want quick scoring and reusable representations.

    Uber’s write-up centered primarily on retrieval, however it additionally famous that the identical structure can function a last rating layer when candidate era is already dealt with elsewhere and latency should stay low. That second formulation was a lot nearer to our use case.

    Our Strategy

    Picture by the creator

    We saved the two-tower construction however simplified probably the most resource-heavy elements. On the restaurant facet, we didn’t fine-tune a language mannequin contained in the recommender. As an alternative, we reused a TinyBERT mannequin that had already been fine-tuned for search within the app and handled it as a frozen semantic encoder. Its textual content embedding was mixed with express restaurant options corresponding to worth, rankings, and up to date efficiency indicators, plus a small trainable restaurant ID embedding, after which projected into the ultimate restaurant vector. This gave us semantic protection with out paying the complete value of end-to-end language-model coaching. For a POC or MVP, a small frozen sentence-transformer can be an affordable place to begin as effectively.

    We prevented studying a devoted user-ID embedding and as an alternative represented every person on the fly by their earlier interactions. The person vector was constructed from averaged embeddings of eating places the client had ordered from (Uber’s submit talked about this supply as effectively, however the authors don’t specify the way it was used), along with person and session options. We additionally used views with out orders as a weak unfavorable sign. That mattered when order historical past was sparse or irrelevant to the present choice. If the mannequin couldn’t clearly infer what the person preferred, it nonetheless helped to know which eating places had already been explored and rejected.

    A very powerful modeling selection was filtering that historical past by the tag of the present choice. Averaging the entire order historical past created an excessive amount of noise. If a buyer principally ordered burgers after which opened an Ice Cream choice, a world common may pull the mannequin towards burger locations that occurred to promote desserts moderately than towards the strongest ice cream candidates. By filtering previous interactions to matching tags earlier than averaging, we made the person illustration contextual as an alternative of world. In follow, this was the distinction between modeling long-term style and modeling present intent.

    Lastly, we educated the mannequin on the session stage and used multi-task studying. The identical restaurant might be constructive in a single session and unfavorable in one other, relying on the person’s present intent. The rating head predicted click on, add-to-basket, and order collectively, with a easy funnel constraint: P(order) ≤ P(add-to-basket) ≤ P(click on). This made the mannequin much less static and improved rating high quality in contrast with optimizing a single goal in isolation.

    Offline validation was additionally stricter than a random cut up: analysis used out-of-time knowledge and customers unseen throughout coaching, which made the setup nearer to manufacturing habits.

    Outcomes

    In keeping with A/B checks the ultimate system confirmed a statistically important uplift in conversion charge. Simply as importantly, it was not tied to 1 widget. As a result of the mannequin scores a person–restaurant pair moderately than a set record, it generalized naturally to new picks with out architectural modifications since tags are a part of restaurant’s metadata and will be retrieved with out picks in thoughts.

    That transferability made the mannequin helpful past the unique rating floor. We later reused it in Adverts, the place its CTR-oriented output was utilized to particular person promoted eating places with constructive outcomes. The identical illustration studying setup due to this fact labored each for choice rating and for different recommendation-like placement issues contained in the app.

    Additional Analysis

    The obvious subsequent step is multimodality. Restaurant photos, icons, and doubtlessly menu visuals will be added as further branches to the restaurant tower. That issues as a result of click on habits is strongly influenced by presentation. A pizza place inside a pizza choice might underperform if its predominant picture doesn’t present pizza, whereas a price range restaurant can look premium purely due to its hero picture. Textual content and tabular options don’t seize that hole effectively.

    Key Takeaways:

    • Two-Tower fashions can work even with restricted knowledge. You don’t want Uber-scale infrastructure if candidate retrieval is already solved and the mannequin focuses solely on the rating stage.
    • Reuse pretrained embeddings as an alternative of coaching from scratch. A frozen light-weight language mannequin (e.g., TinyBERT or a small sentence-transformer) can present sturdy semantic indicators with out costly fine-tuning.
    • Averaging embeddings of beforehand ordered eating places works surprisingly effectively when person historical past is sparse.
    • Contextual filtering reduces noise and helps the mannequin seize the person’s present intent, not simply long-term style.
    • Adverse indicators assist in sparse environments. Eating places that customers seen however didn’t order from present helpful info when constructive indicators are restricted.
    • Multi-task studying stabilizes rating. Predicting click on, add-to-basket, and order collectively with funnel constraints produces extra constant scores.
    • Design for reuse. A mannequin that scores person–restaurant pairs moderately than particular lists will be reused throughout product surfaces corresponding to picks, search rating, or advertisements.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleA Tale of Two Variances: Why NumPy and Pandas Give Different Answers
    Next Article Why physical AI is becoming manufacturing’s next advantage
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Why Care About Prompt Caching in LLMs?

    March 13, 2026
    Artificial Intelligence

    How Vision Language Models Are Trained from “Scratch”

    March 13, 2026
    Artificial Intelligence

    A Tale of Two Variances: Why NumPy and Pandas Give Different Answers

    March 13, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    AI strategies from the front lines

    May 21, 2025

    Building a Modern Dashboard with Python and Gradio

    June 4, 2025

    Physics-Informed Neural Networks for Inverse PDE Problems

    July 29, 2025

    The Product Health Score: How I Reduced Critical Incidents by 35% with Unified Monitoring and n8n Automation

    November 28, 2025

    AI is coming for music, too

    April 16, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    What Are Agent Skills Beyond Claude?

    March 10, 2026

    If You Want to Become a Data Scientist in 2026, Do This

    January 21, 2026

    Introducing the AI-3P Assessment Framework: Score AI Projects Before Committing Resources

    September 24, 2025
    Our Picks

    Why Care About Prompt Caching in LLMs?

    March 13, 2026

    How Vision Language Models Are Trained from “Scratch”

    March 13, 2026

    Why physical AI is becoming manufacturing’s next advantage

    March 13, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.