Close Menu
    Trending
    • Gemini introducerar funktionen schemalagda åtgärder i Gemini-appen
    • AIFF 2025 Runway’s tredje årliga AI Film Festival
    • AI-agenter kan nu hjälpa läkare fatta bättre beslut inom cancervård
    • Not Everything Needs Automation: 5 Practical AI Agents That Deliver Enterprise Value
    • Prescriptive Modeling Unpacked: A Complete Guide to Intervention With Bayesian Modeling.
    • 5 Crucial Tweaks That Will Make Your Charts Accessible to People with Visual Impairments
    • Why AI Projects Fail | Towards Data Science
    • The Role of Luck in Sports: Can We Measure It?
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » Behind the Magic: How Tensors Drive Transformers
    Artificial Intelligence

    Behind the Magic: How Tensors Drive Transformers

    ProfitlyAIBy ProfitlyAIApril 25, 2025No Comments5 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Transformers have modified the best way synthetic intelligence works, particularly in understanding language and studying from information. On the core of those fashions are tensors (a generalized sort of mathematical matrices that assist course of data) . As information strikes by means of the totally different components of a Transformer, these tensors are topic to totally different transformations that assist the mannequin make sense of issues like sentences or pictures. Studying how tensors work inside Transformers will help you perceive how as we speak’s smartest AI techniques truly work and suppose.

    What This Article Covers and What It Doesn’t

    ✅ This Article IS About:

    • The stream of tensors from enter to output inside a Transformer mannequin.
    • Making certain dimensional coherence all through the computational course of.
    • The step-by-step transformations that tensors bear in numerous Transformer layers.

    ❌ This Article IS NOT About:

    • A common introduction to Transformers or deep studying.
    • Detailed structure of Transformer fashions.
    • Coaching course of or hyper-parameter tuning of Transformers.

    How Tensors Act Inside Transformers

    A Transformer consists of two important parts:

    • Encoder: Processes enter information, capturing contextual relationships to create significant representations.
    • Decoder: Makes use of these representations to generate coherent output, predicting every aspect sequentially.

    Tensors are the basic information buildings that undergo these parts, experiencing a number of transformations that guarantee dimensional coherence and correct data stream.

    Picture From Analysis Paper: Transformer normal archictecture

    Enter Embedding Layer

    Earlier than getting into the Transformer, uncooked enter tokens (phrases, subwords, or characters) are transformed into dense vector representations by means of the embedding layer. This layer capabilities as a lookup desk that maps every token vector, capturing semantic relationships with different phrases.

    Picture by writer: Tensors passing by means of Embedding layer

    For a batch of 5 sentences, every with a sequence size of 12 tokens, and an embedding dimension of 768, the tensor form is:

    • Tensor form: [batch_size, seq_len, embedding_dim] → [5, 12, 768]

    After embedding, positional encoding is added, making certain that order data is preserved with out altering the tensor form.

    Modified Picture from Analysis Paper: Scenario of the workflow

    Multi-Head Consideration Mechanism

    One of the crucial parts of the Transformer is the Multi-Head Consideration (MHA) mechanism. It operates on three matrices derived from enter embeddings:

    • Question (Q)
    • Key (Okay)
    • Worth (V)

    These matrices are generated utilizing learnable weight matrices:

    • Wq, Wk, Wv of form [embedding_dim, d_model] (e.g., [768, 512]).
    • The ensuing Q, Okay, V matrices have dimensions 
      [batch_size, seq_len, d_model].
    Picture by writer: Desk exhibiting the shapes/dimensions of Embedding, Q, Okay, V tensors

    Splitting Q, Okay, V into A number of Heads

    For efficient parallelization and improved studying, MHA splits Q, Okay, and V into a number of heads. Suppose we now have 8 consideration heads:

    • Every head operates on a subspace of d_model / head_count.
    Picture by writer: Multihead Consideration
    • The reshaped tensor dimensions are [batch_size, seq_len, head_count, d_model / head_count].
    • Instance: [5, 12, 8, 64] → rearranged to [5, 8, 12, 64] to make sure that every head receives a separate sequence slice.
    Picture by writer: Reshaping the tensors
    • So every head will get the its share of Qi, Ki, Vi
    Picture by writer: Every Qi,Ki,Vi despatched to totally different head

    Consideration Calculation

    Every head computes consideration utilizing the system:

    As soon as consideration is computed for all heads, the outputs are concatenated and handed by means of a linear transformation, restoring the preliminary tensor form.

    Picture by writer: Concatenating the output of all heads
    Modified Picture From Analysis Paper: Scenario of the workflow

    Residual Connection and Normalization

    After the multi-head consideration mechanism, a residual connection is added, adopted by layer normalization:

    • Residual connection: Output = Embedding Tensor + Multi-Head Consideration Output
    • Normalization: (Output − μ) / σ to stabilize coaching
    • Tensor form stays [batch_size, seq_len, embedding_dim]
    Picture by writer: Residual Connection

    Feed-Ahead Community (FFN)

    Within the decoder, Masked Multi-Head Consideration ensures that every token attends solely to earlier tokens, stopping leakage of future data.

    Modified Picture From Analysis Paper: Masked Multi Head Consideration

    That is achieved utilizing a decrease triangular masks of form [seq_len, seq_len] with -inf values within the higher triangle. Making use of this masks ensures that the Softmax perform nullifies future positions.

    Picture by writer: Masks matrix

    Cross-Consideration in Decoding

    For the reason that decoder doesn’t absolutely perceive the enter sentence, it makes use of cross-attention to refine predictions. Right here:

    • The decoder generates queries (Qd) from its enter ([batch_size, target_seq_len, embedding_dim]).
    • The encoder output serves as keys (Ke) and values (Ve).
    • The decoder computes consideration between Qd and Ke, extracting related context from the encoder’s output.
    Modified Picture From Analysis Paper: Cross Head Consideration

    Conclusion

    Transformers use tensors to assist them be taught and make good selections. As the information strikes by means of the community, these tensors undergo totally different steps—like being become numbers the mannequin can perceive (embedding), specializing in necessary components (consideration), staying balanced (normalization), and being handed by means of layers that be taught patterns (feed-forward). These modifications maintain the information in the proper form the entire time. By understanding how tensors transfer and alter, we will get a greater concept of how AI models work and the way they’ll perceive and create human-like language.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleA Step-By-Step Guide To Powering Your Application With LLMs
    Next Article OpenAI har släppt en omfattande prompt guide för GPT-4.1
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Not Everything Needs Automation: 5 Practical AI Agents That Deliver Enterprise Value

    June 6, 2025
    Artificial Intelligence

    Prescriptive Modeling Unpacked: A Complete Guide to Intervention With Bayesian Modeling.

    June 6, 2025
    Artificial Intelligence

    5 Crucial Tweaks That Will Make Your Charts Accessible to People with Visual Impairments

    June 6, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Simulating Flood Inundation with Python and Elevation Data: A Beginner’s Guide

    May 30, 2025

    Transform Medical Transcription through AI Speech-to-Text in 2025

    April 29, 2025

    New training approach could help AI agents perform better in uncertain conditions | MIT News

    April 7, 2025

    Med Claude Explains kan Claude nu skapa egna blogginlägg

    June 4, 2025

    MIT’s McGovern Institute is shaping brain science and improving human lives on a global scale | MIT News

    April 18, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    Shaip Democratizes Access to Critical Healthcare Data Through Partnership with Databricks Marketplace

    April 5, 2025

    3 Questions: How to help students recognize potential bias in their AI datasets | MIT News

    June 2, 2025

    Behind the Magic: How Tensors Drive Transformers

    April 25, 2025
    Our Picks

    Gemini introducerar funktionen schemalagda åtgärder i Gemini-appen

    June 7, 2025

    AIFF 2025 Runway’s tredje årliga AI Film Festival

    June 7, 2025

    AI-agenter kan nu hjälpa läkare fatta bättre beslut inom cancervård

    June 7, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.