Close Menu
    Trending
    • Optimizing Data Transfer in Distributed AI/ML Training Workloads
    • Achieving 5x Agentic Coding Performance with Few-Shot Prompting
    • Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found
    • From Transactions to Trends: Predict When a Customer Is About to Stop Buying
    • America’s coming war over AI regulation
    • “Dr. Google” had its issues. Can ChatGPT Health do better?
    • Evaluating Multi-Step LLM-Generated Content: Why Customer Journeys Require Structural Metrics
    • Why SaaS Product Management Is the Best Domain for Data-Driven Professionals in 2026
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » Javascript Fatigue: HTMX Is All You Need to Build ChatGPT — Part 2
    Artificial Intelligence

    Javascript Fatigue: HTMX Is All You Need to Build ChatGPT — Part 2

    ProfitlyAIBy ProfitlyAINovember 17, 2025No Comments17 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    1, we confirmed how we might leverage HTMX so as to add interactivity to our HTML components. In different phrases, Javascript with out Javascript. As an instance that, we started constructing a easy chat that may return a simulated LLM response. On this article, we are going to prolong the capabilities of our chatbot and add a number of options, amongst which streaming, which is a big enhancement when it comes to person expertise in comparison with the synchronous chat constructed beforehand.

    • ✅ Actual-time streaming with SSE
    • ✅ Session-based structure for a number of customers
    • ✅ Async coordination with asyncio.Queue
    • ✅ Clear HTMX patterns with devoted SSE dealing with
    • ✅ A Google Search Agent to reply queries with recent information
    • ✅ Virtually Zero JavaScript

    Here’s what we are going to construct right now:

    From sync communication to async

    What we constructed beforehand leveraged very primary net functionalities leveraging kinds.Our communication was synchronous, which means we don’t get something till the server is finished. We concern a request, we anticipate the total response, and we show it. Between the 2, we simply…wait.

    However fashionable chatbots work otherwise, by offering asynchronous communication capabilities. That is completed utilizing streaming: we get updates and partial responses as an alternative of ready for the total response. That is notably useful when the response course of takes time, which is often the case for LLMs when the reply is lengthy.

    SSE vs Websockets

    SSE (Server Aspect Occasions) and Websockets are two real-time information exchanges protocols between a shopper and a server.

    Websockets permits for full-duplex connections: this implies the browser and the server can each ship and obtain information concurrently. That is usually utilized in on-line gaming, chat functions, and collaborative instruments (google sheets).

    SSE is unidirectional and solely permits a one-way dialog, from server to shopper. Because of this the shopper can’t ship something to the server through this protocol. If websockets is a two-way telephone dialog the place folks can communicate and pay attention on the identical time, SSE is like listening to the radio. SSE are usually used to ship notification, replace charts in finance functions, or newsfeeds.

    So why will we select SSE? Nicely as a result of in our use case we don’t want full duplex, and that easy HTTP (which isn’t how Websockets work) is sufficient for our use case: we ship information, we obtain information. SSE simply means that we’ll obtain information in a stream, nothing extra is required.

    What we wish to do

    1. Consumer inputs a question
    2. Server receives the question and sends it to the LLM
    3. LLM begins producing content material
    4. For each bit of content material, the server returns it instantly
    5. Browser provides this piece of data to the DOM

    We’ll separate our work into backend and frontend sections.

    Backend

    The backend will proceed in 2 steps:

    • A POST endpoint that may obtain the message, and return nothing
    • A GET endpoint that may learn a queue and produce an output stream.

    In our demo, to start with, we are going to create a pretend LLM response by repeating the person enter, which means that the phrases of the stream will likely be precisely the identical because the person enter.

    To maintain issues clear, we have to separate the message streams (the queues) by person session, in any other case we’d find yourself mixing up conversations. We’ll subsequently create a session dictionary to host our queues.

    Subsequent, we have to inform the backend to attend earlier than the queue is crammed earlier than streaming our response. If we don’t, we are going to encounter concurrency run or timing points: SSE begins on shopper aspect, queue is empty, SSE closes, person inputs a message however…it’s too late!

    The answer: async queues! Utilizing asynchronous queues has a number of benefits:

    • If queue has information: Returns instantly
    • If queue is empty: Suspends execution till queue.put() known as
    • A number of shoppers: Every will get their very own information
    • Thread-safe: No race situations

    I do know you’re burning to know extra, so right here is the code under:

    from fastapi import FastAPI, Request, Kind
    from fastapi.templating import Jinja2Templates
    from fastapi.responses import HTMLResponse, StreamingResponse
    import asyncio
    import time
    import uuid
    
    app = FastAPI()
    templates = Jinja2Templates("templates")
    
    # This object will retailer session id and their corresponding worth, an async queue.
    periods = dict()
    
    @app.get("/")
    async def root(request: Request):
        session_id = str(uuid.uuid4())
        periods[session_id] = asyncio.Queue()
        return templates.TemplateResponse(request, "index.html", context={"session_id": session_id})
    
    
    @app.submit("/chat")
    async def chat(request: Request, question: str=Kind(...), session_id: str=Kind(...)):
        """ Ship message to session-based queue """
    
        # Create the session if it doesn't exist
        if session_id not in periods:
            periods[session_id] = asyncio.Queue()
    
        # Put the message within the queue
        await periods[session_id].put(question)
    
        return {"standing": "queued", "session_id": session_id}
    
    
    @app.get("/stream/{session_id}")
    async def stream(session_id: str):
        
        async def response_stream():
    
            if session_id not in periods:
                print(f"Session {session_id} not discovered!")
                return
    
            queue = periods[session_id]
    
            # This BLOCKS till information arrives
            print(f"Ready for message in session {session_id}")
            information = await queue.get()
            print(f"Bought message: {information}")
    
            message = ""
            await asyncio.sleep(1)
            for token in information.substitute("n", " ").break up(" "):
                message += token + " "
                information = f"""information: <li class='mb-6 ml-[20%]'> <div class='font-bold text-right'>AI</div><div>{message}</div></li>nn"""
                yield information
                await asyncio.sleep(0.03)
    
            queue.task_done()
    
        return StreamingResponse(response_stream(), media_type="textual content/event-stream")
    

    Let’s clarify a few key ideas right here.

    Session isolation

    It will be significant that every customers will get their very own message queue, in order to not combine up conversations. The way in which to try this is through the use of the periods dictionary. In actual manufacturing apps, we’d in all probability use Redis to retailer that. Within the code under, we see {that a} new session id is created on web page load, and saved within the periods dictionary. Reloading the web page will begin a brand new session, we aren’t persisting the message queues however we might through a database for instance. This subject is roofed partially 3.

    # This object will retailer session id and their corresponding worth, an async queue.
    periods = dict()
    
    @app.get("/")
    async def root(request: Request):
        session_id = str(uuid.uuid4())
        periods[session_id] = asyncio.Queue()
        return templates.TemplateResponse(request, "index.html", context={"session_id": session_id})
    

    Blocking coordination

    We have to management the order through which SSE are despatched and the person question is obtained. The order is, on the backend aspect:

    1. Obtain person message
    2. Create a message queue and populate it
    3. Ship messages from the queue in a Streaming Response

    Failure to take action might result in undesirable habits, ie. first studying the (empty) message queue, then populating it with the person’s question.

    The answer to manage the order is to make use of asyncio.Queue. This object will likely be used twice:

    • After we insert new messages within the queue. Inserting messages will “get up” the polling within the SSE endpoint
    await periods[session_id].put(question)
    • After we pull messages from the queue. On this line, the code is blocked till a sign from the queue arrives saying “hey, i’ve new information!”:
    information = await queue.get()

    This sample affords a number of benefits:

    • Every person has its personal queue
    • There is no such thing as a threat of race situations

    Streaming simulation

    On this article, we are going to simulate a LLM response by splitting the person’s question in phrases and return these phrases one after the other. Partly 3, we are going to truly plug an actual LLM to that.

    The streaming is dealt with through the StreamingResponse object from FastAPI. This object expects an asynchronous generator that may yield information till the generator is over. We’ve to make use of the yield key phrase as an alternative of the return key phrase, in any other case our generator would simply cease after the primary iteration.

    Let’s decompose our streaming perform:

    First, we have to guarantee we’ve got a queue for the present session from which we are going to pull messages:

    if session_id not in periods:
        print(f"Session {session_id} not discovered!")
        return
    
    queue = periods[session_id]

    Subsequent, as soon as we’ve got the queue, we are going to pull messages from the queue if it incorporates any, in any other case the code pauses and waits for messages to reach. That is crucial a part of our perform:

    # This BLOCKS till information arrives
    print(f"Ready for message in session {session_id}")
    information = await queue.get()
    print(f"Bought message: {information}")

    To simulate stream, we are going to now chunk the message in phrases (referred to as tokens right here), and add a while sleeps to simulate the textual content era course of from a LLM (the asyncio.sleep elements). Discover how the information we yield is definitely HTML strings, encapsulated in a string beginning with “information:”. That is how SSE messages are despatched. You may also select to flag your messages with the “occasion:” metadata. An instance could be:

    occasion: my_custom_event
    information: <div>Content material to swap into your HTML web page.</div>
    

    Let’s see how we implement it in Python (for the purists, use Jinja templates to render the HTML as an alternative of a string:) ):

    message = ""
    
    # First pause to let the browser show "Pondering when the message is shipped"
    await asyncio.sleep(1)
    
    # Simulate streaming by splitting message in phrases
    for token in information.substitute("n", " ").break up(" "):
    
        # We append tokens to the message
        message += token + " "
    
        # We wrap the message in HTML tags with the "information" metadata
        information = f"""information: <li class='mb-6 ml-[20%]'><div class='font-bold text-right'>AI</div><div>{message}</div></li>nn"""
        yield information
    
        # Pause to simulate the LLM era course of
        await asyncio.sleep(0.03)
    
    queue.task_done()

    Frontend

    Our frontend has 2 jobs: ship person queries to the backend, and pay attention for SSE message on a selected channel (the session_id). To try this, we apply an idea referred to as “Separation of ideas”, which means every HTMX aspect is chargeable for a single job solely.

    • the shape sends a person enter
    • the sse listener handles the streaming
    • the ul chat shows the message

    To ship messages, we are going to use an ordinary textarea enter in a kind. The HTMX magic is slightly below:

    <kind 
        id="userInput" 
        class="flex max-h-16 gap-4"
        hx-post="/chat" 
        hx-swap="none"
        hx-trigger="click on from:#submitButton" 
        hx-on::before-request="
            htmx.discover('#chat').innerHTML += `<li class='mb-6 justify-start max-w-[80%]'><div class='font-bold'>Me</div><div>${htmx.discover('#question').worth}</div></li>`;
            htmx.discover('#chat').innerHTML += `<li class='mb-6 ml-[20%]'><div class='font-bold text-right'>AI</div><div class='text-right'>Pondering...</div></li>`;
            htmx.discover('#question').worth = '';
        "
    >
        <textarea 
            id="question" 
            identify="question"
            class="flex w-full rounded-md border border-input bg-transparent px-3 py-2 text-sm shadow-sm placeholder:text-muted-foreground focus-visible:outline-none focus-visible:ring-1 focus-visible:ring-ring disabled:cursor-not-allowed disabled:opacity-50 min-h-[44px] max-h-[200px]"
            placeholder="Write a message..." 
            rows="4"></textarea>
        <button 
            kind="submit" 
            id="submitButton"
            class="inline-flex max-h-16 items-center justify-center rounded-md bg-neutral-950 px-6 font-medium text-neutral-50 transition lively:scale-110"
        >Sends</button>
    </kind>
    

    In the event you bear in mind the article from part 1, we’ve got a number of HTMX attributes which deserve explanations:

    • hx-post: The endpoint the shape information will likely be submitted.
    • hx-swap: Set to none, as a result of in our case the endpoint doesn’t return any information.
    • hx-trigger: Specifies which occasion will set off the request
    • hx-on::before-request: A really gentle half with javascript so as to add some snappiness to the app. We’ll append the person’s request to the record within the chat, and show a “Pondering” message to the person whereas we’re ready for the SSE messages to stream. That is nicer that having to stare at a clean web page.

    It’s value nothing that we truly ship 2 parameters to the backend: the person’s enter and the session id. This fashion, the message will likely be inserted in the correct queue on the backend aspect.

    Then, we outline one other element that’s particularly devoted to listening to SSE messages.

    <!-- Messages will likely be added to this list-->
    <div class="mb-auto max-h-[80%] overflow-auto">
        <ul id="chat" class="rounded-2xl p-4 mb-16 justify-start">
        </ul>
    </div>
    
    <!-- SSE listened (message buffer)-->
    <div 
        hx-ext="sse" 
        sse-connect="/stream/{{ session_id }}" 
        sse-swap="message" 
        hx-swap="outerHTML scroll:backside"
        hx-target="#chat>li:last-child" 
        model="show: none;"
    ></div>

    This element will hearken to the /stream endpoint and cross its session id to pay attention for messages for this session solely. The hx-target tells the browser so as to add the information to the final li aspect of the chat. The hx-swap specifies that the information is definitely meant to exchange your entire present li aspect. That is how our streaming impact will work: changing present message with the most recent one.

    Observe: different strategies might have been used to exchange particular components of the DOM, comparable to out-of-band (OOB) swaps. They work just a little bit otherwise since they require a selected id to search for within the DOM. In our case, we selected on goal to not assign ids to every written record components

    A Actual Chatbot utilizing Google Agent Improvement Equipment

    Now’s the time to exchange our dummy streaming endpoint with an actual LLM. To realize that, we are going to construct an agent utilizing Google ADK, outfitted with instruments and reminiscence to fetch data and bear in mind dialog particulars.

    A really quick introduction to brokers

    You in all probability already know what a LLM is, a minimum of I assume you do. The principle disadvantage of LLMs as of right now is that LLMs alone can’t entry actual time data: their data is frozen in the meanwhile they had been skilled. The opposite disadvantage is their lack of ability to entry data that’s exterior their coaching scope (eg, your organization’s inside information),

    Brokers are a sort of AI functions that may cause, act and observe. The reasoning half is dealt with by the LLM, the “mind”. The “arms” of the brokers are what we name “instruments”, and may take a number of kinds:

    • a Python perform, for instance to fetch an API
    • a MCP server, which is an ordinary that enables brokers to connect with APIs by way of a standardized interface (eg accessing all of the Gsuite instruments with out having to jot down your self the API connectors)
    • different brokers (in that case, this sample known as agent delegation had been a router or grasp brokers controls completely different sub-agents)

    In our demo, to make issues quite simple, we are going to use a quite simple agent that may use one software: Google Search. This can permit us to get recent data and guarantee it’s dependable (a minimum of we hope that the Google Search outcomes are…)

    Within the Google ADK world, brokers want primary data:

    • identify and outline, for documentation functions principally
    • directions: the immediate that defines the habits of the agent (instruments use, output format, steps to observe, and many others)
    • instruments: the capabilities / MCP servers / brokers the agent can use to meet its goal

    There are additionally different ideas round reminiscence and session administration, however which can be out of scope.

    With out additional ado, let’s outline our agent!

    A Streaming Google Search Agent

    from google.adk.brokers import Agent
    from google.adk.brokers.run_config import RunConfig, StreamingMode
    from google.adk.runners import Runner
    from google.adk.periods import InMemorySessionService
    from google.genai import sorts
    from google.adk.instruments import google_search
    
    # Outline constants for the agent
    APP_NAME = "default"  # Software
    USER_ID = "default"  # Consumer
    SESSION = "default"  # Session
    MODEL_NAME = "gemini-2.5-flash-lite"
    
    # Step 1: Create the LLM Agent
    root_agent = Agent(
        mannequin=MODEL_NAME,
        identify="text_chat_bot",
        description="A textual content chatbot",
        instruction="You're a useful assistant. Your purpose is to reply questions primarily based in your data. Use your Google Search software to offer the most recent and most correct data",
        instruments=[google_search]
    )
    
    # Step 2: Arrange Session Administration
    # InMemorySessionService shops conversations in RAM (momentary)
    session_service = InMemorySessionService()
    
    # Step 3: Create the Runner
    runner = Runner(agent=root_agent, app_name=APP_NAME, session_service=session_service)

    The `Runner` object acts because the orchestrator between you and the agent.

    Subsequent, we (re)outline our `/stream` endpoint. We first verify the session for the agent exists, in any other case we create it:

            # Try and create a brand new session or retrieve an current one
            attempt:
                session = await session_service.create_session(
                    app_name=APP_NAME, user_id=USER_ID, session_id=session_id
                )
            besides:
                session = await session_service.get_session(
                    app_name=APP_NAME, user_id=USER_ID, session_id=session_id
                )

    Then, we take the person question, cross it to the agent in an async trend to get a stream again:

            # Convert the question string to the ADK Content material format
            question = sorts.Content material(function="person", elements=[types.Part(text=query)])
    
            # Stream the agent's response asynchronously
            async for occasion in runner.run_async(
                user_id=USER_ID, session_id=session.id, new_message=question, run_config=RunConfig(streaming_mode=StreamingMode.SSE)
            ):
    

    There’s a subtlety subsequent. When producing a response, the agent may output a double linebreak “nn”. That is problematic as a result of SSE occasions finish with this image. Having a double linebreak in your string subsequently means:

    • your present message will likely be truncated
    • your subsequent message will likely be incorrectly formatted and the SSE stream will cease

    You’ll be able to attempt it by your self. To repair this, we are going to use just a little hack, together with one other little hack to format record components (I take advantage of Tailwind CSS which overrides sure CSS guidelines). The hack is:

                if occasion.partial:
                    message += occasion.content material.elements[0].textual content
                  
                    # Hack right here
                    html_content = markdown.markdown(message, extensions=['fenced_code']).substitute("n", "<br/>").substitute("<li>", "<li class='ml-4'>").substitute("<ul>", "<ul class='list-disc'>")
    
                    full_html = f"""information: <li class='mb-6 ml-[20%]'> <div class='font-bold text-right'>AI</div><div>{html_content}</div></li>nn"""
    
                    yield full_html

    This fashion, we be certain that no double linebreaks will break our SSE stream.

    Full code for the route is under:

    @app.get("/stream/{session_id}")
    async def stream(session_id: str):
    
        async def response_stream():
    
            if session_id not in periods:
                print(f"Session {session_id} not discovered!")
                return
    
            # Try and create a brand new session or retrieve an current one
            attempt:
                session = await session_service.create_session(
                    app_name=APP_NAME, user_id=USER_ID, session_id=session_id
                )
            besides:
                session = await session_service.get_session(
                    app_name=APP_NAME, user_id=USER_ID, session_id=session_id
                )
    
            queue = periods[session_id]
    
            # This BLOCKS till information arrives
            print(f"Ready for message in session {session_id}")
            question = await queue.get()
            print(f"Bought message: {question}")
    
            message = ""
    
            # Convert the question string to the ADK Content material format
            question = sorts.Content material(function="person", elements=[types.Part(text=query)])
    
            # Stream the agent's response asynchronously
            async for occasion in runner.run_async(
                user_id=USER_ID, session_id=session.id, new_message=question, run_config=RunConfig(streaming_mode=StreamingMode.SSE)
            ):
                if occasion.partial:
                    message += occasion.content material.elements[0].textual content
    
                    html_content = markdown.markdown(message, extensions=['fenced_code']).substitute("n", "<br/>").substitute("<li>", "<li class='ml-4'>").substitute("<ul>", "<ul class='list-disc'>")
                    
                    full_html = f"""information: <li class='mb-6 ml-[20%]'> <div class='font-bold text-right'>AI</div><div>{html_content}</div></li>nn"""
    
                    yield full_html
    
            queue.task_done()
    
        return StreamingResponse(response_stream(), media_type="textual content/event-stream")

    And that’s it! It is possible for you to to converse together with your chat!

    I add under just a little CSS snippet to format code blocks. Certainly, if you happen to ask your chat to supply code snippets, you need it correctly formatted. Right here is the HTML:

    pre, code {
          background-color: black;
          shade: lightgrey;
          padding: 1%;
          border-radius: 10px;
          white-space: pre-wrap;
          font-size: 0.8rem;
          letter-spacing: -1px;
        }

    Now you can additionally generate code snippets:

    Thoughts = blown

    Workflow recap

    With much less that 200 LoC, we had been capable of write a chat with the next worflow, stream a response from the server and show it very properly by enjoying with SSE and HTMX.

    Consumer sorts "Hi there World" → Submit
    ├── 1. Add "Me: Hi there World" to speak
    ├── 2. Add "AI: Pondering..." to speak  
    ├── 3. POST /chat with message
    ├── 4. Server queues message
    ├── 5. SSE stream produces a LLM response primarily based on the question
    ├── 6. Stream "AI: This" (replaces "Pondering...")
    ├── 7. Stream "AI: That is the reply ..."
    └── 8. Full
    

    Conclusion

    On this sequence of articles, we confirmed how simple it could possibly be to develop a chatbot app with nearly no javascript and no heavy JS framework, simply through the use of Python and HTML. We coated subjects comparable to Server-side rendering, Server-sent Occasions (SSE), async streaming, brokers, with the assistance of a magical library, HTMX.

    The principle goal of those articles was to point out that net functions just isn’t inaccessible to non-Javascript builders. There’s truly a really sturdy and legitimate cause to not use Javascript everytime for net improvement, and though Javascript is a robust language, my feeling right now is that’s it typically overused rather than less complicated, but sturdy approaches. The server-side vs client-side functions debate is long-standing and never over but, however I hope these articles had been an eye-opener to a few of you, and that it will definitely taught you one thing

    Keep tuned!



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleIntroducing ShaTS: A Shapley-Based Method for Time-Series Models
    Next Article Understanding Convolutional Neural Networks (CNNs) Through Excel
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Optimizing Data Transfer in Distributed AI/ML Training Workloads

    January 23, 2026
    Artificial Intelligence

    Achieving 5x Agentic Coding Performance with Few-Shot Prompting

    January 23, 2026
    Artificial Intelligence

    Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found

    January 23, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    DataRobot + Aryn DocParse for Agentic Workflows

    October 2, 2025

    A new way to increase the capabilities of large language models | MIT News

    December 17, 2025

    Nya Mercedes GLC EQ 2026 utrustad med båda ChatGPT och Gemini

    September 9, 2025

    What We Need to Know About AI in Emotion Recognition in 2024

    April 5, 2025

    Decision Trees Natively Handle Categorical Data

    June 3, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    Free AI Prompt Generator: Features, Benefits and Alternatives

    December 5, 2025

    ChatGPT Feels More Human Than Ever. And It’s Causing Concern

    June 10, 2025

    The Future of Data with Intelligent Character Recognition (ICR)

    April 9, 2025
    Our Picks

    Optimizing Data Transfer in Distributed AI/ML Training Workloads

    January 23, 2026

    Achieving 5x Agentic Coding Performance with Few-Shot Prompting

    January 23, 2026

    Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found

    January 23, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.