Ten Lessons of Building LLM Applications for Engineers

throughout industries. Conventional engineering domains are not any exception.

Prior to now two years, I’ve been constructing LLM-powered instruments with engineering area specialists. These are course of engineers, reliability engineers, cybersecurity analysts, and many others., who spend most of their day in logs, specs, schematics, and stories, and doing duties corresponding to troubleshooting, failure mode evaluation, take a look at planning, compliance checks, and many others.

The promise is compelling: because of its in depth pre-trained information, the LLMs can, in principle, purpose like area specialists and speed up the tedious, pattern-matching components of engineering work, and liberate specialists for higher-order selections.

The observe, nonetheless, is messier. “Simply add a chatbox” not often interprets into helpful engineering instruments. There may be nonetheless fairly a big hole between a powerful demo and a system that engineers truly belief and use.

It has every little thing to do with how you can body the issue, how you can construction the workflow, and how you can combine it into the engineer’s actual surroundings.

On this publish, I’d prefer to share 10 classes I realized from my previous initiatives. They’re simply my assortment of “subject notes” reasonably than a complete guidelines. However should you additionally plan to construct or are at the moment constructing LLM functions for area specialists, I hope these classes might provide help to keep away from just a few painful useless ends.

Our roadmap. (Picture by creator)

I set up the teachings into three phases, which precisely align with the levels of a typical LLM challenge:

Earlier than you begin: body the proper drawback and set the proper expectation.
Throughout the challenge: design clear workflows and implement construction in all places.
After you’ve gotten constructed: combine the place engineers work and consider with actual circumstances.

With that in thoughts, let’s get began.

Section 1: Earlier than You Begin

What you do earlier than even writing a single line of code largely shapes whether or not an LLM challenge will succeed or fail.

Meaning in case you are chasing the fallacious drawback or failing to set the proper expectation upfront, your software will wrestle to achieve traction later, irrespective of how technically sound you make it.

Within the following, I’d prefer to share some classes on laying the proper basis.

Lesson 1: Not each drawback can or must be addressed by LLMs

After I have a look at a brand new use case from engineers, I’d at all times strive very arduous to problem my “LLM-first” reflex and actually ask myself: can I resolve the issue with out utilizing LLMs?

For the core reasoning logic, that’s, the decision-making bottleneck you wish to automate, there are normally at the least three courses of strategies you possibly can contemplate:

Rule-based and analytical strategies
Information-driven ML fashions
LLMs

Rule-based and analytical strategies are low cost, clear, and straightforward to check. Nevertheless, they may be rigid and solely possess restricted energy within the messy actuality.

Traditional ML fashions, even a easy regression or classification, can typically offer you quick, dependable, and simply scalable selections. Nevertheless, they require historic information (and very often, additionally the labels) to study the patterns.

LLMs, then again, shine if the core problem is about understanding, synthesizing, or producing language throughout messy artifacts. Assume skimming via 50 incident stories to floor possible related ones, or turning free-text logs into labeled, structured occasions. However LLMs are costly, sluggish, and normally don’t behave deterministically as you may want.

Earlier than deciding to make use of an LLM for a given drawback, ask your self:

Might 80% of the issue be solved with a rule engine, an analytical mannequin, or a traditional mannequin? If sure, merely begin there. You possibly can at all times layer an LLM on prime later if wanted.
Does this job require exact, reproducible numerical outcomes? If that’s the case, then preserve the computation in analytical code or ML fashions, and use LLMs just for clarification or contextualization.
Will there be no human within the loop to assessment and approve the output? If that’s the case, then an LLM won’t be a sensible choice because it not often offers sturdy ensures.
At our anticipated velocity and quantity, would LLM calls be too costly or too sluggish? If it is advisable to course of 1000’s of log strains or alerts per minute, counting on LLM alone will rapidly make you hit a wall on each value and latency.

In case your solutions are largely “no”, you’ve in all probability discovered an excellent candidate to discover with LLMs.

Lesson 2: Set the proper mindset from day one

As soon as I’m satisfied that an LLM-based answer is acceptable for a selected use case, the subsequent factor I’d do is to align on the proper mindset with the area specialists.

One factor I discover extraordinarily essential is the positioning of the device. A framing I normally undertake that works very properly in observe is that this: the purpose of our LLM device is for augmentation, not automation. The LLM solely helps you (i.e., area specialists) analyze quicker, triage quicker, and discover extra, however you stay the decision-maker.

That distinction issues rather a lot.

If you place the LLM device as an augmentation, engineers have a tendency to interact it with enthusiasm, as they see it as one thing that would make their work quicker and fewer tedious.

Alternatively, in the event that they sense that the brand new device is one thing that will threaten their position or autonomy, they are going to distance themselves from the challenge and offer you very restricted assist.

From a developer’s viewpoint (which is you and me), setting this “amplify as an alternative of changing” mindset additionally reduces nervousness. Why? As a result of it makes it a lot simpler to speak about errors! When the LLM will get one thing fallacious (and it’ll), the dialog received’t merely be “your AI failed.”, but it surely’s extra about “the suggestion wasn’t fairly proper, but it surely’s nonetheless insightful and offers me some concepts.” That’s a really totally different dynamic.

Subsequent time, when you find yourself constructing LLM Apps for area specialists, attempt to emphasize:

LLMs are, at greatest, junior assistants. They’re quick, work across the clock, however not at all times proper.
Consultants are the reviewers and supreme decision-makers. You might be skilled, cautious, and accountable.

As soon as this mindset is in place, you’ll see engineers begin to consider your answer via the lens of “Does this assist me?” reasonably than “Can this substitute me?” That issues rather a lot in constructing belief and enhancing adoption.

Lesson 3: Co-design with specialists and outline what “higher” means

As soon as we’ve agreed that LLMs are acceptable for the duty at hand and the purpose is augmentation not automation, the subsequent vital level I’ll strive to determine is:

“What does higher truly imply for this job?”

To get a very good understanding on that, it is advisable to carry the area specialists into the design loop as early as doable.

Concretely, it’s best to spend time to sit down down with the area specialists, stroll via how they resolve the issue in the present day, take notes on which instruments they use, and which docs/specs they discuss with. Bear in mind to ask them to level out the place the ache level actually is, and higher perceive what’s OK to be “approximate” and what varieties of errors are annoying or unacceptable.

A concrete consequence of those conversations with area specialists is a shared definition of “higher” in their very own language. These are the metrics you might be optimizing for, which could possibly be the quantity of triage time being saved, the variety of false leads being lowered, or the variety of guide steps being skipped.

As soon as the metric(s) are outlined, you’d robotically have a practical baseline (i.e., no matter it takes by the present guide course of) to benchmark your answer later.

In addition to the technical results, I’d say the psychological results are simply as necessary: by involving specialists early, you’re exhibiting to them that you just’re genuinely attempting to learn the way their world works. That alone goes a great distance in incomes belief.

Section 2: Throughout The Venture

After establishing the stage, you’re now able to construct. Thrilling stuff!

In my expertise, there are a few necessary selections it is advisable to make to make sure your arduous work truly earns belief and will get adopted. Let’s speak about these determination factors.

Lesson 4: It’s Co-pilot, not Auto-pilot

A temptation I see rather a lot (additionally in myself) is the need to construct one thing “absolutely autonomous”. As a knowledge scientist, who can actually resist constructing an AI system that provides the consumer the ultimate reply with only one button push?

Properly, the fact is much less flashy however far more practical. In observe, this “autopilot” mindset not often works properly with area specialists, because it basically goes towards the truth that engineers are used to programs the place they perceive the logic and the failure modes.

In case your LLM app merely does every little thing within the background and solely presents a last consequence, two issues normally occur:

Engineers don’t belief the outcomes as a result of they’ll’t see the way it bought there.
They can’t right it, even when they see one thing clearly off.

Subsequently, as an alternative of defaulting to an “autopilot” mode, I favor to deliberately design the system with a number of management factors the place specialists can affect the LLMs’ conduct. For instance, as an alternative of LLM auto-classifying all 500 alarms and creating tickets, we will design the system to first group alarms into 5 candidate incident threads, pause, present the skilled the grouping rationale and key log strains for every thread. Then, specialists might merge or break up teams. After specialists approve the grouping, the LLM can proceed to generate draft tickets.

Sure, from a UI perspective, this provides a bit of labor, as you need to implement human-input mechanisms, expose intermediate reasoning traces and outcomes clearly, and so forth. However the payoff is actual: your specialists will truly belief and use your system as a result of it offers them the sense that they’re in management.

Lesson 5: Give attention to workflow, roles, and information stream earlier than selecting a framework

As soon as we get into the implementation part, a standard query many builders (together with myself previously) are inclined to ask first is:

“Which LLM App framework ought to I exploit? LangGraph? CrewAI? AutoGen? Or one thing else?”

This intuition is completely comprehensible. In any case, there are such a lot of shiny frameworks on the market, and it does really feel like selecting the “proper” one is the primary massive determination. However for prototyping with engineering area specialists, I’d argue that that is normally not the proper place to start out.

In my very own expertise, for the primary model, you possibly can go a great distance with the nice outdated from openai import OpenAI or from google import genai (or another LLM suppliers you like).

Why? As a result of at this stage, essentially the most urgent query is just not which framework to construct upon, however:

“Does an LLM truly assist with this particular area job?”

And it is advisable to confirm it as rapidly as doable.

To do this, I’d prefer to give attention to three pillars as an alternative of frameworks:

Pipeline design: How will we decompose the issue into clear steps?
Function design: How ought to we instruct the LLMs at every step?
Information stream & context design: What goes out and in of every step?

If you happen to deal with every LLM name as a pure operate, like this:

inputs → LLM reasoning → output

Then, you possibly can wire these “capabilities” along with simply regular management stream, e.g., if/else situations, for/whereas loops, retries, and many others., that are already pure to you as a developer.

This is applicable to device calling, too. If the LLM decides it must name a device, it may well merely output the operate title and the related parameters, and your common code can execute the precise operate and feed the consequence again into the subsequent LLM name.

You actually don’t want frameworks simply to specific the pipeline.

After all, I’m not saying that it’s best to keep away from utilizing frameworks. They’re fairly useful in manufacturing as they supply observability, concurrency, state administration, and many others., out of the field. However for the early stage, I believe it’s an excellent technique to simply preserve issues easy, in an effort to iterate quicker with area specialists.

After you have verified your key assumptions together with your specialists, it’s not going to be tough emigrate your pipeline/position/information design to a extra production-ready framework.

In my view, that is lean growth in motion.

Lesson 6: Attempt workflows earlier than leaping to brokers

Not too long ago, there was various dialogue round workflows vs. brokers. Each main participant within the subject appears keen to emphasise that they’re “constructing brokers,” as an alternative of simply “working predefined workflows.”

As builders, it’s very simple to really feel the temptation:

“Yeah, we positively ought to construct autonomous brokers that determine issues out on their very own, proper?“

No.

On paper, AI brokers sound tremendous enticing. However in observe, particularly in engineering domains, I’d argue {that a} well-orchestrated workflow with domain-specific logics can already resolve a big fraction of the actual issues.

And right here is the factor: it does so with far much less randomness.

Normally, engineers already observe a sure workflow to unravel that particular drawback. As an alternative of letting LLM brokers “rediscover” that workflow, it’s much better should you translate that “area information” straight right into a deterministic, staged workflow. This instantly offers you a few advantages:

Workflows are means simpler to debug. In case your system begins to behave surprisingly, you possibly can simply spot which step is inflicting the problem.
Area specialists can simply perceive what you might be constructing, as a result of a workflow maps naturally to their psychological mannequin.
Workflows naturally invite human suggestions. They’ll simply be paused, settle for new inputs, after which resume.
You get far more constant conduct. The identical enter would result in an analogous path or consequence, and that issues a ton in engineering problem-solving.

Once more, I’m not saying that AI brokers are ineffective. There are actually many conditions the place extra versatile, agentic-like conduct is justified. However I’d say at all times begin with a transparent, deterministic workflow that explicitly encodes area information, and validate with specialists that it’s truly useful. You possibly can introduce extra agentic conduct should you hit limitations {that a} easy workflow can’t resolve.

Sure, it’d sound boring. However your final purpose is to unravel the issue in a predictable and explainable means that carry enterprise values, not some fancy agentic demos. It’s good to at all times preserve that in thoughts.

Lesson 7: Construction every little thing you possibly can – inputs, outputs, and information

A standard notion of LLMs is that they’re good at dealing with free-form texts. So the pure intuition is: let’s simply feed stories and logs in and ask the mannequin to purpose, proper?

No.

In my expertise, particularly in engineering domains, that’s leaving a number of efficiency on the desk. In actual fact, LLMs are inclined to behave a lot better once you give them structured enter and ask them to supply structured output.

Engineering artifacts typically are available in semi-structured kind already. As an alternative of dumping total uncooked paperwork into the immediate, I discover it very useful to extract and construction the important thing info first. For instance, for free-text incident stories, we will parse them into the next JSON:

{
  "incident_id": "...",
  "gear": "...",
  "signs": ["..."],
  "start_time": "...",
  "end_time": "...",
  "suspected_causes": ["..."],
  "mitigations": ["..."]
}

That structuring step could be executed in varied methods: we will resort to traditional regexes, or develop small helper scripts. We are able to even make use of a separate LLM whose solely job is to normalize the free-texts right into a constant schema.

This fashion, you may give the principle reasoning LLMs a clear view of what occurred. And the bonus level is, with this construction in place, you possibly can ask the LLMs to quote particular information when reaching their conclusion. And that saves you fairly a while in debugging.

If you happen to’re doing RAG, this structured layer can be what it’s best to retrieve over, as an alternative of the uncooked PDFs or logs. You’d get higher precision and extra dependable citations when retrieving over clear, structured artifacts.

Now, on the output facet, construction is principally necessary if you wish to plug the LLM into a bigger workflow. Concretely, this implies as an alternative of asking:

“Clarify what occurred and what we must always do subsequent.”

I favor one thing like:

“Fill this JSON schema together with your evaluation.”

{
  "likely_causes": [
    medium
  ],
  "recommended_next_steps": [
    {"description": "...", "priority": 1}
  ],
  "abstract": "brief free-text abstract for the human"
}

Often, that is outlined as a Pydantic mannequin and you’ll leverage the “Structured Output” function to explicitly instruct the LLMs to supply output that conforms to it.

I used to see LLMs as “textual content in, textual content out”. However now I see it extra as “construction in, construction out”, and that is very true in engineering domains the place we’d like precision and robustness.

Lesson 8: Don’t neglect about analytical AI

I do know we’re constructing LLM-based options. However as we realized in Lesson 1, LLMs should not the one device in your toolbox. We even have the “old-fashioned” analytical AI fashions.

In lots of engineering domains, there’s a lengthy monitor document of making use of traditional analytical AI/ML strategies to handle varied points of the issues, e.g., anomaly detection, time-series forecasting, clustering, classification, you title it.

These strategies are nonetheless extremely priceless, and in lots of circumstances, they need to be doing the heavy lifting as an alternative of being thrown away.

To successfully resolve the issue at hand, many occasions it may be value contemplating a hybrid method of analytical AI + GenAI: analytical ML to deal with the heavy-lifting of the sample matching and detection, and LLMs function on prime to purpose, clarify, and advocate subsequent steps.

For instance, say you’ve gotten 1000’s of incident occasions per week. You can begin with utilizing classical clustering algorithms to group comparable occasions into patterns, perhaps additionally compute some combination stats for every cluster. Then, the workflow can feed these cluster analytical outcomes into an LLM and ask it to label every sample, describe what it means, and counsel what to test first. Afterward, engineers assessment and refine the labels.

So why does this matter? As a result of analytical strategies provide the velocity, reliability, and precision on structured information. They’re deterministic, they scale to thousands and thousands of knowledge factors, and so they don’t hallucinate. LLMs, then again, excels properly at synthesis, context, and communication. You must use every for what it’s greatest at.

Section 3: After You Have Constructed

You’ve constructed a system that works technically. Now comes the toughest half: getting it adopted. Regardless of how good your implementation is, a device that’s placed on a shelf is a device that brings zero worth.

On this part, I’d prefer to share two last classes on integration and analysis. You wish to be sure your system lands in the actual world and earns belief via proof, proper?

Lesson 9: Combine the place engineers truly work

A separate UI, corresponding to a easy net app or a pocket book, works completely wonderful for exploration and getting first-hand suggestions. However for actual adoption, it’s best to suppose past what your app does and give attention to the place your app exhibits up.

Engineers have already got a set of instruments they depend on day by day. Now, in case your LLM device presents itself as “yet one more net app with a login and a chat field”, you possibly can already see that it’ll wrestle to develop into a part of the engineers’ routine. Folks will strive it a few times, then when issues get busy, they simply fall again to no matter they’re used to.

So, how you can deal with this concern?

I’d ask myself this query at this level:

“The place within the current workflow would this app truly be used, and what would it not appear like there?”

In observe, what does this indicate?

Essentially the most highly effective integration is usually UI-level embedding. That principally means you embed LLM capabilities straight into the instruments engineers already use. For instance, in a typical log viewer, moreover the same old dashboard plots, you possibly can add a facet panel with buttons like “summarize the chosen occasions” or “counsel subsequent diagnostic steps”. This empowers the engineers with the LLM intelligence with out interrupting their regular workflow.

One caveat value mentioning, although: UI-level embedding typically requires buy-in from the crew that owns that device. If doable, begin constructing these relationships early.

Then, as an alternative of a generic chat window, I’d give attention to buttons with concrete verbs that match how engineers take into consideration their duties, be it summarize, group, clarify, or evaluate. A chat interface (or one thing comparable) can nonetheless exist if engineers have follow-up questions, want clarifications, or want to enter free-form suggestions after the LLM produces its preliminary output. However the main interplay right here must be task-specific actions, not open-ended dialog.

Additionally necessary: it’s best to make the context of LLMs dynamic and adaptive. If the system already is aware of which incident or time window specialists are , move that context on to the LLM calls. Don’t make them copy-paste IDs, logs, or descriptions into yet one more UI.

If this integration is completed properly, the barrier to attempting it (and finally adopting it) would develop into a lot decrease. And for you as a developer, it’s a lot simpler to get richer and extra trustworthy suggestions because it’s examined below actual situations.

Lesson 10: Analysis, analysis, analysis

After you have shipped the primary model, you would possibly suppose your work is completed. Properly, the reality is, in observe, that’s precisely the purpose the place the actual work begins.

It’s the start of the analysis.

There are two issues I wish to talk about right here:

Make the system present its work in a means that engineers can examine.
Sit down with specialists and stroll via actual circumstances collectively.

Let’s talk about them in flip.

First, make the system present its work. After I say “present its work”, I don’t simply imply a last reply. I need the system to reveal, at an affordable stage of element, three concrete issues: what it checked out, what steps it took, and the way assured LLMs are.

What it checked out: these are basically the proof LLMs use. It’s an excellent observe to at all times instruct LLMs to quote particular proof after they produce a conclusion or advice. That proof could be the precise log strains, the precise incident IDs, or spec sections that assist the declare. Bear in mind in Lesson 7, we talked about structured enter? That is useful for LLM quotation administration and verification.
What steps did it take: these discuss with the reasoning hint produced by LLMs. Right here, I’d expose the output produced in key intermediate steps of the pipeline. If you happen to’re adopting a multi-step workflow (Classes 5 & 6), you’ll have already got these steps as separate LLM calls or capabilities. And should you’re implementing structured output (Lesson 7), surfacing them on UI turns into simple.
How assured LLMs are: lastly, I virtually at all times ask the LLM to output a confidence stage (low/medium/excessive), plus a brief rationale on why assigning this confidence stage. In observe, what you’ll receive is one thing like this: “The LLM stated A, primarily based on B and C, with medium confidence due to D and E assumptions.” Engineers are far more comfy with that sort of assertion, and once more, it is a essential step in the direction of constructing belief.

Now, let’s go to the second level: consider with specialists utilizing actual circumstances.

My suggestion is, as soon as the system can correctly present its work, it’s best to schedule devoted analysis classes with area specialists.

It’s like doing consumer testing.

A typical session might appear like this: you and the skilled choose a set of actual circumstances. These generally is a mixture of typical ones, edge circumstances, and some historic circumstances with recognized outcomes. You run them via the device collectively. Throughout the course of, ask the skilled to suppose aloud: What do you anticipate the device to do right here? Is that this abstract correct? Are these instructed subsequent steps affordable? Would you agree that the cited proof truly helps the conclusion? In the meantime, keep in mind to take detailed notes on issues like the place the device clearly saves time, the place it nonetheless fails, and what necessary context is at the moment lacking.

After a few classes with the specialists, you possibly can tie the outcomes again to the “higher” we outlined earlier (Lesson 3). This doesn’t should be a “formal” quantitative analysis, however belief me, even a handful of concrete earlier than/after comparisons could be eye-opening, and offer you a strong basis to maintain iterating your answer.

Conclusion

Now, wanting again at these ten classes, what recurring themes do you see?

Here’s what I see:

First, respect the area experience. Begin from how area engineers truly work, genuinely study their ache factors and needs. Place your device as one thing that helps them, not one thing that replaces them. At all times let specialists keep in management.

Second, engineer the system. Begin with easy SDK calls, deterministic workflows, structured inputs/outputs, and blend conventional analytics with the LLM if that is sensible. Bear in mind, LLMs are only one element in a bigger system, not all the answer.

Third, deal with deployment as the start, not the top. The second you ship the primary working model is when you possibly can lastly begin having significant conversations with specialists. Strolling via actual circumstances collectively, accumulating their suggestions, and retaining iterating.

After all, these classes are simply my present reflections of what appears to work when constructing LLM functions for engineers, and they’re actually not the one option to go. Nonetheless, they’ve served me properly, and I hope they’ll spark some concepts for you, too.

Comfortable constructing!

Source link

Why CrewAI’s Manager-Worker Architecture Fails — and How to Fix It

How to Implement Three Use Cases for the New Calendar-Based Time Intelligence

How to Create Professional Articles with LaTeX in Cursor

OpenAI Is Now a For-Profit Company, Paving the Way for a Possible $1 Trillion IPO

New AI agent learns to use CAD to create 3D objects from sketches | MIT News

Optimizing food subsidies: Applying digital platforms to maximize nutrition | MIT News

Mining Rules from Data | Towards Data Science

Everything You Need to Know About the New Power BI Storage Mode

Most Popular

How to Protect Your Brand in an AI-Powered World with Jen Leonard [MAICON 2025 Speaker Series]

Tracking Drill-Through Actions on Power BI Report Titles

“My biggest lesson was realizing that domain expertise matters more than algorithmic complexity.“

Our Picks

Why CrewAI’s Manager-Worker Architecture Fails — and How to Fix It

How to Implement Three Use Cases for the New Calendar-Based Time Intelligence

Ten Lessons of Building LLM Applications for Engineers

Ten Lessons of Building LLM Applications for Engineers

Section 1: Earlier than You Begin

Lesson 1: Not each drawback can or must be addressed by LLMs

Lesson 2: Set the proper mindset from day one

Lesson 3: Co-design with specialists and outline what “higher” means

Section 2: Throughout The Venture

Lesson 4: It’s Co-pilot, not Auto-pilot

Lesson 5: Give attention to workflow, roles, and information stream earlier than selecting a framework

Lesson 6: Attempt workflows earlier than leaping to brokers

Lesson 7: Construction every little thing you possibly can – inputs, outputs, and information

Lesson 8: Don’t neglect about analytical AI

Section 3: After You Have Constructed

Lesson 9: Combine the place engineers truly work

Lesson 10: Analysis, analysis, analysis

Conclusion

Related Posts