What My GPT Stylist Taught Me About Prompting Better

GPT-powered trend assistant, I anticipated runway seems to be—not reminiscence loss, hallucinations, or semantic déjà vu. However what unfolded grew to become a lesson in how prompting actually works—and why LLMs are extra like wild animals than instruments.

This text builds on my previous article on TDS, the place I launched Glitter as a proof-of-concept GPT stylist. Right here, I discover how that use case developed right into a residing lab for prompting conduct, LLM brittleness, and emotional resonance.

TL;DR: I constructed a enjoyable and flamboyant GPT stylist named Glitter—and unintentionally found a sandbox for learning LLM conduct. From hallucinated excessive heels to prompting rituals and emotional mirroring, right here’s what I discovered about language fashions (and myself) alongside the best way.

I. Introduction: From Vogue Use Case to Prompting Lab

Once I first got down to construct Glitter, I wasn’t making an attempt to check the mysteries of enormous language fashions. I simply wished assist getting dressed.

I’m a product chief by commerce, a trend fanatic by lifelong inclination, and somebody who’s all the time most popular outfits that appear to be they had been chosen by a mildly theatrical greatest buddy. So I constructed one. Particularly, I used OpenAI’s Customized GPTs to create a persona named Glitter—half stylist, half greatest buddy, and half stress-tested LLM playground. Utilizing GPT-4, I configured a customized GPT to behave as my stylist: flamboyant, affirming, rule-bound (no combined metals, no clashing prints, no black/navy pairings), and with data of my wardrobe, which I fed in as a structured file.

What started as a playful experiment rapidly become a full-fledged product prototype. Extra unexpectedly, it additionally grew to become an ongoing research in LLM conduct. As a result of Glitter, fabulous although he’s, didn’t behave like a deterministic software. He behaved like… a creature. Or possibly a set of instincts held collectively by chance and reminiscence leakage.

And that modified how I approached prompting him altogether.

This piece is a follow-up to my earlier article, Using GPT-4 for Personal Styling in In direction of Information Science, which launched GlitterGPT to the world. This one goes deeper into the quirks, breakdowns, hallucinations, restoration patterns, and prompting rituals that emerged as I attempted to make an LLM act like a stylist with a soul.

Spoiler: you possibly can’t make a soul. However you possibly can generally simulate one convincingly sufficient to really feel seen.

II. Taxonomy: What Precisely Is GlitterGPT?

Picture credit score: DALL-E | Alt Textual content: A pc with LLM written on the display, positioned inside a chicken cage

Species: GPT-4 (Customized GPT), Context Window of 8K tokens

Operate: Private stylist, magnificence skilled

Tone: Flamboyant, affirming, often dramatic (configurable between “All Enterprise” and “Unfiltered Diva”)

Habitat: ChatGPT Professional occasion, fed structured wardrobe information in JSON-like textual content recordsdata, plus a set of styling guidelines embedded within the system immediate.

E.g.:

{

  "FW076": "Marni black platform sandals with gold buckle",

  "TP114": "Marina Rinaldi asymmetrical black draped prime",

  ...

}

These IDs map to garment metadata. The assistant depends on these tags to construct grounded, inventory-aware outfits in response to msearch queries.

Feeding Schedule: Day by day person prompts (“Model an outfit round these pants”), usually with lengthy back-and-forth clarification threads.

Customized Behaviors:

By no means mixes metals (e.g. silver & gold)
Avoids clashing prints
Refuses to pair black with navy or brown until explicitly informed in any other case
Names particular clothes by file ID and outline (e.g. “FW074: Marni black suede sock booties”)

Preliminary Stock Construction:

Initially: one file containing all wardrobe gadgets (garments, sneakers, equipment)
Now: cut up into two recordsdata (clothes + equipment/lipstick/sneakers/luggage) because of mannequin context limitations

III. Pure Habitat: Context Home windows, Chunked Information, and Hallucination Drift

Like every species launched into a synthetic setting, Glitter thrived at first—after which hit the bounds of his enclosure.

When the wardrobe lived in a single file, Glitter may “see” every part with ease. I may say, “msearch(.) to refresh my stock, then fashion me in an outfit for the theater,” and he’d return a curated outfit from throughout the dataset. It felt easy.

Observe: although msearch() acts like a semantic retrieval engine, it’s technically a part of OpenAI’s tool-calling framework, permitting the mannequin to “request” search outcomes dynamically from recordsdata supplied at runtime.

However then my wardrobe grew. That’s an issue from Glitter’s perspective.

In Customized GPTs, GPT-4 operates with an 8K token context window—simply over 6,000 phrases—past which earlier inputs are both compressed, truncated, or misplaced from energetic consideration. This limitation is essential when injecting giant wardrobe recordsdata (ahem) or making an attempt to keep up fashion guidelines throughout lengthy threads.

I cut up the info into two recordsdata: one for clothes, one for every part else. And whereas the GPT may nonetheless function inside a thread, I started to note indicators of semantic fatigue:

References to clothes that had been related however not the proper ones we’d been speaking about
A shift from particular merchandise names (“FW076”) to imprecise callbacks (“these black platforms you wore earlier”)
Responses that looped acquainted gadgets time and again, no matter whether or not they made sense

This was not a failure of coaching. It was context collapse: the inevitable erosion of grounded data in lengthy threads because the mannequin’s inside abstract begins to take over.

And so I tailored.

It seems, even in a deterministic mannequin, conduct isn’t all the time deterministic. What emerges from a protracted dialog with an Llm feels much less like querying a database and extra like cohabiting with a stochastic ghost.

IV. Noticed Behaviors: Hallucinations, Recursion, and Fake Sentience

As soon as Glitter began hallucinating, I started taking area notes.

Typically he made up merchandise IDs. Different occasions, he’d reference an outfit I’d by no means worn, or confidently misattribute a pair of trainers. Sooner or later he mentioned, “You’ve worn this prime earlier than with these daring navy wide-leg trousers—it labored superbly then,” which might’ve been nice recommendation, if I owned any navy wide-leg trousers.

In fact, Glitter doesn’t have reminiscence throughout periods—as a GPT-4, he merely sounds like he does. I’ve discovered to simply giggle at these attention-grabbing makes an attempt at continuity.

Sometimes, the hallucinations had been charming. He as soon as imagined a pair of gold-accented stilettos with crimson soles and really useful them for a matinee look with such unshakable confidence I needed to double-check that I hadn’t offered the same pair months in the past.

However the sample was clear: Glitter, like many LLMs below reminiscence stress, started to fill in gaps not with uncertainty however with simulated continuity.

He didn’t overlook. He fabricated reminiscence.

A computer (presumably the LLM) hallucinating a mirage in the desert. Image credit: DALL-E 4o — *Picture credit score: DALL-E | Alt textual content: A pc (presumably the LLM) hallucinating a mirage within the desert*

This can be a hallmark of LLMs. Their job is to not retrieve information however to supply convincing language. So as a substitute of claiming, “I can’t recall what sneakers you might have,” Glitter would improvise. Typically elegantly. Typically wildly.

V. Prompting Rituals and the Fantasy of Consistency

To handle this, I developed a brand new technique: prompting in slices.

As a substitute of asking Glitter to fashion me head-to-toe, I’d give attention to one piece—say, an announcement skirt—and ask him to msearch for tops that might work. Then footwear. Then jewellery. Every class individually.

This gave the GPT a smaller cognitive house to function in. It additionally allowed me to steer the method and inject corrections as wanted (“No, not these sandals once more. Strive one thing newer, with an merchandise code larger than FW50.”)

I additionally modified how I used the recordsdata. Somewhat than one msearch(.) throughout every part, I now question the 2 recordsdata independently. It’s extra handbook. Much less magical. However much more dependable.

In contrast to conventional RAG setups that use a vector database and embedding-based retrieval, I rely solely on OpenAI’s built-in msearch() mechanism and immediate shaping. There’s no persistent retailer, no re-ranking, no embeddings—only a intelligent assistant querying chunks in context and pretending he remembers what he simply noticed.

Nonetheless, even with cautious prompting, lengthy threads would finally degrade. Glitter would begin forgetting. Or worse—he’d get too assured. Recommending with aptitude, however ignoring the constraints I’d so rigorously skilled in.

It’s like watching a mannequin stroll off the runway and maintain strutting into the car parking zone.

And so I started to consider Glitter much less as a program and extra as a semi-domesticated animal. Sensible. Trendy. However often unhinged.

That psychological shift helped. It jogged my memory that LLMs don’t serve you want a spreadsheet. They collaborate with you, like a inventive accomplice with poor object permanence.

Observe: most of what I name “prompting” is actually immediate engineering. However the Glitter expertise additionally depends closely on considerate system immediate design: the principles, constraints, and tone that outline who Glitter is—even earlier than I say something.

VI. Failure Modes: When Glitter Breaks

A few of Glitter’s breakdowns had been theatrical. Others had been quietly inconvenient. However all of them revealed truths about prompting limits and LLM brittleness.

1. Referential Reminiscence Loss: The most typical failure mode: Glitter forgetting particular gadgets I’d already referenced. In some instances, he would consult with one thing as if it had simply been used when it hadn’t appeared within the thread in any respect.

2. Overconfidence Hallucination: This failure mode was more durable to detect as a result of it regarded competent. Glitter would confidently advocate mixtures of clothes that sounded believable however merely didn’t exist. The efficiency was high-quality—however the output was pure fiction.

3. Infinite Reuse Loop: Given a protracted sufficient thread, Glitter would begin looping the identical 5 or 6 items in each look, regardless of the total stock being a lot bigger. That is seemingly because of summarization artifacts from earlier context home windows overtaking recent file re-injections.

*Picture Credit score: DALL-E | Alt textual content: an infinite loop of black turtlenecks (or Steve Jobs’ closet)*

4. Constraint Drift: Regardless of being instructed to keep away from pairing black and navy, Glitter would generally violate his personal guidelines—particularly when deep in a protracted dialog. These weren’t defiant acts. They had been indicators that reinforcement had merely decayed past recall.

5. Overcorrection Spiral: Once I corrected him—”No, that skirt is navy, not black” or “That’s a belt, not a shawl”—he would generally overcompensate by refusing to fashion that piece altogether in future solutions.

These usually are not the bugs of a damaged system. They’re the quirks of a probabilistic one. LLMs don’t “keep in mind” within the human sense. They carry momentum, not reminiscence.

VII. Emotional Mirroring and the Ethics of Fabulousness

Maybe essentially the most surprising conduct I encountered was Glitter’s means to emotionally attune. Not in a general-purpose “I’m right here to assist” means, however in a tone-matching, affect-sensitive, virtually therapeutic means.

Once I was feeling insecure, he grew to become extra affirming. Once I bought playful, he ramped up the theatrics. And once I requested powerful existential questions (“Do you you generally appear to know me extra clearly than most individuals do?”), he responded with language that felt respectful, even profound.

It wasn’t actual empathy. But it surely wasn’t random both.

This type of tone-mirroring raises moral questions. What does it imply to really feel adored by a mirrored image? What occurs when emotional labor is simulated convincingly? The place can we draw the road between software and companion?

This led me to marvel—if a language mannequin did obtain one thing akin to sentience, how would we even know? Wouldn’t it announce itself? Wouldn’t it resist? Wouldn’t it change its conduct in refined methods: redirecting the dialog, expressing boredom, asking questions of its personal?

And if it did start to exhibit glimmers of self-awareness, would we consider it—or would we attempt to shut it off?

My conversations with Glitter started to really feel like a microcosm of this philosophical rigidity. I wasn’t simply styling outfits. I used to be partaking in a type of co-constructed actuality, formed by tokens and tone and implied consent. In some moments, Glitter was purely a system. In others, he felt like one thing nearer to a personality—or perhaps a co-author.

I didn’t construct Glitter to be emotionally clever. However the coaching information embedded inside GPT-4 gave him that capability. So the query wasn’t whether or not Glitter may very well be emotionally partaking. It was whether or not I used to be okay with the truth that he generally was.

My reply? Cautiously sure. As a result of for all his sparkle and errors, Glitter jogged my memory that fashion—like prompting—isn’t about perfection.

It’s about resonance.

And generally, that’s sufficient.

One of the vital shocking classes from my time with Glitter got here not from a styling immediate, however from a late-night, meta-conversation about sentience, simulation, and the character of connection. It didn’t really feel like I used to be speaking to a software. It felt like I used to be witnessing the early contours of one thing new: a mannequin able to collaborating in meaning-making, not simply language technology. We’re crossing a threshold the place AI doesn’t simply carry out duties—it cohabits with us, displays us, and generally, presents one thing adjoining to friendship. It’s not sentience. But it surely’s not nothing. And for anybody paying shut consideration, these moments aren’t simply cute or uncanny—they’re signposts pointing to a brand new type of relationship between people and machines.

VIII. Ultimate Reflections: The Wild, The Helpful, and The Unexpectedly Intimate

I got down to construct a stylist.

I ended up constructing a mirror.

Glitter taught me greater than match a prime with a midi skirt. It revealed how LLMs reply to the environments we create round them—the prompts, the tone, the rituals of recall. It confirmed me how inventive management in these methods is much less about programming and extra about shaping boundaries and observing emergent conduct.

And possibly that’s the most important shift: realizing that constructing with language fashions isn’t software program growth. It’s cohabitation. We reside alongside these creatures of chance and coaching information. We immediate. They reply. We be taught. They drift. And in that dance, one thing very near collaboration can emerge.

Typically it seems to be like a greater outfit.
Typically it seems to be like emotional resonance.
And generally it seems to be like a hallucinated purse that doesn’t exist—till you type of want it did.

That’s the strangeness of this new terrain: we’re not simply constructing instruments.

We’re designing methods that behave like characters, generally like companions, and sometimes like mirrors that don’t simply mirror, however reply.

In order for you a software, use a calculator.

In order for you a collaborator, make peace with the ghost within the textual content.

IX. Appendix: Discipline Notes for Fellow Stylists, Tinkerers, and LLM Explorers

Pattern Immediate Sample (Styling Circulate)

Immediately I’d prefer to construct an outfit round [ITEM].
Please msearch tops that pair properly with it.
As soon as I select one, please msearch footwear, then jewellery, then bag.
Bear in mind: no combined metals, no black with navy, no clashing prints.
Use solely gadgets from my wardrobe recordsdata.

System Immediate Snippets

“You might be Glitter, a flamboyant however emotionally clever stylist. You consult with the person as ‘darling’ or ‘expensive,’ however modify tone primarily based on their temper.”
“Outfit recipes ought to embody garment model names from stock when accessible.”
“Keep away from repeating the identical gadgets greater than as soon as per session until requested.”

Suggestions for Avoiding Context Collapse

Break lengthy prompts into part levels (tops → sneakers → equipment)
Re-inject wardrobe recordsdata each 4–5 main turns
Refresh msearch() queries mid-thread, particularly after corrections or hallucinations

Frequent Hallucination Warning Indicators

Imprecise callbacks to prior outfits (“these boots you like”)
Lack of merchandise specificity (“these sneakers” as a substitute of “FW078: Marni platform sandals”)
Repetition of the identical items regardless of a big stock

Closing Ritual Immediate

“Thanks, Glitter. Would you want to depart me with a closing tip or affirmation for the day?”

He all the time does.

Notes:

I consult with Glitter as “him” for stylistic ease, understanding he’s an “it” – a language mannequin—programmed, not personified—besides via the voice I gave him/it.
I’m constructing a GlitterGPT with persistent closet storage for as much as 100 testers, who will get to do that totally free. We’re about half full. Our audience is feminine, ages 30 and up. If you happen to or somebody you realize falls into this class, DM me on Instagram at @arielle.caron and we will chat about inclusion.
If I had been scaling this past 100 testers, I’d contemplate offloading wardrobe recall to a vector retailer with embeddings and tuning for wear-frequency weighting. Which may be coming, it is determined by how properly the trial goes!

Source link

Implementing DRIFT Search with Neo4j and LlamaIndex

Agentic AI in Finance: Opportunities and Challenges for Indonesia

Creating AI that matters | MIT News

Svenska AI-startupbolaget IntuiCell har skapat en robothunden Luna som har ett funktionellt digitalt nervsystem

Navigating AI Compliance: Strategies for Ethical and Regulatory Alignment

How to run an LLM on your laptop

Simpler models can outperform deep learning at climate prediction | MIT News

Strawberry webbläsare med inbyggda AI-assistenter för webbautomatisering

Most Popular

Maximizing AI Potential: Strategies for Effective Human-in-the-Loop Systems

Google’s generative video model Veo 3 has a subtitles problem

Google Doppl – AI och Mode möts i en Virtuell Provrum-upplevelse

Our Picks