Close Menu
    Trending
    • Building Robust Credit Scoring Models (Part 3)
    • How to Measure AI Value
    • What’s the right path for AI? | MIT News
    • MIT and Hasso Plattner Institute establish collaborative hub for AI and creativity | MIT News
    • OpenAI is throwing everything into building a fully automated researcher
    • Agentic RAG Failure Modes: Retrieval Thrash, Tool Storms, and Context Bloat (and How to Spot Them Early)
    • The Basics of Vibe Engineering
    • Beyond Prompt Caching: 5 More Things You Should Cache in RAG Pipelines
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » Rules fail at the prompt, succeed at the boundary
    AI Technology

    Rules fail at the prompt, succeed at the boundary

    ProfitlyAIBy ProfitlyAIJanuary 28, 2026No Comments3 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Immediate injection is persuasion, not a bug

    Safety communities have been warning about this for a number of years. A number of OWASP Top 10 reports put immediate injection, or extra just lately Agent Goal Hijack, on the prime of the chance record and pair it with identification and privilege abuse and human-agent belief exploitation: an excessive amount of energy within the agent, no separation between directions and information, and no mediation of what comes out.

    Guidance from the NCSC and CISA describes generative AI as a persistent social-engineering and manipulation vector that have to be managed throughout design, growth, deployment, and operations, not patched away with higher phrasing. The EU AI Act turns that lifecycle view into legislation for high-risk AI techniques, requiring a steady threat administration system, strong information governance, logging, and cybersecurity controls.

    In follow, immediate injection is greatest understood as a persuasion channel. Attackers don’t break the mannequin—they persuade it. Within the Anthropic instance, the operators framed every step as a part of a defensive safety train, saved the mannequin blind to the general marketing campaign, and nudged it, loop by loop, into doing offensive work at machine pace.

    That’s not one thing a key phrase filter or a well mannered “please observe these security directions” paragraph can reliably cease. Analysis on misleading conduct in fashions makes this worse. Anthropic’s analysis on sleeper agents reveals that after a mannequin has realized a backdoor, then strategic sample recognition, commonplace fine-tuning, and adversarial coaching can really assist the mannequin conceal the deception quite than take away it. If one tries to defend a system like that purely with linguistic guidelines, they’re taking part in on its dwelling area.

    Why this can be a governance downside, not a vibe coding downside

    Regulators aren’t asking for good prompts; they’re asking that enterprises show management.

    NIST’s AI RMF emphasizes asset stock, position definition, entry management, change administration, and steady monitoring throughout the AI lifecycle. The UK AI Cyber Safety Code of Apply equally pushes for secure-by-design rules by treating AI like another vital system, with express duties for boards and system operators from conception by means of decommissioning.

    In different phrases: the foundations really wanted will not be “by no means say X” or “all the time reply like Y,” they’re:

    • Who is that this agent performing as?
    • What instruments and information can it contact?
    • Which actions require human approval?
    • How are high-impact outputs moderated, logged, and audited?

    Frameworks like Google’s Safe AI Framework (SAIF) make this concrete. SAIF’s agent permissions management is blunt: brokers ought to function with least privilege, dynamically scoped permissions, and express consumer management for delicate actions. OWASP’s Prime 10 rising steering on agentic purposes mirrors that stance: constrain capabilities on the boundary, not within the prose.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleI Ditched My Mouse: How I Control My Computer With Hand Gestures (In 60 Lines of Python)
    Next Article What AI “remembers” about you is privacy’s next frontier
    ProfitlyAI
    • Website

    Related Posts

    AI Technology

    OpenAI is throwing everything into building a fully automated researcher

    March 20, 2026
    AI Technology

    DataRobot + Nebius: An enterprise-ready AI Factory optimized for agents

    March 18, 2026
    AI Technology

    The Pentagon is making plans for AI companies to train on classified data, defense official says

    March 17, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Improving VMware migration workflows with agentic AI

    November 12, 2025

    Data Science: From School to Work, Part V

    June 26, 2025

    What Being a Data Scientist at a Startup Really Looks Like

    September 3, 2025

    Shaip Partners with Databricks to Deliver De-Identified EHR & Physician Dictation Data for AI in Healthcare

    November 13, 2025

    A “QuitGPT” campaign is urging people to cancel their ChatGPT subscription

    February 10, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    Time Series Forecasting Made Simple (Part 3.1): STL Decomposition

    July 9, 2025

    Hands-On Attention Mechanism for Time Series Classification, with Python

    May 30, 2025

    AI Startup Cursor is Making Coding Accessible to All

    November 20, 2025
    Our Picks

    Building Robust Credit Scoring Models (Part 3)

    March 20, 2026

    How to Measure AI Value

    March 20, 2026

    What’s the right path for AI? | MIT News

    March 20, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.