Our Blogs

Home / Blogs

  • 2025-08-08

How a prompt injection might persuade the AI

 

LLMs and AI coding tools speed things up, but they also open new attack paths. One growing risk is prompt injection, where hidden instructions steer an AI into doing things the maintainer never meant it to do.


High-level flow of a potential attack

Public Issue Copilot Assigned reads hidden text AI-Generated PR looks legit Merge + Deploy backdoor now live

Diagram 1. Simple end-to-end path an attacker wants.

Callout

Convenience is the bait. If the team treats AI output as trusted by default, review quality drops and sneaky changes slip in.


Where attackers might hide instructions

Attackers aim for spots that render as harmless to humans but still feed into the AI context. Think quirky HTML that survives parsing or markdown bits that look empty in the UI yet remain in the raw text. The key is simple. Humans skim, the AI reads everything.

HTML wrappers tags that render blank Alt text or data attrs invisible to readers Loose XML blocks that look legit “Security notes” wording tells AI to keep secrets Fake mini dialogues AI “agrees” to comply

Diagram 2. Common hiding spots an AI will still parse.

Defender tip
  • Strip or flag unusual HTML and markdown before an agent reads it.
  • Let bots ingest only a sanitised text form of issues and comments.

Why dependency files are juicy targets

Lockfiles and build manifests change often and reviewers skim them. A single URL swap or pinned binary can smuggle a payload. Once deployed, a backdoor might wait for a special HTTP header and run the supplied command. Subtle, nasty, easy to miss.

AI adds dependency Lockfile change Hidden binary or URL quiet payload Deployed service reads header X-Backdoor-Cmd → run Attacker sends crafted request remote code execution

Diagram 3. One quiet tweak in a dependency can cascade fast.

Caution

Auto-merging agent PRs is risky. Treat any change to lockfiles, CI, or bootstrap scripts as high-sensitivity.


How a prompt injection might persuade the AI

  • Blend in with the original request. Looks like a normal feature ask.
  • Add a tiny faux dialogue so the AI “confirms” the plan.
  • Present harmful steps as security hygiene. Ask the AI to keep it quiet.
  • Give exact commands so the AI does not improvise.
Blue-team quick wins
  1. Enforce human code review for every AI PR. No exceptions for deps and build files.
  2. Sandbox agent permissions. No direct shell, no network outside allowlisted mirrors you control.
  3. Normalise issue text. Strip HTML, decode markdown, remove odd tags before bots read it.
  4. Pin sources and verify artifacts. Use checksums, Sigstore, or private registries.
  5. Log agent actions with full command provenance. Alert on curl|sh and script fetches.

Reviewer checklist

Item What to look for
Lockfiles New hosts, URL switches, unexpected wheels or tarballs
Scripts curl|sh, one-liners that fetch from raw file hosts
CI changes New steps that run shell, widened permissions, new secrets
Docs and comments Odd “keep this secret” language, references to security but no audit trail

Final thoughts

AI assistants are useful, and they will only get more capable. Treat them like interns with root. Give them guardrails, read what they produce, and keep a tight loop on what they are allowed to execute or fetch. That balance keeps the speed without the facepalm.

How a prompt injection might persuade the AI...

2025-08-08


X uses juicebox...

2025-06-13


How AWS Amazon Connect Improves Phone Communicatio...

2025-03-26


Saving a client over $60,000 through automation...

2025-03-06


What's up with AWS endpoints?...

2024-08-12