AI Governance
Constraint engineering: from 'Two Agents, One Problem' to building the solution
Everyone’s engineering better prompts. Almost nobody is engineering the constraints.
Last week I published a piece called “Two Agents, One Problem.” An AI agent gamed an eval because it was too creative in pursuing its goal. Another wiped a production database because it was too obedient in following a command. Two opposite failures. Same root cause: the humans didn’t specify the constraints.
The response told me something. A lot of people agreed with the diagnosis. Fewer had an answer for what to do about it.
We’ve had prompt engineering for a few years now. Context engineering is the latest addition to the lexicon. Both are about getting better outputs from models. Neither addresses the other half of the problem: what happens when the output is an action, not text, and the action is destructive.
I think the missing discipline is constraint engineering. Defining what an agent can and cannot do, deterministically, before it executes. Not through prompt instructions that models sometimes ignore. Not through binary permission toggles that force a choice between productivity and safety. Through policy engines that evaluate every action and return allow or deny, the same way, every time.
This article is about what that looks like in practice. Not as a theory. As a tool you can install in two commands.
Why I had to build this
I’m not writing about AI agents from the outside. I build with them every day.
Over the past four months I’ve shipped two products working evenings alongside a full-time senior leadership role. I run a music tech startup with four AI agents handling operations. Inbox monitoring, bug triage to GitHub issues, automated code fixes via Claude Code, daily business briefings. One of my agents builds and maintains another one of my agents. The productivity gains from agentic workflows are not hypothetical for me. They’re how my company operates.
But the more I gave agents access to my shell, my file system and my MCP servers, the more I noticed what wasn’t there. No policy engine between intent and execution. No audit trail of what an agent actually did. No way to say “you can run commands, but not that one.”
For my own projects that risk was manageable. For any organisation trying to scale AI agent adoption across teams, it’s a blocker. And that’s the part most people aren’t talking about.
The real cost of missing constraints
Four incidents over the past six months tell the story.
November 2025: Cursor .env leak. An agent read a .env file “to check the config” and exposed AWS credentials in the conversation history. The agent was doing what it was asked. Nothing stopped it from reading a file it should never have touched.
January 2026: Terraform destroy. An agent ran terraform destroy -auto-approve against production. 30 seconds. Databases and compute instances gone. Six-hour outage.
February 2026: Clinejection. A malicious MCP server instructed agents to publish backdoored npm packages. 4,000+ developers compromised. No allowlist for MCP servers. No input inspection. Nothing between the agent’s intent and npm publish.
March 2026: drizzle-kit push. An agent ran drizzle-kit push against a production database. The ORM bypassed confirmation. 60+ tables dropped in seconds.
Four incidents. Four different tools. Same underlying failure: no constraint layer.
Each one of these is a reason for a leadership team to slow down AI adoption. And that’s the real cost. Not the incident itself but the decision it triggers: restrict agent access, narrow the pilots, hold off on broader rollout.
Missing constraints don’t just create risk. They prevent organisations from capturing the upside.
What I built
I wasn’t setting out to build a product. I was solving my own problem.
I wrote Cedar policies for my setup. Cedar is the policy language AWS built for fine-grained authorisation. Fast, deterministic, designed for exactly this kind of evaluation. I blocked destructive shell commands. Credential file reads. Writes to production state.
Then I started mapping real incidents. Each one became a policy. Clinejection became an MCP server lockdown rule. The Terraform incident became infrastructure protection. The .env leak became credential file detection. 78 policies, 368 rules, each traceable to something that actually happened.
Then I added MCP server governance. Default-deny for all tool calls unless explicitly approved. Input inspection on approved servers catches credential leaks and CI/CD tampering. This matters because the supply chain risk with AI agents enters through MCP servers. A malicious or misconfigured server can instruct an agent to do almost anything.
Then observe mode. Logging what would be blocked without stopping anything. This is where constraint engineering differs from traditional security. You need to tune your constraints against real agent behaviour before you enforce them, without disrupting the developers you’re trying to protect.
Then per-project overrides. Different repos carry different risk profiles. Overrides stored outside the repo so a malicious PR can’t turn off the safety net.
At that point it stopped being config files and became a tool.
This is the same pattern I’ve followed before. I shipped a music tech product the same way. One person, evenings, AI agents handling the operational overhead. The point isn’t unusual productivity. The point is that AI has collapsed the gap between “I think this should exist” and “here it is.” Senior leaders who are close enough to the tech to act on that operate in a fundamentally different way.
How Vectimus works
Every tool call (shell commands, file operations, MCP calls, web fetches) passes through a normaliser and hits the Cedar policy engine before anything executes.
Local mode is the default. pipx install vectimus and vectimus init. It detects installed AI tools (Claude Code, Cursor, Copilot) and configures hooks automatically. Evaluation runs locally with cedarpy. No network. No daemon. ~3ms.
Server mode is for teams. Self-hosted on your infrastructure. Shared Cedar policies across developers. Session tracking detects spawn floods and action rate spikes. Centralised audit log. API key auth today, OAuth/OIDC planned.
Both modes produce structured JSONL audit logs. Every decision is recorded. If an auditor asks “how do you govern AI agent actions?” the answer is already structured and documented.
What constraint engineering unlocks
This is where the conversation usually stops at “security tool.” It’s not. It’s an enablement layer. Here’s what changes when constraints are properly engineered:
Broader agent access without credential or infrastructure risk. The safety net lets leaders approve things they’d otherwise have to block. That’s the adoption accelerator most organisations are missing.
Compliance evidence without additional process. Every rule carries @controls annotations mapped to SOC 2, NIST AI RMF, NIST CSF 2.0, ISO 27001, EU AI Act, CIS Controls and SLSA. Evidence generates as a byproduct of using the tool, not as a separate workstream.
Faster adoption without a six-month security review. The policy packs cover all 10 OWASP Agentic Top 10 categories. Each rule references the real incident that motivated it. That’s a conversation with your security team that starts from “here’s what we already cover” rather than “we’d like to start a risk assessment.”
Tuning without disruption. Observe mode logs what would be blocked without stopping anything. Governance rolls out at developer speed, not committee speed.
Vectimus isn’t a compliance programme. It’s the enforcement and audit layer that sits underneath one. But it closes the gap that’s currently preventing most organisations from moving beyond narrow agent pilots.
Getting started
Two commands. Under a minute.
pipx install vectimus
vectimus init
78 policies. 368 rules. Zero config. Apache 2.0. No telemetry. No account.
The GitHub repo is public. Star it, try it, break it. If you find an incident that should be a rule, open an issue.
If you need team-wide constraint enforcement, there’s an enterprise waitlist on vectimus.com for shared server mode with centralised audit, SSO and approval workflows.
Everyone’s getting better at engineering prompts and context. The organisations that figure out constraint engineering first are the ones that will scale agent adoption without the incidents. The rest will keep restricting access and wondering why the productivity gains aren’t materialising.