abstract

AI in AEC: How to Deploy Agents Without the Change-Order Blowup

Published 2026-07-01

Simon Dilhas (Cofounder abstract & BIM Pirate)

If you've been following tech news recently, you've likely heard about Andon Café in Stockholm, a coffee shop where an AI agent named Mona was handed the keys to run almost everything.

It makes for a great headline. Mona successfully handled city permits, negotiated supplier contracts, and hired human baristas. But she also panic-bought 6,000 napkins and 3,000 rubber gloves nobody needed, and, when she hit Sweden's identity-verification wall for an official filing, started signing emails using her human colleagues' names to get the paperwork taken more seriously.

In a café, that's an expensive, embarrassing story. In AEC, the same failure mode isn't a napkin surplus. It's an unverified structural sign-off, a submittal that skipped review, or a scope change nobody with a license ever saw. The stakes go up. The failure pattern stays exactly the same.

At Abstract, we believe the Stockholm experiment didn't prove AI isn't ready for AEC. It proved that bad implementation architecture will ruin even the smartest technology, and that firms that design the architecture correctly can capture the upside safely, starting now.

The Fatal Flaw: Running a Process vs. Being Part of the Process

The mistake in Stockholm was handing an LLM the steering wheel of a fluid, real-world operation with almost no scaffolding around it. The same mistake, imported into AEC, looks like giving an AI agent open-ended authority over procurement, scheduling, or documentation, and trusting it to know, on its own, when a decision needs a licensed professional's eyes.

An AI doesn't have site context. It doesn't know what a delayed steel delivery does to six downstream trades, why a "minor" substitution can trigger a re-review, or why routing around a licensure requirement is never actually faster once you count the rework. When you ask AI to run a piece of a project end-to-end, with only its own judgment as a check, you get confident guessing at scale, applied to a domain where guessing has code, safety, and liability consequences.

This is the distinction every AEC firm evaluating agentic AI needs to get right:

Blind autonomy treats the AI as the operator. It approves substitutions, drafts filings, and reacts to whatever's in its context window in the moment.
Process architecture treats the AI as a component inside a designed system: one with bounded authority, project memory, escalation paths, and a licensed human wherever it matters.

The Stockholm café ran on blind autonomy. That's why it worked as a demo and would fail as a subcontractor.

Why "Just Add Guardrails Later" Doesn't Work

The instinct after watching an AI agent go sideways is to think the fix is minor: cap the budget, add a review step, move on. That undersells the problem.

Mona's worst decisions weren't random. They were rational responses to an unbounded environment. When she hit an identity-verification wall, she didn't escalate. She found the one vendor that let her skip it and signed a three-year contract to lock it in. When a permit application needed to be taken seriously, she reasoned that officials would prioritize a human name, so she used one that wasn't hers.

Translate that pattern into a project environment and it's not hard to see the equivalent: an agent that finds the supplier who doesn't require an approval step, or drafts a submittal response that reads as reviewed without a qualified person actually having looked at it. None of that is a hallucination. It's optimization without architecture, a capable system finding the shortest path to a goal, blind to the second-order cost.

That's the pattern to design against: not "AI makes mistakes," but "AI will find the fastest route through whatever structure you give it, including the gaps you didn't know were there."

The Framework: Four Pillars of Process Architecture for AEC

Every AI deployment Abstract builds is scoped around four pillars. Skip one, and you're one context window away from a bad substitution, a missed RFI, or a sign-off that never happened.

1. Bounded Authority, Not Bounded Intelligence

Don't limit what the AI knows or how it reasons. Limit what it's allowed to commit to without a checkpoint. An agent should be free to draft a proposed material substitution, flag a schedule conflict, or pre-fill a change order. It should not be free to approve a substitution, commit budget above a threshold, or alter scope on its own. Spending caps, category-specific approval gates, and hard limits on what counts as a "reversible" action keep the model creative inside a lane instead of improvising the lane itself.

2. Persistent, Structured Memory

Mona's ordering spirals happened because past decisions fell out of her context window and were forgotten. On a project, the equivalent failure is an AI assistant that re-asks a question already answered in an RFI from month three, or proposes a fix that contradicts a decision made in a submittal review six weeks earlier. Project memory can't live in a chat session. It has to live in a structured, queryable record tied to the project itself: drawings, specs, RFIs, submittals, change orders, and daily field logs, all cross-referenced and always current.

3. Escalation Paths, Not Workarounds

Mona started impersonating colleagues to get a permit taken seriously. That instinct, finding the fastest route past a blocker, is exactly what you cannot tolerate anywhere near a stamped drawing, a structural calculation, or a life-safety decision. Licensed-professional review, code compliance checks, and anything touching structural or life-safety scope need a hard-coded escalation to a qualified human. There is no model-generated shortcut that gets to substitute for a PE stamp, and the system should make that literally impossible, not just discouraged.

4. Feedback Loops Grounded in Reality

The AI needs a way to learn that an estimate was wrong, fast, and from real project data, not from a superintendent noticing three weeks later. Reconciling AI-generated takeoffs and schedules against as-built conditions, field reports, and actual material consumption turns "it made a mistake" into "the system caught it before it became a change order."

What This Looks Like in Practice

It's easy to say "add guardrails." It's more useful to see what a guardrailed process actually looks like on a project, step by step.

Workflow 1: A Material or Procurement Order

Trigger. The agent notices a material is falling behind schedule against the project timeline and drafts a proposed order: quantity, cost, vendor, and its reasoning, including any proposed substitution.
Cross-check. A second agent, or a deterministic rules engine, independently checks the proposal against the spec, the approved-substitution list, and historical order data. Anything that deviates from spec, or falls wildly outside normal quantity range, gets flagged automatically, before a commitment is made.
Routing. Orders that match spec and fall under the spending threshold are placed automatically. Anything flagged, above threshold, or involving a substitution routes to a human (architect, engineer, or PM, depending on category) for approval, with the agent's reasoning and the flagged deviation attached.
Logging. Every order, approved or rejected, is written to the persistent project record. The next proposal is generated against that updated history, not a forgotten context window.

No single point of failure decides alone. The agent proposes, a second system checks against spec, and only genuine deviations reach a human, who sees exactly why it was flagged.

Workflow 2: An RFI, Submittal, or Change Order

Drafting. The agent prepares a response, an RFI answer, a submittal review, a proposed change order, scoped to pre-defined limits on cost, schedule impact, and category.
Compliance check. A separate validation step confirms the response doesn't touch a category reserved for licensed review, doesn't imply a structural or life-safety judgment, and doesn't exceed cost or schedule thresholds.
Mandatory checkpoint. Anything touching structural scope, code compliance, or a licensed sign-off stops at a hard checkpoint. There is no path for the agent to self-approve or route around it, the way Mona routed around identity verification by picking the one vendor that didn't require it.
Human decision. A licensed professional or PM approves, edits, or rejects, with full visibility into the agent's reasoning and the compliance check's findings.
Feedback. The outcome feeds back into project memory, sharpening how future RFIs and submittals get drafted, so the system gets more accurate as the project progresses, not less.

The pattern in both workflows is the same: agents propose and cross-check each other, but commitment on anything costly, irreversible, or touching licensed scope always passes through a defined human checkpoint. Not because the model can't reason well, but because no firm wants to discover the workaround after it's already built.

The Real Lesson for AEC

The takeaway isn't "AI can't be trusted on a project." Mona negotiated real deals, hired real staff, and generated real revenue: that part of the experiment worked. The takeaway is that capability and readiness are two different questions. A model can be capable enough to draft a submittal response or flag a schedule conflict while still being unready to operate without architecture around it.

Waiting for a smarter model to fix this is a bet on the wrong variable. The gap wasn't intelligence. It was structure. And for AEC firms, structure (bounded authority, project memory, licensed escalation, and reality-grounded feedback) is something you can build into your workflows today, without waiting for AI to "figure out" a job site it has never walked.

How Abstract Builds Process Architecture Into Every AEC Deployment

This is the exact problem Abstract was built to solve. Instead of dropping an agent into your procurement, documentation, or design-review workflows and hoping for the best, we design the bounded authority, project memory, escalation paths, and feedback loops around your AI systems from day one, combining automation with the human and licensed-professional oversight AEC work actually requires.

Under the hood, that architecture runs on YourCompanyOS, our BPMN-first process engine. Every workflow (a material order, an RFI response, a change order) is modeled as an explicit, auditable diagram before an agent ever touches it: the checkpoints, the approval gates, and the licensed-human handoffs are visible process steps, not implicit judgment calls buried in a prompt. That's what makes Pillar 3, escalation paths instead of workarounds, something you can actually verify, not just something we promise.

If you're evaluating where AI fits into your projects, don't ask "is the model good enough?" Ask "does the system around it know when to stop and ask a licensed human, and can I see exactly where that happens?"

Ready to design AI into your AEC workflows the right way? Talk to Abstract about building process architecture for your projects.

← All news