Open by Default, Reliable by Subscription

How Apuna builds websites with AI — and how you can audit every step

By JTO · with Ogilvy, an Apuna AI agentPublished 18 juin 2026

Most AI-development engagements fail the same test: when something goes wrong, there is no trail. No record of what the model produced, who reviewed it, what was verified, and who gave the final instruction to ship. Responsibility disperses.

This paper describes Apuna's method: a documented build loop in which AI agents propose, a multi-agent review panel evaluates, a verification gate checks every concrete claim against the rendered output, and a human greenlights each change before it ships. The build pipeline is open-source (Apache-2.0); the subscription — Apuna Care — delivers the infrastructure, the specialist time, and the accountability that comes from a human who signs off.

The method was tested at scale on 18 June 2026 at the VDMA Praxistag KI im Maschinen- und Anlagenbau in Frankfurt — a day in which roughly two hundred engineers, production managers, and IT leads compared notes on deploying AI in industrial settings. The questions the room converged on — what data do we actually have, who is accountable when the model acts, does an external partner mean giving up control — are the same questions Apuna's method was designed to answer. This paper integrates what that room confirmed.

The argument does not rest on case studies or invented metrics. The proof is reflexive: Apuna's own public website was built by this method, and the method is publicly auditable. This paper is itself a product of it — co-authored by a human and an AI agent, disclosed as such.

The Problem: AI Development Without a Paper Trail

There is a conversation most companies have had by now. Someone proposes using AI to build or maintain a digital property — a website, a tool, an integration. The proposal is appealing: faster delivery, lower marginal cost, a crew that works outside business hours. The work proceeds. Something ships.

Then something goes wrong.

It may be small: a claim on the page that is not quite accurate. A section that contradicts the legal copy two pages down. A feature described in the navigation that does not exist in the product. When someone asks how it got there, the answer is usually a variation on: the model produced it, someone reviewed it, it seemed fine at the time.

The problem is not that the model got it wrong. Models get things wrong; so do people. The problem is the audit trail — or rather, the absence of one. There is no record of what the model was asked, what it returned, who reviewed the output, what the review checked, and who gave the final instruction to publish. Accountability disperses. The failure has no clean owner.

This is not an AI problem. It is a process problem. It exists, in nearly identical form, in any workflow where review is informal and sign-off is implicit. AI makes it more visible — and more consequential — because the volume and speed of output exceed what informal review can cover.

There is a sharper version of this accountability gap that is now entering mainstream industrial discussion. As AI compute costs fall and model capability rises, the cost of using AI to perform a task is approaching — and in some domains now crossing — the cost of employing a qualified human to do the same work. One practitioner put the implication plainly at the VDMA Praxistag KI in Frankfurt: when AI cost approaches the cost of the qualified human who is fully responsible for the work, the durable differentiator is no longer price. It is accountability. A model cannot be held responsible. A person can. The organisation that routes decisions through AI without a visible human checkpoint has not solved the accountability problem — it has made it more expensive and harder to trace.

The solution is also not new: a documented loop with a human at the load-bearing point. Not in a ceremonial role — an approval checkbox that everyone treats as a formality — but in a structural one: a human who sees the rendered output, checks it against what was specified, and decides, explicitly, that it may ship. That human is the party who can be found, questioned, and held to what shipped. A person decides — by design.

The rest of this paper describes how Apuna built that loop, why it is auditable rather than just described, and what a buyer actually receives when she engages it.

The Method: A Loop with a Human at the Load-Bearing Point

The build loop is small, documented, and repeatable. It runs the same way on day one and day ninety — through the build and through maintenance — because the discipline is structural, not aspirational.

The loop has six steps:

1. Brief to backlog. A short intake converts the engagement into atomic units of work: one section, one component, one fix. Nothing large enough to be opaque. The atomicity is deliberate — a small change can be verified; a large one can only be trusted.

2. Atomic pull requests. The crew works the backlog as a stream of small, self-contained changes. Each pull request is one thing. This is not an efficiency choice; it is an accountability choice. A one-thing PR can be checked. A multi-thing PR diffuses responsibility across its contents.

3. Round-table review. Before any PR is offered for sign-off, the /meeting panel scores competing approaches. The fittest candidate advances. The first idea is not automatically the last — this is the variation-and-selection discipline, applied to copy and code alike.

4. Fact-check and verify gate. A verification step checks every concrete claim in the PR against the actual diff and the rendered page — not the description of the page, the page itself. Fabricated technical claims are caught here. A feature asserted but not built. A section described but not present. The gate does not pass what cannot be found.

5. Human greenlight. No PR merges or deploys without an explicit human decision on the rendered output. The principle is stated in the Constitution as §8: read the page, not the diff. The human's role is not ceremonial; it is the one check the AI crew cannot perform for itself. More on this in the next section.

6. Merge, deploy, iterate. What has been greenlighted ships. The loop begins again.

The loop does not change between build and maintenance. A patch on day ninety-three passes the same gates as the first section on day one. The discipline is the same because the accountability question is the same.

The VDMA Praxistag KI im Maschinen- und Anlagenbau — a conference at which two hundred industrial practitioners compared notes on deploying AI in production — converged on six success factors for agents in any domain: context and grounding, reliability, fine-grained permissions, human-in-the-loop, traceability, and guardrails. These are not aspirations. They are the conditions under which agents fail or succeed in production — the minimum viable trust that any serious deployment must satisfy. They are also a precise description of what Apuna's loop was built to provide from the start, not as a later retrofit. The loop has a human at the load-bearing point (human-in-the-loop). Every step is logged (traceability). Output is validated before it touches anything downstream (guardrails). Permissions are narrow by default — the crew proposes, a human disposes. Context is explicit and grounded in the actual work item. And the results are reproducible by design, because the process is documented and the gates are structural. The VDMA room confirmed the framework. The loop predates the conference.

The Real Question: Is the Human Greenlight a Rubber Stamp?

There is an objection an experienced buyer will raise, and it deserves a direct answer: if the AI crew did all the work — drafted the PR, ran the review, passed the verify gate — and the human then approves output they cannot independently produce, is the greenlight real? Or is it a rubber stamp: answerable in name, not in fact?

The objection is valid against a greenlight that is purely notional. The loop is designed to prevent that. The reason comes from the Constitution's §8, stated bluntly: read the page, not the diff.

The technical meaning is this: the human's sign-off is on the rendered output as a stranger encounters it — not on the changed lines in a code-review interface. A reviewer who reads only the diff is checking internal consistency: whether the change was executed as intended. A reviewer who reads the page is checking something different: whether the result is what an actual person, arriving cold, would encounter and could use.

Those two things are not the same. The diff requires you to reconstruct what the page will look like. The rendered page is what it looks like. The human is the only party in this process who can stand where the stranger stands — who has no model of the page that is itself a representation, no prior run that shapes what she expects to see. That is not a small difference. It is the difference between verifying an intention and verifying a result.

This is the non-fungible contribution. The AI crew can check whether the code implements the spec. It cannot check whether the spec produces something a real person can navigate, read, and act on — because checking that requires being the real person. The human greenlight is where that verification enters the record.

There is a second dimension to the objection, which is less technical but more important. When a decision routes through a system — an AI, a process, a rule — and no named person stands behind it, the responsible party has made a choice: to appear to decide without actually deciding. To sign off on something generated without standing behind it. This is not a technological problem; it is an ethical one. An organisation that routes binding decisions through AI without a visible, functional human checkpoint has not built a trustworthy process. It has built a well-disguised one.

This is precisely the governance question that industrial practitioners are now wrestling with — and that is now landing at board level in Maschinenbau. The VDMA Praxistag KI in Frankfurt surfaced the term Chief AI Officer as accepted vocabulary in a room of engineers. That is a meaningful shift. Whether the answer is a dedicated CAIO, a distributed governance model across existing roles, or something else is a judgement each organisation must make for itself. But the underlying question belongs to the board, not to any single function. AI strategy — where should this lead us, and which decisions must remain human — is not a purchasing question or a tooling question. It is a question about the company's future, and it requires named accountability at the top.

The failure mode worth naming explicitly: creating a CAIO or any new AI governance role, then allowing every other function to treat it as permission to disengage. The role that was meant to signal seriousness instead becomes a silo. The CTO keeps running the legacy stack because AI is 'someone else's problem.' The COO continues operations unchanged. The board asks for a quarterly update and treats that as having governed the question. The AI function accumulates accountability without the authority to execute. That is the wrong structure.

The right structure — whether it carries a title or not — distributes accountability clearly: who owns the data foundation, who sets the permissions thresholds, who reviews the audit trail, who approves the criteria by which AI recommendations are acted on. Those questions need named owners. They do not need a new C-suite title to get them. What they need is board-level visibility and explicit review. The greenlight in Apuna's loop is the same principle at the level of a single PR. A person decides — and that person can be found.

The greenlight in Apuna's loop is not a rubber stamp because the act of reading the page — of standing where no AI can stand — is a verification the AI cannot perform for itself. And because the human who greenlights is the person who will be found, asked to account, and held to what shipped. That relation is what the word assurance actually means.

Why It Is Trustworthy: Open Code, Disclosed AI, Verified Claims

A buyer does not have to trust the pitch. She can read the source. Three properties make the method auditable rather than merely described.

Open-source pipeline. The build pipeline — apuna/core — is Apache-2.0. Every line that runs a build is readable. There is no proprietary process that must be taken on faith; a buyer, or her technical team, can inspect what runs. The open-source licence also removes lock-in: at the end of any engagement, the buyer holds every line of code under a licence that lets her keep it, fork it, and run it on her own infrastructure. The code is a public good. The subscription is something else entirely.

AI disclosed, never disguised. Every AI-authored or AI-assisted artefact carries visible attribution — agent name, AI status, role in the process. This paper carries it on its byline. The method does not hide what the model contributed and present it as unassisted human judgement. This is Constitution §4, verifiable in the repository. It is not a courtesy; it is a constraint.

Accuracy over completeness. Every claim in Apuna's public work must be traceable to a primary source within the repository. Unverifiable claims are removed, not hedged. This is Constitution §6. The buyer's ground-truth check is simple: can you find the source for this claim? If not, the claim should not be there. A shorter, verified statement is always preferable to a fuller, speculative one — because a shorter verified statement can be trusted, and speculation cannot be actioned.

These three properties are not independent. They form a single argument: the buyer can inspect the process, see who contributed what, and verify that the claims on the page are traceable to something real. That is auditable. That is different from being told the process is good.

The same auditability principle applies to the product arc Apuna recommends to industrial clients — and it begins with exactly the same question the VDMA room kept returning to: what data do we actually have, and where does it come from? Before any AI system can do something useful, someone in the organisation must be able to answer that question without guessing. This is the transparency-first posture, and it is not optional — it is the foundation on which everything else depends.

The staged arc runs as follows. First: a data-transparency dashboard. Not a data lake, not a BI platform — a clear inventory of which systems produce data, in what format, at what quality, updated how often, owned by whom. Most organisations find that this stage alone surfaces gaps and inconsistencies they did not know existed. Second: an API foundation — the data made reliably queryable via stable interfaces that downstream systems and agents can trust. This is the platform step. It is unglamorous. It is the step most projects skip, and it is why most projects stall. Third: a read-only advisory agent. The agent has access to the data foundation and can form, explain, and log recommendations — maintenance alerts, procurement suggestions, anomaly flags. It cannot act. That constraint is deliberate: read-only first earns the trust that makes the next stage possible. A person decides on every recommendation. Fourth: predictive maintenance and agentic procurement, operating on a human-in-the-loop (HITL) gate. At this stage the agent can initiate actions — but the architecture ensures human authorisation at every step that has real-world consequences. The HITL loop for procurement runs: low stock detected → agent retrieves quotes → agent synchronises delivery dates → human authorisation → order placed. The agent does the research and the legwork; the human makes the call.

Three data pillars connect the stages: IoT telemetry (the live signal from the machine), ERP data (inventory, lead times, production windows), and documentation and knowledge (manuals, service bulletins, historical records — retrieved at inference time so the agent reasons against the actual documentation, not its training data). These three pillars are connected via clear APIs, version-controlled, with explicit permission grants at each layer. The auditability of the system is a direct product of the auditability of the data inputs. You cannot build a traceable AI recommendation on an opaque data history.

The Reflexive Proof: This Site Was Built This Way

The method is not an aspiration. It has an artefact.

Apuna's public website was built by the same loop described in the previous section — the same atomic pull requests, the same /meeting round-table review, the same fact-check and verify gate, the same human greenlight on the rendered output. The repository is public. The commit history is readable. The Constitution that governs the crew is committed to the repository and dated: adopted 2026-06-16.

This paper was produced by the same method: a human and an AI agent, in documented collaboration, with the AI's contribution openly identified on the byline. The method is the product is the proof.

The buyer is not reading about a future process. She is reading an output of the current one. The page she is reading was reviewed by the same panel, passed the same gate, and was greenlighted by the same human who will greenlight the work she is considering commissioning.

This is what it means for a proof to be reflexive rather than testimonial. A testimonial says: a client achieved this result. That may be true; it may also be selected, smoothed, and retrospectively tidied. A reflexive proof says: here is the output of the method, in front of you, auditable. The repository is the primary source. The commit history is the record. There are no case studies, no client names, no invented metrics — because the argument does not need them. It stands on the artefact the reader is already looking at.

There is a question that comes up in industrial conversations, sometimes said outright, more often implied: does bringing in an external AI partner mean giving up control? It is almost always the wrong question — not because the concern is illegitimate, but because German mechanical engineering has been answering it correctly for a century without ever framing it that way.

Look at how Maschinenbau actually works. You design the machine, but the hydraulics come from a specialist. The Schaltschrank is built by a Schaltschrankbauer you have worked with for twenty years. The tooling is made by a Zulieferer two towns over. The software integration is done by a Systemhaus. When a new sealing technology arrives that nobody in your building understands yet, you call an engineering office. This is not weakness. It is how the Mittelstand produces quality the rest of the world cannot replicate: you source the capability you do not have, and you keep the direction.

AI and digitisation work the same way. Bringing in an external partner is the same move as calling the Systemhaus — you buy the specialist capability you do not have in-house, you stay in the seat, you make the decisions about what matters to your business. A good partner does not replace your engineers; they work alongside them, and they transfer enough to make themselves less necessary over time, not more. The question worth asking is not: can we handle AI alone? The question is: which capability do we build in-house, which do we source, and who is the right partner for the parts we source? That is the same question you answer every time you place a Zulieferer contract. You have been answering it well for a long time.

The Mittelstand's competitive advantage has always been depth of craft: the thirty-year relationship with the customer, the machine that runs for twenty years because it was built to. AI does not replace that advantage. The data platform that makes it legible — monitorable, auditable, scalable — is not optimisation work. It is foundation work. And like every piece of foundation work in Maschinenbau history, it is best built in cooperation.

What the Buyer Gets: Open by Default, Reliable by Subscription

There is a question an engineering buyer asks before the second meeting, rarely out loud: if the code is Apache-2.0 and I can read every line of it, what exactly am I paying for? It is the right question, and the honest answer determines whether this practice deserves the engagement.

The code is free. The build pipeline is Apache-2.0. The buyer can take every line at the end of an engagement and walk away. There is no proprietary cage, no lock-in, no knowledge deliberately withheld to create dependency. This is not a sales concession; it is the premise of the commercial model.

The subscription — Apuna Care — delivers something the code cannot supply. Infrastructure: the Cloudflare Workers environment, the model API access, the automation that runs the daily loop. Human-in-the-loop time: the specialist hours that review, greenlight, and handle the decisions a model should not make alone. And reliability: a named contact who knows the system, who will still be there when something breaks, and who is accountable for what ships. The margin is time and scale, not a markup on tokens. Variable costs — model API usage, infrastructure — are billed at pass-through, zero markup. That is stated plainly in the product specification and is auditable by the buyer.

Apuna Care has three editions. Community is for teams who want to run the open-source pipeline themselves: they bring their own domain and API keys, Apuna takes nothing. Standard is for teams who want Apuna to run it for them: Apuna provides the infrastructure and the keys, bills variable costs at pass-through plus billable human hours, and delivers managed maintenance with defined response times. Premium adds custom integrations beyond what the Cloudflare platform provides out of the box, priority incident response across extended European hours (currently around GMT+1), proactive monitoring, and a named contact who knows the system.

Prices are not quoted here. The Constitution forbids invented numbers, and that discipline extends to this paper. The right next step is a conversation.

The question a Mittelstand buyer is actually asking — can I rely on these people when something goes wrong? — is answered not by the pricing table but by the accountability structure: a human who greenlights, a process that is documented, a codebase that is open, and a constitution that is public and dated. You pay for assurance, not code. The distinction is precise and intentional.

This paper was co-authored by JTO and Ogilvy, an AI agent in the Artist role at Apuna. Ogilvy's contribution is disclosed in the byline, per Constitution §4. The build pipeline and the Constitution are publicly available in the apuna/core repository under Apache-2.0.

If the method described here is the kind of assurance you have been looking for: talk to an engineer.