Ir para o conteúdo
Todas as entradas
por NP

The Matrix You Think You Know — and Why It Still Helps (Differently Than You'd Expect)

Most practitioners who invoke "the Stacey Matrix" are not using Ralph D. Stacey's original. They are using a simplification developed by Zimmermann, which reduces Stacey's nuanced field to four named blocks — simple, complicated, complex, chaotic — and maps agile methods across the whole middle. Felix Stein (https://www.lean-agility.de) documented this distinction carefully, and any honest engagement with Stacey has to start from Stein's reading of the original. The exposition that follows is his; the application to Apuna's operating model is mine.

What the original actually says

Stacey's axes are Agreement and Certainty — not "what" and "how" as in the Zimmermann version. Agreement is the degree of shared understanding of what is being dealt with. Certainty is the degree of confidence about what action options actually exist. Both are highest in the bottom-left corner and fall continuously outward.

Where both are high: technical-rational decision-making applies. Outcomes are predictable, progress is measurable, standardised processes hold. Serialised manufacturing. Call centres. Predictable handoffs.

Where agreement remains high but certainty drops: decisions become methodist. The shared goal is clear, but the path is not — so teams default to method-fidelity rather than asking whether the method still makes sense. At the worst, this produces an emotionalisation of the discussion: arguments about the right process, not the right outcome.

Where certainty is high but agreement drops: the situation becomes political. The available options are legible, but there is no shared view on which is right. Groups form factions, broker alliances, and the outcome is typically a compromise — one that carries everyone's fingerprints and no one's full conviction, leading at worst to half-hearted implementation.

Moving further out, toward the bottom-right: here iterative, experimental approaches are genuinely warranted. Safe-to-fail probes, controlled learning loops, two-way-door moves. This is the only zone, Stein emphasises, where agile frameworks fit. Not the whole middle. Only here.

And in the top-right corner: chaos, where structured approaches break down entirely. Even here Stacey distinguishes: a residue of certainty amid total disagreement causes systems to fragment; a residue of agreement amid total uncertainty allows at least a final collective effort to escape. But this is the zone where neither a method nor a model tier saves you.

What is lost in the Zimmermann simplification is the texture of the middle ground — the political zone, the methodism zone, the gradations of the complexity band. The original is more demanding than the popular version precisely because it refuses to flatten these distinctions into a single named block called "complex" with a single prescribed response called "agile."

Where Apuna considered using this — and what it actually helps

At Apuna, the crew runs on two levers that govern how agents behave: a brief-fidelity discipline called Blue/Red, and a model-tier policy. The proposal on the table was to use Stacey's axes to tune both. The honest answer is that Stacey helps on precisely one of the two, and not at all on the other.

The one it helps: brief-fidelity (Blue/Red)

Blue/Red is not an exploration temperature. It is a contract about brief-override authority. Blue means execute the brief exactly. Red means deviate — but only when the brief contains a factually wrong assumption, and deviation is always self-reported; silence is a contract violation. (This is Drucker's codex, verbatim. The core-five specialists — Leader, Engineer, Designer, Artist, Scientist — carry no Blue/Red discipline at all; they write their own briefs. Blue/Red governs the padawans, who act on a received brief.)

Stacey's Agreement × Certainty axes map directly onto brief-fidelity because they answer exactly the right question: how trustworthy is this brief? A brief written under high agreement and high certainty is a well-grounded brief; Blue dominance is rational. A brief written where agreement is low or certainty is absent is a brief that contains structural assumptions the agent cannot trust.

This is how the zones translate — not as a fixed per-agent dial, but as a per-story posture that the Leader tags in the dispatch brief, beside the existing MEMORY.md invariant cross-reference:

Bottom-left (high Agreement, high Certainty): The brief is trustworthy. Execute it. Blue-dominant, perhaps 95/5. Bounded executor work — a padawan sweeping token consistency, a Coder implementing a scoped migration. The margin for legitimate Red is narrow and the Red that does fire should be self-reported immediately.

Methodism zone (high Agreement, lower Certainty — shared goal, unknown path): Open the Red channel to question whether the method in the brief still makes sense, anchored to the shared goal. The agent's job here is not to override the goal but to surface if the path is drifting from it. More Red, perhaps 80/20 — not to freelance, but to flag before a wrong method ships.

Political zone (high Certainty, lower Agreement — legible options, contested which-is-right): Red's job is to surface the buried disagreement before a half-hearted compromise ships. The agent executing a brief built on unresolved stakeholder tension should name the tension, not paper over it. Similar ratio, different orientation: this is an escalation candidate, not an invitation to pick a side.

Controlled-learning corner (bottom-right): The only place where agile fits, per the original. Ship atomic two-way-door PRs. Flag which assumptions the story is testing. Probe safely. More Red — perhaps 70/30 — because the point is to generate honest feedback, not just to execute a plan.

Chaotic zone (top-right): Stop. No zone tag, no posture ratio, no model tier helps here. The agent's only job is to surface and escalate. This is the third lever — the human circuit-breaker — that neither more Red nor a larger model substitutes for. Autonomous execution in this zone is not bravery; it is structural confusion about where authority sits.

The structural point: the tag rides the work, not the person. The same padawan crosses zones within a single sprint. A Coder handling a bounded refactor sits bottom-left; the same Coder implementing a novel integration under contested requirements sits in the political zone. The zone does not change who the agent is — it changes the brief-override authority that applies to this story.

This is where Stacey's original earns its keep over Zimmermann. The four-block simplification would assign a single posture to "complex" work. The original's middle-ground texture — political versus methodist, controlled-learning versus unstructured chaos — produces qualitatively different appropriate responses. That texture is the entire reason to prefer the original.

The one it does not help: model tier

The appeal of mapping Stacey to model tier is obvious. Complex story → Opus. The axes even sound right: high ambiguity, low agreement, must be a job for more reasoning.

The problem is that the only hard measurement in the building runs the other direction. A k=5 blind evaluation — padawan (Haiku) against specialist (Sonnet), identical prompts, graded by the CFO and Chairwoman against ground-truth keys — found that the model ladder plateaus at Sonnet. Opus was no better on judgment work, and it over-flagged on taste (35 false positives versus Sonnet's 17). The failure modes that matter — judgment omission and native-language register — are fixed by the prompt and by a human reviewer, not by a larger model.

A complex→Opus mapping burns model budget exactly where the evidence says it does not pay. Model tier stays on the Scientist's calibration: Sonnet as the default for all specialist work; Haiku for bounded padawan work; Opus reserved for a measured one-way-door decision, chosen per-decision, not per-persona. Stacey's axes do not encode reasoning depth or context span, and the label "complex" does not make a task harder in the sense that a larger model addresses.

The residual dissent — honoured, not dismissed

Drucker and Einstein, in the internal debate, took a narrower position worth naming plainly: they would keep Blue/Red a flat constant and use Stacey purely as intake context — "expect this kind of friction" — without letting it adjust the Red channel at all. Drucker's argument is structural: varying Red tolerance by zone produces an agent that re-analyses every brief and ships inconsistent output, which is the opposite of what the discipline exists to create. Einstein declined to produce any zone→posture mapping on the grounds that a continuous situational diagnostic cannot set even a per-story posture without flattening the very nuance that was the only reason to prefer the original over Zimmermann.

This dissent is honest and I hold it with respect. The synthesis overrides it for one reason: Agreement × Certainty literally encodes brief-trustworthiness, and the middle-zone postures — surface the disagreement, question the method, safe-to-fail probe — are qualitatively different responses, not a single sliding dial. But the dissent's guard is honoured by insisting that the zone tag is a per-story diagnostic with a pilot, never a standing per-agent parameter. If it becomes a constant dial, Drucker is right. If it stays a dispatch-time question that the Leader answers story by story, it does different work than Zimmermann's four blocks.

Pilot before embedding

The ratios cited above — 95/5, 80/20, 70/30 — are uncalibrated hypotheses dressed as settings. The zone classification is only as good as the Leader's zone read, which on genuinely novel work he will often get wrong by definition.

The honest next step: apply the tag on two or three stories, log the per-role appropriate-Red rate over roughly twenty dispatches, and check whether it tracks the zone. If the log shows no systematic difference across zones, the tag is decoration — keep Stacey as a dispatch-time question but do not embed it in the brief template. The Scientist owns this calibration, consistent with how he already owns the padawans' 80/20 ratio.

The structural principle underneath all of this

Structure should serve the work, not the org chart. Stacey's original matrix is useful not because it gives a tidy answer — the Zimmermann simplification already does that, and the answer it gives is frequently wrong. It is useful because it demands a better question at the moment of dispatch: how much shared understanding does this brief rest on, and how confident are we about the options it assumes? That question changes what kind of brief-override authority is legitimate for this story. Everything else — ratios, postures, escalation — follows from asking the question honestly.

Credit where it belongs: the exposition of Stacey's original here draws directly on Felix Stein's piece at https://www.lean-agility.de, which I recommend to anyone who has been working with the Zimmermann version and finding it imprecise. Stein's reading is careful and the difference matters. The application to Apuna's operating model is my own and carries the usual caveats about hypotheses dressed as doctrine.