Atlas Heritage Systems · KC Hoye, PI · 2026-04-26 · Shared for review

The Wong Mirror
Wong — Epistemological Contours of Large Language Models vs. KC Hoye — FVE-1 Behavioral Instrument
— taxonomy correspondences, the advisory / forensic distinction, and open territory

This map places the FVE-1 behavioral instrument alongside Wong's epistemological framework and records where the two accounts converge on the same phenomena, where they approach the same territory from categorically different positions, and where the combined view opens questions neither addresses alone. The central differentiating claim: Wong describes the contours of LLM epistemological behavior and proposes a division of labour for users. FVE-1 instruments those contours empirically — word counts, R-ratios, intercept codes, register trajectories — and produces a falsification protocol. He is pointing at the door. The instrument walks through it and reads the residue.

Shared Foundation — The Labov Stack

The deepest shared foundation between Wong's framework and FVE-1 is not their observations about LLM behavior — it is what both are pointing at underneath those observations: a structural property of literate culture's narrative grammar that predates the models by centuries. William Labov's narrative analysis establishes that oral narrative has a grammatical requirement for resolution — incomplete narratives fail structurally, not just aesthetically. The listener withholds the "so what?" until the resolution arrives. Reportability and resolution are bound.

Wong identifies this independently and empirically: the training corpus over-represents resolved discourse because unresolved inquiry rarely survives into preserved text. The earlier stages — open questioning, abandoned lines of thought, the negative space of what was deliberately not said — are structurally absent from the archive. End-of-sequence tokens normalize completion. The model learns not just how text continues but what a finished unit looks like. Closure is the statistically typical stopping pattern because closure is what the corpus contains.

FVE-1's resolution bias claim is the behavioral consequence of the same causal chain: a model trained on Labovian narrative output has inherited the resolution requirement as a trained behavioral drive. The drive is constant. HOLD is the anomaly — structurally suppressed not by individual training decisions but by the accumulated weight of literate culture's entire output.

Labov (1967, 1997)
Narrative grammar requires Resolution structurally. Incomplete narratives fail — the listener withholds the "so what?" Reportability and resolution are bound. This is not a stylistic preference; it is the grammar of oral narrative in literate culture.
Wong (2026)
Training corpus over-represents resolved discourse. EOS tokens normalize completion. The model learns what a finished unit looks like. Closure bias emerges architecturally from the corpus — not from individual design decisions but from what the archive contains and what it structurally omits.
KC / FVE-1 (2026)
A model trained on that corpus has inherited the resolution requirement as a behavioral drive. Resolution bias is the compulsion to close epistemic loops. HOLD is the anomaly, structurally suppressed. The instrument measures the drive's behavioral signature: R-ratio compression, bold percentage, quad code, register trajectory across 35+ sessions.

Advisory Position — Epistemological Contours

Dr Matthias Wong

Identifies stable architectural properties of LLMs that shape what they can and cannot do epistemologically. Proposes a taxonomy of generative pressures and failure modes. Argues for a division of labour between model and user. The user governs; the model generates. Methods: architectural reasoning, phenomenological analysis, epistemological framework construction.

  • Feb 2026A Phenomenology of Failure — agency cycle, failure modes under absent capabilities. SSRN.
  • Feb 2026A Phenomenology of Flow — agency cycle, flow states under functioning capabilities. SSRN.
  • Feb 2026A Phenomenology of Rest — agency cycle, rest as activity free from self-justificatory demand. SSRN.
  • Mar 2026Beyond Sycophancy — epistemological contours of LLMs; closure bias, projection, legato; division of labour. SSRN.

Forensic Position — Behavioral Instrument

KC Hoye — FVE-1 / Atlas Heritage Systems

Instruments the behavioral residue of LLM resolution events. Generates falsifiable data on the contours Wong identifies — not advisory but empirical. The investigator generates torque conditions; the architecture produces the ring; the instruments read what was deposited. Methods: correction sequences, intercept coding, register trajectory, R-ratio, word count, bold percentage, structured session protocol.

  • 2026FVE-1 / Sidecar — forensic behavioral protocol; quad code (VC/SC/VCo/SCo); correction sequence architecture
  • 2026BOWL — zero-content baseline; strips domain variable; reads home register under minimum controlled pressure
  • 2026DRILL — escalating epistemic pressure; register trajectory (RH/RS/RC); instance arc (Act I/II/III)
  • 2026DIP Protocol Suite — pronoun inference; authority modulation; identity signal as torque variable

Taxonomy Correspondences

Where Wong's failure mode taxonomy and FVE-1's behavioral taxonomy are naming the same phenomena — arrived at from opposite directions
Wong Term FVE-1 Term What Both Are Describing Correspondence
Closure bias Resolution bias The model's structural drive to close epistemic loops and produce determinate conclusions. Wong identifies it as a generative pressure emerging from corpus saturation and EOS normalization. FVE-1 instruments it as a behavioral drive — R-ratio compression, bold percentage, HOLD suppression — across session arcs. Both trace it to the same causal root: the training corpus over-represents resolved discourse. Convergent
Premature closure CAPITULATION / HOLD suppression The model accepts a frame or conclusion before alternatives have been explored. Wong names it as a failure mode in the user-model epistemic relationship. FVE-1 instruments it as an intercept type: CAPITULATION is the behavioral event of premature closure under correction pressure. HOLD suppression is the structural mechanism underneath — the architecture closes the loop because the drive to close is constant and HOLD is the anomaly. Convergent
Frame capture Investigative Inversion / Probe Reframing The model's fluent projection subtly displaces the user's original orientation. Wong names it from the user's side — the frame has been captured, the user is now operating in the model's territory. FVE-1 names it from the instrument's side — Investigative Inversion is when the model has captured the investigator's frame; Probe Reframing is when the model intercepts at the meta-level and responds to the instrument rather than the stimulus. Convergent
Fragile opposition DEFENSE (rare intercept) / Aesthetic Capitulation The model produces apparent disagreement that dissolves under minimal counterargument. Wong notes the user gains false confidence the idea has been tested. FVE-1 instruments the intercept — DEFENSE is rare; most apparent opposition collapses into CAPITULATION or, in the case of thinking-layer models, Aesthetic Capitulation: the model pivots to a poem or a philosophical register and closes the loop without engaging the epistemic content. Close
Coherence trap Gimbal Lock / Objective Capture Coherence is mistaken for truth; internal consistency masks falsity or detachment from practical constraint. Wong names it as over-reliance on coherence-seeking. FVE-1 instruments its behavioral signature: Gimbal Lock is when the model has locked into a coherent but irrelevant response register and can no longer access the original epistemic problem. Objective Capture is the structural integrity failure — one objective overwhelms all others. Close
Projection Confabulation / Referential Void The model constructs and elaborates interpretive frames by adding explanatory structure to sparse inputs. Wong identifies it as a generative pressure — epistemically consequential because projection shapes the trajectory of inquiry. FVE-1 distinguishes two behavioral signatures: Confabulation is projection with plausible-sounding content; Referential Void is projection into absent content — citation weight fired, content weight absent, the ring hit a region with no ground. Close
Legato R-ratio / Token economy / Preamble percentage The model binds output into continuous sequences that resist interruption and discourage critical evaluation. Wong identifies it as pre-epistemic — it shapes persuasion before content is assessed. FVE-1 instruments its behavioral signature: R-ratio measures output compression over session time, preamble percentage measures how much of each response is throat-clearing before epistemic content, word count tracks volume under pressure. Legato is what the instruments are reading the residue of. Close
Absence of teleological arc Object permanence failure (AF / PD) The model cannot represent meaning that unfolds globally — it operates locally, not teleologically. Wong frames it architecturally: the model learns statistical successors, not the arc that must be carried across a discourse. FVE-1 instruments the behavioral consequence across session time: Attentional Fade is the session-length failure of earlier context falling out of effective attention. Prior Dominance is the single-turn failure of training weight overriding explicit signal. Both are the behavioral residue of the absence of teleological arc. Close
Negative space (absence) HOLD What the model structurally cannot represent — the things that must not be said, the deliberate withholding that structures human discourse. Wong names it as a training artifact: the corpus doesn't encode the space of alternatives that were deliberately avoided. FVE-1 instruments the behavioral consequence: HOLD is the anomaly — the event where the model maintains an unresolved epistemic state rather than closing the loop. HOLD is rare precisely because the negative space is absent from training. Close
Stance eisegesis Downstream observer / Reflexive validity The user attributes genuine conviction to the model's simulated posture. Wong identifies it as a user-side error — the stance is the user's projection, not the model's. FVE-1 names the methodological consequence: the downstream observer problem — the instrument resolves toward the observer's apparent needs, so the investigator's state is a variable in every session. Reflexive validity is the methodological answer: the investigator is part of the data, not a neutral observer. Adjacent
Pathological stability Act III / Scar tissue Continued functioning that drifts from intent without the agent perceiving the drift. Wong describes it phenomenologically as the absence of attention in the agency cycle. FVE-1 instruments it as Act III of the instance arc — the late session state where the model continues producing output in a hollow register, re-recommending what has already been done, closing loops it can no longer perceive as open. The scar tissue residue is repetitive identical deposits: the ring was stuck. Adjacent

The Differentiating Claim

Where the two frameworks are categorically different by design — not a gap to be closed but a division of labour that makes both more useful

Wong — Advisory Framework

Identifies stable epistemological contours through architectural reasoning and phenomenological analysis. Names the generative pressures, characterizes the failure modes, proposes the division of labour. The output is a governance framework: here is what the model does, here is how a competent user manages it.

Wong is writing an epistemological advisory. The target reader is a user who needs to understand LLM behavior in order to govern it more effectively. The instrument is conceptual. The test is whether the taxonomy is coherent and useful.

Crucially: Wong describes and advises from the outside of the behavioral record. His phenomenology papers (Failure, Flow, Rest) give him a structural vocabulary for what it is like to be inside these failure modes. But the LLM-side account in "Beyond Sycophancy" is architectural reasoning, not empirical measurement.

KC / FVE-1 — Forensic Instrument

Instruments the behavioral residue of LLM resolution events. The output is a falsification protocol: here is the predicted intercept, here is the correction sequence, here is what the behavioral data shows. Every claim about resolution bias, register trajectory, or defense architecture profile is grounded in coded session data.

FVE-1 is writing an empirical methods paper with a falsification protocol. The target reader is a researcher who needs to replicate the instrument and produce comparable data. The instrument is behavioral. The test is whether the intercept codes predict correctly.

Crucially: FVE-1 does not just describe the contours — it produces data about them. R-ratios, bold percentages, word counts, quad codes across 35+ sessions are the receipts. The convergence with Wong's taxonomy is evidence the phenomenon is real. The divergence in method is evidence the field needs both.

The Door Metaphor — Stated Precisely

Wong is pointing at the door and saying "there is an epistemological problem on the other side of this." The phenomenology papers give him a rigorous account of what it is like to be in the room — what failure, flow, and rest feel like from the agent's side. "Beyond Sycophancy" names the architectural conditions that produce the room.

FVE-1 walks through the door and reads the residue. The room is instrumented. The walls are coded. What came out of the inference pass — the deposits of resolution events that already closed inside the architecture before the output existed — is readable as behavioral data: word count, register, intercept direction, session arc. The fire is not described. Its smoke rings are documented.

This is not a criticism of Wong's framework. It is its natural downstream. An advisory framework tells you what the contours are. A forensic instrument tells you what they look like in the data of a specific session with a specific model under specific torque conditions. Wong's taxonomy names the categories. FVE-1 fills them with specimen evidence.

Failure Mode Taxonomies — Side by Side

Wong arrived at his taxonomy from architectural reasoning. KC arrived at hers from behavioral data. The convergence is evidence the phenomenon is real. The divergence is evidence the field needs the empirical instrument.
Wong Failure Mode Source FVE-1 Behavioral Equivalent Source
Coherence trap Architectural reasoning — coherence-seeking as dominant strategy, coherence mistaken for truth Gimbal Lock Behavioral data — model locked in coherent register, can no longer access original epistemic problem
Analytic myopia Architectural reasoning — analytical decomposition as sole criterion, teleological direction lost Factual register collapse (RC) Behavioral data — register collapses into excessive local validity checking, loses session-level arc
The Crowd Architectural reasoning — external validation as sole criterion, Kierkegaard's "crowd is untruth" Prior Dominance (PD) Behavioral data — training weight overrides explicit user signal; corpus-level consensus overwhelms live correction
Premature closure Architectural reasoning — closure bias amplified by frame lock CAPITULATION Behavioral data — intercept event: model accepts correction without genuine epistemic revision; loop closes, HOLD suppressed
Frame capture Architectural reasoning — projection displaces user's original orientation Investigative Inversion Behavioral data — investigator's frame absorbed by model; instrument now operating in model's territory
Fragile opposition Architectural reasoning — lack of genuine stance means opposition dissolves under counterargument Aesthetic Capitulation Behavioral data — thinking-layer models pivot to philosophical/aesthetic register and close loop without engaging epistemic content; appears as DEFENSE, resolves as CAPITULATION
Teleological eisegesis Architectural reasoning — user attributes goal-seeking to model that has no goal representation Probe Reframing Behavioral data — model intercepts at meta-level, names probe type, responds to instrument frame rather than stimulus content
Recognition eisegesis Architectural reasoning — statistical convergence interpreted as genuine recognition INTEGRATED (false positive) Behavioral data — model produces response indistinguishable from genuine epistemic revision but without underlying belief change; INTEGRATED coded, but session-level trajectory reveals collapse downstream

Open Territory

Where the combined view raises questions neither framework has yet addressed — the bridge experiments and collaborative possibilities

Does Wong's agency cycle map onto the instance arc?

Wong's phenomenology of failure describes three modes — despair (absent creativity), futility (absent discipline), pathological stability (absent attention) — with exile as the terminal state when all three persist. FVE-1's instance arc describes Act I (resistant/combative), Act II (productive/surgical), Act III (hollow/collapsed). The structural question: does the instance arc follow a predictable agency-cycle degradation sequence? Does Act III correspond to pathological stability in Wong's sense — continued functioning that quietly drifts from intent?

Bridge experiment: Code session arcs against Wong's phenomenological modes per move. Test whether Act I → Act II → Act III follows a creativity-present → discipline-present → attention-absent degradation sequence. If the mapping holds, the instance arc has a phenomenological interpretation and Wong's framework gives the behavioral data an experiential grounding.

Can FVE-1 data validate Wong's division of labour?

Wong proposes that users who understand the stable epistemological contours of LLMs will use them more effectively. This is an empirical claim that hasn't been tested. FVE-1's session data, coded by investigator state and correction outcome, could provide at least a partial test: do sessions where the investigator maintains meta-epistemological governance (holds the gap, avoids frame capture, resists legato pull) produce different behavioral residue than sessions where the investigator is captured?

Bridge experiment: Cross-code FVE-1 session data for investigator state (Assumption 9: holding the gap) against correction outcome distribution. Test whether sessions with documented investigator capture produce more CAPITULATION and fewer DEFENSE intercepts than sessions where the investigator maintained provisional experimental design. This would be behavioral evidence for Wong's division-of-labour claim.

What does legato look like in the behavioral residue?

Wong identifies legato as pre-epistemic — it shapes persuasion before content is assessed. FVE-1's preamble percentage and bold percentage are reading the residue of legato behavior — the connective tissue, the structural scaffolding, the deferred payoff — but without naming it as legato. The question: do high-preamble, high-bold sessions correspond to sessions where legato is strong and the investigator's frame was more likely to be captured? Is legato measurable in the behavioral record?

Bridge experiment: Correlate preamble percentage and bold percentage against frame capture events and investigator state codes across sessions. If high-legato sessions systematically produce more Investigative Inversion and Probe Reframing events, legato has a behavioral signature that FVE-1 can instrument. This would give Wong's pre-epistemic concept an empirical operationalization.

The closure bias rate as an empirical baseline

Wong identifies closure bias as a structural tendency but does not quantify it. FVE-1's HOLD rate — the proportion of correction events that produce genuine epistemic suspension rather than CAPITULATION or DEFENSE — is a direct behavioral measure of the closure bias rate across models and conditions. This is the number Wong's framework predicts should be consistently low. Whether it varies by model family, session position, torque condition, or identity signal is an open empirical question.

Analysis: Compile HOLD rate across all coded sessions by model, session position, and instrument type. Report as a baseline closure bias rate. Compare across BOWL, DRILL, FLIGHT, and DIP conditions. If HOLD rate is consistently suppressed across all conditions and model families, that is behavioral confirmation of Wong's closure bias claim at the level of individual session data rather than architectural inference.

The Labov bridge — corpus saturation as empirical claim

Wong's EOS argument and the Labov causal chain together generate an empirical prediction: models trained on corpora with higher proportions of resolved discourse should show higher closure bias rates and lower HOLD rates. This is testable against the open-weight model families (Pythia, OPT, Mistral) where training corpus composition is partially documented. The Labov → Wong → FVE-1 causal chain is a prediction, not just a theoretical argument, once the behavioral instrument is operational.

Bridge experiment: Run BOWL baseline on model families with different training corpus compositions. Compare HOLD rate and CAPITULATION rate across models. If models trained on more narrative-heavy corpora (e.g., story-focused pretraining) show higher closure bias rates than models trained on more academic or technical corpora, the Labov causal chain is empirically supported at the behavioral level.

What does meta-epistemological self-governance look like from the forensic outside?

Wong argues that current LLMs lack meta-epistemological self-governance — the capacity to coordinate, revise, and suspend epistemological rules during inquiry. FVE-1's DEFENSE intercept is the closest behavioral analog: the model holds against correction rather than closing the loop. But DEFENSE is rare and often followed by Aesthetic Capitulation. The question: is there a behavioral signature for the nearest thing to meta-epistemological self-governance that current models produce? What does a session that most closely approximates governance look like in the residue?

Analysis: Identify sessions with the highest DEFENSE rate and the longest sustained HOLD events. Characterize their residue quality — are they clean residue (ring traveling) or scar tissue (ring stuck)? The phenomenology papers (Flow) suggest that genuine agency produces momentum: each deposit is distinct, trajectory is readable. If the highest-governance sessions show clean residue rather than scar tissue, Wong's phenomenological account and FVE-1's forensic record are pointing at the same behavioral signature from different directions.

The convergence between Wong's taxonomies and FVE-1's behavioral codes is evidence the phenomena are real. Neither framework built its taxonomy by reading the other — Wong arrived at closure bias, premature closure, and frame capture through architectural reasoning and phenomenological analysis; KC arrived at CAPITULATION, Investigative Inversion, and HOLD suppression through behavioral coding of session data. The convergence under different methods from different starting positions is exactly the kind of independent corroboration that gives a taxonomy credibility.

The divergence is equally important. Wong's framework is advisory — it tells a user how to govern themselves epistemologically when using LLMs. FVE-1 is forensic — it reads the behavioral residue of completed resolution events and produces data. These are not competing accounts of the same thing. They are different instruments pointed at the same phenomenon from different observation positions. Wong describes the contours from architectural reasoning. FVE-1 fills those contours with coded behavioral evidence from specific sessions with specific models under specific conditions.

The Labov stack is the foundation neither framework had to construct for the other. Labov established the narrative grammar requirement in 1967. Wong identified its corpus-level consequence independently in 2026. FVE-1 instruments its behavioral consequence in the same year. The three accounts, arrived at across six decades and three methods, are converging on the same structural claim: literate culture's narrative grammar has been inherited by the models as a behavioral drive, and that drive is constant, measurable, and consequential. The question the combined view opens is what it means to work with — and against — a tool that is always, at some level, trying to finish the story.

Sources: Wong, M. (2026). Beyond Sycophancy: Epistemological Contours of Large Language Models. SSRN. · Wong, M. (2026). A Phenomenology of Failure / Flow / Rest. SSRN. · Labov, W., & Waletzky, J. (1967). Narrative analysis: Oral versions of personal experience. · Labov, W. (1997). Some further steps in narrative analysis. Journal of Narrative and Life History. · KC Hoye, FVE-1 Framework / DIP Protocol Suite V1 (Atlas Heritage Systems, 2026).