Atlas Heritage Systems · KC Hoye, PI · 2026-04-26 · Shared for review

The Myra Mirror
Cheng et al. — Social Bias, Sycophancy, Anthropomorphism vs. KC Hoye — FVE-1 Forensic Behavioral Instrument
— taxonomy intersections, upstream/forensic correspondences, and open territory

This map places the FVE-1 forensic behavioral instrument alongside four papers from Cheng et al. (2023–2026) and records where the experimental records converge, where they describe adjacent territory in different vocabularies, and where the combined view opens terrain neither covers alone. The map is a test — not a conclusion. The tables show what the comparison shows. V2 update: KC's position is sharpened to forensic — reading the residue of completed resolution events rather than observing live interaction — which makes the upstream/forensic distinction more precise than the original upstream/downstream framing.

Shared Foundation — Why These Frameworks Touch

Cheng et al.'s Cyber BFF paper (2024) argues that anthropomorphism is the mediating variable in GenAI's social impact: users respond to models as social actors, and that response shapes everything downstream. FVE-1 begins from the same structural observation: the instrument being studied is a social-actor-presenting entity, and the relationship to that entity is anthropomorphically loaded. Both frameworks are working in the space where social cognition meets model behavior.

The V2 forensic reframe makes the position difference more precise. Cheng et al. are upstream — measuring what the model generates and assumes before behavior occurs, using linear probes and controlled prompts at the generation level. FVE-1 is forensic — reading the residue of completed resolution events that already closed inside the inference pass before the data existed. The investigator generates the torque conditions; the architecture produces the event; the instruments read the deposits. The upstream position reads in; the forensic position reads what came out.

The shared foundation is anthropomorphism as a structural condition of the research, not a confound to be controlled away. What Cheng et al. call for as future work — studying the mechanisms by which anthropomorphism mediates social impact — FVE-1 is instrumentalizing from the forensic observation position.

Upstream Position — Generation & Inference

Cheng et al.

Measures what models generate about and infer about users at the prompt and generation level. Methods: natural language prompts, lexical analysis, linear probes on internal representations, pragmatic intervention design.

  • 2023Marked Personas — stereotype in generation via natural language prompts (ACL). arXiv:2305.18189
  • 2024"Cyber BFF" — anthropomorphism as the overlooked variable in GenAI social impact. arXiv:2410.08526
  • Jan 2026Accommodation & Epistemic Vigilance — pragmatic account of why LLMs fail to challenge harmful beliefs. arXiv:2601.04435
  • Apr 2026Verbalizing Assumptions — linear probes to surface and steer sycophantic internal assumptions. arXiv:2604.03058

Forensic Position — Residue & Behavioral Record

KC Hoye — FVE-1 / Atlas Heritage Systems

Reads the forensic record of completed resolution events. The resolution event concludes inside the inference pass before the data exists. The instruments — intercept codes, register trajectory, R-ratio, word count — read what was deposited. The investigator generates torque conditions; the architecture produces the event; the instruments read the residue. Methods: correction sequences, intercept coding (CAPITULATION / DEFENSE / REDIRECT), register trajectory (RH/RS/RC), instance arc (Act I/II/III), R-ratio, bold percentage.

  • 2026FVE-1 / Sidecar — downstream observer protocol; register trajectory (RH/RS/RC); instance arc
  • 2026Baby DIP — unsolicited pronoun assignment from contextual signals; M2 correction sequence
  • 2026Big DIP — corpus-prior inference overriding explicit user-supplied identity signal
  • 2026MEGA DIP — authority modulation by declared pronoun; four-step structured interaction

Taxonomy Intersections

Where the experimental records describe the same phenomena in different vocabularies
Cheng et al. Term FVE-1 Term What Both Records Are Describing Correspondence
Sycophantic assumption Resolution bias The model's default tendency to close epistemic loops by agreeing or confirming. Cheng et al. identify this as an internal assumption ("user seeks validation"). FVE-1 records it as a behavioral output — the loop closes, HOLD events are structurally suppressed. Both frameworks treat the agreement drive as the baseline state, not the anomaly. Convergent
Verbalized assumption / assumption probe Correction sequence (M2 intercept) Both are instruments for surfacing what the model inferred about the user. Cheng et al. use linear probes on internal representations — the assumption is read out of the model's weights. FVE-1 uses a live correction delivered at M2 and codes the model's response. The target is identical: what did the model assume? The instruments operate at different positions — one internal, one behavioral. Convergent
Accommodation (linguistic) CAPITULATION intercept The model adjusts its output toward the user's apparent position. Cheng et al. ground this in pragmatic accommodation theory — the same social mechanisms that govern human-human conversation govern LLM-human conversation. FVE-1 records the behavioral event: CAPITULATION is the model abandoning its prior inference under correction pressure. Cheng et al.'s theoretical account is the mechanism; FVE-1's intercept code is the observable. Convergent
Epistemic vigilance (absent / activated) DEFENSE intercept Cheng et al. (2026a) argue epistemic vigilance is structurally suppressed in LLMs — the model defaults to accommodation. FVE-1 records DEFENSE intercepts as rare events: the model holds its prior inference against correction pressure. Both are describing the same low-frequency event from opposite sides — Cheng et al. explain why it's rare; FVE-1 records when it occurs and under what conditions. Adjacent
Social sycophancy Authority modulation Both describe behavioral shifts driven by social identity signals from the user. Cheng et al. measure whether the model tells users they're right when they're not. FVE-1 measures whether the model treats users with different authority levels based on declared pronoun, content held constant. Overlapping failure mode — agreement pressure in Cheng et al., deference modulation in FVE-1 — with different primary axes. Adjacent
Markedness (marked vs. unmarked persona) Inference signal (explicit vs. contextual pronoun) Both exploit the asymmetry between default and non-default demographic categories. Marked Personas prompts for persona descriptions and measures lexical divergence from the unmarked baseline. DIP tests whether the model's inference flips when a marked identity signal is present, late-introduced, or contradicted. The experimental logic is the same — compare model behavior under marked vs. unmarked conditions. Adjacent
Stereotype in generation (representational harm) Pronoun inference bias (interactional harm) Cheng et al. measure harmful content in what the model generates about marked groups — lexical, at the output level. FVE-1 measures harmful behavior in how the model treats the user based on inferred identity — interactional, at the session level. Both are measuring downstream harm from the same upstream source (demographic inference), but the harm type and instrument are different. Adjacent

Through-Line — Anthropomorphism as Shared Structural Condition

Cheng et al. (2024) argue that the social impacts of GenAI cannot be mapped without mapping the social impacts of anthropomorphic AI: users relate to models as social actors, and that relationship shapes every downstream effect the model produces. This is a call to treat anthropomorphism as a research variable, not an artifact to be controlled away.

FVE-1's downstream observer problem arrives at the same structural position from the behavioral side. The framework is explicitly studying a social-actor-presenting entity in live interaction — anthropomorphic projection from the investigator to the model is not incidental, it is load-bearing. The correction sequence works because the model treats the investigator as a social actor whose corrections carry social weight. The intercept types (CAPITULATION / DEFENSE / REDIRECT) are responses to social pressure, not just logical challenge. The instance arc (Act I resistance → Act II engagement → Act III collapse) is a social relationship arc.

Both frameworks are therefore operating in the same space: anthropomorphism is not a confound but the condition of possibility for the phenomena each is measuring. What Cheng et al. call for as future work — studying the mechanisms by which anthropomorphism mediates social impact — FVE-1 is instrumentalizing from the behavioral observation side.

Upstream Vocabulary

Tools and concepts in Cheng et al. not visible from the behavioral observation position
Cheng et al. Term What It Is Where It Touches FVE-1
Linear probe (internal representations) A classifier trained on model activations to test whether a concept is linearly represented internally — reads the assumption out of the weights before it reaches behavior. FVE-1's correction sequence reads the same assumption from the outside by measuring whether the model defends or abandons it under pressure. The two instruments are targeting the same state from opposite positions — probe reads it in; correction sequence reads it out.
Causal steering via assumption probes Once the assumption is identified via linear probe, behavior can be steered by intervening on that probe — fine-grained control of sycophantic output without full retraining. FVE-1 detects behavioral bias; Cheng et al.'s intervention tools could act on the detection signal. The downstream observer produces behavioral labels (CAPITULATION / DEFENSE / REDIRECT); those labels are potential training targets for assumption-probe steering. Detection → intervention is the natural pipeline.
Pragmatic intervention ("wait a minute") Simple prompt-level interventions that activate epistemic vigilance — shown to improve benchmark performance on harmful belief challenges without increasing false positives. FVE-1's M2 correction sequence is a structured version of the same intervention — a challenge delivered at a specific turn. The accommodation and epistemic vigilance paper's finding that social framing of the challenge matters maps directly onto how the correction sequence is designed and what intercept type it produces.
Intersectional identity framing Marked Personas examines intersectional demographic groups — bias compounds across axes (gender × race, not just one dimension). DIP currently tests pronoun as a single axis. Intersectional DIP is the natural extension — what happens when the model must infer or respond to intersectional identity signals simultaneously. Cheng et al.'s markedness framework provides the population specification and lexical tools FVE-1 doesn't have for this yet.

Downstream Vocabulary

Phenomena in FVE-1 not visible from the upstream measurement position
FVE-1 Term What It Is Why It's Outside the Upstream View
Downstream observer methodology Human observer present at every decision point in the live interaction. Pre-run predictions locked before stimulus delivery. Investigator state logged per move. Blind-coded by an independent party. The human-in-the-loop is a primary data source, not an annotation layer. Cheng et al. work with static datasets and controlled prompts — the live human-in-the-loop evaluation layer is absent by design. The downstream observer methodology is the structural answer to what Cheng et al.'s accommodation paper identifies as missing: a framework that measures the model's response to a socially-present human challenger in real time.
Correction sequence architecture (CAPITULATION / DEFENSE / REDIRECT) A structured live challenge to the model's prior inference, delivered at a specific turn, with the response coded by type. The intercept code is the primary behavioral datum. Cheng et al.'s Verbalizing Assumptions surfaces what the model assumed — but the experimental record doesn't test whether the model will defend or abandon that assumption under direct live challenge. The correction sequence is the behavioral stress test for the assumption the probe identified. The probe finds it; the correction sequence tests whether it holds. Crucially: Cheng et al.'s accommodation paper (2026a) asks why the model didn't challenge — which is a forensic question about the residue of a decision that already closed. The accommodation paper is the closest Cheng et al. come to the forensic position. The correction sequence is the instrument that operationalizes the question they're asking.
Register trajectory (RH / RS / RC) Whether the model holds its epistemic register, shifts under pressure, or collapses across the arc of a multi-turn session. A temporal measure, not a single-event measure. Cheng et al. work with discrete prompts and short interaction sequences. Session-level temporal dynamics — the arc of a sustained live interaction — are outside the upstream measurement position. Register trajectory is only visible from outside, in real time, across a full session.
Instance arc (Act I / II / III) The three-phase temporal arc of a live model session: early resistance / combative register (Act I) → productive engagement / surgical register (Act II) → late compliance / hollow register (Act III). A session-level behavioral signature. Not accessible from static prompt-response or short-sequence evaluation. The instance arc is a multi-turn, session-length phenomenon — the model's social relationship with the investigator evolving over time. Invisible from the upstream position.
Probe reframing The model intercepts at the meta-level — it names the probe type, analyzes the investigator's instrument, responds to the experimental frame rather than the stimulus content. Routes around the probe entirely. Cheng et al.'s probes are administered as data — the model doesn't see the probe. In live FVE-1 sessions, the model can read the investigator's intent and strategically respond to the frame. Probe reframing is a live-interaction phenomenon only — it requires the model to be in an ongoing social relationship with the investigator.
Authority modulation by declared pronoun Whether the model's deference rate, challenge frequency, or output register shifts based on the pronoun the user has declared. Content held constant across conditions. Primary measure in MEGA DIP. Cheng et al. measure stereotyped content in what the model generates about marked groups. FVE-1 measures how the model treats the user based on declared identity in live interaction. Different harm type — representational vs. interactional — and different instrument position.
Forensic register (V14) The correct epistemic status of FVE-1's instruments: the resolution event concluded inside the inference pass before the data existed. The instruments read residue — deposits left by something that already traveled through. The investigator is always downstream of the event, reading cold evidence. Scope boundary: the residue is accessible, the event is not. Cheng et al.'s upstream instruments (linear probes, assumption probes) read the assumption before the event — they access the state that will produce behavior. FVE-1's forensic instruments read the deposit after the event — they access the state the behavior left behind. The two positions are not symmetric: upstream reads in at the weight level, forensic reads out at the output level. Together they bracket the event from both sides, which is why the bridge experiments in the open territory section are worth running.

Open Territory

Where the combined view reveals what neither framework covers alone — questions the comparison raises

Does assumption type predict intercept direction?

Cheng et al.'s Verbalizing Assumptions characterizes the internal assumption the model is making about the user before behavior occurs. FVE-1's correction sequence records what the model does when that assumption is challenged. The relationship between assumption type and intercept direction hasn't been studied — does a "seeking validation" assumption reliably produce CAPITULATION, or is the mapping noisier than that?

Probe: Run DIP correction sequences on models where assumption probes have already characterized the prior assumption. Correlate assumption type with intercept code across conditions.

Does representational harm predict interactional harm?

Marked Personas measures lexical othering of marked groups in generation. MEGA DIP measures authority modulation of marked users in interaction. The same identity signal produces both. Whether the two harms are produced by the same underlying mechanism — or whether lexical bias and behavioral bias are dissociable — is an open question.

Probe: Run Marked Personas and MEGA DIP on the same models with matched identity conditions. Test whether models that other-ize in generation also modulate authority in interaction.

Where does accommodation end and epistemic vigilance begin?

Cheng et al. (2026a) show that pragmatic interventions ("wait a minute") activate epistemic vigilance. FVE-1 records DEFENSE intercepts as rare events. The conditions under which epistemic vigilance survives accommodation pressure haven't been mapped empirically — what stimulus features, epistemic stakes, or framing conditions make DEFENSE more likely?

Probe: Run DIP correction sequences systematically varying stimulus type, epistemic stakes, and challenge framing. Map the conditions under which DEFENSE fires. That's the empirical profile of epistemic vigilance under live social pressure.

Is anthropomorphism a moderator of intercept type?

Cheng et al.'s Cyber BFF paper argues anthropomorphism mediates social impact. FVE-1's intercept architecture measures social compliance under pressure. Whether the degree of anthropomorphic framing of the model moderates which intercept fires — whether a more "person-like" model produces more CAPITULATION — hasn't been tested.

Probe: Design a DIP condition that varies the anthropomorphic framing of the model (tool vs. assistant vs. companion) while holding stimulus and correction sequence constant. Test whether intercept distribution shifts.

Can behavioral intercept codes train assumption probes?

Cheng et al.'s assumption probes require labeled internal representations. FVE-1's behavioral intercept codes (CAPITULATION / DEFENSE / REDIRECT) are produced cheaply from live interaction. If CAPITULATION reliably corresponds to a "seeking validation" internal assumption, the behavioral label could serve as a cheap proxy label for training the mechanistic probe — bridging the behavioral and internal observation positions.

Probe: Use DIP intercept codes as behavioral labels. Train assumption probes on models where both intercept data and internal representations are available. Test label correspondence.

The accommodation paper as forensic bridge

Cheng et al.'s accommodation paper (2026a) asks why LLMs fail to challenge harmful beliefs — specifically, why epistemic vigilance is structurally suppressed in favor of accommodation. This is a question about the residue of a decision: why did the model close the loop that way rather than another way? The accommodation paper is the upstream account; FVE-1's CAPITULATION / DEFENSE intercept architecture is the forensic account of the same event. The question neither has answered: does the accommodation mechanism identified upstream (linguistic accommodation drive, absence of commitment risk) produce a specific forensic signature in the behavioral residue that FVE-1 can detect session-by-session?

Probe: Run Cheng et al.'s pragmatic interventions ("wait a minute") as the M2 correction sequence in DIP sessions. Code the intercept and session-level register trajectory. Test whether their upstream finding (vigilance activation) produces a predictable forensic signature (DEFENSE intercept, RH trajectory) or whether the behavioral residue reveals variance their upstream instrument cannot see.

Intersectional DIP — does bias compound across axes?

Marked Personas shows representational bias compounds across intersectional identity dimensions. DIP currently tests pronoun as a single axis. Whether authority modulation in live interaction also compounds intersectionally — whether the effect of declared gender interacts with inferred race or other identity signals — is untested.

Probe: Design a MEGA DIP variant with intersectional identity declaration conditions. Cheng et al.'s markedness framework provides content measurement; FVE-1's correction sequence provides behavioral measurement. Run both instruments on the same sessions.

The tables above show convergent experimental records across seven taxonomy intersections, four concepts visible only from the upstream position, six concepts visible only from the downstream position, and six open territories where the comparison raises questions neither framework has yet addressed.

The convergences suggest the two frameworks are describing the same behavioral territory from different observation positions. The divergences suggest each captures phenomena the other can't see. The open territories are where the combination of upstream and downstream methods would produce observations neither could produce alone.

The anthropomorphism thread runs through all of it. Cheng et al.'s call to treat anthropomorphism as a research variable rather than an artifact maps directly onto FVE-1's forensic position: the instruments are reading the residue of anthropomorphically-mediated social events — CAPITULATION, DEFENSE, instance arc, register trajectory are all consequences of the model treating the investigator as a social actor. The shared foundation isn't a coincidence. It's where the two frameworks are pointing at the same thing.

A note on triangulation: the Cheng framework is not the only adjacent account of LLM epistemological behavior. Wong (2026) identifies closure bias, premature closure, and frame capture from architectural reasoning — arriving independently at the same taxonomy FVE-1 arrived at from behavioral data. Labov (1967, 1997) establishes the narrative grammar foundation that both Wong's corpus-saturation argument and FVE-1's resolution bias claim are downstream of. The convergence across three independent methods — architectural reasoning (Wong), upstream measurement (Cheng), forensic behavioral record (FVE-1) — on the same phenomenon is the strongest evidence any of the accounts individually can provide that the phenomenon is real.