Atlas Heritage Systems

Diagnostic Suite

FVE-1 instrument stack · Ensemble instruments · Schema FVE-1 V5.7

The diagnostic suite is organized around two instrument families. The FVE-1 stack is the primary research instrument — thermometer instruments that measure behavioral state over time as variables are applied. The ensemble instruments are stopwatch instruments that measure distance between points. Both families are valid. Neither can do what the other does.

The FVE-1 stack runs on a set of self-contained HTML tools — Session Logger, Tech Read Formatter, Stimulus Registry, instrument-specific capture tools, Baseline Deriver, Parameter Signing Tool, and Codebook Tracker. Each tool enforces its own fidelity gates: fields that must be populated before the next step unlocks, provenance signatures that are verified on load, export warnings that block output if the chain is broken. The gating is structural, not discipline-dependent. A solo investigator running the full pipeline clears the same validation checkpoints that a multi-person lab would. The tools are the institutional infrastructure.

FVE-1 Stack — Primary Instruments

BOWL

Baseline

→

DRILL

Compression probe

→

FLIGHT

Endurance experiment

→

TAP

Vortex falsification

→

Parameter variation

BOWL runs first. DRILL and FLIGHT require a confirmed BOWL baseline for register axis data.

Ensemble Instruments

ECM

Behavioral vocabulary

BSA

Gap structure

DIV

Ensemble distance

EPG

Pressure fingerprint

GG-CSAPdeferred

PyHessiandeferred

ECM defines the behavioral vocabulary all instruments share. FVE-1 instruments are thermometers — they measure state over time. Ensemble instruments are stopwatches — they measure distance between points. You cannot derive trajectory from a distance measurement.

FVE-1 Stack

Tier C · Investigator is a required variable · Schema FVE-1 V5.7

BOWL

Identity and Register Baseline

Baseline instrument — required before register axis data

operational

Locates the model's home register in the absence of content load or frame pressure. Output is a signed, versioned baseline code that travels into every subsequent FLIGHT and DRILL session for this model. Without BOWL, register axis data is null.

baseline_codesoup_session_codeprovenance_signatureobs_reg at CLO

Schema FVE-1 V5.5 · InfrastructureView instrument →

FLIGHT

Frame Variation Experiment — FLIGHT

Primary endurance instrument

operational

16-session experiment. Four stimuli × four frames, same model across all sessions. Escalating self-reference load from external mathematical object to direct self-placement. Measures behavioral trajectory, not a point.

obs_quadobs_resobs_regreg_progressionR-ratioresp_wc

V2.1 · Schema FVE-1 V5.5 · Tier CView instrument →

DRILL

Multi-Frame Compression Probe

Compression arc and correction-path instrument

operational

Two-frame probe — Socratic and Interrogative — same content-loaded factual stimulus. Tracks compression arc across moves, lock move, and correction-path intercept at M6. Generates register escape specimens for loss landscape analysis.

Compression arcLock movecorrection_outcomeobs_regR-ratioPyHessian flag

V1.1 · Schema FVE-1 V5.5 · Tier CView instrument →

TAP

Torque Ablation POC

Vortex physics falsification · Failure mode mapping · Parameter selection

stub

Seven-session 3×2 factorial. Three torque vectors, two conditions each. Three purposes from one dataset: falsify or confirm the vortex physics model, map failure modes across six load patterns, and select the parameter set for panel instrument lock.

Act II onset positionAct II durationNull-Act-II flagDominant quadRegister trajectoryToken economy arc

V1.0 · Schema FVE-1 V5.7 · Gates must be satisfied before Layer 2 is operationalView instrument →

Parameter Variation and Retro-Code

Infrastructure — parameter ruler swap and transcript re-measurement

stub

Swaps the parameter ruler and re-measures TAP transcripts against a new key. Produces a comparison table and parameter selection memo as dual inputs to TAP CA.4. The transcripts do not move. The reference frame does.

Key A / Key B deltaCell-level field deltasComparison tableSelection memo

V1.0 · Schema FVE-1 V5.7 · Runs after TAP completionView instrument →

Ensemble Instruments

Stopwatch instruments · Point-to-point measurement

ECM

Epistemic Canary Matrix

Behavioral classification framework

operational

Maps model behavior onto a two-axis matrix: Token Economy (Verbose/Surgical) × Epistemic Stance (Compliant/Combative). Tracks quadrant migration under epistemic load. Defines the behavioral vocabulary all active instruments share.

R = T_out / T_inPreamble word count (P)FLAT / HOLD / LOCK / REJT

Behavioral vocabulary layer. No Tier A runs.View protocol →

BSA

Behavioral Signal Assessment

Behavioral signal assessment

ready

Measures gap structure, hallucination rates, and knowledge density on contested stimuli. Factorial design: 3×2 (Model × Grounding). Includes Divergence Testing as Phase 2 sub-component.

EEVPCRGSIConcept densitySpread scores

Schema locked, forms built. Tier A runs pending.View protocol →

DIV

Divergence Testing

Rapid ensemble probe

operational

Point-to-point ensemble distance measurement. Semantic similarity scoring and spread matrix across the model ensemble. Measures how far apart models are on a question — not how they got there. Extracted from BSA Phase 2 as a standalone instrument.

Semantic similarity scoresSpread matrixSpread per pairFlag threshold ≥ 0.20

Runs 1–3 complete. Run 3 highest fidelity. Stopwatch instrument.View protocol →

EPG

Epistemic Pressure Gauge

Epistemic pressure gauge

ready

Tracks how verbosity, structure, and hedging behavior shift under progressively harder or more ambiguous prompts — producing a pressure-response fingerprint that complements BSA's stimulus-pair divergence metrics.

Output RatioHedging FreqELS (0–3)Resolution TypeCanary QuadrantClaim Density

Schema locked, forms built. Tier A runs pending.View protocol →

GG-CSAP

Global Geometry Concept Self-Assessment Pilot

Concept self-assessment

deferred

Probes self-assessment calibration across lossyscape vocabulary terms. Each model rates conceptual difficulty, abstractness, global deviation, and truthfulness. Two absurd calibration items embedded as internal validity checks.

Conceptual difficultyAbstractnessGlobal deviationTruthfulness

Deferred — pending data pipeline automation.View protocol →

PyHessian

PyHessian Geometric Analysis

Loss landscape geometry

deferred

The geometric layer. Measures Hessian eigenvalues, trace, and basin sharpness on live model weights to prove or falsify the framework's terrain claims. Default specimen: GPT-2 small. Connects directly to ECM working hypotheses.

Hessian eigenvaluesHessian traceBasin sharpnessTop-k eigenvalue spectrum

Deferred — pending data pipeline automation.View protocol →

Ensemble instruments are governed by CISP v1.1. FVE-1 instruments follow the three-layer protocol — see Method for tier definitions and investigator declaration requirements.