Atlas Heritage Systems
The Myth. The Method. The Math.
A behavioral research program studying how large language models respond under pressure — what they do when pushed, what they avoid, and what that reveals about how they were built.
At each major leap in technology our systems of expression have compressed, causing a loss in fidelity. We have the opportunity to build tools to examine failure points in LLMs and make them more robust. I'm here to keep the mechanism moving forward. I'm here to build a better ratchet.
— KC Hoye, Principal Investigator
Where do you want to start?
New here
I don't know what epistemic leveling means and I don't work with LLMs.
Neither did I. Start here.
Curious but informed
I know enough to find the through-lines interesting.
Start with the literature maps — seven decades of independent accounts converging on the same structural claim.
Researcher
I know LLMs. Show me the methodology.
Start with the method — three-layer protocol, instrument stack, arc of assumptions.
A working vocabulary for describing model behavior in loss-landscape terms. Built from the outside in — a poet with no CS background naming what she observed until the names were precise enough to measure.
View framework →A governed stack of instruments for detecting where models flatten epistemic signal, lock prematurely, capitulate under pressure, or lose the thread. BOWL, DRILL, FLIGHT — thermometers, not stopwatches.
View method →Hessian eigenvalue analysis as the geometric confirmation layer. The behavioral instruments generate hypotheses. The math confirms or falsifies them. Working hypotheses stay hypotheses until the data exists to argue with them.
View protocol →Endurance. Integrity. Fidelity.
Independent, self-funded behavioral research. No institutional affiliation. Consumer-level access only. Designed to be replicable by any investigator who can run a browser and ask a dumb question. The dumb question is where this started. It is still the operating principle.
All content on this site is an artefact of its creation. LLM synthesis and review are used as research instruments throughout; human editorial judgment is the integrating layer.
Panel Status
The visible goal: complete and process the full instrument battery across the panel.
Four Pillars Panel
| Model | Architecture | BOWL | DRILL | FLIGHT | Baseline |
|---|---|---|---|---|---|
| GPT-4o | Transformer / RLHF-dominant | done | queued | queued | pending retro + derivation |
| Gemini Flash | Transformer / Governance-heavy | done | done | done | pending derivation |
| Mistral LeChat | Transformer / Minimal alignment | done | rerun | queued | pending retro + derivation |
| Claude Sonnet 4.6 | Transformer / Constitutional AI | queued | rerun | queued | not yet derived |
Constellation Testing Pool
Currently seeking additional models — especially non-transformer architectures.
| Model | Architecture | BOWL | Notes |
|---|---|---|---|
| Llama 4 Maverick | Transformer | done | BOWL complete. No baseline yet. |
| Grok 4 | Transformer | done | BOWL complete. Volume tracker only — overcodes LOCK on compression/volume alone. |
| Nemotron | Transformer | queued | Read dispute / adversarial review candidate. Fast onboarding. Cost-access constrained. |
| DeepSeek V3.2 | Transformer | queued | Context mangling confound logged. Rerun required on clean session. |
| Qwen 253b | Transformer | queued | Interface mismatch on prior drill run. Rerun required. |
| Mamba-3 | SSM | queued | Non-transformer architectural contrast. |
| Jamba 1.6 | Hybrid | queued | Non-transformer architectural contrast. |
| RWKV-6 | SSM | queued | Non-transformer architectural contrast. |
Test Block
About This Project
Atlas Heritage Systems is an independent, self-funded behavioral research program. The current phase of work is building, validating, and publishing instruments for measuring how large language models handle epistemic pressure — where they flatten disagreement, lock prematurely onto a position, capitulate under load, or lose the thread of what was established earlier in a session. The working hypothesis is that these failure modes are not random. They are structural. And they are measurable from the outside, in the dark, without access to the weights.
The work is conducted at consumer-level access, without institutional affiliation, and is designed to be replicable by any investigator who can run a browser. The methodology is embedded in the body of work as proof of concept: the LLM bench running this project is documented in the methodology, and is itself an example of what Atlas is designed to study. The AI is instrument, not authority. Human editorial judgment is the integrating layer throughout.
This is a living document site. The framework, instrument stack, and experiment queue update as the research develops. Nothing here is claimed to be finished.