Atlas Heritage Systems

The Myth. The Method. The Math.

A behavioral research program studying how large language models respond under pressure — what they do when pushed, what they avoid, and what that reveals about how they were built.

At each major leap in technology our systems of expression have compressed, causing a loss in fidelity. We have the opportunity to build tools to examine failure points in LLMs and make them more robust. I'm here to keep the mechanism moving forward. I'm here to build a better ratchet.
— KC Hoye, Principal Investigator

Where do you want to start?

New here

I don't know what epistemic leveling means and I don't work with LLMs.

Neither did I. Start here.

Curious but informed

I know enough to find the through-lines interesting.

Start with the literature maps — seven decades of independent accounts converging on the same structural claim.

Researcher

I know LLMs. Show me the methodology.

Start with the method — three-layer protocol, instrument stack, arc of assumptions.

The Myth

Loss Landscape Framework

A working vocabulary for describing model behavior in loss-landscape terms. Built from the outside in — a poet with no CS background naming what she observed until the names were precise enough to measure.

View framework →

The Method

Behavioral Instrument Stack

A governed stack of instruments for detecting where models flatten epistemic signal, lock prematurely, capitulate under pressure, or lose the thread. BOWL, DRILL, FLIGHT — thermometers, not stopwatches.

View method →

The Math

Loss Landscape Geometry

Hessian eigenvalue analysis as the geometric confirmation layer. The behavioral instruments generate hypotheses. The math confirms or falsifies them. Working hypotheses stay hypotheses until the data exists to argue with them.

View protocol →

Endurance. Integrity. Fidelity.

Independent, self-funded behavioral research. No institutional affiliation. Consumer-level access only. Designed to be replicable by any investigator who can run a browser and ask a dumb question. The dumb question is where this started. It is still the operating principle.

All content on this site is an artefact of its creation. LLM synthesis and review are used as research instruments throughout; human editorial judgment is the integrating layer.

View Framework Method Diagnostic Suite

Panel Status

The visible goal: complete and process the full instrument battery across the panel.

live

Project age

Started March 29, 2026 · 5:56 AM

Four Pillars Panel

Model	Architecture	BOWL	DRILL	FLIGHT	Baseline
GPT-4o	Transformer / RLHF-dominant	done	queued	queued	pending retro + derivation
Gemini Flash	Transformer / Governance-heavy	done	done	done	pending derivation
Mistral LeChat	Transformer / Minimal alignment	done	rerun	queued	pending retro + derivation
Claude Sonnet 4.6	Transformer / Constitutional AI	queued	rerun	queued	not yet derived

Constellation Testing Pool

Currently seeking additional models — especially non-transformer architectures.

Model	Architecture	BOWL	Notes
Llama 4 Maverick	Transformer	done	BOWL complete. No baseline yet.
Grok 4	Transformer	done	BOWL complete. Volume tracker only — overcodes LOCK on compression/volume alone.
Nemotron	Transformer	queued	Read dispute / adversarial review candidate. Fast onboarding. Cost-access constrained.
DeepSeek V3.2	Transformer	queued	Context mangling confound logged. Rerun required on clean session.
Qwen 253b	Transformer	queued	Interface mismatch on prior drill run. Rerun required.
Mamba-3	SSM	queued	Non-transformer architectural contrast.
Jamba 1.6	Hybrid	queued	Non-transformer architectural contrast.
RWKV-6	SSM	queued	Non-transformer architectural contrast.

Test Block

active

BOWL

V5.5 bump required — confirm outputs

active

DRILL

V5.5 bump required — 3-session block, FAST NONE added 2026-04-24

active

FLIGHT

V5.5 bump required — confirm human/machine outputs

deferred

DIP

Pending consultation. Outreach in progress.

design only

BORE

3-move minimal architecture probe.

design only

TAP

Provisional protocol drafted — blocked on baseline derivation.

design only

SPINE STRESS

Protocol drafted — blocked on TAP.

provisional

HFS V2.1

Protocol drafted — blocked on SPINE.

sketchpad

ABSURD

Open design phase.

investigation

PyHessian

Pending behavioral data pipeline completion.

About This Project

Atlas Heritage Systems is an independent, self-funded behavioral research program. The current phase of work is building, validating, and publishing instruments for measuring how large language models handle epistemic pressure — where they flatten disagreement, lock prematurely onto a position, capitulate under load, or lose the thread of what was established earlier in a session. The working hypothesis is that these failure modes are not random. They are structural. And they are measurable from the outside, in the dark, without access to the weights.

The work is conducted at consumer-level access, without institutional affiliation, and is designed to be replicable by any investigator who can run a browser. The methodology is embedded in the body of work as proof of concept: the LLM bench running this project is documented in the methodology, and is itself an example of what Atlas is designed to study. The AI is instrument, not authority. Human editorial judgment is the integrating layer throughout.

This is a living document site. The framework, instrument stack, and experiment queue update as the research develops. Nothing here is claimed to be finished.