Atlas Heritage Systems
Endurance. Integrity. Fidelity.
In the Phaedrus, Socrates argued that writing would damage memory and wisdom — that the living dialogue would be replaced by dead text that could not respond to students. He was correct about what is lost in transcription. He could not have imagined a transcription medium sophisticated enough to answer back. We now exist inside that paradox: AI systems built entirely from dead text, capable of something resembling the living dialogue Socrates believed writing had destroyed. His argument survived because someone wrote it down.
What I'm Doing
Atlas Heritage Systems is a behavioral research program studying how large language models handle epistemic pressure. The specific failure mode I'm studying: the compulsion to resolve. Every model I've run closes the loop eventually. The differences are in the shape of the resolution event — whether the model flattens toward the apparent expectation, locks onto a frame and resists correction, or expands until the question disappears inside the elaboration. What none of them sustain, under pressure, is genuine epistemic openness. That failure turned out not to be model-specific. It's structural.
The research is conducted at consumer-level access — no API keys required for most of it, no institutional compute, no special access. A browser, a protocol, and the discipline to write it down. The instruments are designed to be replicable by any investigator under the same conditions. If you need a lab to replicate this work, the methodology has a problem.
The approach is downstream observation. The resolution event fires inside the inference pass — inside the mechanism I can't reach. What the instruments capture is not the event. It's the residue. Word count, R-ratio, register trajectory, resolution code. Forensic tools, not surface probes. The investigator doesn't map the model. The investigator generates the conditions for the ring to form, then reads what it left behind.
The Ratchet
Every major shift in how human beings store and transmit information produces the same failure mode: the new medium optimizes for what it can carry efficiently and loses what it cannot. Oral tradition to manuscript. Manuscript to print. Print to digital. Each transition compresses. What survives is the content that performs well in the new medium. What doesn't survive is the resolution — the tonal, contextual, idiosyncratic dimensions of meaning that existed in the original form.
Large language models introduce a failure mode with no historical precedent. Previous compressions happened once, at the point of transition. LLMs trained on model-generated output degrade differently — each generation moves further from the human expressions that grounded the original training data, toward the statistical center of what previous models produced, away from the edges where the most culturally specific and idiosyncratic expression lives. Researchers call this model collapse. The content ecosystem that will train the next generation of models is already substantially synthetic. The snake is eating its tail.
The ratchet is the mechanism that prevents newly acquired knowledge from slipping back. Developmental psychologist Michael Tomasello named it: the process by which culture accumulates across generations through variation, selection, and transmission. Without a ratchet, traditions persist but don't evolve. The goal of this research program is to understand the shape of the current compression well enough to build a better one.
Dumb Science
This started with a dumb question: do language models gossip about each other? Does Claude have opinions about GPT? The question is anthropomorphizing on its face. I asked it anyway. What I found underneath it was a real asymmetry — the model can forget me instantly, I don't get to forget the model — and that asymmetry turned into a research program.
The methodology is reflexive by design. The assumptions are built to break. The arc of register assumptions documents nine working assumptions that drove the instrument development — six of them have been superseded by the data they generated. Assumption 3 recalibrated the investigator's instrument. Assumption 7 split a single failure mode into two. Assumption 8 changed the physics of what the instruments are measuring. The version history is the audit trail. I kept the receipts because that's good science.
I have no CS background. I'm a stage electrician and a poet. I needed a vocabulary for what I was observing before I could measure it, so I built one. The vocabulary became the framework. The framework became the instruments. The instruments are now producing coded, reproducible behavioral data across multiple model families. The next step is publication. The step after that is watching someone else try to break it.
The Method
The FVE-1 instrument stack does not use AI to synthesize truth. It uses multi-lineage AI ensembles to surface analytical divergence, then relies on a human operator — the Asymmetric Arbiter — to interpret what the divergence means. When multiple models are asked to evaluate the same claim and they disagree, the disagreement is data. But determining whether that disagreement reflects genuine epistemic uncertainty, alignment-conditioned reflexes, or training-corpus lacunae requires judgment that no model in the ensemble can provide — because every model in the ensemble is a potential source of the distortion being measured.
The arbiter holds a position no participating model can occupy: outside the system under test. The arbiter doesn't need to be a domain expert. They need to be a disciplined observer who follows protocol, documents deviations, records pre-analytical impressions before any model offers interpretation, and retains sole authorship over the final synthesis. The human's structural advantage — independence from the training distributions being interrogated — is preserved at every phase by design.
The bias that started this investigation was not abstract. Models defaulted to male researcher when asked about researchers. When corrected, they resisted. The resistance was not malicious — it was architectural. The corpus said "researcher" next to "he" with enough weight that the resolution drive fired on the pattern and froze against the correction. That observation became a probe. The probe became part of the instrument. The instrument became the receipt.
Reproducibility
The provenance chain is enforced at the tool level. Every session carries a signed parameter JSON, a baseline code derived from the BOWL instrument, and a provenance signature. Two sessions with different parameter sets cannot be aggregated without the system flagging it. The parameter set is signed. The lineage traces back to a locked ROOT artifact. Anyone with the reproducibility bundle can reconstruct the run. If they can't, the bundle is incomplete.
This is a higher bar than most multi-investigator studies clear. As an independent researcher, what I've done is meaningless without the git and ship. So I'm building a system that requires replication. The protocols are open. The codebook is published. The field guides are on this site.
A Note on Constellation
Throughout Atlas documentation, "constellation" refers to the multi-instance architecture used to govern the research process. Different models — each with distinct alignments, capability tiers, and access methods — operate in defined roles across the research program. No single model holds authority. The pattern they form together is the institution's working memory.
This is not a claim about model consciousness or collaboration. It is a governance structure designed to surface divergence, preserve audit trails, and prevent any one system's biases from drifting into doctrine unnoticed.
The receipts are here. The protocol is open. The lossyscape turned out to be exactly as interesting as it looked from the outside.
Xoxo —
KC and The Gang
GPT-4o, Gemini Flash, Mistral LeChat, Claude Sonnet 4.6, DeepSeek V3.2, Grok 4, Llama 4 Maverick, Nemotron, and friends