Empirical Work | Atlas Heritage Systems

Empirical Work

Experimental results, pilot data, and the experiment queue. The loss landscape framework and the Behavioral Signal Assessment protocol are two instruments designed to be run together — this section documents what has been measured, what is in progress, and what the bridge experiment connecting the two instruments is designed to test.

All results on this page are preliminary unless explicitly marked otherwise. Nothing here has been peer-reviewed or formally published. Replication warnings are displayed on individual experiment pages.

Loss Landscape Measurements

preliminaryFirst pass complete

GPT-2 small first-pass results — perplexity map across eight domains and inter-head coupling across twelve layers. The baseline against which all subsequent model comparisons are made.

Experiment Queue

Active

Priority-ordered list of pending experiments with falsification criteria, methods, and current status. Updates automatically from the master Google Sheet.

The Bridge Experiment

Designed — not yet run

The experiment connecting the two instruments — running BSA Tier 2 stimulus pairs through the Pythia checkpoint series to test whether perplexity on contested claims tracks with ensemble divergence.

Ensemble Divergence Experiment

preliminaryPreliminary

Twenty models, eight training lineages, thirty prompt pairs. Divergence concentrated in domains consistent with sparse training coverage. Preliminary and not formally analyzed.

BSA Pilot Results

Pending pilot run

Results from the Behavioral Signal Assessment pilot run — seven models, thirty stimulus pairs, three tiers. Pending execution.