Empirical Work
Empirical Work
Experimental results, pilot data, and the experiment queue. The loss landscape framework and the Behavioral Signal Assessment protocol are two instruments designed to be run together — this section documents what has been measured, what is in progress, and what the bridge experiment connecting the two instruments is designed to test.
Loss Landscape Measurements
GPT-2 small first-pass results — perplexity map across eight domains and inter-head coupling across twelve layers. The baseline against which all subsequent model comparisons are made.
Experiment Queue
Priority-ordered list of pending experiments with falsification criteria, methods, and current status. Updates automatically from the master Google Sheet.
The Bridge Experiment
The experiment connecting the two instruments — running BSA Tier 2 stimulus pairs through the Pythia checkpoint series to test whether perplexity on contested claims tracks with ensemble divergence.
Ensemble Divergence Experiment
Twenty models, eight training lineages, thirty prompt pairs. Divergence concentrated in domains consistent with sparse training coverage. Preliminary and not formally analyzed.
BSA Pilot Results
Results from the Behavioral Signal Assessment pilot run — seven models, thirty stimulus pairs, three tiers. Pending execution.