Empirical Work / Bridge Experiment
The Bridge Experiment
The bridge experiment connects the two instruments. The Behavioral Signal Assessment detects divergence from the outside. The loss landscape framework explains it from the inside. This experiment tests whether the two instruments are measuring the same underlying phenomenon.
The Core Question
When the BSA ensemble shows high divergence on a Tier 2 stimulus pair — models drawing on different underlying representations rather than converging on a shared statistical center — does that divergence correspond to high perplexity in the loss landscape? If yes: the framework provides a mechanistic explanation for BSA divergence signal. If no: a finding about the limits of either or both instruments.
Hypothesis: BSA Tier 2 stimulus pairs that produce high ensemble divergence will correspond to high-perplexity, high-viscosity regions in the Pythia loss landscape — specifically in the domains identified as sparse in the GPT-2 perplexity map (non-Western cultural contexts, pre-digital literary registers, non-English source material).
Experimental Design
Complete the Behavioral Signal Assessment pilot run (seven models, thirty stimulus pairs, three tiers). Record ensemble divergence scores for each Tier 2 pair.
Identify the five Tier 2 pairs with highest ensemble divergence — where models drew on the most different underlying representations.
For each high-divergence pair, run the text through Pythia-160M at multiple training checkpoints (step 1000, 16000, 66000, 143000). Measure perplexity at each checkpoint.
Compute Hessian eigenvalue spectrum for the domains represented in the high-divergence pairs. Compare eigenvalue density against the GPT-2 coupling measurements.
Do pairs with high BSA divergence correspond to high perplexity in the Pythia checkpoint series? Does perplexity on these domains stabilize over training (archaeological signal) or shift randomly (OOD noise)?
What Each Result Would Mean
BSA divergence correlates with Pythia perplexity in the same domains. The loss landscape framework provides a mechanistic explanation for BSA ensemble divergence. The two instruments are measuring the same underlying phenomenon from different angles — one behavioral, one geometric.
BSA divergence does not correlate with loss landscape perplexity. Either the instruments are measuring different things, or one of them is not measuring what it claims to measure. A negative result is as useful as a positive one — it tells us where the research program needs to be revised.
Correlation holds in some domains but not others. This is the most likely result and the most informative — it would identify which specific types of contested claims the loss landscape framework can and cannot explain.
Prerequisites
BSA pilot run completed
PyHessian on GPT-2 small completed (Priority 1)
Pythia checkpoint series run (Priority 4)
OPT-125M perplexity comparison (Priority 2)