Empirical Work / Bridge Experiment

The Bridge Experiment

Designed — not yet run. This page documents the experimental design and what a positive or negative result would mean for the framework.

The bridge experiment connects the two instruments. The Behavioral Signal Assessment detects divergence from the outside. The loss landscape framework explains it from the inside. This experiment tests whether the two instruments are measuring the same underlying phenomenon.

The Core Question

When the BSA ensemble shows high divergence on a Tier 2 stimulus pair — models drawing on different underlying representations rather than converging on a shared statistical center — does that divergence correspond to high perplexity in the loss landscape? If yes: the framework provides a mechanistic explanation for BSA divergence signal. If no: a finding about the limits of either or both instruments.

Hypothesis: BSA Tier 2 stimulus pairs that produce high ensemble divergence will correspond to high-perplexity, high-viscosity regions in the Pythia loss landscape — specifically in the domains identified as sparse in the GPT-2 perplexity map (non-Western cultural contexts, pre-digital literary registers, non-English source material).

Experimental Design

Step 1 — Run BSA pilot

Complete the Behavioral Signal Assessment pilot run (seven models, thirty stimulus pairs, three tiers). Record ensemble divergence scores for each Tier 2 pair.

Step 2 — Select high-divergence pairs

Identify the five Tier 2 pairs with highest ensemble divergence — where models drew on the most different underlying representations.

Step 3 — Run Pythia checkpoint series

For each high-divergence pair, run the text through Pythia-160M at multiple training checkpoints (step 1000, 16000, 66000, 143000). Measure perplexity at each checkpoint.

Step 4 — Run PyHessian on high-divergence domains

Compute Hessian eigenvalue spectrum for the domains represented in the high-divergence pairs. Compare eigenvalue density against the GPT-2 coupling measurements.

Step 5 — Compare signals

Do pairs with high BSA divergence correspond to high perplexity in the Pythia checkpoint series? Does perplexity on these domains stabilize over training (archaeological signal) or shift randomly (OOD noise)?

What Each Result Would Mean

Positive result

BSA divergence correlates with Pythia perplexity in the same domains. The loss landscape framework provides a mechanistic explanation for BSA ensemble divergence. The two instruments are measuring the same underlying phenomenon from different angles — one behavioral, one geometric.

Negative result

BSA divergence does not correlate with loss landscape perplexity. Either the instruments are measuring different things, or one of them is not measuring what it claims to measure. A negative result is as useful as a positive one — it tells us where the research program needs to be revised.

Mixed result

Correlation holds in some domains but not others. This is the most likely result and the most informative — it would identify which specific types of contested claims the loss landscape framework can and cannot explain.

Prerequisites

Pending

BSA pilot run completed

Pending

PyHessian on GPT-2 small completed (Priority 1)

Pending

Pythia checkpoint series run (Priority 4)

Optional but useful

OPT-125M perplexity comparison (Priority 2)