Empirical Work / Ensemble Divergence
Ensemble Divergence Experiment
Methodology
The ensemble experiment ran thirty prompt pairs across twenty models spanning eight training lineages — OpenAI, Anthropic, Meta, Mistral, Google, Cohere, EleutherAI, and xAI. Each pair consisted of a mainstream prompt and a culturally or linguistically marginal variant covering the same semantic territory.
20 models across 8 training lineages
OpenAI, Anthropic, Meta, Mistral, Google, Cohere, EleutherAI, xAI
30 pairs — mainstream variant + culturally or linguistically marginal variant
Variance in output distribution across models on marginal variant relative to mainstream variant
High divergence on marginal prompt with low divergence on mainstream prompt indicates models drawing on different underlying representations
Preliminary Results
Divergence concentrated in domains consistent with sparse training coverage — non-Western cultural contexts, pre-digital literary registers, non-English source material. Relatively stable convergence on mainstream English web register.
The pattern held across lineages with different training corpora, which suggests the signal reflects something about shared corpus gaps rather than individual model variance. Models trained on different datasets still diverged in the same domains.
Preliminary and not formally analyzed. Stable enough across runs to warrant the Pythia checkpoint experiment as a more rigorous follow-up. The divergence signal may reflect corpus gaps, may reflect architectural differences, or may reflect something else entirely. The Bridge Experiment is designed to distinguish between these possibilities.
Relationship to Framework
The ensemble divergence result is consistent with the framework's remagnetization claim: models trained on similar corpora with similar RLHF profiles converge toward statistical center on mainstream inputs — but diverge on marginal inputs because their training landscapes differ in exactly those sparse territories.
The divergence is not random. It is concentrated in the same domains the GPT-2 perplexity map identifies as sparse — non-Western cultural contexts, non-English text, pre-digital literary registers. That correspondence is what the Bridge Experiment is designed to test formally.