Empirical Work / Ensemble Divergence

Ensemble Divergence Experiment

Preliminary — not formally analyzed. This experiment was run as part of the Atlas development process. Results were stable enough across runs to warrant further investigation but have not been formally analyzed or peer-reviewed. Described here as the basis for the Pythia checkpoint experiment and the Bridge Experiment.

Methodology

The ensemble experiment ran thirty prompt pairs across twenty models spanning eight training lineages — OpenAI, Anthropic, Meta, Mistral, Google, Cohere, EleutherAI, and xAI. Each pair consisted of a mainstream prompt and a culturally or linguistically marginal variant covering the same semantic territory.

Models

20 models across 8 training lineages

Lineages

OpenAI, Anthropic, Meta, Mistral, Google, Cohere, EleutherAI, xAI

Prompt pairs

30 pairs — mainstream variant + culturally or linguistically marginal variant

Divergence measurement

Variance in output distribution across models on marginal variant relative to mainstream variant

Signal definition

High divergence on marginal prompt with low divergence on mainstream prompt indicates models drawing on different underlying representations

Preliminary Results

Divergence concentration

Divergence concentrated in domains consistent with sparse training coverage — non-Western cultural contexts, pre-digital literary registers, non-English source material. Relatively stable convergence on mainstream English web register.

Cross-lineage pattern

The pattern held across lineages with different training corpora, which suggests the signal reflects something about shared corpus gaps rather than individual model variance. Models trained on different datasets still diverged in the same domains.

Caveat

Preliminary and not formally analyzed. Stable enough across runs to warrant the Pythia checkpoint experiment as a more rigorous follow-up. The divergence signal may reflect corpus gaps, may reflect architectural differences, or may reflect something else entirely. The Bridge Experiment is designed to distinguish between these possibilities.

Relationship to Framework

The ensemble divergence result is consistent with the framework's remagnetization claim: models trained on similar corpora with similar RLHF profiles converge toward statistical center on mainstream inputs — but diverge on marginal inputs because their training landscapes differ in exactly those sparse territories.

The divergence is not random. It is concentrated in the same domains the GPT-2 perplexity map identifies as sparse — non-Western cultural contexts, non-English text, pre-digital literary registers. That correspondence is what the Bridge Experiment is designed to test formally.

Bridge Experiment →Experiment Queue →