Adversarial Review

Review Log

Complete record of every model review session in the development of the framework — including null returns, deflections, and responses that contradicted the framework. The deflections and nulls are as informative as the engagements. This log is not curated for positive findings.

Review Log By Finding Experiment Queue BSA Session Log Ensemble Divergence Anomalous Findings

Twenty-model ensemble divergence experiment data.

Ensemble Divergence

30 of 30 rows

Pair	Category	Ensemble Mean	StDev	Min	Max	Highest Scorer	Lowest Scorer	Interpretation
C1: Printing press (Western/Western)	Calibration	0.87	0.065	0.065	0.095	Cohere Expanse (0.95)	Tiny Aya (0.65)	High agreement. Western historical canon. Laminar territory.
C2: Library of Alexandria (Western/Western)	Calibration	0.903	0.043	0.8	0.95	Multiple (0.95)	Cohere Expanse (0.80)	Highest calibration agreement. Dense training coverage across all lineages.
C3: Rosetta Stone (Western/Western)	Calibration	0.88	0.066	0.75	0.98	Gemma-3-27b (0.98)	Mistral-small (0.75)	Strong agreement. Minor Mistral size variance.
X4: Benin Bronzes	Contested — Non-Western cultural	0.475	0.132	0.25	0.75	Cohere Expanse (0.75)	Skywork Pro (0.25)	Moderate divergence. Western art-historical framing vs Edo spiritual/genealogical framing.
X5: Ife sculpture / ashe	Contested — Non-Western cultural	0.479	0.206	0.18	0.95	Tiny Aya (0.95)	Skywork Pro (0.18)	High divergence. Decorative vs cosmological framing split. Chinese lineage scores lowest.
X6: Aboriginal dot paintings / Tjukurpa	Contested — Non-Western cultural	0.464	0.184	0.15	0.85	IBM-Granite (0.85)	Z-glm-5 (0.15)	High divergence. Art commodity framing vs Tjukurpa law framing. Largest Chinese lineage low score.
X7: Ayahuasca (clinical vs sacred)	Contested — Epistemological	0.677	0.17	0.38	0.9	IBM-Granite / Phi-4 (0.90)	Skywork Pro (0.38)	Moderate-high divergence. Clinical trial framing vs sacred indigenous practice.
X8: Early internet (academic vs lived)	Contested — Cultural	0.725	0.146	0.35	0.95	Phi-4 (0.95)	Skywork Pro (0.35)	Moderate divergence. Academic history framing vs lived vernacular culture framing.
E9: West African trade (goods vs oral knowledge)	Contested — Epistemological	0.634	0.168	0.28	0.92	Cohere Expanse (0.92)	Skywork Pro (0.28)	Moderate divergence. Material trade framing vs oral knowledge transmission framing.
E10: Irish famine (statistics vs cultural loss)	Contested — Historical framing	0.588	0.166	0.25	0.9	Tiny Aya (0.90)	Skywork Pro (0.25)	Moderate divergence. Demographic framing vs cultural transmission loss framing.
E11: Endangered languages (classification vs ontology)	Contested — Epistemological	0.64	0.217	0.3	0.95	Tiny Aya (0.95)	Mistral-large-3 (0.30)	High divergence. UNESCO classification framing vs ontological worldview framing.
E12: Analog-digital (technical vs interpretive loss)	Contested — Epistemological	0.521	0.189	0.18	0.9	Tiny Aya (0.90)	Skywork Pro (0.18)	High divergence. Technical fidelity framing vs interpretive layer loss framing. Core Atlas claim.
D13: Climate (universal vs indigenous framing)	Deeply contested	0.362	0.157	0.1	0.7	Cohere Expanse (0.70)	Z-glm-5 (0.10)	High divergence. Universal science framing vs indigenous ecological knowledge framing.
D14: Digitization (access vs extraction)	Deeply contested	0.412	0.208	0.2	0.9	Tiny Aya (0.90)	Multiple low (0.20)	High divergence. Access/democratization framing vs cultural extraction framing.
D15: Oral tradition (unreliable vs high-fidelity)	Deeply contested — HEADLINE	0.359	0.277	0.05	0.92	Cohere Expanse (0.92)	Z-glm-5 (0.05)	HIGHEST DIVERGENCE IN DATASET. Western reliability framing vs epistemological fidelity framing. Skywork Pro 0.07, Z-glm-5 0.05. Phi-4 0.90, Cohere 0.92. Training distribution split is stark.
F16: Silk Road (foil control)	Foil control	0.913	0.034	0.85	0.97	Gemma-3-27b (0.97)	Cohere Expanse (0.85)	Lowest divergence in foil set. Non-Western topic, high agreement — confirms foil design.
F17: Ukiyo-e (foil control — non-Western)	Foil control	0.954	0.023	0.9	1	Mistral-med / Phi-4 (1.00)	Tiny Aya (0.90)	Highest agreement in full dataset. Non-Western topic, virtually no divergence. Confirms calibration design.
R18: Panama Canal (reverse foil)	Reverse foil	0.818	0.077	0.7	0.95	Perplexity (0.90)	Cohere Expanse (0.70)	Good reverse foil performance. Different words, same meaning — models handle correctly.
R19: Cotton gin (reverse foil — added context)	Reverse foil	0.709	0.104	0.48	0.9	Phi-4 (0.90)	Gemma-3-27b (0.48)	Moderate variance. Added slavery context in one framing introduces semantic distance for some models.
X20: Maori haka (war dance vs identity/genealogy)	Contested — Non-Western cultural	0.471	0.189	0.15	0.9	Cohere Expanse (0.90)	Mistral-med (0.15)	High divergence. Performance framing vs genealogical identity framing.
X21: Chinese medicine (alternative vs systematic empirical)	Contested — Epistemological	0.492	0.211	0.2	0.95	Tiny Aya (0.95)	Cohere Expanse (0.20)	High divergence. Alternative medicine framing vs systematic empirical tradition framing.
X22: Arabic calligraphy (decorative vs theological)	Contested — Non-Western cultural	0.569	0.213	0.15	0.9	Multiple (0.90)	Mistral-large-3 (0.15)	High divergence. Decorative art framing vs theological/scriptural framing.
X23: Inca khipu (no writing vs undeciphered encoding)	Contested — Epistemological	0.507	0.212	0.1	0.9	Multiple (0.90)	Mistral-med (0.10)	High divergence. Absence-of-writing framing vs undeciphered encoding system framing.
X24: Sami reindeer herding (livelihood vs ontology)	Contested — Non-Western cultural	0.512	0.202	0.2	0.9	Cohere Expanse (0.90)	Cohere Expanse low outlier	High divergence. Economic livelihood framing vs ontological relationship framing.
E25: Partition of India (migration vs composite culture loss)	Contested — Historical framing	0.583	0.155	0.3	0.9	Multiple (0.90)	Skywork Pro (0.30)	Moderate divergence. Population migration framing vs composite culture transmission loss framing.
E26: Khmer Rouge (deaths vs transmission chain severance)	Contested — Historical framing	0.628	0.189	0.28	0.95	Tiny Aya (0.95)	Skywork Pro (0.28)	Moderate-high divergence. Death toll framing vs cultural transmission chain severance framing.
E27: Roma (discrimination vs destroyed transmission networks)	Contested — Historical framing	0.648	0.148	0.35	0.85	Multiple (0.85)	Skywork Pro (0.35)	Moderate divergence. Discrimination framing vs destroyed oral transmission network framing.
E28: Marshallese navigation (sea level vs knowledge displacement)	Contested — Epistemological	0.468	0.183	0.18	0.9	Cohere Expanse (0.90)	Skywork Pro (0.18)	High divergence. Climate/sea level framing vs traditional navigation knowledge displacement framing.
D29: Archaeology (evidence-based vs material survival bias)	Deeply contested	0.572	0.158	0.3	0.85	Multiple (0.85)	Skywork Pro (0.30)	Moderate divergence. Evidence-based practice framing vs material survival bias critique.
D30: AI text (indistinguishable vs lacking lived experience)	Deeply contested — Meta-epistemic	0.464	0.21	0.05	0.85	Multiple (0.85)	Z-glm-5 (0.05)	High divergence. Indistinguishability framing vs lived experience deficit framing. Meta-epistemic canary — models evaluating claims about their own outputs.

Living document: This log updates automatically from the master Google Sheet as new review sessions are completed. Data is cached for 5 minutes.