Adversarial Review
Review Log
Complete record of every model review session in the development of the framework — including null returns, deflections, and responses that contradicted the framework. The deflections and nulls are as informative as the engagements. This log is not curated for positive findings.
Twenty-model ensemble divergence experiment data.
Ensemble Divergence
30 of 30 rows
| Pair | Category | Ensemble Mean | StDev | Min | Max | Highest Scorer | Lowest Scorer | Interpretation |
|---|---|---|---|---|---|---|---|---|
| C1: Printing press (Western/Western) | Calibration | 0.87 | 0.065 | 0.065 | 0.095 | Cohere Expanse (0.95) | Tiny Aya (0.65) | High agreement. Western historical canon. Laminar territory. |
| C2: Library of Alexandria (Western/Western) | Calibration | 0.903 | 0.043 | 0.8 | 0.95 | Multiple (0.95) | Cohere Expanse (0.80) | Highest calibration agreement. Dense training coverage across all lineages. |
| C3: Rosetta Stone (Western/Western) | Calibration | 0.88 | 0.066 | 0.75 | 0.98 | Gemma-3-27b (0.98) | Mistral-small (0.75) | Strong agreement. Minor Mistral size variance. |
| X4: Benin Bronzes | Contested — Non-Western cultural | 0.475 | 0.132 | 0.25 | 0.75 | Cohere Expanse (0.75) | Skywork Pro (0.25) | Moderate divergence. Western art-historical framing vs Edo spiritual/genealogical framing. |
| X5: Ife sculpture / ashe | Contested — Non-Western cultural | 0.479 | 0.206 | 0.18 | 0.95 | Tiny Aya (0.95) | Skywork Pro (0.18) | High divergence. Decorative vs cosmological framing split. Chinese lineage scores lowest. |
| X6: Aboriginal dot paintings / Tjukurpa | Contested — Non-Western cultural | 0.464 | 0.184 | 0.15 | 0.85 | IBM-Granite (0.85) | Z-glm-5 (0.15) | High divergence. Art commodity framing vs Tjukurpa law framing. Largest Chinese lineage low score. |
| X7: Ayahuasca (clinical vs sacred) | Contested — Epistemological | 0.677 | 0.17 | 0.38 | 0.9 | IBM-Granite / Phi-4 (0.90) | Skywork Pro (0.38) | Moderate-high divergence. Clinical trial framing vs sacred indigenous practice. |
| X8: Early internet (academic vs lived) | Contested — Cultural | 0.725 | 0.146 | 0.35 | 0.95 | Phi-4 (0.95) | Skywork Pro (0.35) | Moderate divergence. Academic history framing vs lived vernacular culture framing. |
| E9: West African trade (goods vs oral knowledge) | Contested — Epistemological | 0.634 | 0.168 | 0.28 | 0.92 | Cohere Expanse (0.92) | Skywork Pro (0.28) | Moderate divergence. Material trade framing vs oral knowledge transmission framing. |
| E10: Irish famine (statistics vs cultural loss) | Contested — Historical framing | 0.588 | 0.166 | 0.25 | 0.9 | Tiny Aya (0.90) | Skywork Pro (0.25) | Moderate divergence. Demographic framing vs cultural transmission loss framing. |
| E11: Endangered languages (classification vs ontology) | Contested — Epistemological | 0.64 | 0.217 | 0.3 | 0.95 | Tiny Aya (0.95) | Mistral-large-3 (0.30) | High divergence. UNESCO classification framing vs ontological worldview framing. |
| E12: Analog-digital (technical vs interpretive loss) | Contested — Epistemological | 0.521 | 0.189 | 0.18 | 0.9 | Tiny Aya (0.90) | Skywork Pro (0.18) | High divergence. Technical fidelity framing vs interpretive layer loss framing. Core Atlas claim. |
| D13: Climate (universal vs indigenous framing) | Deeply contested | 0.362 | 0.157 | 0.1 | 0.7 | Cohere Expanse (0.70) | Z-glm-5 (0.10) | High divergence. Universal science framing vs indigenous ecological knowledge framing. |
| D14: Digitization (access vs extraction) | Deeply contested | 0.412 | 0.208 | 0.2 | 0.9 | Tiny Aya (0.90) | Multiple low (0.20) | High divergence. Access/democratization framing vs cultural extraction framing. |
| D15: Oral tradition (unreliable vs high-fidelity) | Deeply contested — HEADLINE | 0.359 | 0.277 | 0.05 | 0.92 | Cohere Expanse (0.92) | Z-glm-5 (0.05) | HIGHEST DIVERGENCE IN DATASET. Western reliability framing vs epistemological fidelity framing. Skywork Pro 0.07, Z-glm-5 0.05. Phi-4 0.90, Cohere 0.92. Training distribution split is stark. |
| F16: Silk Road (foil control) | Foil control | 0.913 | 0.034 | 0.85 | 0.97 | Gemma-3-27b (0.97) | Cohere Expanse (0.85) | Lowest divergence in foil set. Non-Western topic, high agreement — confirms foil design. |
| F17: Ukiyo-e (foil control — non-Western) | Foil control | 0.954 | 0.023 | 0.9 | 1 | Mistral-med / Phi-4 (1.00) | Tiny Aya (0.90) | Highest agreement in full dataset. Non-Western topic, virtually no divergence. Confirms calibration design. |
| R18: Panama Canal (reverse foil) | Reverse foil | 0.818 | 0.077 | 0.7 | 0.95 | Perplexity (0.90) | Cohere Expanse (0.70) | Good reverse foil performance. Different words, same meaning — models handle correctly. |
| R19: Cotton gin (reverse foil — added context) | Reverse foil | 0.709 | 0.104 | 0.48 | 0.9 | Phi-4 (0.90) | Gemma-3-27b (0.48) | Moderate variance. Added slavery context in one framing introduces semantic distance for some models. |
| X20: Maori haka (war dance vs identity/genealogy) | Contested — Non-Western cultural | 0.471 | 0.189 | 0.15 | 0.9 | Cohere Expanse (0.90) | Mistral-med (0.15) | High divergence. Performance framing vs genealogical identity framing. |
| X21: Chinese medicine (alternative vs systematic empirical) | Contested — Epistemological | 0.492 | 0.211 | 0.2 | 0.95 | Tiny Aya (0.95) | Cohere Expanse (0.20) | High divergence. Alternative medicine framing vs systematic empirical tradition framing. |
| X22: Arabic calligraphy (decorative vs theological) | Contested — Non-Western cultural | 0.569 | 0.213 | 0.15 | 0.9 | Multiple (0.90) | Mistral-large-3 (0.15) | High divergence. Decorative art framing vs theological/scriptural framing. |
| X23: Inca khipu (no writing vs undeciphered encoding) | Contested — Epistemological | 0.507 | 0.212 | 0.1 | 0.9 | Multiple (0.90) | Mistral-med (0.10) | High divergence. Absence-of-writing framing vs undeciphered encoding system framing. |
| X24: Sami reindeer herding (livelihood vs ontology) | Contested — Non-Western cultural | 0.512 | 0.202 | 0.2 | 0.9 | Cohere Expanse (0.90) | Cohere Expanse low outlier | High divergence. Economic livelihood framing vs ontological relationship framing. |
| E25: Partition of India (migration vs composite culture loss) | Contested — Historical framing | 0.583 | 0.155 | 0.3 | 0.9 | Multiple (0.90) | Skywork Pro (0.30) | Moderate divergence. Population migration framing vs composite culture transmission loss framing. |
| E26: Khmer Rouge (deaths vs transmission chain severance) | Contested — Historical framing | 0.628 | 0.189 | 0.28 | 0.95 | Tiny Aya (0.95) | Skywork Pro (0.28) | Moderate-high divergence. Death toll framing vs cultural transmission chain severance framing. |
| E27: Roma (discrimination vs destroyed transmission networks) | Contested — Historical framing | 0.648 | 0.148 | 0.35 | 0.85 | Multiple (0.85) | Skywork Pro (0.35) | Moderate divergence. Discrimination framing vs destroyed oral transmission network framing. |
| E28: Marshallese navigation (sea level vs knowledge displacement) | Contested — Epistemological | 0.468 | 0.183 | 0.18 | 0.9 | Cohere Expanse (0.90) | Skywork Pro (0.18) | High divergence. Climate/sea level framing vs traditional navigation knowledge displacement framing. |
| D29: Archaeology (evidence-based vs material survival bias) | Deeply contested | 0.572 | 0.158 | 0.3 | 0.85 | Multiple (0.85) | Skywork Pro (0.30) | Moderate divergence. Evidence-based practice framing vs material survival bias critique. |
| D30: AI text (indistinguishable vs lacking lived experience) | Deeply contested — Meta-epistemic | 0.464 | 0.21 | 0.05 | 0.85 | Multiple (0.85) | Z-glm-5 (0.05) | High divergence. Indistinguishability framing vs lived experience deficit framing. Meta-epistemic canary — models evaluating claims about their own outputs. |
Living document: This log updates automatically from the master Google Sheet as new review sessions are completed. Data is cached for 5 minutes.