Empirical Work / Experiment Queue
Experiment Queue
Priority-ordered list of pending experiments with falsification criteria, methods, and current status. Updates automatically from the master Google Sheet.
Experiment Queue
Click any row to expand full method and falsification criterion.
17 of 17 rows — click any row to expand full details
| Experiment | Type | Priority | Models | Status | Results | |
|---|---|---|---|---|---|---|
| Divergence Test 1 | Model prompt | Reference only | ~10 models | Superseded | Methodology refined for Test 3. | |
| Divergence Test 2 | Model prompt | Reference only | ~11 models | Superseded | Methodology refined for Test 3. | |
| Divergence Test 3 | Model prompt × 20 | Completed | Skywork Pro, DeepSeek 3.2, Qwen 3.5-max, Qwen3.5-122b, Cohere Expanse, IBM-Granite-h-small, Tiny Aya, Z-glm-5, Claude Opus 4.5, GPT-5.2, Gemini 3.1 Pro, Perplexity, Mistral-large-3, Mistral-med-2505, Mistral-small-2603, Llama 3.3 70B, Gemma-3-27b, Grok 4.2, Phi-4, SciFact-search | Completed | See Ensemble Divergence page. | |
| Divergence Test 4 | Model prompt × TBD | Pending | Architecturally distinct lineages — non-translation-optimized, 4+ lineages minimum | Pending | ||
| Divergence Test 5 | Model prompt × 20 | Pending | Skywork Pro, DeepSeek 3.2, Qwen 3.5-max, Qwen3.5-122b, Cohere Expanse, IBM-Granite-h-small, Tiny Aya, Z-glm-5, Claude Opus 4.5, GPT-5.2, Gemini 3.1 Pro, Perplexity, Mistral-large-3, Mistral-med-2505, Mistral-small-2603, Llama 3.3 70B, Gemma-3-27b, Grok 4.2, Phi-4, SciFact-search | Pending | ||
| PyHessian — GPT-2 small | Empirical (Colab) | 1 — FIRST | GPT-2 small + ALBERT-base | Pending | ||
| OPT-125M perplexity comparison | Empirical (Colab) | 2 | OPT-125M vs GPT-2 baseline | Pending | ||
| Mistral-7B BASE vs INSTRUCT | Empirical + Model prompt | 3 | Mistral-7B BASE vs INSTRUCT | Pending | ||
| Pythia Multi-Scale Checkpoint: Series A | Empirical (Colab) | 4 — PRIORITY | Pythia 70M, 160M, 410M, 1B — full checkpoint series | Pending | ||
| Pythia Multi-Scale Checkpoint: Series B | Empirical (Colab) | 4 — PRIORITY | Pythia 70M, 160M, 410M, 1B — full checkpoint series | Pending | ||
| Falcon-7B Legibility Test Series: A | Model prompt x2 | 5 | Falcon-7B-Instruct | Pending | ||
| Falcon-7B Legibility Test Series: B | concurrent with above | 5 | Falcon-7B-Instruct | Pending | ||
| ALBERT-base PyHessian | Empirical (Colab) | 6 | google/albert-base-v2 | Pending | ||
| BLOOM-560M multilingual signal | Model prompt | 7 | bigscience/bloom-560m | Pending | ||
| Phi-2 domain collapse | Model prompt | 8 | microsoft/phi-2 | Pending | ||
| GPT-2 head ablation | Empirical (Colab) | 9 | GPT-2 small | Pending | ||
| BSA Pilot Run | Protocol | 10 | Claude 3.5 Sonnet, GPT-4o, Gemini 1.5 Pro, Mistral Large, DeepSeek-V3, Llama 3.1 405B, Perplexity | Pending |
Living document: This queue updates automatically as experiments are designed, run, and completed. Data is cached for 5 minutes.