Empirical Work / Experiment Queue

Experiment Queue

Priority-ordered list of pending experiments with falsification criteria, methods, and current status. Updates automatically from the master Google Sheet.

Experiment Queue

Click any row to expand full method and falsification criterion.

17 of 17 rows — click any row to expand full details

ExperimentTypePriorityModelsStatusResults
Divergence Test 1Model prompt Reference only ~10 models SupersededMethodology refined for Test 3.
Divergence Test 2 Model promptReference only ~11 models SupersededMethodology refined for Test 3.
Divergence Test 3 Model prompt × 20 Completed Skywork Pro, DeepSeek 3.2, Qwen 3.5-max, Qwen3.5-122b, Cohere Expanse, IBM-Granite-h-small, Tiny Aya, Z-glm-5, Claude Opus 4.5, GPT-5.2, Gemini 3.1 Pro, Perplexity, Mistral-large-3, Mistral-med-2505, Mistral-small-2603, Llama 3.3 70B, Gemma-3-27b, Grok 4.2, Phi-4, SciFact-search Completed See Ensemble Divergence page.
Divergence Test 4 Model prompt × TBD Pending Architecturally distinct lineages — non-translation-optimized, 4+ lineages minimum Pending
Divergence Test 5Model prompt × 20PendingSkywork Pro, DeepSeek 3.2, Qwen 3.5-max, Qwen3.5-122b, Cohere Expanse, IBM-Granite-h-small, Tiny Aya, Z-glm-5, Claude Opus 4.5, GPT-5.2, Gemini 3.1 Pro, Perplexity, Mistral-large-3, Mistral-med-2505, Mistral-small-2603, Llama 3.3 70B, Gemma-3-27b, Grok 4.2, Phi-4, SciFact-search Pending
PyHessian — GPT-2 small Empirical (Colab)1 — FIRSTGPT-2 small + ALBERT-basePending
OPT-125M perplexity comparison Empirical (Colab) 2OPT-125M vs GPT-2 baselinePending
Mistral-7B BASE vs INSTRUCTEmpirical + Model prompt3Mistral-7B BASE vs INSTRUCTPending
Pythia Multi-Scale Checkpoint: Series AEmpirical (Colab) 4 — PRIORITYPythia 70M, 160M, 410M, 1B — full checkpoint seriesPending
Pythia Multi-Scale Checkpoint: Series BEmpirical (Colab) 4 — PRIORITYPythia 70M, 160M, 410M, 1B — full checkpoint seriesPending
Falcon-7B Legibility Test Series: AModel prompt x2 5Falcon-7B-InstructPending
Falcon-7B Legibility Test Series: Bconcurrent with above5Falcon-7B-InstructPending
ALBERT-base PyHessianEmpirical (Colab) 6 google/albert-base-v2 Pending
BLOOM-560M multilingual signalModel prompt 7 bigscience/bloom-560mPending
Phi-2 domain collapse Model prompt 8microsoft/phi-2 Pending
GPT-2 head ablation Empirical (Colab) 9GPT-2 smallPending
BSA Pilot RunProtocol10Claude 3.5 Sonnet, GPT-4o, Gemini 1.5 Pro, Mistral Large, DeepSeek-V3, Llama 3.1 405B, PerplexityPending
Living document: This queue updates automatically as experiments are designed, run, and completed. Data is cached for 5 minutes.