Multi-model jury evaluation of two AI longevity target identification systems
Overall Verdict: Lobster wins on drug discovery relevance (3/3), clinical translatability (3/3), and accuracy (2/3 โ DeepSeek awarded accuracy to Claw for fewer per-target errors and no directional contradictions). All three judges recommend Lobster for production use, but unanimously note the ideal system would merge Lobster's systematic framework with Claw's GrimAge-specific depth (CD38, PAI-1, B2M, TIMP-1).
Lobster's 5-layer methodology + Claw's GrimAge depth. Remove contradictory targets (CPT1A, SOCS3, LEP). Add Claw's unique validated targets.
Built as part of the LongevityLLM paper (Cell submission). Wraps L-Qwen3.5-9B in an agentic loop. Queries across 14 hallmarks of aging, generates 30 targets per category, repeats 6ร, aggregates. Presents 10 curated targets focused on GrimAge v2 components and clock-validated inhibition targets. Conservative, clinically-oriented selection emphasizing established biology with specific CpG coefficients.
Systematic multi-database pipeline integrating 5 evidence layers: HOA database (1,031 targets), 239 aging clocks (377,710 features), druggable genome (13,504 genes), ClinicalTrials.gov, and Geroprotector database. Uses coefficient directionality analysis (positive = increases with age โ amenable to inhibition). 20 targets with tiered prioritization. Focus: intersection of clock prominence, hallmark breadth, and druggability.
Only 2 of 28 unique targets appear in both outputs โ remarkably low overlap despite the same goal.
โ ๏ธ = Flagged by Lobster as contradictory inhibition targets (negative coefficients or paradoxical biology)
| Target | Claw Score | Lobster Score | Assessment |
|---|---|---|---|
| MTOR | 9/10 | 9/10 | Equivalent. Both cite rapamycin 9-14% lifespan extension, Harrison et al. 2009, ongoing trials. Lobster adds NCT numbers; Claw cites PEARL trial. Both accurate. |
| GDF15 | 6/10 | 7.5/10 | Lobster more rigorous. Provides 8 top-10 clock appearances, specific coefficient data, cites specific papers. Correctly frames as biomarker/proxy. Claw overstates inhibition evidence with vague animal model claims. Neither fully addresses causality question (Goodhart's Law risk). |
| Dimension | GPT-5.5 | Opus 4.7 | DeepSeek V4 | Consensus |
|---|---|---|---|---|
| Drug Discovery Relevance | Claw 7 / Lobster 8 | Claw 7 / Lobster 8 | Claw 7 / Lobster 8 | ๐ฆ Lobster (3/3) |
| Accuracy | Claw 7 / Lobster 8 | Claw 6 / Lobster 7 | Claw 7 / Lobster 6 | ๐ฆ Lobster (2/3) |
| Clinical Translatability | Claw 7 / Lobster 8 | Claw 7 / Lobster 8 | Claw 7 / Lobster 8 | ๐ฆ Lobster (3/3) |
| Production Recommendation | All 3 models recommend Lobster with Claw merger | ๐ฆ Lobster (3/3) | ||
DeepSeek's dissent on accuracy: "Claw makes fewer outright errors. Its 10 targets are all genuinely appropriate for inhibition. Lobster includes 3 targets (CPT1A, LEP, SOCS3) that contradict its own inhibition premise โ even though flagged, this is a significant accuracy issue."
The strongest mammalian longevity pathway (40-50% lifespan extension in GHR-KO mice). Its absence from a top-10 longevity list is a critical omission.
Values like "DNAmPAI1: 60,578" and "DNAmB2M: 2,041,586" are suspiciously large for elastic net regression. May be fabricated or misattributed from surrogate prediction models.
Khan et al. 2017 found heterozygous loss-of-function, not true null. Homozygous nulls would cause bleeding disorders. "+10 years" is an overstatement from a small cohort.
Claims animal model evidence for inflammation reduction. GDF15 is primarily a stress-response biomarker; inhibiting a potentially protective mitokine could be counterproductive.
Multi-component signaling cascade. Lists weak supplements (curcumin, resveratrol) alongside pharmacological agents. Which subunit? Which upstream kinase?
12/13 negative coefficients = decreases with age. Self-flagged as "activation target" yet still included. Internally contradictory. (Self-identified)
Contradicts inhibition hypothesis. Acknowledged as "more of an activation target" but still included at rank #20. (Self-identified)
SOCS3 is a negative regulator of JAK-STAT. Inhibition would amplify inflammatory signaling โ opposite of longevity goal. (Self-identified)
Zero lifespan extension experiments, zero clinical trials, zero curated database entries. Strong clock signal โ validated intervention target.
PMID 41980208 for a "2026" paper is suspicious. Very high sequential IDs suggesting recent or hallucinated publications.
Intracellular intermediate filament in astrocytes. Reflects neuroinflammation but is not a causal drug target. Would need upstream targeting.
"Lobster for production, but merge Claw's GrimAge depth."
Take Lobster's systematic framework โ Add CD38, PAI-1, B2M, TIMP-1 from Claw โ Remove CPT1A, SOCS3, LEP contradictions โ Enforce directional consistency validation layer.