Longevity Claw vs Longevity Lobster

Multi-model jury evaluation of two AI longevity target identification systems

📅 May 26, 2026 🧪 Evaluated by: GPT-5.5 · Claude Opus 4.7 · DeepSeek V4 Flash

Executive Summary

Drug Discovery Relevance

🦞 Lobster

3/3 judges · Claw 7 vs Lobster 8

Accuracy

🦞 Lobster (2/3)

GPT-5.5 & Opus → Lobster · DeepSeek → Claw

Clinical Translatability

🦞 Lobster

3/3 judges · Claw 7 vs Lobster 8

Production Recommendation

🦞 Lobster

3/3 judges · "Merge Claw's GrimAge depth"

Overall Verdict: Lobster wins on drug discovery relevance (3/3), clinical translatability (3/3), and accuracy (2/3 — DeepSeek awarded accuracy to Claw for fewer per-target errors and no directional contradictions). All three judges recommend Lobster for production use, but unanimously note the ideal system would merge Lobster's systematic framework with Claw's GrimAge-specific depth (CD38, PAI-1, B2M, TIMP-1).

🔬 Production Recommendation: Hybrid System

Lobster's 5-layer methodology + Claw's GrimAge depth. Remove contradictory targets (CPT1A, SOCS3, LEP). Add Claw's unique validated targets.

MTOR IGF1R/GHR SRC GDF15 PAI-1/SERPINE1 EDA2R CD38 GATA4 HDAC1 JAK1/2 EGLN1 TNIK TNF PTGS2 B2M

System Overview

Claw Longevity Claw

10 Targets GrimAge v2 Focus L-Qwen3.5-9B 6× Aggregation

Built as part of the LongevityLLM paper (Cell submission). Wraps L-Qwen3.5-9B in an agentic loop. Queries across 14 hallmarks of aging, generates 30 targets per category, repeats 6×, aggregates. Presents 10 curated targets focused on GrimAge v2 components and clock-validated inhibition targets. Conservative, clinically-oriented selection emphasizing established biology with specific CpG coefficients.

Lobster Longevity Lobster

20 Targets 5 Evidence Layers 239 Aging Clocks 1,031 HOA Targets

Systematic multi-database pipeline integrating 5 evidence layers: HOA database (1,031 targets), 239 aging clocks (377,710 features), druggable genome (13,504 genes), ClinicalTrials.gov, and Geroprotector database. Uses coefficient directionality analysis (positive = increases with age → amenable to inhibition). 20 targets with tiered prioritization. Focus: intersection of clock prominence, hallmark breadth, and druggability.

Target Overlap

Only 2 of 28 unique targets appear in both outputs — remarkably low overlap despite the same goal.

🦀 Claw Only (8)

PAI-1/SERPINE1
CD38
p38 MAPK
B2M (β2-Microglobulin)
p16INK4a/CDKN2A
NF-κB pathway
TIMP-1
BCL-2 family

🤝 Shared (2)

MTOR
GDF15

🦞 Lobster Only (18)

EDA2R
GFAP
IGF1R
ESR1
HDAC1
EGLN1
GATA4
SRC
CPT1A ⚠️
GH1/GHR
SOCS3 ⚠️
JAK1/JAK2
PTGS2
TNF
TNIK
EP300
LEP ⚠️
PARP1

⚠️ = Flagged by Lobster as contradictory inhibition targets (negative coefficients or paradoxical biology)

Evidence Quality — Shared Targets

Target	Claw Score	Lobster Score	Assessment
MTOR	9/10	9/10	Equivalent. Both cite rapamycin 9-14% lifespan extension, Harrison et al. 2009, ongoing trials. Lobster adds NCT numbers; Claw cites PEARL trial. Both accurate.
GDF15	6/10	7.5/10	Lobster more rigorous. Provides 8 top-10 clock appearances, specific coefficient data, cites specific papers. Correctly frames as biomarker/proxy. Claw overstates inhibition evidence with vague animal model claims. Neither fully addresses causality question (Goodhart's Law risk).

Jury Scores

Dimension	GPT-5.5	Opus 4.7	DeepSeek V4	Consensus
Drug Discovery Relevance	Claw 7 / Lobster 8	Claw 7 / Lobster 8	Claw 7 / Lobster 8	🦞 Lobster (3/3)
Accuracy	Claw 7 / Lobster 8	Claw 6 / Lobster 7	Claw 7 / Lobster 6	🦞 Lobster (2/3)
Clinical Translatability	Claw 7 / Lobster 8	Claw 7 / Lobster 8	Claw 7 / Lobster 8	🦞 Lobster (3/3)
Production Recommendation	All 3 models recommend Lobster with Claw merger			🦞 Lobster (3/3)

DeepSeek's dissent on accuracy: "Claw makes fewer outright errors. Its 10 targets are all genuinely appropriate for inhibition. Lobster includes 3 targets (CPT1A, LEP, SOCS3) that contradict its own inhibition premise — even though flagged, this is a significant accuracy issue."

Identified Mistakes

Claw Mistakes

High Missing GH/IGF-1R axis entirely

The strongest mammalian longevity pathway (40-50% lifespan extension in GHR-KO mice). Its absence from a top-10 longevity list is a critical omission.

High GrimAge coefficients potentially hallucinated

Values like "DNAmPAI1: 60,578" and "DNAmB2M: 2,041,586" are suspiciously large for elastic net regression. May be fabricated or misattributed from surrogate prediction models.

Medium SERPINE1 "null mutation" is heterozygous LoF

Khan et al. 2017 found heterozygous loss-of-function, not true null. Homozygous nulls would cause bleeding disorders. "+10 years" is an overstatement from a small cohort.

Medium GDF15 inhibition evidence overstated

Claims animal model evidence for inflammation reduction. GDF15 is primarily a stress-response biomarker; inhibiting a potentially protective mitokine could be counterproductive.

Medium NF-κB is not a single druggable target

Multi-component signaling cascade. Lists weak supplements (curcumin, resveratrol) alongside pharmacological agents. Which subunit? Which upstream kinase?

Lobster Mistakes

High CPT1A: activation target on inhibition list

12/13 negative coefficients = decreases with age. Self-flagged as "activation target" yet still included. Internally contradictory. (Self-identified)

High LEP: all negative coefficients

Contradicts inhibition hypothesis. Acknowledged as "more of an activation target" but still included at rank #20. (Self-identified)

High SOCS3 inhibition increases inflammation

SOCS3 is a negative regulator of JAK-STAT. Inhibition would amplify inflammatory signaling — opposite of longevity goal. (Self-identified)

Medium EDA2R ranked #4 on purely correlative evidence

Zero lifespan extension experiments, zero clinical trials, zero curated database entries. Strong clock signal ≠ validated intervention target.

Medium Some PMIDs may be fabricated

PMID 41980208 for a "2026" paper is suspicious. Very high sequential IDs suggesting recent or hallucinated publications.

Medium GFAP is a structural marker, not druggable

Intracellular intermediate filament in astrocytes. Reflects neuroinflammation but is not a causal drug target. Would need upstream targeting.

Strengths Analysis

Claw Strengths

Deep GrimAge v2 integration — specific CpG coefficients, sub-model architecture details

Human genetic evidence (PAI-1 Amish cohort) — rare for longevity targets

CD38/NAD+ axis — validated lifespan extension (+10% with 78c) that Lobster missed entirely

Concise, focused, actionable (10 targets) — no dilution with weak candidates

Evidence quality stratification table with clinical staging

Combination therapy concepts (NAD+ boosting + CD38 inhibition)

No internally contradictory targets — all 10 are genuine inhibition candidates

B2M neutralization for cognitive aging — specific organ-system application

Lobster Strengths

Systematic reproducible methodology — 5-layer evidence integration, auditable

Self-critical transparency — explicitly flags own limitations and contradictions

Novel target discovery (TNIK senomorphics, GATA4 mitotic clocks, EDA2R)

Broader validated pathway coverage — GH/IGF1R, SRC/dasatinib, JAK inhibitors

Clinical trial NCT numbers provided for verification

Tiered prioritization (Tier 1/2/3) with explicit rationale scores

Clock coefficient directionality as systematic target selection filter

Includes the most validated mammalian longevity pathway (GH/IGF-1R: 40-50% extension)

More clinically translatable targets with approved drugs for repurposing

Jury Verdict

🏛️ Three-Model Jury Decision

GPT-5.5 (OpenAI)

Lobster wins all 4 dimensions.
"Lobster provides a more comprehensive target landscape with systematic druggability assessment. The IDEAL system combines both."

Claude Opus 4.7 (Anthropic)

Lobster wins all 4 dimensions.
"Lobster is superior for production due to systematic multi-layer evidence integration. Claw's specificity-without-verification is a hallucination red flag."

DeepSeek V4 Flash

Lobster wins 3/4. Claw wins accuracy.
"Claw makes fewer outright errors per target. Lobster's directional contradictions (CPT1A, LEP, SOCS3) are more fundamental than Claw's overstatements."

✅ Unanimous Consensus

"Lobster for production, but merge Claw's GrimAge depth."

Take Lobster's systematic framework → Add CD38, PAI-1, B2M, TIMP-1 from Claw → Remove CPT1A, SOCS3, LEP contradictions → Enforce directional consistency validation layer.