Longevity Claw vs Longevity Lobster

Multi-model jury evaluation of two AI longevity target identification systems

๐Ÿ“… May 26, 2026 ๐Ÿงช Evaluated by: GPT-5.5 ยท Claude Opus 4.7 ยท DeepSeek V4 Flash

Executive Summary

Drug Discovery Relevance
๐Ÿฆž Lobster
3/3 judges ยท Claw 7 vs Lobster 8
Accuracy
๐Ÿฆž Lobster (2/3)
GPT-5.5 & Opus โ†’ Lobster ยท DeepSeek โ†’ Claw
Clinical Translatability
๐Ÿฆž Lobster
3/3 judges ยท Claw 7 vs Lobster 8
Production Recommendation
๐Ÿฆž Lobster
3/3 judges ยท "Merge Claw's GrimAge depth"

Overall Verdict: Lobster wins on drug discovery relevance (3/3), clinical translatability (3/3), and accuracy (2/3 โ€” DeepSeek awarded accuracy to Claw for fewer per-target errors and no directional contradictions). All three judges recommend Lobster for production use, but unanimously note the ideal system would merge Lobster's systematic framework with Claw's GrimAge-specific depth (CD38, PAI-1, B2M, TIMP-1).

๐Ÿ”ฌ Production Recommendation: Hybrid System

Lobster's 5-layer methodology + Claw's GrimAge depth. Remove contradictory targets (CPT1A, SOCS3, LEP). Add Claw's unique validated targets.

MTOR IGF1R/GHR SRC GDF15 PAI-1/SERPINE1 EDA2R CD38 GATA4 HDAC1 JAK1/2 EGLN1 TNIK TNF PTGS2 B2M

System Overview

Claw Longevity Claw

10 Targets GrimAge v2 Focus L-Qwen3.5-9B 6ร— Aggregation

Built as part of the LongevityLLM paper (Cell submission). Wraps L-Qwen3.5-9B in an agentic loop. Queries across 14 hallmarks of aging, generates 30 targets per category, repeats 6ร—, aggregates. Presents 10 curated targets focused on GrimAge v2 components and clock-validated inhibition targets. Conservative, clinically-oriented selection emphasizing established biology with specific CpG coefficients.

Lobster Longevity Lobster

20 Targets 5 Evidence Layers 239 Aging Clocks 1,031 HOA Targets

Systematic multi-database pipeline integrating 5 evidence layers: HOA database (1,031 targets), 239 aging clocks (377,710 features), druggable genome (13,504 genes), ClinicalTrials.gov, and Geroprotector database. Uses coefficient directionality analysis (positive = increases with age โ†’ amenable to inhibition). 20 targets with tiered prioritization. Focus: intersection of clock prominence, hallmark breadth, and druggability.

Target Overlap

Only 2 of 28 unique targets appear in both outputs โ€” remarkably low overlap despite the same goal.

๐Ÿฆ€ Claw Only (8)

  • PAI-1/SERPINE1
  • CD38
  • p38 MAPK
  • B2M (ฮฒ2-Microglobulin)
  • p16INK4a/CDKN2A
  • NF-ฮบB pathway
  • TIMP-1
  • BCL-2 family

๐Ÿค Shared (2)

  • MTOR
  • GDF15

๐Ÿฆž Lobster Only (18)

  • EDA2R
  • GFAP
  • IGF1R
  • ESR1
  • HDAC1
  • EGLN1
  • GATA4
  • SRC
  • CPT1A โš ๏ธ
  • GH1/GHR
  • SOCS3 โš ๏ธ
  • JAK1/JAK2
  • PTGS2
  • TNF
  • TNIK
  • EP300
  • LEP โš ๏ธ
  • PARP1

โš ๏ธ = Flagged by Lobster as contradictory inhibition targets (negative coefficients or paradoxical biology)

Evidence Quality โ€” Shared Targets

Target Claw Score Lobster Score Assessment
MTOR 9/10 9/10 Equivalent. Both cite rapamycin 9-14% lifespan extension, Harrison et al. 2009, ongoing trials. Lobster adds NCT numbers; Claw cites PEARL trial. Both accurate.
GDF15 6/10 7.5/10 Lobster more rigorous. Provides 8 top-10 clock appearances, specific coefficient data, cites specific papers. Correctly frames as biomarker/proxy. Claw overstates inhibition evidence with vague animal model claims. Neither fully addresses causality question (Goodhart's Law risk).

Jury Scores

Dimension GPT-5.5 Opus 4.7 DeepSeek V4 Consensus
Drug Discovery Relevance Claw 7 / Lobster 8 Claw 7 / Lobster 8 Claw 7 / Lobster 8 ๐Ÿฆž Lobster (3/3)
Accuracy Claw 7 / Lobster 8 Claw 6 / Lobster 7 Claw 7 / Lobster 6 ๐Ÿฆž Lobster (2/3)
Clinical Translatability Claw 7 / Lobster 8 Claw 7 / Lobster 8 Claw 7 / Lobster 8 ๐Ÿฆž Lobster (3/3)
Production Recommendation All 3 models recommend Lobster with Claw merger ๐Ÿฆž Lobster (3/3)

DeepSeek's dissent on accuracy: "Claw makes fewer outright errors. Its 10 targets are all genuinely appropriate for inhibition. Lobster includes 3 targets (CPT1A, LEP, SOCS3) that contradict its own inhibition premise โ€” even though flagged, this is a significant accuracy issue."

Identified Mistakes

Claw Mistakes

High Missing GH/IGF-1R axis entirely

The strongest mammalian longevity pathway (40-50% lifespan extension in GHR-KO mice). Its absence from a top-10 longevity list is a critical omission.

High GrimAge coefficients potentially hallucinated

Values like "DNAmPAI1: 60,578" and "DNAmB2M: 2,041,586" are suspiciously large for elastic net regression. May be fabricated or misattributed from surrogate prediction models.

Medium SERPINE1 "null mutation" is heterozygous LoF

Khan et al. 2017 found heterozygous loss-of-function, not true null. Homozygous nulls would cause bleeding disorders. "+10 years" is an overstatement from a small cohort.

Medium GDF15 inhibition evidence overstated

Claims animal model evidence for inflammation reduction. GDF15 is primarily a stress-response biomarker; inhibiting a potentially protective mitokine could be counterproductive.

Medium NF-ฮบB is not a single druggable target

Multi-component signaling cascade. Lists weak supplements (curcumin, resveratrol) alongside pharmacological agents. Which subunit? Which upstream kinase?

Lobster Mistakes

High CPT1A: activation target on inhibition list

12/13 negative coefficients = decreases with age. Self-flagged as "activation target" yet still included. Internally contradictory. (Self-identified)

High LEP: all negative coefficients

Contradicts inhibition hypothesis. Acknowledged as "more of an activation target" but still included at rank #20. (Self-identified)

High SOCS3 inhibition increases inflammation

SOCS3 is a negative regulator of JAK-STAT. Inhibition would amplify inflammatory signaling โ€” opposite of longevity goal. (Self-identified)

Medium EDA2R ranked #4 on purely correlative evidence

Zero lifespan extension experiments, zero clinical trials, zero curated database entries. Strong clock signal โ‰  validated intervention target.

Medium Some PMIDs may be fabricated

PMID 41980208 for a "2026" paper is suspicious. Very high sequential IDs suggesting recent or hallucinated publications.

Medium GFAP is a structural marker, not druggable

Intracellular intermediate filament in astrocytes. Reflects neuroinflammation but is not a causal drug target. Would need upstream targeting.

Strengths Analysis

Claw Strengths
Deep GrimAge v2 integration โ€” specific CpG coefficients, sub-model architecture details
Human genetic evidence (PAI-1 Amish cohort) โ€” rare for longevity targets
CD38/NAD+ axis โ€” validated lifespan extension (+10% with 78c) that Lobster missed entirely
Concise, focused, actionable (10 targets) โ€” no dilution with weak candidates
Evidence quality stratification table with clinical staging
Combination therapy concepts (NAD+ boosting + CD38 inhibition)
No internally contradictory targets โ€” all 10 are genuine inhibition candidates
B2M neutralization for cognitive aging โ€” specific organ-system application
Lobster Strengths
Systematic reproducible methodology โ€” 5-layer evidence integration, auditable
Self-critical transparency โ€” explicitly flags own limitations and contradictions
Novel target discovery (TNIK senomorphics, GATA4 mitotic clocks, EDA2R)
Broader validated pathway coverage โ€” GH/IGF1R, SRC/dasatinib, JAK inhibitors
Clinical trial NCT numbers provided for verification
Tiered prioritization (Tier 1/2/3) with explicit rationale scores
Clock coefficient directionality as systematic target selection filter
Includes the most validated mammalian longevity pathway (GH/IGF-1R: 40-50% extension)
More clinically translatable targets with approved drugs for repurposing

Jury Verdict

๐Ÿ›๏ธ Three-Model Jury Decision

GPT-5.5 (OpenAI)
Lobster wins all 4 dimensions.
"Lobster provides a more comprehensive target landscape with systematic druggability assessment. The IDEAL system combines both."
Claude Opus 4.7 (Anthropic)
Lobster wins all 4 dimensions.
"Lobster is superior for production due to systematic multi-layer evidence integration. Claw's specificity-without-verification is a hallucination red flag."
DeepSeek V4 Flash
Lobster wins 3/4. Claw wins accuracy.
"Claw makes fewer outright errors per target. Lobster's directional contradictions (CPT1A, LEP, SOCS3) are more fundamental than Claw's overstatements."

โœ… Unanimous Consensus

"Lobster for production, but merge Claw's GrimAge depth."

Take Lobster's systematic framework โ†’ Add CD38, PAI-1, B2M, TIMP-1 from Claw โ†’ Remove CPT1A, SOCS3, LEP contradictions โ†’ Enforce directional consistency validation layer.