← Blog

ESM3, Evo 2 and the models that learn the language of life

11 jun 2026

Proteins are instructions. DNA is the source code. The hypothesis that has guided computational biology for decades is that, if language models can learn the grammar of human text, perhaps they can also learn the grammar of the molecules of life. In 2025, two releases made this hypothesis less theoretical and more experimental: ESM3 from EvolutionaryScale and Evo 2 from the Arc Institute. Both published results that, five years ago, would have been considered science fiction.

ESM3: the model that designed a protein 500 million years ahead

ESM3 is a 98-billion-parameter model trained to understand proteins in three dimensions simultaneously: amino acid sequence (the text), three-dimensional structure (the shape) and biological function (the meaning). Published in Science in January 2025, the model represents a qualitative leap over its predecessor — which only processed sequences.

The most impressive result of the paper: ESM3 designed a completely new green fluorescent protein (GFP) — without ever having "seen" that protein in training — that works as expected, but is so different from known natural GFPs that the researchers estimated it would be equivalent to 500 million years of natural evolution to arise spontaneously.

This is not "generating a protein similar to existing ones." It is generating a protein that does not exist in nature, that belongs to a known class, and that works correctly. The difference is the same as that between a language model that generates text similar to existing texts and one that generates a new literary genre that did not yet exist.

The ESM3 API was opened in public beta in January 2025, also available on Amazon Bedrock, SageMaker, AWS HealthOmics and NVIDIA BioNeMo. Researchers from any lab with internet access can now query the model for protein design.

Evo 2: 40 billion parameters to read the complete genome

Evo 2, published in Nature in 2025 by the Arc Institute in collaboration with NVIDIA, Stanford, UCSF, UC Berkeley and the University of Washington, is today the largest publicly available biological AI model with open weights. It has 40 billion parameters and was trained on 9 trillion nucleotides — the complete DNA sequence of hundreds of thousands of organisms from all domains of life.

The most unprecedented capability is the context of 1 megabase — 1 million nucleotides at once. For comparison, the average human gene has about 27 thousand base pairs. With 1 megabase of context, Evo 2 can analyze a complete gene with all its regulatory regions, neighboring non-coding regions and distal control elements in a single inference.

Capabilities verified in zero-shot (without specific fine-tuning):

  • Prediction of gene essentiality — determines which genes are critical for a cell's survival
  • Prediction of pathogenic mutations at single-nucleotide resolution
  • Generation of functional eukaryotic mitochondrial genomes, verified with AlphaFold 3

This last point deserves attention: the model did not just predict structures — it generated DNA sequences of complex genomes that, when "translated" into proteins by AlphaFold tools, produce plausible and functional 3D structures. It is biological generation, not just classification.

ProGen3: antibody design in one step

ProGen3 from Profluent Bio, presented as a Spotlight at NeurIPS 2025, is a generative protein model trained on 3.4 billion full-length sequences, scaling from 339 million to 46 billion parameters. The model uses a generalized Masked Language Model (GLM) architecture that fills in any portion of a protein sequence conditioned on the surrounding context.

The most direct application is OpenAntibodies — a platform for single-shot antibody design for specific molecular targets. The portfolio covers 20 relevant drug targets that correspond to 7 million patients and US$ 660 billion in historical drug sales. ProGen3 also designed an ultra-compact gene editor, significantly smaller than the standard CRISPR-Cas9 — with implications for gene therapies that need to be delivered in vivo in size-limited vehicles.

What these models change in practice

For pharmacology researchers, the most immediate change is the cost of hypotheses. Before, testing whether a given amino acid sequence produced a functional protein with a given structure required weeks of experimental work. With ESM3 or Evo 2, this screening can be done in hours computationally — filtering from millions of candidates to dozens that merit synthesis and physical testing.

For antibiotic development — a critical area given the increase in antimicrobial resistance — the ability to rapidly generate and evaluate new protein structures of pathogens as targets is potentially transformative. Evo 2 can analyze complete genomic sequences of resistant bacteria and identify essential proteins that have no homology with human proteins — ideal candidates for antibiotics with fewer side effects.

What these models do not replace: experimental validation. A model can predict that a protein is functional — only synthesis and lab testing confirm it. The value lies in drastically compressing the list of candidates for physical testing, not in eliminating the test.

Get the latest posts

New articles on AI, Vibe Code and Builder Code — by email or Telegram.

or
Get it on Telegram

By subscribing, you agree to receive emails/messages and to the Privacy Policy. You can unsubscribe anytime. No spam.