MedGemma and the open source medical AI movement

In May 2025, Google released MedGemma through the Health AI Developer Foundations (HAI-DEF) program — making available an open weight medical language model that healthcare teams can run on their own infrastructure, fine-tune on their own data, and audit completely. The launch signals a strategic shift: medical AI no longer has to be the exclusive property of large platforms. Hospitals, universities and health startups can build on a solid base without depending on external APIs.

MedGemma: architecture and capabilities

MedGemma is built on the Gemma 3 architecture and comes in three variants:

4B multimodal: Processes images and text simultaneously. Suited for medical image classification (X-ray, dermatology, ophthalmology), answering questions about clinical images, and initial triage.

27B text-only: A pure language model focused on clinical reasoning, medical literature review, answering structured clinical questions, and text-based decision support. It scored 87.7% on MedQA — within the range of much larger models.

27B multimodal: Combines clinical reasoning in text with image processing and longitudinal reasoning over electronic health records (EHR). The most complete variant for hospital systems.

The model can be run on-premise — on Google Cloud Platform, on the institution's own servers, or on high-performance local hardware. This resolves the main adoption obstacle for hospitals with data regulated by LGPD, HIPAA or GDPR: patient data does not need to leave the institution's infrastructure.

MedSigLIP: the vision component

Alongside MedGemma, Google released MedSigLIP — a 400-million-parameter vision-language model specialized in medical images. MedSigLIP can be used independently for medical image classification without the overhead of the full MedGemma. For hospitals that only need X-ray or dermatoscopy analysis, without full text generation, it is the lowest-compute-cost option.

OpenBioLLM: the open source competitor

OpenBioLLM-70B from Saama AI Research is the most robust alternative to MedGemma in the open source space. Based on the Llama 3 architecture, it scores 74% on MedQA USMLE, 75% on PubMedQA and 80.85% on complex clinical cases. It runs on a single A100 GPU. The 8B-parameter model is an option for environments with more limited hardware.

OpenBioLLM's differentiator is its specialization in biomedical literature: it was extensively trained on PubMed, PMC and international medical guidelines. For scientific literature retrieval and article summarization, it surpasses general models of similar capacity.

Meditron3: for low-resource contexts

Meditron, developed by EPFL in partnership with Yale Medicine and the International Committee of the Red Cross (ICRC), was designed for a specific use case: healthcare in resource-limited contexts. The third version (Meditron3), based on Llama 3, was released in 2025 and outperforms all open source models of equivalent size on MedQA and MedMCQA.

Its differentiator is not the top benchmark — MedGemma 27B is clearly superior in absolute capability. The differentiator is size: Meditron3 runs on hardware that exists in hospitals in low- and middle-income countries, and it was trained to include WHO guidelines and international protocols relevant to contexts where access to specialists is limited.

The counterintuitive finding: fine-tuning doesn't always win

An important research finding from 2025 (arXiv:2408.13833) questions the basic premise of specialized medical models: fine-tuned biomedical models do not consistently outperform general frontier models on medical data not seen during training.

The reason is that models like GPT-5 and Gemini 3.1 Pro, trained on massive volumes of text, processed enormous amounts of medical literature — potentially more than any specialized medical fine-tune. On known benchmarks, the fine-tune wins because the benchmark data leaked into training. On genuinely new data, the gap narrows.

This does not invalidate MedGemma or OpenBioLLM. Open source models have advantages that go beyond the benchmark score: they run on-premise (privacy), are fine-tunable on proprietary data (vertical specialization), cost far less at scale (own infrastructure), and can be fully audited (regulatory compliance).

For a hospital institution that needs to process 10 million records per year on HIPAA-compliant infrastructure, MedGemma 27B at zero API cost is a radically different proposition from GPT-5 at US$ 5.00/M tokens. Even if GPT-5 is slightly superior in quality, the economic and regulatory argument may be decisive.

MedGemma: architecture and capabilities

MedSigLIP: the vision component

OpenBioLLM: the open source competitor

Meditron3: for low-resource contexts

The counterintuitive finding: fine-tuning doesn't always win

Get the latest posts