H-optimus-1, development of a foundation model in histology

This project aims to develop a foundation model in histology. This artificial intelligence (AI) tool, trained on millions of images of human tissue, could revolutionize diagnosis and biomedical research. By analyzing complex cellular patterns, it can improve the characterization of diseases such as cancer and help predict treatment response, paving the way for more effective precision medicine.

Histology and Computational Pathology

Histology is a branch of biology that studies the organization and microscopic structure of biological tissues. By observing the characteristics of these tissues, such as the shape of cells or their arrangement, doctors can, for example, diagnose cancer, determine its type or aggressiveness. This analysis is therefore crucial in oncology but also for other pathologies, in order to predict the progression of the disease and guide therapeutic decisions.

Computational pathology is an emerging field that uses AI to analyze digital images of tissue slides. This algorithmic approach can extract complex information, such as tumor cell density, immune system response, or gene mutation, with speed and accuracy sometimes exceeding the human eye. Developing AI models in this field is challenging, primarily because digitized images of histological samples are very large, and we have limited annotated data to train and robustly evaluate these models.

Advent of the first specialized characteristic extractors for histology

In recent years, the scientific community has made considerable progress in the development of computational pathology models, moving from simple cell detection tools to AI systems capable of predicting the status of complex molecular biomarkers, therapeutic response, and prognosis. The performance of these models in performing these different tasks has also greatly increased, particularly thanks to the use of increasingly complex and powerful feature extraction models. Specifically, thanks to self-supervised learning (SSL), the use of feature extractors pre-trained directly on histological datasets has enabled a breakthrough in the performance of AI systems for computational pathology. In 2019, the study by Kather et al. demonstrated that their model could predict microsatellite instability in colorectal cancer with an AUC of 0.77 [1]. Current models based on self-supervised learning achieve an AUC of approximately 0.94 on the same task [2]. This drastic increase in performance to identify this predictive phenotype of response to immunotherapy is key to integrating this type of tool into clinical practice.

Following recent advances in SSL and the emergence of large language models, the scientific community has started developing feature extractors for computational pathology at an unprecedented scale. The idea is to train massive feature extractors on huge datasets of unannotated histological images. By learning generic and rich visual representations from these vast data collections, these models, often called “foundation models”, can then be specialized with a minimal amount of labeled data for specific tasks. This approach not only significantly improves the performance and generalization of AI systems for histological image analysis, but also overcomes the challenge of the scarcity of annotated data in pathology.

H-optimus-0 and the development of H-optimus-1

In July 2024, we launched H-optimus-0 [3], the largest open-source foundation model for histology. This model has been trained on hundreds of millions of images with SSL and has achieved state-of-the-art performance on several tasks. In this Grand Challenge, we aim to go further: we aim to train a new version of the model on a significantly larger and more heterogeneous dataset, while exploring even more complex model architectures. The Hoptimus-1 training set contains more than 2 billion histology images (patch dimensions 224 x 224 pixels), extracted from more than 1 million histology slides (images with dimensions of the order of 100,000 x 100,000 pixels) and represents 240 terabytes of data. These images cover more than 50 different organs of the human body and come from more than 800,000 patients. Numerous experimental training runs were carried out before training the final model, a “vision transformer” [4] containing 1.1 billion parameters, for several days on around a hundred H100 GPUs.

H-optimus-1: a reference model for computational pathology

We evaluated the performance of H-optimus-1 on a wide range of 23 tasks, ranging from metastasis identification to mutation or gene expression prediction from histological tissue images. H-optimus-1 was found to be the best performing model on average, outperforming 8 other foundation benchmark models, including H-optimus-0.

The model was made available to the academic community in February 2025 (https://huggingface.co/bioptimus/H-optimus-1) to help develop more efficient tools and accelerate research in the field of computational pathology.

Bibliography

[1] Kather, Jakob Nikolas, et al. "Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer." Nature medicine 25.7 (2019): 1054-1056.

[2] Schirris, Yoni, et al. "DeepSMILE: Contrastive self-supervised pre-training benefits MSI and HRD classification directly from H&E whole-slide images in colorectal and breast cancer." Medical image analysis 79 (2022): 102464.

[3] Saillard, Charlie, et al. "H-optimus-0, 2024." *URL https://github.com/bioptimus/releases/tree/main/models/h-optimus/v0*.&n…;

[4] Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).

A key figure :

2 billion: this is the number of histology images used to train H-optimus-1.

Definitions :

Histology: Histology is a branch of medicine that studies biological tissues. It helps diagnose diseases and understand how cells are organized.

Self-supervised learning (SSL): This method allows a model to learn without manual labeling, by creating its own tasks. Foundation models use it to pre-train on very large amounts of data.