A unified foundational model for quantum chemistry

Understanding matter at the quantum scale of atoms and electrons is necessary for discovering promising new materials for energy transition or for the discovery of new medicines. We present a unique foundational model for quantum chemistry, designed to simulate matter at the atomic level, at the interface between surface chemistry, materials science and molecular biology. The model is trained on hundreds of millions of examples of solutions to the Schrödinger equation calculated by density functional theory (DFT).

The precise simulation of matter at the atomic scale represents one of the most significant computational challenges in current science. Traditional methods of quantum chemistry, although offering remarkable precision, are limited by their algorithmic complexity, which grows cubically with the number of atoms. This limitation has historically restricted DFT simulations to a few hundred atoms on picosecond time scales, insufficient to capture many phenomena of scientific and industrial interest, in biology or materials science.

SIMULATE ATOMS THRU DEEP LEARNING

Our project develops a foundation model for quantum chemistry based on the MACE architecture. This equivariant architecture uses deep graph neural networks to directly learn the potential energy function from reference DFT calculations. The key innovation lies in the use of many-body atomic representations between neighboring atoms, allowing for precise learning of interatomic interactions. Our first foundation model, MACE-MP-0, was trained on the Materials Project on Jean Zay, a massive database containing over 150,000 crystalline materials calculated with uniform DFT precision. This unprecedented chemical diversity - covering 89 elements of the periodic table - allows the model to generalize to chemical systems never seen during training, an essential property for the discovery of new materials. This model has been highly successful, being used by thousands of researchers around the world in fields as diverse as heterogeneous catalysis, battery science, pharmacology, and organic chemistry.

SCALING UP WITH A DATABASE OF BILLIONS OF ATOMS

After the community's validation of our first model, MACE-MP-0, we worked on Jean Zay to make a series of improvements to the model. The first area of progress was to significantly increase the number of training data. One of the challenges for this was to develop a training method that could combine a large number of incompatible data because they were calculated with different levels of approximations. For this, we developed a two-level approach: first, we perform pre-training on a massive database (OMAT) of 100 million configurations covering 89 elements, mostly containing inorganic systems. The great chemical diversity allows the model to develop a good understanding of basic interatomic interactions across the periodic table. Secondly, we perform a multi-head refinement step on a large collection of freely accessible data covering very diverse fields such as surface chemistry, organic chemistry reactions, or enzymatic chemistry. This strategy allows the model to learn shared chemical representations while training on incompatible data. The second axis of progress was architectural. Access to large quantities of GPUs allowed us to conduct a comprehensive scan of MACE's hyperparameters and develop key architectural innovations that enable models to learn effectively at very large data scales. These modifications significantly improved the model's performance on molecular systems and surfaces. These avenues of progress culminated in a new model, MACE-MH-1, achieving an unprecedented level of precision for applications at the interface between materials chemistry and molecular chemistry, such as the study of batteries or heterogeneous catalysis. VALIDATION AND SCIENTIFIC IMPACT The model's performance was validated on benchmarks covering various fields: heterogeneous catalysis, molecular crystals, chemical reactions, acoustic properties of inorganic crystals, 2D materials, proteins. Thru our benchmarks, we were able to demonstrate that the model represents the state of the art in terms of a foundation model for quantum chemistry. An emblematic application case concerns the simulation of the interaction between water molecules at very low temperatures forming different ice structures. Where a traditional DFT simulation would require thousands of hours, our model reproduces the stability of known ice phases in a few seconds, with 5 times greater accuracy than MACE-MP-0, the previous generation of models.

DEMOCRATIZATION AND PERSPECTIVES

The code to replicate our results and use the models is deployed in open source and integrated into the main atomistic simulation software (ASE, LAMMPS). This massive adoption is transforming the practice of computational chemistry: researchers can now explore vast chemical spaces, test complex hypotheses, and optimize materials in silico before any experimental synthesis. Future developments aim to extend to excited states and magnetic properties, paving the way for the complete simulation of photovoltaic and spintronic devices. The modular architecture of the model also allows for its fine adaptation to specific fields with just a few dozen additional examples, democratizing access to high-fidelity quantum simulations for the entire scientific community.

Definition:

DENSITY FUNCTIONAL THEORY (DFT):
Method for solving the Schrödinger equation to understand matter at the scale of atoms and electrons.

EQUIVARIANT NEURAL NETWORK:
Neural network respecting physical symmetries to ensure physical coherence