Fabian Jirasek uses machine learning to understand the behavior of substances and mixtures. This helps the industry develop new production processes.

In search of new materials: AI drives industry forward

These are turbulent times: climate change, rising energy prices, and dependence on raw material imports. Our industries must respond to these challenges and change the way they produce. Fabian Jirasek wants to help them do that — using machine learning.

Thermodynamics deals with the properties of substances and mixtures of substances. Their “behavior” is determined in particular by the interactions between the molecules. This is also the case with solubility, a property that indicates how well substances can be dissolved in a particular solvent. In less scientific terms: how much the substance and the solvent like each other. Molecules have something like “sympathy” and “antipathy.” Water and formic acid, for example, like each other – they are infinitely soluble in each other and mix well. In oil, on the other hand, water dissolves only in very low concentrations – the two prefer to stay separate.

Thermodynamic properties such as solubility play a major role in many processes in nature and technology. They determine, for example, how chemicals are distributed in the environment and accumulate in water, sediments, or organisms. “The properties of substances are also important in industrial processes such as the production of active ingredients, dyes, or fine chemicals,” says Fabian Jirasek, head of the Laboratory of Engineering Thermodynamicsin the Department of Mechanical and Process Engineering. This is because the produced products initially arise in mixtures that still contain many undesirable components: unused raw materials, impurities, additives, or catalysts. The purification process is therefore necessary to separate the actual end product from everything else.

Millions and millions of possibilities

Almost all industrial separation processes are based on the interactions between molecules. “In extraction, for example, an extractant is used to isolate the actual product from a mixture, while the unwanted components remain behind. This only works if the solubility of the product in the extractant is significantly greater than that of the other substances.” In another separation process, distillation, a lot depends on how the substances interact, whether they like each other or not, explains the researcher. In this process, a mixture is brought to its boiling point, and the individual substances that make up the mixture are distributed between the vapor and liquid phase. A well-known example is distilling schnapps. Here, the more volatile ethanol accumulates in the vapor phase, while the water tends to remain in the liquid phase. Thermodynamics describes the temperature at which the mixture boils and the extent to which the substances accumulate in the individual phases.

Therefore, knowledge of the properties of substances is a fundamental prerequisite for efficient production processes. “However, there are currently over 100 million known substances and many more possible mixtures,” Jirasek points out. On top of that, properties depend on additional factors such as temperature, pressure, or the concentrations within a mixture. ”Ideally, we would like to investigate all substances, mixtures, and properties in laboratory experiments, but that would take far too long.” Determining a single property of a single mixture of substances can take an entire day. And it's expensive, too: such measurements can easily cost several thousand euros. “So our options are limited. We can only ever investigate a fraction of the possible substances and mixtures in experiments.”

That's why Jirasek works with artificial intelligence. He uses it to predict the properties of pure substances and mixtures. At what temperatures they boil, how soluble they are in each other, and much more. “If we have reliable prediction data, we no longer need to test all possible substances and mixtures in experiments. We can make a selection and focus on those that are most promising according to the AI. This way, we save time and money.”

Hybrid is better than classic or AI alone

This is not entirely new. Thermodynamic prediction methods based on physical theories have been around for a long time. And they work well in simpler cases. Example: The boiling point of a substance at a certain pressure is known and now needs to be predicted at a different pressure. It becomes more difficult when no measurement data is available for a substance or mixture of substances. There are so-called group contribution methods that simplify molecules by breaking them down into molecular building blocks, thereby enabling predictions to be made for new molecules. However, their accuracy often leaves much to be desired. In many cases, the right building blocks to describe the substance or mixture are also missing.

That is why Jirasek is developing novel hybrid models by combining methods from machine learning (ML), a subfield of AI, with physical knowledge. Jirasek pursues different strategies in doing so. In one, ML models are “implanted” into a physical model and trained to predict the parameters of the physical model. “This strategy has the advantage that the industry is already familiar with the physical models. At the same time, AI allows us to expand their scope, for example, by enabling predictions for completely new classes of substances. And we can even significantly improve the quality of the results.”

Jirasek finds his second “hybridization strategy” even more exciting, where the opposite is true: Here, physical knowledge is implanted into the architecture of artificial neural networks. Neural networks solve problems in a similar way to our brains. They are trained with data, learn from it, and become increasingly better at solving specific tasks over time. The structure of such a network consists of an input layer that receives the data and an output layer that outputs the result. In between are several interconnected layers that process the input data.

AI has to play by the rules

“The networks are so flexible that they can basically describe the behavior of any substance or mixture,” says Jirasek. ”But this advantage is also their biggest disadvantage. Because of their flexibility, the networks often produce nonsensical or physically impossible predictions.” For example, that the boiling point of a pure substance decreases with increasing pressure – when in fact the opposite is true. Neural networks also have problems predicting the behavior of mixtures, for example, with what is known as permutation invariance. For us humans, it is clear that a mixture of “ethanol + water” is the same as a mixture of “water + ethanol” as long as we do not change the mixing ratio. However, neural networks predict completely different properties for the two cases, depending on the order in which they receive the input, i.e., whether they receive the input “water” first or the input “ethanol” first.

Jirasek, therefore, forces the neural networks to only make physically meaningful predictions about the behavior and properties of substances and mixtures. To do this, he incorporates detailed knowledge of physical laws and boundary conditions into the network architecture. For example, the order of the substances in a mixture must not influence the result. This disciplines the neural networks, so to speak.

Jirasek's networks receive the molecular structure of the substances or mixtures of substances as input, as well as state variables such as temperature, pressure, and concentration. The molecular structure can be represented in different ways, two of which are particularly interesting. The first is molecular graphs, which contain all the information about the molecular structure. They consist of a series of nodes and edges that connect the nodes to each other. In molecules, the nodes are the atoms, and the edges are the bonds between the atoms. Nodes and edges each have specific properties; for example, each node represents a specific type of atom, and an edge can be a single, double, or triple bond.

Atomic neighborhood relationships

The graphs serve as input for special neural networks called graph neural networks (GNNs). In GNNs, processing takes place across several layers, known as graph convolutional layers. The information is sent from one node, i.e., one atom, to the next in order to inform all nodes about their “neighborhoods.” “A molecule is more than the sum of its atoms,” says Jirasek. “It is important which atom is next to which and how they are connected. During training, the model learns how these neighborhoods affect different thermodynamic properties. For example, solubility.”

The molecular structure can also be fed into the neural network as a text string. Jirasek uses SMILES (Simplified Molecular Input Line Entry System) for this purpose. Here, each molecule is represented in the form of a text string, which can be short or long depending on the complexity of the molecule. A language model is used that works similarly to ChatGPT, but instead of learning the semantics of words, it learns what certain structures in the molecule mean for the thermodynamic properties. Both network variants work very well. They learn to recognize the relationship between the structure of the molecule and its properties. After training, they can make predictions for any molecule that has not yet been measured.

Jirasek and his team achieved a breakthrough with the hybrid model HANNA (Hard-Constraint Neural Network for Consistent Activity Coefficient Prediction). It is a world first: the first neural network that can predict activity coefficients while guaranteeing compliance with all the rules of physics. The technical term “activity coefficient” simply refers to a measure of how well a substance feels in a mixture. It is a key parameter in the thermodynamics of mixtures and is directly related to measurable material properties such as solubility. Jirasek notes that HANNA's predictions are more accurate than those of all previous models.

New materials for difficult times

Hybrid models therefore provide more reliable and better quality predictions than pure ML methods. However, this does not mean that AI will completely replace laboratory tests at some point. “Experiments remain absolutely essential because we need reliable data to train good models. AI does not change that. But it helps us make the most of the data.” Jirasek explains the significance of his research as follows: There is global warming and rising energy prices. There are dependencies on raw material imports from autocratic and politically unstable countries, accompanied by increasing global political tensions. “It is therefore high time to transform our key industries, such as the chemical and pharmaceutical industry. We need better processes that are more energy-efficient and based on renewable raw materials. To achieve this, we also need to consider new substances and mixtures – and that is only possible if we know their properties.” Since these cannot all be measured individually, reliable prediction methods are more important than ever. They help industries to overcome the major challenges ahead.

Prof. Dr.

Fabian

Jirasek

Professor of Thermodynamics

"I want to help make our industry more independent, sustainable, and future-proof - with the help of thermodynamics and artificial intelligence."

Prof. Dr. Fabian Jirasek heads the Laboratory of Engineering Thermodynamics and the Emmy Noether Junior Research Group “Hybrid Thermodynamic Models” of the German Research Foundation. His research topics range from the prediction of fluid properties to the design and optimization of production processes and fault detection in chemical plants. He focuses on developing hybrid models by integrating machine learning methods with physical knowledge.

RESEARCHER PROFILE ON RPTU.DE

For a deeper dive into the topic

Browse through the selection of media reports and scientific publications:

F. Jirasek, H. Hasse: Combining Machine Learning with Physical Knowledge in Thermodynamic Modeling of Fluid Mixtures. Annual Review of Chemical and Biomolecular Engineering 14 (2023) 31-51.
>> GO TO PAPER

T. Specht, M. Nagda, S. Fellenz, S. Mandt, H. Hasse, F. Jirasek: HANNA: Hard-constraint Neural Network for Consistent Activity Coefficient Prediction. Chemical Science 15 (2024) 19777-19786.
>> READ THE PUBLICATION

N. Hayer, T. Wendel, S. Mandt, H. Hasse, F. Jirasek: Advancing Thermodynamic Group-contribution Methods by Machine Learning: UNIFAC 2.0. Chemical Engineering Journal 504 (2025) 158667.
>> VIEW ARTICLE

by Andreas Lorenz-Meyer

Andreas Lorenz-Meyer is a freelance journalist und lives in the Palatinate. He writes for specialised newspapers, the magazines of universities and research institutions as well as daily newspapers in Germany and Switzerland. His main topics in the field of science include artificial intelligence, biology and renewable energies. Further subject areas: Energy industry and the hotel and tourism industry.

These topics might also interest you:

An ambulance on its way to an emergency call

Mathematics save lives

Sven Oliver Krumke's goal is to use mathematical methods to optimize the organization of emergency medical services. He also employs artificial...

Artificial Intelligence

Health

Life