Since their advent, Transformer models have been applied across a wide range of fields, including cheminformatics. In this context, drug discovery has benefited from using Molecular Transformers by leveraging diverse string representations of molecules, such as the Simplified Molecular Input Line Entry Systems (SMILES), for a variety of tasks. In this study, we present a model focused on the optimization of a formerly developed Molecular Transformer specifically dedicated to metabolism prediction. Metabolism refers to all the biotransformations a drug undergoes once inside the human body, directly influencing its therapeutic effect and potential toxicity, and therefore represents a key topic in medicinal chemistry. Framing molecular transformation prediction as a sequence-to-sequence translation task has shown promise, but suffers from limitations such as low validity of generated molecules and high computational cost. To address this limitation, we here propose an optimized model that integrates pre-training, transfer learning, and fine-tuning techniques, already improving validity and reducing computation time. Finally, by separating the metabolism prediction task from the SMILES syntax learning, we ensure broader applicability of the proposed model across diverse datasets and a variety of SMILES-based tasks beyond metabolic transformations, expanding its potential utility.
Improving the Efficiency and the Validity of Molecular Transformers
Bacciu, Leone;Grazioso, Matteo;Multari, Silvia;Nobile, Marco S.
2025-01-01
Abstract
Since their advent, Transformer models have been applied across a wide range of fields, including cheminformatics. In this context, drug discovery has benefited from using Molecular Transformers by leveraging diverse string representations of molecules, such as the Simplified Molecular Input Line Entry Systems (SMILES), for a variety of tasks. In this study, we present a model focused on the optimization of a formerly developed Molecular Transformer specifically dedicated to metabolism prediction. Metabolism refers to all the biotransformations a drug undergoes once inside the human body, directly influencing its therapeutic effect and potential toxicity, and therefore represents a key topic in medicinal chemistry. Framing molecular transformation prediction as a sequence-to-sequence translation task has shown promise, but suffers from limitations such as low validity of generated molecules and high computational cost. To address this limitation, we here propose an optimized model that integrates pre-training, transfer learning, and fine-tuning techniques, already improving validity and reducing computation time. Finally, by separating the metabolism prediction task from the SMILES syntax learning, we ensure broader applicability of the proposed model across diverse datasets and a variety of SMILES-based tasks beyond metabolic transformations, expanding its potential utility.I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



