
In this guest blog, we’re pleased to welcome insights from Dr. Angelo Pugliese, Associate Director of In Silico Discovery at BioAscent. As part of Life Sciences Month at The Data Lab, Dr. Pugliese explores the critical role Artificial Intelligence (AI) is playing in transforming drug discovery.
The Challenge of Developing New Medicines
Bringing a new medicine to patients is a long, complex, and expensive journey. A crucial step is figuring out if a potential drug will work in the body and, most importantly, if it will be safe. This involves understanding its ADMETox properties, essentially how the drug is Absorbed into the system, Distributed to different parts of the body, Metabolised (broken down), Excreted (removed), and whether it causes any Toxicity (harmful effects). Traditionally, finding these answers requires many lab experiments, often involving cells or in vivo testing. These methods are vital but take significant time and resources. Companies like pharmaceutical firms, biotechs, and specialised research partners (Contract Research Organisations or CROs, such as BioAscent Discovery) need faster, more accurate ways to predict these factors early on. Making the right predictions early saves time and money, increases the chances of success, and helps ensure new drugs meet safety standards.
The Rise of AI in Predicting Drug Success
This is where Artificial Intelligence (AI) comes in. Using advanced computer techniques, particularly Machine Learning (ML) – where computers learn patterns from data – researchers can now predict a drug’s ADMETox properties without needing to run as many early-stage lab tests. These AI tools analyse the structure of potential drug molecules and predict how they might behave in the body, offering a faster and potentially more ethical route by reducing the need for extensive testing. This shift towards computational methods aligns strongly with recent regulatory trends, such as the U.S. Food and Drug Administration’s (FDA) (the U.S. agency responsible for ensuring the safety and effectiveness of medicines) reported plan aiming to phase out traditional animal testing requirements for new drug development (Reuters, April 2025).
Powerful AI Tools for the Job
Scientists now have a toolbox of AI methods to help predict how drug candidates will behave. One category includes specialised prediction tools, like Chemprop. These are AI models trained for very specific jobs, such as predicting one particular type of toxicity. They work well when researchers have good, focused data for that specific task. Think of them like a specialised wrench designed perfectly for one type of bolt.
Another powerful category involves versatile “Foundation Models“, such as Molformer and ChemBERTa. These newer, large-scale AI models are first trained on massive amounts of general scientific data, giving them a broad understanding of chemistry. Scientists can then quickly adapt or “fine-tune” them for specific tasks, like predicting various ADMETox properties, often using fewer specific data than specialised tools need. Think of them as a highly adaptable multi-tool that can be adjusted for many different jobs.
Additionally, other approaches exist. Some tools focus specifically on the 3D shape of molecules (like Graphormer), while others use more established computational methods (like QSAR, often run using accessible, codeless platforms like KNIME or programming tools like Scikit-learn). The best tool depends on the specific question, the available data, and the team’s expertise, but the overall trend is clear: AI is making predictions faster and more powerful.
It’s important to note that training and using these powerful AI models often requires significant computing power, relying on specialised hardware like GPUs and High-Performance Computing (HPC) infrastructure.
The Big Challenge: Getting Enough Data Safely
These AI tools, especially the big foundation models, need vast amounts of high-quality data to learn effectively. The more data they see, the better their predictions become. However, much of the most valuable data is owned by different pharmaceutical companies or research labs (including university groups like those in SULSA – the Scottish Universities Life Sciences Alliance), who need to keep it private due to commercial sensitivity and patient confidentiality. So, how can we train powerful AI models using data from multiple sources without anyone having to reveal their private information?
The Solution: Federated Learning (Learning Together, Privately)
Federated Learning offers a clever solution. Instead of pooling all the sensitive data in one central place, the AI model is sent out to each organisation. Each organisation trains the model only on its own private data locally. They then share only the learnings from the model (called model updates) with a central coordinator, never the raw data itself. The coordinator combines these learnings to create an improved “global” model, which is sent back out for further training. This way, the model gets smarter by learning from everyone’s data, but no private data ever leaves its owner’s control. It’s a method of collaborative learning while preserving privacy.
A major project called MELLODDY successfully used this approach with several large pharmaceutical companies, proving that collaborative AI training for ADMETox is possible while protecting sensitive data. Building on such pioneering work, specialised technology platforms are now available to provide the infrastructure for secure data collaboration. These platforms make it easier for groups to work together on shared challenges (like the AI Structural Biology Consortium, AISB) without directly sharing sensitive information, enabling initiatives that advance drug discovery.
Making AI Discoveries Work for Everyone
While federated learning helps build better models privately, there’s a debate about what happens next. Often, the powerful models created in these private collaborations are only available to the members. However, making these predictive tools (or versions of them) openly available could significantly speed up drug discovery for everyone. Wider access could spark broader innovation, allow independent checking of the models, and ensure the benefits of collaborative research help develop safer medicines for the public good more quickly, empowering academic researchers, smaller companies, CROs, and scientists globally.
Other exciting AI techniques are also emerging, like training models to handle multiple prediction tasks at once (Multitask Learning) or using AI that understands language (Large Language Models or LLMs) to analyse research papers alongside chemical data. We hope to explore these in a future post.
How Expert Partners (CROs) Fit In
Where do expert research partners like CROs fit into this AI-driven landscape? CROs specialise in performing the hands-on lab experiments needed for drug development. Their evolving role involves becoming expert users of these powerful AI prediction tools, especially if the models become more widely accessible. They can apply the best available AI models to predict the properties of a client’s drug candidates and then use their lab expertise to test and confirm those predictions with real-world experiments. This synergy, combining AI’s speed with the reliability of expert validation, creates a more efficient path to understanding potential medicines, helps reduce the reliance on extensive in vivo testing, and helps bring safer, more effective treatments to patients sooner.
Ultimately, AI is reshaping the landscape of drug discovery, offering faster, more accurate predictions of a drug’s safety and efficacy. By integrating advanced technologies like machine learning and federated learning, we can streamline the early stages of drug development while safeguarding sensitive data. As AI tools become more refined and accessible, their synergy with expert CROs promises an efficient workflow that balances computational predictions with expert validation. Broadening access to these tools can democratise innovation, driving the collective progress needed to develop safer, more effective medicines for the future. The ongoing collaboration between AI and life sciences holds immense potential, promising transformative impacts on healthcare worldwide.
Interested in taking a deeper dive into Data, AI and Life Sciences? The Data Lab and The Scottish Universities Life Sciences Alliance (SULSA) are teaming up to deliver a programme of activity bringing together Life Sciences with Data & AI for the month of April. The programme aims to provide insights into cutting-edge research at the intersection of life sciences and data science being undertaken in Scotland. Creating an opportunity for industry, academics, students and staff in the SULSA network and TDL Community to make new connections. At the end of the programme SULSA and The Data Lab will launch our first joint Innovation Seed to support collaboration between industry and academia.
Events coming up in April:
- From Molecules to Populations: Data-Driven Approaches to Health Research – 30 April 2025 11:00 – 12:00 BST [Online]
Join us via The Data Lab Community as we showcase real-world examples of AI & data in life sciences and discuss the future of these technologies!