Accelerated Discovery: AI and the Scientific Method
Updated: Jan 21
AMT Tech Report Selection by Stephen LaMarca:
Imagine increasing the speed of R&D by a factor of 10. Now imagine this superfast R&D costs 90% less than traditional efforts. Sounds ridiculous, right? Spoiler alert: It’s not.
The urgency of science has never been greater, according to Dario Gil, director of IBM Research. Gil was referring to the race to develop a vaccine for Covid-19, but this urgency is not going anywhere, he said, citing problems on the scale of climate change, future pandemics, food shortages and energy security. What is needed to solve these problems is the acceleration of the rate of discovery.
The solution will lie in a combination of artificial intelligence (AI), quantum computing, high-performance computing (HPC) and hybrid cloud technologies.
“Using [these technologies], we can radically change the process of how we conduct discovery,” Gil said. “We can accelerate and supercharge the traditional scientific method.”
Gil’s example was materials science, which has relied on serendipity throughout history (think of Teflon, discovered when searching for a new refrigerant. Vaseline and graphene were similar surprises). When not discovered by accident, useful new materials have typically been discovered by trial-and-error, which is extremely expensive and can take years, even with the latest HPC-based modelling and simulation of complex molecules.
This is where AI comes in.
“[AI] is really enabling unprecedented levels of speed and automation at scale, helping us solve ever more complex problems,” said Gil. “And it is AI that can help us usher in this new era of accelerated discovery. It can help us supercharge the scientific method to turn it from a linear process into a closed loop.”
The closed loop Gil envisions uses AI to sift through the existing knowledge base, then HPC (or in the future, quantum computing) to augment the data with simulation and look for gaps in that knowledge base. This information is fed to a generative AI model, which can suggest candidates for new molecules that fill the gaps. Then, AI-powered robots can generate the candidate molecules, based on data and examples of past chemical reactions.
“The process will culminate with having new knowledge that invokes a new question and the loop starts again,” Gil said. “So this would be a continuous loop of discovery, increasingly automated and increasingly autonomous.”
Gil called this loop “accelerated discovery,” and stressed that this is not in the distant future, this is something that is most definitely happening now.
PAG discovery A further example in Gil’s presentation was based on the accelerated discovery of a more sustainable photoresist material. Photoresists are vital to the semiconductor manufacturing process; chemically amplified photoresists use chemicals called photo acid generators (PAGs).
Using the traditional process, discovering a new PAG would likely take 10 years and cost at least $10 million. Scientists would search published literature and use what they could find, plus their own knowledge, to design a molecule and target the properties needed. They would then go through iterative cycles of synthesis, characterization, and testing until they reach a satisfactory compound. Even with supercomputers at their disposal, the process is long and hard, not least because of the large number of chemical compounds to consider.
Accelerated discovery has cut that process to more like one year and $1 million.
The accelerated discovery workflow has four stages. First, an AI called Deep Search “reads” all the existing scientific literature about PAGs.
“Using Deep Search typically speeds up the process by a thousand times, as the AI can ingest and process about 20 pages per second, per processing core,” Gil said. “Human readers of technical literature, on the other hand, typically need between one and two minutes per page.”
This knowledge base is represented by a dendrogram of the known PAG families (inner gray section of the diagram below). Empty spaces in the dendrogram represent places where data was missing.
The second step is to enrich the data with AI-powered simulations. Another AI simulates the experimental parameters of the known PAGs, 2-40x faster than for regular simulations.
“For some properties, the available data was so sparse or noisy and unreliable that it was almost useless,” Gil said. “We have to augment this data set with enough data on predicted properties to train an AI model. Here, we use AI enriched simulation to provide quantitative values for important properties for the PAGs in the dataset.”
These properties include toxicity (purple data in the chart above), biodegradability (blue) and lambda max, the wavelength of light most strongly absorbed by the material (green).
This information is fed into a third type of AI, called a generative model. Generative AIs are used today to generate fake profile images for social media profiles, long essay texts indistinguishable from human-written prose, computer code, and now molecules. The idea is to effectively fill in the gaps in the (gray) dendrogram. In IBM’s case, they wanted to look for a PAG material with better sustainability properties, particularly biodegradability.
“We have seen 10 X acceleration using generative models to identify gaps and create materials concepts for tests,” Gil said.
The final step in the process is to test the results in an AI driven lab, which according to Gil, achieves synthesis a hundred times faster than traditional methods.
There is some human involvement at this stage; an expert selects the best candidates for experimental validation, though the chemical synthesis which produces real-life chemicals is done by machines. At IBM’s RoboRXN laboratory in Zurich, an incredible combination of AI, automation and cloud technologies has learnt synthetic organic chemistry and can apply it remotely with robots.
IBM discovered its first new PAG material using this accelerated discovery process in November.
“[The new material] brings with it the promise of a world of many, many more accelerated discoveries to come,” Gil said.
Future vision As well as solving human problems like pandemics and climate change, IBM’s vision is that accelerated discovery will define the most innovative businesses of tomorrow. Companies whose core business is scientific discovery will obviously benefit, including life sciences, chemicals and materials companies. Another huge tranche of today’s big businesses rely on scientific discovery, including automakers, technology companies, healthcare and utilities. And many more are driven by information and discovery, that is, they gain their competitive advantage using data experimentation and learning. In this third category are software, financial markets, media & entertainment, telcos, banking and retail). Accelerating science and discovery will affect all these businesses.
“I am certain that tomorrow’s most innovative businesses will be discovery driven enterprises,” Gil said. “They will seek the platforms, tools and technologies that will allow them to accelerate the discovery process that gives them their competitive advantage. But that requires a serious investment in science and R&D.”
Again, Gil referred to the Covid-19 vaccine, developed in 14 months. If discovery had proceeded the traditional way, the vaccine would not have become available until 2033.
“To address our biggest challenges, we need to discover faster,” he said. “We need to unleash the power of accelerated discovery, and we need to do it with purpose, not just digital innovations for digital products and services, but let’s also direct our digital prowess to improve our physical world.”