Close Menu
    Facebook X (Twitter) Instagram
    SciTechDaily
    • Biology
    • Chemistry
    • Earth
    • Health
    • Physics
    • Science
    • Space
    • Technology
    Facebook X (Twitter) Pinterest YouTube RSS
    SciTechDaily
    Home»Technology»MIT’s AI Learns Molecular Language for Rapid Material Development and Drug Discovery
    Technology

    MIT’s AI Learns Molecular Language for Rapid Material Development and Drug Discovery

    By Adam Zewe, Massachusetts Institute of TechnologyJuly 13, 2023No Comments6 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email
    Molecular Properties AI Concept Illustration
    MIT-Watson AI Lab’s new AI system drastically streamlines drug and material discovery by accurately predicting molecular properties with minimal data. The system leverages a “molecular grammar” learned via reinforcement learning to generate new molecules efficiently. This method has shown remarkable efficacy even with datasets of less than 100 samples.

    This AI system only needs a small amount of data to predict molecular properties, which could speed up drug discovery and material development.

    Discovering new materials and drugs typically involves a manual, trial-and-error process that can take decades and cost millions of dollars. To streamline this process, scientists often use machine learning to predict molecular properties and narrow down the molecules they need to synthesize and test in the lab.

    Researchers from MIT and the MIT-Watson AI Lab have developed a new, unified framework that can simultaneously predict molecular properties and generate new molecules much more efficiently than these popular deep-learning approaches.

    To teach a machine-learning model to predict a molecule’s biological or mechanical properties, researchers must show it millions of labeled molecular structures — a process known as training. Due to the expense of discovering molecules and the challenges of hand-labeling millions of structures, large training datasets are often hard to come by, which limits the effectiveness of machine-learning approaches.

    By contrast, the system created by the MIT researchers can effectively predict molecular properties using only a small amount of data. Their system has an underlying understanding of the rules that dictate how building blocks combine to produce valid molecules. These rules capture the similarities between molecular structures, which helps the system generate new molecules and predict their properties in a data-efficient manner.

    This method outperformed other machine-learning approaches on both small and large datasets, and was able to accurately predict molecular properties and generate viable molecules when given a dataset with fewer than 100 samples.

    Unified Framework To Predict Molecular Properties
    Researchers from MIT and the MIT-Watson AI Lab have developed a unified framework that uses machine learning to simultaneously predict molecular properties and generate new molecules using only a small amount of data for training. Credit: Jose-Luis Olivares/MIT

    “Our goal with this project is to use some data-driven methods to speed up the discovery of new molecules, so you can train a model to do the prediction without all of these cost-heavy experiments,” says lead author Minghao Guo, a computer science and electrical engineering (EECS) graduate student.

    Guo’s co-authors include MIT-IBM Watson AI Lab research staff members Veronika Thost, Payel Das, and Jie Chen; recent MIT graduates Samuel Song ’23 and Adithya Balachandran ’23; and senior author Wojciech Matusik, a professor of electrical engineering and computer science and a member of the MIT-IBM Watson AI Lab, who leads the Computational Design and Fabrication Group within the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will be presented at the International Conference for Machine Learning.

    Learning the language of molecules

    To achieve the best results with machine-learning models, scientists need training datasets with millions of molecules that have similar properties to those they hope to discover. In reality, these domain-specific datasets are usually very small. So, researchers use models that have been pretrained on large datasets of general molecules, which they apply to a much smaller, targeted dataset. However, because these models haven’t acquired much domain-specific knowledge, they tend to perform poorly.

    The MIT team took a different approach. They created a machine-learning system that automatically learns the “language” of molecules — what is known as a molecular grammar — using only a small, domain-specific dataset. It uses this grammar to construct viable molecules and predict their properties.

    In language theory, one generates words, sentences, or paragraphs based on a set of grammar rules. You can think of a molecular grammar the same way. It is a set of production rules that dictate how to generate molecules or polymers by combining atoms and substructures.

    Just like a language grammar, which can generate a plethora of sentences using the same rules, one molecular grammar can represent a vast number of molecules. Molecules with similar structures use the same grammar production rules, and the system learns to understand these similarities.

    Since structurally similar molecules often have similar properties, the system uses its underlying knowledge of molecular similarity to predict properties of new molecules more efficiently.

    “Once we have this grammar as a representation for all the different molecules, we can use it to boost the process of property prediction,” Guo says.

    The system learns the production rules for a molecular grammar using reinforcement learning — a trial-and-error process where the model is rewarded for behavior that gets it closer to achieving a goal.

    But because there could be billions of ways to combine atoms and substructures, the process to learn grammar production rules would be too computationally expensive for anything but the tiniest dataset.

    The researchers decoupled the molecular grammar into two parts. The first part, called a metagrammar, is a general, widely applicable grammar they design manually and give the system at the outset. Then it only needs to learn a much smaller, molecule-specific grammar from the domain dataset. This hierarchical approach speeds up the learning process.

    Big results, small datasets

    In experiments, the researchers’ new system simultaneously generated viable molecules and polymers, and predicted their properties more accurately than several popular machine-learning approaches, even when the domain-specific datasets had only a few hundred samples. Some other methods also required a costly pretraining step that the new system avoids.

    The technique was especially effective at predicting physical properties of polymers, such as the glass transition temperature, which is the temperature required for a material to transition from solid to liquid. Obtaining this information manually is often extremely costly because the experiments require extremely high temperatures and pressures.

    To push their approach further, the researchers cut one training set down by more than half — to just 94 samples. Their model still achieved results that were on par with methods trained using the entire dataset.

    “This grammar-based representation is very powerful. And because the grammar itself is a very general representation, it can be deployed to different kinds of graph-form data. We are trying to identify other applications beyond chemistry or material science,” Guo says.

    In the future, they also want to extend their current molecular grammar to include the 3D geometry of molecules and polymers, which is key to understanding the interactions between polymer chains. They are also developing an interface that would show a user the learned grammar production rules and solicit feedback to correct rules that may be wrong, boosting the accuracy of the system.

    Reference: Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction

    This work is funded, in part, by the MIT-IBM Watson AI Lab and its member company, Evonik.

    Artificial Intelligence Computer Science CSAIL Electrical Engineering Machine Learning MIT Pharmaceuticals
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    CausalSim: MIT’s New Tool for Accurately Simulating Complex Systems

    When It Comes to AI, Can We Ditch the Datasets? Using Synthetic Data for Training Machine-Learning Models

    MIT AI Hardware Program Aims To Lead in Artificial Intelligence Technology Development

    Demystifying Machine-Learning Systems: Automatically Describing Neural Network Components in Natural Language

    Breakthrough AI Technique Enables Real-Time Rendering of Scenes in 3D From 2D Images

    New Artificial Intelligence System Enables Machines That See the World More Like Humans Do

    New Machine-Learning System Gives Robots Social Skills

    Avoiding Shortcut Solutions in Artificial Intelligence for More Reliable Predictions

    MIT’s New Neural Network: “Liquid” Machine-Learning System Adapts to Changing Conditions

    Leave A Reply Cancel Reply

    • Facebook
    • Twitter
    • Pinterest
    • YouTube

    Don't Miss a Discovery

    Subscribe for the Latest in Science & Tech!

    Trending News

    Could Perseverance’s Mars Samples Hold the Secret to Ancient Life?

    Giant Fossil Discovery in Namibia Challenges Long-Held Evolutionary Theories

    Is There Anybody Out There? The Hunt for Life in Cosmic Oceans

    Paleontological Surprise: New Research Indicates That T. rex Was Much Larger Than Previously Thought

    Photosynthesis-Free: Scientists Discover Remarkable Plant That Steals Nutrients To Survive

    A Waste of Money: New Study Reveals That CBD Is Ineffective for Pain Relief

    Two Mile Long X-Ray Laser Opens New Windows Into a Mysterious State of Matter

    650 Feet High: The Megatsunami That Rocked Greenland’s East Coast

    Follow SciTechDaily
    • Facebook
    • Twitter
    • YouTube
    • Pinterest
    • Newsletter
    • RSS
    SciTech News
    • Biology News
    • Chemistry News
    • Earth News
    • Health News
    • Physics News
    • Science News
    • Space News
    • Technology News
    Recent Posts
    • Harnessing Blue Energy: The Sustainable Power Source of Tomorrow
    • Mystery Solved: Scientists Discover Unique Evolutionary Branch of Snakes
    • Unlocking the Deep Past: New Study Maps the Dawn of Animal Life
    • Scientists Uncover How Cocaine Tricks the Brain Into Feeling Good – Breakthrough Could Lead to New Substance Abuse Treatments
    • Scientists Sound the Alarm: Record Ocean Heat Puts the Great Barrier Reef in Danger
    Copyright © 1998 - 2024 SciTechDaily. All Rights Reserved.
    • Latest News
    • Trending News
    • Privacy Policy
    • Terms of Use

    Type above and press Enter to search. Press Esc to cancel.