RhoFold+: A Deep Learning Framework for Accurate RNA 3D Structure Prediction from Sequences

0
2
RhoFold+: A Deep Learning Framework for Accurate RNA 3D Structure Prediction from Sequences


Predicting RNA 3D structures is critical for understanding its biological functions, advancing RNA-targeted drug discovery, and designing synthetic biology applications. However, RNA’s structural flexibility and the limited availability of experimentally resolved data pose challenges. Despite RNA’s importance in gene regulation, RNA-only structures represent less than 1% of the Data Bank, and traditional methods like X-ray crystallography and cryo-EM are slow and resource-intensive. Computational techniques, including template-based methods like ModeRNA and de novo approaches like FARFAR2, have advanced RNA modeling but often need more speed and data availability. Deep learning models have emerged as transformative tools by leveraging RNA sequence data.

Recent deep learning-based methods integrate multiple sequence alignments (MSAs) and secondary structure constraints to enhance RNA 3D structure prediction. Approaches like DeepFoldRNA and trRosettaRNA use MSAs to derive geometric features for energy-based modeling, while end-to-end frameworks like AlphaFold3 and RoseTTAFoldNA directly predict 3D structures from sequences. Although MSA-based methods offer high accuracy, they are computationally expensive due to extensive database searches. Alternatives like DRFold rely solely on single sequences, providing faster results with slightly lower precision. Future developments aim to combine the speed of single-sequence models with the accuracy of MSA-based techniques for more efficient predictions.

RhoFold+ is an advanced deep learning framework developed by researchers from institutions including The Chinese University of Hong Kong, Shanghai Zelixir Biotech Company Ltd, Shenzhen Institute of Advanced Technology, Fudan University, Shanghai Artificial Intelligence Laboratory, Harvard University, MIT, Broad Institute of MIT and Harvard, Arizona State University, and Integrated Biosciences. Designed for accurate de novo RNA 3D structure prediction, RhoFold+ leverages an RNA language model pretrained on over 23.7 million sequences and incorporates multiple sequence alignments (MSAs) to address data limitations. Validated through benchmarks like RNA-Puzzles and CASP15, it predicts secondary structures and interhelical angles, offering broad applicability in RNA biology and functional studies.

The RhoFold+ platform combines multiple methods for RNA structure prediction. It incorporates MSA features using tools like Infernal and rMSA, which capture co-evolutionary information from RNA sequences. The RNA-FM language model, built on a transformer architecture similar to BERT, is trained on a large dataset of noncoding RNA sequences from RNAcentral. The model uses self-supervised learning, predicting masked nucleotides in sequences. RhoFold+ integrates a structure prediction module that employs a geometry-aware attention mechanism (IPA) for 3D structure refinement. The model is trained with various loss functions, including MLM, distance loss, and secondary structure loss, for accurate RNA structure predictions.

RhoFold+ is a computational tool for RNA 3D structure prediction, built using RNA-specific insights and data. It leverages a large RNA language model (RNA-FM) for sequence embeddings and MSAs for structure modeling. The model’s performance was rigorously benchmarked, showing superior accuracy compared to existing methods in RNA-Puzzles and CASP15 challenges, with an average RMSD of 4.02 Å. RhoFold+ excels at structure prediction, even for unseen sequences, and demonstrates faster prediction times than other methods. It was tested on various RNA structures, consistently achieving high accuracy across multiple validation scenarios.

In conclusion, RhoFold+ is a deep learning-based RNA 3D structure prediction tool that integrates an RNA language model pretrained on 23.7 million sequences. It offers a fully automated, differentiable approach to RNA structure prediction without requiring expert knowledge or computationally intensive processes. RhoFold+ outperforms existing methods in accuracy, particularly for single-strand RNAs, and is effective in predicting both RNA 3D and secondary structures. It can generalize across different datasets and predict unseen RNA structures. Despite its strengths, challenges still need to be addressed, including limited structural diversity data, difficulties with large RNA sequences, and interactions with ligands or proteins. Future improvements could address these limitations.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[FREE AI VIRTUAL CONFERENCE] SmallCon: Free Virtual GenAI Conference ft. Meta, Mistral, Salesforce, Harvey AI & more. Join us on Dec 11th for this free virtual event to learn what it takes to build big with small models from AI trailblazers like Meta, Mistral AI, Salesforce, Harvey AI, Upstage, Nubank, Nvidia, Hugging Face, and more.


Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.



LEAVE A REPLY

Please enter your comment!
Please enter your name here