Kanikah Mehndiratta, MSc, University of Glasgow
Understanding the association between diseases and the potential genes involved can be challenging. Usually, scientists prefer biomedical experiments to explore the underlying genetics of a complex disease. But the experiments turn out to be very expensive and time consuming. They also pose risk for human error which can lead to unreliable results. Computational approach via machine learning models is a recent area of interest when deducing relationship between diseases and the genetic makeup of a patient. Such a. relationship could be seen as a link prediction problem in the machine learning world’s bipartite network. Latest study published in the BMC Genomics journal discusses existing machine learning models. It specifically emphasises on the use of dual hypergraph regularized least squares to identify such associations.
Machine learning models for gene-disease relationship:
In disease prevention and/or treatment, understanding association of gene to a related disease can prove greatly helpful. A typical machine learning model targeting the same, would initially extract relevant genetic information of every disease in the database. A model can then be trained to determine association of genes to the disease of interest. Though, the challenge here is targeting diseases that have fewer known genes or lack of association data and other relevant information. Use of a matrix completion model can prove useful here. It calculates similarity information and can predict gene-disease association but is time consuming.
Graph Convolutional network rather divides diseases and genes into heterogenous networks of two where the edges denote similarity. Thus, genes with higher similarity in the analysis will rete to similar diseases. But the method isn’t smooth and stays biased by network topology. Another machine learning approach is the Multiple Kernel Learning (MKL) that combines information from multiple sources. It has been previously used in identification of drug-target and associated side-effects. It has also been used to identify protein localization in the subcellular region, prediction of DNA-binding proteins etc.
A novel prediction model- DHRLS:
Hypergraphs as part of the LapRLS (Laplacian regularized least squares) framework represent complex associations between different objects, here gene and disease. To a graph neural network family, two trainable operators were introduced in an end-to-end manner. They were the hypergraph convolution and attention. The convolution describes basic formulation behind executing convolution. The hypergraph attention rather focusses on the enhancement of capacity of the representation learning.
Inspired by the same, a novel prediction model for identifying gene-disease association has been proposed through the study. It is called the dual hypergraph least squares model (DHRLS). The associated complex networks have nodes divided into 2 sets, X and Y, here gene and disease. A connection between only these 2 nodes is allowed during analysis via machine learning in the bipartite network. Two-feature spaces have been used as well to describe similarity information between many genes and diseases. MKL has been used to deduce weights of various kernels and further combine them in the two spaces. Hyper-graphs have been embedded for preservation of complex information related to genes and diseases.
Conclusions drawn from the study:
The effectiveness of the method was proven via one gene to disease network and six types of real networks. Two types of cross validation approaches were used at the dataset of gene-disease network. The capability of such a model has also been tested in real time prediction of a novel disease with excellent results.
Also read: The cellular pathways that trigger spitting
References:
- Yang, H., Ding, Y., Tang, J., & Guo, F. (2021). Identifying potential association on gene-disease network via dual hypergraph regularized least squares. BMC Genomics, 22(1), 605. https://doi.org/10.1186/s12864-021-07864-z
- Sikandar, M., Sohail, R., Saeed, Y., Zeb, A., Zareei, M., Khan, M. A., Khan, A., Aldosary, A., & Mohamed, E. M. (2020). Analysis for disease gene association using machine learning. IEEE Access, 8, 160616–160626. https://doi.org/10.1109/ACCESS.2020.3020592
- The Corrosion Prediction from the Corrosion Product Performance
- Nitrogen Resilience in Waterlogged Soybean plants
- Cell Senescence in Type II Diabetes: Therapeutic Potential
- Transgene-Free Canker-Resistant Citrus sinensis with Cas12/RNP
- AI Literacy in Early Childhood Education: Challenges and Opportunities
Author Info:
Kanikah Mehndiratta is an avid researcher in the field of Genetics with a background in Biotechnology. She is a postgraduate from the University of Glasgow in their Medical Genetics and Genomics program. Currently, based in Chandigarh as a scientific writer, she involves herself mainly in projects related to neurological disorders. Outside of academics, she likes to read novels, travel and is involved in volunteer work mostly.
LinkedIn profile- https://www.linkedin.com/in/kanikah-mehndiratta-301830171
Other articles-
1. https://bioxone.in/news/worldnews/natural-killer-cells-defence-against-self-destruction/
2. https://bioxone.in/news/worldnews/crispr-cas9-for-disease-resistance-in-salmon/
3.https://bioxone.in/news/worldnews/sexually-dimorphic-hydrocarbons-pheromones-in-cockroaches/
One thought on “Understanding gene-disease association via machine learning”