Debarati Basu, Makaut (WB)
Cancer is distinguished by heterogeneous morphological, phenotypic, and genomic profiles. For patients to undergo proper treatment identification of cancer subtypes plays an important role. Meta-data is being produced in the fields of genomics, epigenomics, and transcriptomics which will help to identify underlying root causes of cancer on a gene level. It is assumed that if we increase the usage of omics data then the accuracy for identifying cancer subtypes also increases.
For scrutinizing this assumption, the construction of three categories of benchmarking datasets is carried out. This is mainly done for thorough assessment and comparison of ten prototypical multi-omics data integration methods for subtyping cancer. The assumptions are carried out by taking into consideration of robustness, accuracy, and computational efficiency. The impact of various omics data and the potency of cancer subtyping of their various combinations are investigated.
Multi-omics data integration
One of the notable techniques in a data-driven investigation is computational integrative analysis. Although there are numerous cancer subtyping methods, their evaluation is complicated due to a shortage of gold standards. Utilizing the datasets, a complete assessment of ten prototypical integration methods for determining the accuracy of cancer subtypes is carried out. It is quantified based on clinical importance, clustering accuracy, soundness, and computational efficiency. The study is suggestive of various successful combinations for many cancers which are under further studies which is an important factor for the researchers in omics data analysis.
The application of data integration methods for the analysis of cancer subtypes has three main goals. The goals are comprehension of the molecular mechanism of cancer, cluster formation of disease samples, and forecasting the result like that of survival efficacy. An important aspect of this method is to identify the molecular subtypes of cancer i.e. patients categorized with similar biological characteristics which include response to a drug, survival time. Different patients suffering from cancer undergo treatments that are dependent on particular subtypes. The data integration approach is utilized for the identification of cancer subtypes from the macro point of view. It is mainly carried out for accurate diagnosis and treatment.
Limitations of the study
There are various computational integration methods for identifying cancer subtypes. But most of these methods face two types of problems.
- The first problem is related to the comparison of performance among the various methods. It arises due to the shortage of reliable gold standards and compatible performance criteria. Moreover, the problem arises as various datasets and evaluation measures are undertaken when different methods are considered.
- The second problem is associated with the selection of accessible data types which are combined to attain good results. For understanding the second problem in a better way a survey was undertaken. In this survey, 58 integration methods were undertaken for cancer subtypes. It was carried out between 2009 to20 19. The result obtained showed data combinations taken into consideration under these methods were inconsistent.
Results
The study categorizes three groups of benchmarking datasets by combining all possible combinations of four types of multi-omics data among nine cancer variants. Assessment of the methods is done to check their accuracy, soundness, and computational efficiency. A detailed comparison is performed that determines the efficiency of the ten integration methods considering both the clustering accuracy and clinical significance. Moreover, the impact of various types of omics data and their fusion on cancer subtypes were also analyzed. The subtyping is usually attained by unsupervised clustering. As a result, the number of cancer subtypes determination is not easy.
Silhouette coefficient was utilized to measure cluster accuracy of the integration methods. Silhouette coefficient can be defined as a method used for interpreting and validating data clusters’ consistency. It is usually calculated in original or integrated space. In this method, it is calculated by using integrated input data matrices in the original space. Another important factor is the computation time needed to complete particular tasks. This is particularly significant while working with huge data. In such cases, a user will choose a method that will be less accurate over a method that takes a longer time to complete.
Conclusion
Various multi-omics integration methods are being undertaken for better apprehension about cancer. Cancer subtyping can be utilized to provide patients with personalized and accurate treatments. Two methods that are usually used in cancer subtyping tasks are NEMO and SNF. This study is a detailed and comparative study of multi-omics integration methods. By using the results on accuracy an assessment was done to show the impact of various omics data types and their combinations on subtyping. Experimental results show that two omics data types for integration analysis were better than four omics data types. Similarly, three omics data types were not good as compared to the two omics data types. This is caused due to three interrelated factors:
1. The negatively correlated noises that occur in the omics data which is the reason to abort useful information.
2. The repetition of various types of data.
3. The computational challenges.
It was further observed the importance of DNA methylation data for determining the efficacy of integrating methods. The advancement of new types of omics data leads to the use of proteomics, single-cell omics data, and machine learning methods. These methods are used for the future evolution of data integration.
Also read: Wildfire smoke have significant influence on clouds
Reference:
- Duan, R., Gao, L., Gao, Y., Hu, Y., Xu, H., Huang, M., Song, K., Wang, H., Dong, Y., Jiang, C., Zhang, C., & Jia, S. (2021). Evaluation and comparison of multi-omics data integration methods for cancer subtyping. PLOS Computational Biology, 17(8), e1009224. https://doi.org/10.1371/journal.pcbi.1009224
- The Corrosion Prediction from the Corrosion Product Performance
- Nitrogen Resilience in Waterlogged Soybean plants
- Cell Senescence in Type II Diabetes: Therapeutic Potential
- Transgene-Free Canker-Resistant Citrus sinensis with Cas12/RNP
- AI Literacy in Early Childhood Education: Challenges and Opportunities
Ultra-processed foods: Safe or dangerous to consume?
Nimrit Palan, Mumbai university What is Ultra-processed food? Ultra-processed foods are ingredient formulations, mostly for unique industrial use, that are created through a series of manufacturing processes such as the fractionation of whole food products into materials, the assembly of untreated and treated food substances, and the frequent use of cosmetic preservatives, which are frequently […]