Madhavi Bhatia, National Pharmaceutical Education and Research, Guwahati
Variant calling is the process by which we can identify variants in the sequence data provided. Variant calling has a major application in modern bioinformatics. It helps in discovering the underlying traits of Mendelian diseases, in understanding the individual’s susceptibility to cancer, and also to study genetic diversity to help strategize various crop-breeding methods. It also helps in understanding small insertions, substitutions, and deletions (indels) that account for most of the mutations in the human genome which are of great interest in most of the studies. Probing the human genome can be done at higher resolution by using advanced sequencing technologies. The sequencing reads from next-generation platforms (NGS) for example Illumina is of few hundred bases long and highly accurate.
Variant calling methods
Traditionally, the variant calling methods have used probabilistic models of sequencing errors and sequencing read alignment in order to determine the likelihoods of variation at genomic loci from sequencing data. For Eg: the pair-Hidden Markov model (HMM) is used to determine alignment probabilities of reads to different candidate haplotypes at a site thus leading to the determination of the best haplotypes at the site. Another model known as Deep Variant uses a Deep Neural Network (DNN) that makes predictions for candidate alternative alleles and pairs of candidate alternative alleles at a site. The predictions made by DNN are then converted into variant calls by using additional algorithms that filter, sort and rank the different alleles at a time. Another learning model is Clairvoyant, which is a small variant caller that runs rapidly for SNVs and indels of length up to 4 bp.
HELLO (Hybrid and stand-alone Estimation of smaLL genOmic variants) method
A DNN (deep neural network) architecture is built that recognizes the reads and alleles as they are the elements of the problem, and then introduces specific constructs that encode the relationship of reads to alleles, as well as the relationship of one allele to another allele. Thus, this method is built for small variant calling and accounts for the nature and structure of genome sequencing data and tailors the DNN model to the problem at hand. In this method, DNN predictions used for each allele at the site can be combined to produce the variant call result by using log-likelihood maximization instead of resorting to complex algorithms or a second machine learning method. This method might also allow the entire framework to be extended to the polyploidy cases also with no changes in the underlying DNN. The HELLO method was compared to DeepVariant and Genome Analysis Toolkit (GATK) using sample datasets. HELLO, the model was up to 14x smaller in terms of parameter count as compared to Deepvariant but performed similarly to or better than Deepvariant in different settings. After performing the experiments, it was observed that-
- For PacBio, the HELLO method outperformed Deepvariant and GATK in all cases.
- For hybrid variant calling, the HELLO method outperformed DeepVariant in all cases except SNV cases. It was observed that DeepVariant makes approximately 5% fewer errors in this case.
- For Illumina variant calling, HELLO outperformed as compared to DeepVariant and GATK in all cases except indel, DeepVariant made approx. 3% fewer cases.
Conclusion: A DNN architecture model considers that reads are the fundamental units of sequencing data, more reads supporting the same candidate allele reinforce the confidence in the allele and the confidence in the allele should be evaluated in relation to other candidates alleles in a simple manner. Thus such a DNN architecture should be built that works for the problem by encoding assumptions that can be made regarding sequencing data and variant calling directly into the structure of the DNN.
Also read: Inborn errors of immunity & corresponding protein interactions
Reference: Ramachandran, A., Lumetta, S. S., Klee, E. W., & Chen, D. (2021). HELLO: Improved neural network architectures and methodologies for small variant calling. BMC Bioinformatics, 22(1), 404. https://doi.org/10.1186/s12859-021-04311-4
- The Corrosion Prediction from the Corrosion Product Performance
- Nitrogen Resilience in Waterlogged Soybean plants
- Cell Senescence in Type II Diabetes: Therapeutic Potential
- Transgene-Free Canker-Resistant Citrus sinensis with Cas12/RNP
- AI Literacy in Early Childhood Education: Challenges and Opportunities
One thought on “HELLO- A method based on DNN architecture”