Parnad Basu, Amity University Kolkata
By definition, DNA sequencing refers to a process of determining the order of nucleotides that are present in a DNA sequence. DNA is made of nucleotides. These nucleotides are made of a phosphate group, a sugar group, and nitrogen bases. Nitrogen bases found in DNA are adenine (A), guanine (G), thymine (T), and cytosine (C). These nucleotide sequences are very important to understand the structure and function of a gene. And DNA sequencing is done for exactly that reason. Through DNA sequencing one can identify the genes responsible for diseases. In forensic science, DNA sequencing is used for identification purposes. Not only that, DNA sequencing aids agricultural biotechnology as well.
DNA sequencing can be divided into two types depending on base pairs or reads. Short read sequencing produces shorter reads (30-50 bp) than long-read sequencing. On one hand, a short read gives higher coverage. But, longer reads are easier to process. Some long-read technologies include SMRT (single molecule real-time) sequencing and Nanopore DNA sequencing. Some short-read technologies include Polony sequencing and 454 pyrosequencing.
About Straglr
Despite being very useful, long reads technologies show a higher error rate per nucleotide (10-15%). This lead a group of scientists to develop a new software tool named Straglr. Straglr can scan an entire genome for potential TR (tandem repeat) expansions. At first, it extracts insertions made up of TRs. Then, it genotypes the recognized expanded loci.
According to researchers, it cuts down both time and computing resources massively. Along with the aforementioned functions, Straglr helps the discovery of expansions at unannotated loci as well.
The findings of the study:
Let us look at the results obtained through its study to better understand the function and specialties:
- Simulated data: Scientists generated three genomic sequences of hg38 (human reference genome). Each of the genomes was modified at 17 known STR disease loci on different chromosomes. Straglr along with tandem-genotypes and RepeatHMM were used to better identify the efficiency of Straglr. It was found that Straglr showed a p-value that is smaller than the cut-off only once. Whereas, tandem-genotypes showed twice as RepeatHMM showed a different value altogether.
- Targeted sequencing data: For this part scientists analyzed HiFi sequence data. Seven samples were analyzed with HTT CAG and FMR1 CAG expansions altogether. Whereas one negative control with unknown expansion was analyzed. When Straglr was used with PacBio’s analysis, a high level of agreement was observed.
- WGS data: To evaluate WGS data Straglr was again used with RepeatHMM and tandem-genotypes. Comparison between these three technologies resulted in higher Pearson correlation coefficients for Straglr and RepeatHMM. Straglr’s genotyping data was also admirable. It showed a high level of sensitivity (87%) and specificity (93%).
- Runtime and computing resource: This study was conducted with Straglr and minimap2. Here, both the technologies were run individually and at the same time. Where minimap2 completed the task in about 3 hours, Straglr did it in about one and a half hours. Also, scientists didn’t count the completion time of RepeatHMM as it took more than 24 hours.
A new in silico alternative to traditional means
All in all, the study showed that Straglr is a robust and efficient alternative. Straglr efficiently identifies TR expansions relative to the reference genome. As it stands, there are many genetic disorders with unknown causal mutations. Straglr can help us identify those in a much shorter period. The cost-effective side of this software is also a positive point.
Also read: Understanding Protein Trafficking and its Effects
Reference:
- Chiu, R., Rajan-Babu, I.-S., Friedman, J. M., & Birol, I. (2021). Straglr: Discovering and genotyping tandem repeat expansions using whole genome long-read sequences. Genome Biology, 22(1), 224. https://doi.org/10.1186/s13059-021-02447-3
- The Corrosion Prediction from the Corrosion Product Performance
- Nitrogen Resilience in Waterlogged Soybean plants
- Cell Senescence in Type II Diabetes: Therapeutic Potential
- Transgene-Free Canker-Resistant Citrus sinensis with Cas12/RNP
- AI Literacy in Early Childhood Education: Challenges and Opportunities
One thought on “Straglr: A new and efficient DNA sequencing technology”