Monika R, PSG College of Technology, Coimbatore
Polyploidy has played a central role in carving the genetic construction in most plants, particularly angiosperms (flowering plants). There are two main classes of polyploid formation: autopolyploids arise by doubling of similar homologous genomes within one species; in contrast, allopolyploidy (hybrid polyploidy) arise through hybridization which usually occurs between two related species, thus merging the genomic content from two divergent species into one and doubling of nonhomologous (homoeologous) genomes.
What are Homeologs and the way to infer it?
Homoeologs are pairs of genes within an allopolyploid that originated by speciation, diverged, and were reunited by hybridization.
Source: Glover, N. M., Redestig, H., & Dessimoz, C. (2016). Homoeologs: What Are They and How Do We Infer Them? Trends in Plant Science, 21(7), 609–621. doi:10.1016/j.tplants.2016.02.005
The homoeologous relationships between subgenomes of a polyploid are often utilized to work out the structural, genetic, and evolutionary results of polyploidization. These homoeologous relationships are inferring from low-throughput laboratory techniques to high-throughput computational ones. Due to several limitations in laboratory methods, evolution-based computational techniques like graph-based orthology methods may be adapted to infer homoeologs. The most generally used method of homoeolog detection is by finding BBHs (identifies best bidirectional protein hits between subgenomes), sometimes along with an additional requirement of synteny (Degree of gene position conservation between two homoeologous chromosomes). BBH approach uses BLAST or another sequence alignment algorithm to seek out the set of reciprocally highest-scoring pairs of genes between two subgenomes. By identifying only the ‘best’ pair (1:1 homoeology), it cannot identify one-to-many or many-to-many homoeology. As a result, BBH between subgenomes will at the best infer a subset of the homoeologous relationships, thereby yielding false-negatives. Another graph-based homoeolog inference approach is performed within the Orthologous Matrix (OMA) database – a technique and resource for inferring different kinds of homologous relationships between fully sequenced genomes. This system identifies mutually closest homologs based on evolutionary distance while considering the possibility of many-to-many relationships.
Case study
A case study has been conducted recently on the allotetraploid plant cotton (Gossypium hirsutum) TM-1 genome for homoeolog inference. Modern allotetraploid cotton contains an “A” and “D” subgenome from an ancestral polyploidy event that occurred approximately 1–2 million years ago. Comparing these A- and D-subgenomes through the BBH approach together with synteny revealed the missing of a considerable proportion of homoeologs. OMA provided the distribution of the synteny scores of all homoeolog pairs. Also, 32,426 homoeolog pairs were found with OMA within the cotton.
Using the synteny information together with the BBH status for every homoeolog pair within the OMA set, all homoeolog pairs were divided into four categories: BBH & syntenic, non-BBH & syntenic, BBH & non-syntenic, and non-BBH & non-syntenic. The majority (74%) of pairs within the set were both syntenic and BBHs, 8,276 pairs (26%) were either non-syntenic, non-BBH, or both non-syntenic and non-BBH. To decide whether non-BBH or non-syntenic homoeologs are different from regular homoeologs, the characteristics of the four categories of homoeologs were compared in terms of duplication extent, evolutionary distance (in PAM units), and protein length (in amino-acids) as follows:
- BBH & syntenic homoeolog pairs – These pairs were less likely to induce duplication and had the bottom evolutionary distance (median: 2.4, mean: 2.8 PAM units), indicating more sequence conservation and a slower evolutionary rate. But they had the highest median protein length (median 378 aa).
- Non-BBH & syntenic homoeolog pairs – These pairs had a medium distribution extent (median: 1, mean: 2–2.4) and had a medium evolutionary distance (median: 2.8–5.2, mean: 9.0–10.1 PAM units). They also had midrange protein lengths (medians: 216–276 aa).
- BBH & non-syntenic homoeolog pairs – These pairs were similar to non-BBH & syntenic homoeolog pairs in distribution extent, evolutionary distance, and protein lengths.
- Non-BBH & non-syntenic homoeolog pairs – These pairs had the highest distribution extent (median: 2 and mean: 3.2) and had the highest evolutionary distance (median: 4, mean: 20.8 PAM units). This indicates that generally, non-BBH genes and non-syntenic genes evolve faster than the BBH genes and syntenic genes respectively. But they had the lowest median protein length (157 aa).
These remaining 3 pairs of genes which were missed by BBH and Synteny were then examined for functionality (gene expression) within which the BBH & syntenic category had the foremost genes expressed (89.9%). The remaining 3 categories (non-BBH & syntenic, BBH & non syntenic and non-BBH & non syntenic) had similar expression patterns (72% – 76%) of genes expressed. All these 4 categories perform various biological processes and molecular functions.
BBH vs OMA for Homeolog Inference of Gossypium hirsutum
By restricting each gene to at the most one homoeologous counterpart, the BBH criterion neglects the likelihood that any gene duplication happened within the 5–10 Ma since the speciation of the arboreum (subgenome A) and raimondii (subgenome D) lineages. BBH generally yields many false negatives. Even during this case study, BBH misses 26% of the homoeologs in cotton relative to the OMA homoeolog set.
This variation of homoeolog inference between BBH and OMA is because, in many plant species, a high degree of collinearity, or conservation of gene order, has been observed between homoeologous chromosomes resulting in the concept of positional homoeology. Genes tend to remain in their ancestral position since divergence, but they will get rearranged through duplication/translocation before or after polyploidization. Since BBH with synteny approach relies on one-to-one homoeology (1:1) and position, they show false-negative and miss many homoeolog pairs. But OMA does not rely on synteny for homoeolog inference, and can also identify one-to-one/one-to-many/many-to-many homoeolog pairs and it would not miss any pairs. So, OMA would be the simplest and best approach for homoeolog inference.
From a crop improvement viewpoint, identifying homoeologs that may be functionally conserved is very important for engineering the genetic basis for traits of interest.
Also read: Novel brain cells named “Gorditas” and “OPC” discovered
Sources:
- Glover, N., Sheppard, S., Dessimoz, C. (2021). Homoeolog Inference Methods Requiring Bidirectional Best Hits or Synteny Miss Many Pairs. Genome Biology and Evolution, 13(6). doi.org/10.1093/gbe/evab077
- Glover NM, Altenhoff A, Dessimoz C. (2019). Assigning confidence scores to homoeologs using fuzzy logic. PeerJ. doi:10.7717/peerj.6231
- Eriksson, J. S., de Sousa, F., Bertrand, Y. J. K., Antonelli, A., Oxelman, B., & Pfeil, B. E. (2018). Allele phasing is critical to revealing a shared allopolyploid origin of Medicago arborea and M. strasseri (Fabaceae). BMC Evolutionary Biology, 18(1). doi:10.1186/s12862-018-1127-z
- Glover, N. M., Redestig, H., & Dessimoz, C. (2016). Homoeologs: What Are They and How Do We Infer Them? Trends in Plant Science, 21(7), 609–621. doi:10.1016/j.tplants.2016.02.005
- Flagel, L. E., Wendel, J. F., & Udall, J. A. (2012). Duplicate gene evolution, homoeologous recombination, and transcriptome characterization in allopolyploid cotton. BMC Genomics, 13(1), 302. doi:10.1186/1471-2164-13-302
- The Corrosion Prediction from the Corrosion Product Performance
- Nitrogen Resilience in Waterlogged Soybean plants
- Cell Senescence in Type II Diabetes: Therapeutic Potential
- Transgene-Free Canker-Resistant Citrus sinensis with Cas12/RNP
- AI Literacy in Early Childhood Education: Challenges and Opportunities
About Author: Monika R is an enthusiastic Biotech student aspiring for an opportunity to develop skills and grow professionally in the research field. Extremely motivated and possess strong interpersonal skills and the ability to learn concepts quickly.
Understanding B cell genomics to fight against COVID-19
Soumya Shraddhya Paul, Amity University Noida For a very long time, monoclonal antibodies have been used in various fields (cancer studies) but currently, they are being used to counteract the effects of SARS-CoV-2 and its variant by being an active compound in the vaccine as well as therapeutic drugs. Hence, to understand B cell genomics […]