Sumedha B S, Bangalore University
Cyclin Proteins:
The cell cycle is the sequence of processes that take place in a cell which ultimately leads to its division into two daughter cells. The mechanism of the cell cycle is regulated by a type of protein called “Cyclins”. Cyclins bind and activate cyclin-dependent kinases (CDKs). Cyclins were discovered by Timothy Hunt, Leland H. Hartwell, and Paul M Nurse. They won the Nobel prize for their contribution in 2001.
The precise transmission of genetic information to the daughter cells is made sure by tight regulation of all the stages of the cell cycle. The stages include Gap 1 phase, DNA synthesis phase, gap 2 phase, and mitosis phase. Each phase has a corresponding group of cyclins: G1 cyclins, G1/S cyclins, S cyclins, and M cyclins. Each of these has different functions. The levels of these cyclins waver throughout the cycle with a significant change in levels, at each stage. Cyclin levels rise or fall according to cellular requirements. Cyclin proteins regulate the cell cycle by forming complexes with cyclin-dependent kinases. This then activates the cell cycle.
Precise identification of cyclin protein could provide crucial insights into their functions. However, cyclin sequences share a little similarity. So only a poor prediction can be done for sequence similarity-based approaches.
A machine learning model is required to identify cyclin proteins.
The Computational Model:
The study published in the Computational and Structural Biotechnology Journal aimed to develop a computational model to differentiate between cyclin proteins and non-cyclin ones. For this, an advanced ensemble model was established.
They used a database that collected 215 cyclins and 204 non-cyclin proteins to test and create the methods for the cyclin prediction. In the model, protein sequences were encoded by seven types of features: amino acid composition, Geary correlation composition of k-spaced amino acid pairs, pseudo amino acid composition, tripeptide composition, normalized moreau-broto autocorrelation, and composition/transition/distribution.
Expression of the protein sequences with a mathematical formulation is very important, but not an easy task. Therefore, these seven types of feature-encoding methods were suggested to define the protein sequence.
Further, the features were optimized using an analysis of variance (ANOVA). Minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) technique was used. These methods are decent to confront these issues, as they require less time and give efficient results.
Minimum redundancy feature selection is an algorithm normally used in an approach to precisely identify features of genes and phenotypes, also to narrow down their relevance. This is usually described as paired with appropriate feature selection- Minimum Redundancy Maximum Relevance.
The Gradient boost decision tree algorithm (GBDT) is a chief learning algorithm that has been used in many bioinformatics and biological applications. In this study too, a gradient boost decision tree classifier was optimized to evaluate the model.
Utility of the new computational model:
Five-fold cross-verified results showed that this new model was capable of identifying the cyclin proteins with an accuracy of 93.06%. The AUC value is 0.971. This is higher than that obtained from recent studies on the same data. However, further studies are needed to create a user-friendly web server for this model.
The rapid development of effective computational tools has allowed scientists to confront and tackle biological problems. Computational tools have enabled prediction, analysis, and monitoring at the atomic level.
Also read: When things go wrong with life-saving surgeries!
References:
- Hasan Zulfiqar, Shi-Shi Yuan, Qin-Lai Huang, Zi-Jie Sun, Fu-Ying Dao, Xiao-Long Yu, Hao Lin, Identification of Cyclin Protein Using Gradient Boost Decision Tree Algorithm, Computational and Structural Biotechnology Journal, 2021, ISSN 2001-0370, https://doi.org/10.1016/j.csbj.2021.07.013
- The Corrosion Prediction from the Corrosion Product Performance
- Nitrogen Resilience in Waterlogged Soybean plants
- Cell Senescence in Type II Diabetes: Therapeutic Potential
- Transgene-Free Canker-Resistant Citrus sinensis with Cas12/RNP
- AI Literacy in Early Childhood Education: Challenges and Opportunities
One thought on “Novel Computational Method to Identify Cyclin Proteins”