Skip to content
Tagged COVID-19 Biotechnology SARS-CoV-2 Life Science cancer CORONAVIRUS pandemic
BioXone

BioXone

rethinking future

March 7, 2026
  • About
  • BiotechTodayNews
    • IndiaWeekly Biotech News of India
    • WorldWeekly Biotech News of The World
  • DNA-TalesArticles
    • BiotechnopediaInteresting articles written by BioXone members and associates.
    • Scientists’ CornerArticles from the pioneers of Biotechnology.
    • Cellular CommunicationInterview of greatest researchers’ in the field.
  • Myth-LysisFact Check
  • Signalling PathwayCareer related updates
    • ExaminationsExamination related articles.
    • Job and InternshipJobs and Internship related articles.
  • Courses
  • Contact

Most Viewed This Week

October 17, 2023October 16, 2023

The Corrosion Prediction from the Corrosion Product Performance

1
October 1, 2023September 30, 2023

Nitrogen Resilience in Waterlogged Soybean plants

2
September 28, 2023September 28, 2023

Cell Senescence in Type II Diabetes: Therapeutic Potential

3
September 26, 2023September 25, 2023

Transgene-Free Canker-Resistant Citrus sinensis with Cas12/RNP

4
September 25, 2023September 25, 2023

AI Literacy in Early Childhood Education: Challenges and Opportunities

5
September 22, 2023October 1, 2023

Sustainable Methanol Vapor Sensor Made with Molecularly Imprinted Polymer

6

Search Field

Subscribe Now

  • Home
  • BiotechToday
  • PDBeCIF: For manipulating macromolecular Crystallographic Information File

Global Warming May Reduce the Spread of Dengue

Depressive disorders and Pharmacotherapy: New info revealed!

PDBeCIF: For manipulating macromolecular Crystallographic Information File
  • BiotechToday
  • World

PDBeCIF: For manipulating macromolecular Crystallographic Information File

bioxone July 25, 2021July 24, 2021

Monika Raman, PSG College of Technology, Coimbatore

The Protein Data Bank archive (PDB) managed by the Worldwide Protein Data Bank (wwPDB) organization is the sole global repository of experimentally determined 3D structure data. The historical, human-readable PDB file format has been used to communicate Protein Data Bank (PDB) structures since 1970. On the other hand, rapid developments in experimental and approaches for structure identification like cryo-electron microscopy and integrative/hybrid methods quickly exposed their limitations. The new standard of the PDB, the PDBx/Macromolecular Crystallographic Information File format (PDBx/mmCIF), became the master format for the PDB archive in 2014.

Biomacromolecular structural data outgrew the legacy PDB format on which the scientific community relied for decades, yet the use of its successor PDBx/mmCIF is still not widespread. One factor could be the availability of easy-to-use tools that only support the legacy format. Another could be the inherent problems of accurately processing mmCIF files, given the large number of edge cases that make efficient parsing difficult. To properly utilize macromolecular structure data and their associated annotations, however, this new format must be widely adopted, as soon as possible.

What is PDBx/mmCIF format?

This site explains the wwPDB data content definition format, dictionaries, and related software tools for PDB entry deposition, annotation, and archiving.

It is an extension of the CIF format, which is the gold standard in small molecule crystallography. Each file contains one or more data blocks prefixed with ‘data_’ and populated with data items. A preceding underscore and a name are used to identify each data item. The name is of two parts: category and keyword, separated by a period. Key-value and tabular categories are the two types of categories. Tabular is an array of strings, whereas key-value is a single item of type string per keyword.

The PDBx/mmCIF format replaced the PDB file format to remove size limits on submitted structures and substantially enhance the representation of extra data provided with the coordinates.

PDBx/mmCIF files include programmatically available information on structural elements of macromolecular assemblies (category: pdbx_struct_assembly), details on assembly generation (pdbx_struct_assembly_gen), properties and features (pdbx_struct_assembly_prop), and much more. The PDBx/mmCIF Exchange Dictionary achieves this degree of clarity. It specifies how data item values are validated using data types, controlled dictionaries, and ranges.

The FAIR principles (Findable, Accessible, Interoperable, and Reusable) are followed while implementing a regulated dictionary. Even recently produced software may lack compatibility for the mmCIF format since several prominent software tools still rely on the outdated PDB format.

By making more software PDBx/mmCIF format compliant, the community would benefit from a faster acceptance rate of the new data standard. To facilitate this transition, Glen van Ginkel and colleagues, European Molecular Biology Laboratory, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK, presented a lightweight, general-purpose Python package, PDBeCIF. 

PDBeCIF package

The PDBeCIF package is available for download from PyPI or GitHub. PDBeCIF is a dependency-free Python 2/3 module that allows manipulation of mmCIF/CIF files issued by the wwPDB partners. This program supports reading from and writing on PDBx/mmCIF files. It also supports reading CIF files and providing numerous handy methods for searching the file content.

The package contains several classes.

  • CifFileReader – For reading PDBx/mmCIF files.
  • CifFileWriter – For writing PDBx/mmCIF files.
  • CIFWrapper – It is a wrapper object that lets you use Python dot notation to access the file content and offer search methods for filtering data items using string criteria and regular expressions.
  • CifFile data object – It enables simple changes to mmCIF content like addition and removal of categories and data items.

The parser has a method that allows undesirable categories to get discarded and desirable to get extracted, boosting parsing speed and memory efficiency even more. 

Updated PDBx/ mmCIF files with new information are available via PDBe. These files add uniform and standardized metadata to the basic PDB archive information, allowing the core Exchange Dictionary to grow further. 

PDBeCIF – Performance analysis

“We conducted a performance comparison between PDBeCIF v1.5 and some of the most prominent mmCIF parsers available in Python, such as Biopython v1.78, py-mmcif v0.67, and Atomium v1.0.9,” Ginkel explained. They averaged the results after measuring the running time on seven consecutive runs. 

“For comparisons, we chose a tiny protein (PDB id: 1tqn) and a big molecular machine (PDB id: 7cgo),” he added. In both situations, the PDBeCIF was found to be the quickest, with a parsing duration of 0.3 and 2.28 seconds, respectively. Because PDBeCIF is a pure algorithmic parser with no structural interpretation, it is faster than Atomium or Biopython.

The project is open-source, which ensures its continued development and maintenance for manipulating mmCIF and CIF files. It can be easily integrated into any Python project or used as a format conversion interface between software modules, allowing for a wider acceptance of the PDBx/mmCIF format.

It is included in the wwPDB official list of mmCIF parsers and is used widely in PDB processes around Europe. It can easily be connected with third-party libraries and used for a wide range of scientific investigations.

Also read: Analysis of clinical characteristics of Takayasu’s arteritis patients

Source: Van Ginkel, G., Pravda, L., Dana, J. M., Varadi, M., Keller, P., Anyango, S., & Velankar, S. (2021a). PDBeCIF: An open-source mmCIF/CIF parsing and processing package. BMC Bioinformatics, 22(1), 383. https://doi.org/10.1186/s12859-021-04271-9

  • The Corrosion Prediction from the Corrosion Product Performance
  • Nitrogen Resilience in Waterlogged Soybean plants
  • Cell Senescence in Type II Diabetes: Therapeutic Potential
  • Transgene-Free Canker-Resistant Citrus sinensis with Cas12/RNP
  • AI Literacy in Early Childhood Education: Challenges and Opportunities

About the author: Monika Raman is an undergraduate student pursuing her final year B. Tech in Biotechnology. She is an enthusiastic Biotech student aspiring for an opportunity to develop skills and grow professionally in the research field. Extremely motivated and possess strong interpersonal skills.

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X

Related

Tagged CCD Crystallographic Information File (CIF) Parser PDBeCIF PDBx/Macromolecular Crystallographic Information File (PDBx/mmCIF) Protein Data Bank (PDB) protein structure python Small molecule Software

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Next Post
  • BiotechToday
  • World

Depressive disorders and Pharmacotherapy: New info revealed!

BioTech Today July 25, 2021

Soumya Shraddhya Paul, Amity University, Noida In 2017, depressive disorders were the third biggest cause of non-fatal illness burden worldwide. In the treatment of depressive disorders, pharmacotherapy is an essential component. As of now, the treatment consists of monotherapy using second-generation antidepressants, such as selective serotonin reuptake inhibitors (SSRIs), serotonin and norepinephrine reuptake inhibitors (SNRIs), […]

depressive disorders

Related Post

  • BiotechToday
  • World

Steroid hormones: Their implications on breast cancer treatment

BioTech Today July 17, 2021July 16, 2021

Vaishnavi Kardale, Bioinformatics Centre, Savitribai Phule Pune University Breast cancer is the second most common type of cancer in women after skin cancer. It is caused by uncontrollable growth in breast cells which then accumulate, resulting in the formation of a lump or mass. Breast cancer can be caused by mutations in genes. Some of […]

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X
  • BiotechToday
  • World

J&J COVID-19 vaccine: A potential disaster for humans

bioxone July 14, 2021July 14, 2021

Saptaparna Dasgupta, Bennett University The COVID-19 vaccine has been created as an emergency vaccine, based on several strategies, such as mRNA vaccine, inactivated virus vaccine, etc. The other vaccines having cleared the clinical trials have passed on to the mass population vaccination drives, without any reported risk factors, yet. However, Johnson and Johnson’s COVID 19 […]

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X
  • BiotechToday
  • World

A rare and aggressive form of early onset Alzheimer’s disease

BioTech Today September 7, 2021September 7, 2021

Anjali Kumari, IILM College of Engineering and Technology The international team of scientists in Sweden has discovered a new gene mutation that is linked to the early onset of the Alzheimer’s disease. This has been done by tracing the DNA flaw through various members of a single family. What is Alzheimer’s disease? Alzheimer’s also called […]

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X

Breaking News

The Corrosion Prediction from the Corrosion Product Performance

Nitrogen Resilience in Waterlogged Soybean plants

Cell Senescence in Type II Diabetes: Therapeutic Potential

Transgene-Free Canker-Resistant Citrus sinensis with Cas12/RNP

AI Literacy in Early Childhood Education: Challenges and Opportunities

Sustainable Methanol Vapor Sensor Made with Molecularly Imprinted Polymer

Exogenous Klotho as a Cognition Booster in Aging Primates

Terms and Conditions
Shipping and Delivery Policy
Cancellation and Refund Policy
Contact Us
Privacy Policy