Skip to main content

The BioChemGRAPH collaboration brings together key chemical and biochemical datasets, to give researchers deeper insights than ever before.

A collaboration announced today will see the integration of key structural, functional, and biochemical data across both small molecules and macromolecules. This will allow researchers to quickly access relevant information from trusted but disparate datasets, advancing work in fields such as target validation, drug development, drug repurposing and cross-reactivity.

The Aim

The BioChemGRAPH project will create an easily accessible web platform to bring together datasets which are instrumental in many areas of research.

While vast and highly curated databases of quality chemical and biochemical data exist, interpreting them together is difficult. For each small molecule there are a huge variety of experimentally determined and calculated properties which can inform research. The varied nature of these data means that separate databases are used to collect and manage information, each specializing in a particular area.

Bringing together these different approaches will support both basic and translational research, to better answer questions like; how does this target behave? Where can this drug be repurposed? Or what potential side-effects could it have?

The Partners

The project will see PDBe, ChEMBL and CCDC partner to aggregate data on small molecules and related macromolecules together into the existing PDBe-KB platform.

The three databases are seen as leaders in their respective fields, known and trusted for their high standards in data management. This means the final BioChemGRAPH platform will have a solid foundation, with quality at its core.

The 3 datasets which will be connected by BioChemGRAPH are;

• PDBe - Protein Data Bank in Europe
Managed by EMBL-EBI (European Molecular Biology Laboratory's European Bioinformatics Institute). 174,448 structures have been deposited to the PDB network since 2000

• ChEMBL
Managed by EMBL-EBI, this database contains 2D structures of small molecules along with calculated properties and abstracted bioactivities. It contains 1,961,462 distinct compounds.

• CSD - Cambridge Structural Database
Managed by CCDC (Cambridge Crystallographic Data Centre), this contains 3D crystal structures of organic and metal-organic compounds. It contains 1,064,756 structures.

The technology will build on the existing community-driven PDBe-KB platform, which brings together structural and functional annotations for macromolecules in the PDB.

The Journey

Following a funding award from the BBSRC’s Bioinformatics and Biological Resources Fund, work will begin this year.

“We’re really excited to begin this project.” said Ilenia Giangreco, Discovery Science Team Leader at CCDC. “Curating high quality data and building great tools to interpret it are our passion at CCDC, so joining forces to bring scientists even more information, in an even easier format is a great opportunity. This could open possibilities across so many areas of research.”

“We’ve seen a clear need in the research community for integration of structure and chemistry data” explains Sameer Velankar, PDBe Team Leader at EMBL-EBI. “We’re hoping that this collaboration will allow researchers to get comprehensive answers to a wide range of research questions by opening up protein structure and chemical data to facilitate drug development.”

Media Contact Information :

Sophie Bryant, Marketing Manager, CCDC – sbryant@ccdc.cam.ac.uk