Accelerating Drug Discovery with the CCDC, AWS, and Intel: Curated Data Set of Protein Structures from the Protein Data Bank with Predicted Hydrogen Positions Now Available

Cambridge, UK – 18 June 2024 – Thanks to the combined computing power of Amazon Web Services (AWS) and Intel, the CCDC announces that a potentially significant advancement in drug discovery has been achieved. A curated data set of protein structures from the Protein Data Bank (PDB) with predicted hydrogen positions is now available for download. This project was supported by an Intel RISE Technology Initiative contribution.

Historically, collaborations with the pharmaceutical industry have enabled the development of reliable methods for interpreting interactions within protein binding sites using proprietary information not publicly available. Repeating these studies with PDB structures presented a challenge due to the absence of hydrogen positions in water networks within the proteins. Reliable predictions require databases of augmented protein structures where hydrogen positions are assigned.

Generating this information computationally is intensive, considering multiple possible models. Overcoming this computational challenge was possible for the CCDC through the combined power of Intel and AWS. The CCDC generated a comprehensive snapshot of protein cavities in the PDB, identifying potential binding sites for small molecules with accurately predicted hydrogen positions for all components.

Key Benefits:

Accessibility: This data set is freely available, enabling widespread use in drug discovery research and development.
Efficiency: By providing precomputed hydrogen positions, researchers can save valuable time and resources, eliminating the need for redundant computations.
Environmental Impact: Reducing the necessity for repeated computations lowers the environmental footprint of large-scale computational tasks.

Dr Juergen Harter, CCDC CEO, said “We were delighted to partner with AWS and Intel on this project to provide another valuable structural science resource to enhance the drug discovery process in the pharmaceutical industry. The output from the project now being free to all further emphasizes our commitment to FAIR data and our consideration of the environmental impact of repeated computation.”

Jason Cole, Senior Research Fellow at CCDC, highlighted the impact of this project: “With the power of Intel and AWS, we’ve presented researchers with predictions of protonation states in important protein structures, potentially saving hundreds of thousands of hours of life sciences research time across the globe.”

Download Now: Researchers and developers in the field of drug discovery can download the protonated PDB files from the CCDC download page. This initiative democratizes access to critical data, empowering scientific advancement regardless of access to extensive computational resources.

Notes to Editors:

Interviews with CCDC scientists and project leaders are available upon request.
High-resolution images and additional information are available for press use (credit: Image courtesy of the Cambridge Crystallographic Data Centre (CCDC)).

Media Contact Information :

Press Contact: Michael Francis, CCDC Marketing
Email: [email protected]