New Collaboration Between RCSB Protein Data Bank and Amazon Web Services Provides Expanded Data Storage and Access to Researchers Worldwide
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), headquartered at the Rutgers Institute for Quantitative Biomedicine, announced the expansion of its data storage capacity through the Amazon Web Services (AWS) Open Data Sponsorship Program.
The AWS program is providing the RCSB PDB with more than 100 terabytes of storage for no-cost delivery of Protein Data Bank information to millions of scientists, educators, and students around the world working in fundamental biology, biomedicine, bioenergy, and bioengineering/biotechnology. The partnership with Amazon has more than doubled the data bank’s digital storage capacity at Rutgers.
“For more than five decades, the global Protein Data Bank has enabled basic, translational, and clinical research by providing open access to three-dimensional (3D) biostructure information at the atomic level,” said Stephen K. Burley, M.D., D.Phil., director of the RCSB PDB, founding director of Rutgers Institute for Quantitative Biomedicine, University Professor and Henry Rutgers Chair at Rutgers University. “Open access to Protein Data Bank information is central to accelerating scientific discoveries for the benefit of all humanity.”
The AWS Open Data Sponsorship Program covers the cost of storage and egress for publicly available, high-value, cloud-optimized datasets to successful applicants. Working with data providers, Amazon aims to provide open access to data by making it available for analysis on AWS; develop new cloud-native techniques, formats, and tools that lower the cost of working with data; and encourage the development of communities that benefit from access to shared datasets.
“Access to open data sets is improving the way the scientific community can collaborate and accelerate life-changing discoveries,” said Josh Weatherly, Director, US Education, State and Local Government Verticals at AWS. “The Protein Data Bank provides a vast and diverse repository for researchers in government, academia, and industry to use to develop diagnostics, vaccines, drugs, and other therapeutic treatments. AWS can help provide the Protein Data Bank the capacity to scale up to meet the increasing demand to continue to provide free and open access information and unlock the latest analytic capabilities.”
The Protein Data Bank archive currently houses nearly 190,000 experimentally-determined 3D structures of proteins, DNA and RNA that are freely available with no limitations on usage. The archive is jointly managed by the Worldwide Protein Data Bank partnership, involving data centers in the United States, Europe, and Asia. The U.S. data center is operated by the RCSB PDB at Rutgers, the University of California, San Diego-San Diego Supercomputer Center, and the University of California, San Francisco.
“The Protein Data Bank plays an important role in facilitating discovery and development of life-changing drugs,” added Burley, who also co-leads the Cancer Pharmacology Research Program at Rutgers Cancer Institute of New Jersey. “Freely available 3D biostructure data constitute a public good with far-reaching impacts on patients and their families.”
The RCSB PDB has been operating the United States data center for the global Protein Data Bank for more than 20 years. Burley is an expert in structural biology, molecular biophysics, computational biology, data science, structure-guided/fragment-based drug discovery, and clinical medicine/oncology.
Researchers using the structure data stored in the Protein Data Bank have published more than two million scientific papers, some of which have helped researchers and pharmaceutical companies tackle major health challenges, including heart disease, cancer, diabetes, Alzheimer’s disease, HIV-AIDS, and most recently, the COVID-19 pandemic.
"The sponsorship of the RCSB Protein Data Bank by the AWS Open Data Sponsorship Program is fantastic. It will provide key data storage and distribution support for one of the most valued and scientifically impactful data resources in the biological sciences," says Dr. Rommie Amaro Professor and Endowed Chair at the University of California San Diego.
“It will also open up new avenues for scientific collaboration based around cloud computing services within and outside AWS, making it easier for scientists to do better science faster and with fewer logistical hurdles, particularly in the areas of computational biology and biochemistry, molecular dynamics simulations, and artificial intelligence," added Amaro.
Through the Open Data Sponsorship Program, AWS has sponsored access to petabytes of data, including satellite imagery, climate and weather data, genomic data, and data used for natural language processing. The full list of publicly available datasets is available on the Registry of Open Data on AWS.
The RCSB PDB has been operating the United States data center for the global Protein Data Bank for more than 20 years. Burley is an expert in structural biology, molecular biophysics, computational biology, data science, structure-guided/fragment-based drug discovery, and clinical medicine/oncology.
To learn more and access the information from the Protein Data Bank, visit RCSB.org. For RCSB PDB education and outreach materials, visit PDB101.RCSB.org.