Amazon Web Services today announced that genomic information of 1,700 individuals has been placed in its public cloud and can be accessed by anyone in the world.
The 200 terabytes of data are part of the 1000 Genomes Project, sponsored by the National Institutes of Health in partnership with more than 75 companies and organizations. The goal is to eventually store genomic information from 2,662 individuals from around the world to advance scientific research. Specifically, researchers are looking for genetic variants that have frequencies of greater than 1% across the sample set in an effort to study diseases.
Depositing the genomic information into AWS marks the largest collection of human genetics available worldwide being stored on AWS's servers, the company says. AWS is doing this all for free, but charges users for the supplemental compute power required to analyze the data. AWS says users can, for example, use Hadoop running on AWS's Elastic Cloud Compute (EC2) or Elastic MapReduce compute services to analyze the data stored in its Simple Storage Service (S3).
Most of the 1,700 genomic datasets are from anonymous individuals, and the 10000 Genomes project has an ethics standard, which requires informed consent for participants. Already, the project has collected data samples from populations around the world including: Utah residents with Northern and Western European ancestry, people with Chinese heritage in Denver, Mexican heritage in Los Angeles and African heritage in the Southwestern United States.
The announcement was made as part of the Big Data Summit being held at the White House, which will include a webcast at 2 p.m. ET in which government officials and researchers will discuss challenges and opportunities big data creates.
Network World staff writer Brandon Butler covers cloud computing and social media. He can be reached at [email protected] and found on Twitter at @BButlerNWW.