Excerpt from technologyreview.com
Google is approaching hospitals and universities with a new pitch. Have genomes? Store them with us.
The search giant’s first product for the DNA age is Google Genomics, a cloud computing service that it launched last March but went mostly unnoticed amid a barrage of high profile R&D announcements from Google…
Google Genomics could prove more significant than any of these moonshots. Connecting and comparing genomes by the thousands, and soon by the millions, is what’s going to propel medical discoveries for the next decade. The question of who will store the data is already a point of growing competition between Amazon, Google, IBM, and Microsoft.
Google began work on Google Genomics 18 months ago, meeting with scientists and building an interface, or API, that lets them move DNA data into its server farms and do experiments there using the same database technology that indexes the Web and tracks billions of Internet users.
This flow of data is smaller than what is routinely handled by large Internet companies (over two months, Broad will produce the equivalent of what gets uploaded to YouTube in one day) but it exceeds anything biologists have dealt with. That’s now prompting a wide effort to store and access data at central locations, often commercial ones. The National Cancer Institute said last month that it would pay $19 million to move copies of the 2.6 petabyte Cancer Genome Atlas into the cloud. Copies of the data, from several thousand cancer patients, will reside both at Google Genomics and in Amazon’s data centers.
The idea is to create “cancer genome clouds” where scientists can share information and quickly run virtual experiments as easily as a Web search, says Sheila Reynolds, a research scientist at the Institute for Systems Biology in Seattle. “Not everyone has the ability to download a petabyte of data, or has the computing power to work on it,” she says.
Also speeding the move of DNA data to the cloud has been a yearlong price war between Google and Amazon. Google says it now charges about $25 a year to store a genome, and more to do computations on it. Scientific raw data representing a single person’s genome is about 100 gigabytes in size, although a polished version of a person’s genetic code is far smaller, less than a gigabyte. That would cost only $0.25 cents a year.
The bigger point, he says, is that medicine will soon rely on a kind of global Internet-of-DNA which doctors will be able to search. “Our bird’s eye view is that if I were to get lung cancer in the future, doctors are going to sequence my genome and my tumor’s genome, and then query them against a database of 50 million other genomes,” he says. “The result will be ‘Hey, here’s the drug that will work best for you.’ ”
At Google, Glazer says he began working on Google Genomics as it became clear that biology was going to move from “artisanal to factory-scale data production.” He started by teaching himself genetics, taking an online class, Introduction to Biology, taught by Broad’s chief, Eric Lander. He also got his genome sequenced and put it on Google’s cloud.
Glazer wouldn’t say how large Google Genomics is or how many customers it has now, but at least 3,500 genomes from public projects are already stored on Google’s servers. He also says there’s no link, as of yet, between Google’s cloud and its more speculative efforts in health care, like the company Google started this year, called Calico, to investigate how to extend human lifespans. “What connects them is just a growing realization that technology can advance the state of the art in life sciences,” says Glazer.
Datta says some Stanford scientists have started using a Google database system, BigQuery, that Glazer’s team made compatible with genome data. It was developed to analyze large databases of spam, web documents, or of consumer purchases. But it can also quickly perform the very large experiments comparing thousands, or tens of thousands, of people’s genomes that researchers want to try. “Sometimes they want to do crazy things, and you need scale to do that,” says Datta. “It can handle the scale genetics can bring, so it’s the right technology for a new problem.”