Next week I'll be a visitor at BGI (formerly Beijing Genomics Institute; see earlier posts here). I'm involved in a GWAS (genome wide association study) of IQ involving a very high end sample with a case-control design. More details (perhaps) after my visit.
Long ago I sketched out a science fiction story involving two Junior Fellows, one a bioengineer (a former physicist, building the next generation of sequencing machines) and the other a mathematician. The latter, an eccentric, was known for collecting signatures -- signed copies of papers and books authored by visiting geniuses (Nobelists, Fields Medalists, Turing Award winners) attending the Society's Monday dinners. He would present each luminary with an ornate (strangely sticky) fountain pen and a copy of the object to be signed. Little did anyone suspect the real purpose: collecting DNA samples to be turned over to his friend for sequencing! The mathematician is later found dead under strange circumstances. Perhaps he knew too much! ...
Thanks to recent technological progress, this story is no longer science fiction.
Homework problems: (1) given a high IQ threshold (e.g., +4 SD), what is the most efficient way of collecting thousands of samples from individuals above that threshold? (2) Assuming M alleles, each with equal additive effect on IQ, what will their frequencies in the high group be compared to the general population? How large a population is necessary to resolve the frequency difference beyond statistical error? (Perhaps these problems explain the motivation behind a certain subset of my ruminations on this blog ;-)
Below are some recent articles about BGI. They intend to achieve a sequencing rate of 10^4 human genomes per annum by 2011.
MIT-Harvard Broad Institute vs BGI:
... The Broad’s perch as the largest genome center in the world is getting crowded, as BGI fills its Hong Kong facility with more than twice as many HiSeqs as the Broad (see p. 44). Nusbaum, however, says the Broad and BGI enjoy a friendly (if slightly competitive) relationship. “We’re building ongoing collaborations with them. Ideally we want them to be a sister center with us,” he says. “There’s so much sequencing in the world that needs to be done, right now, I don’t see any need to compete with them.”
While Nusbaum concedes the emergence of BGI “upsets the balance of power,” he thinks the added sequencing capacity is a positive trend. Of course, the spread of sequencing democracy in countless small labs also tilts the balance of power, perhaps even more disruptively.
Sequencing the Human Secret:
... Wong says the Illumina machines are currently producing 200 gigabases per run, and expects a higher throughput by the end of 2010. At 40 gigabases per day per machine, he expects to be generating 3-4 terabases daily by year’s end.
The BGI Hong Kong supercomputer currently has 2,400 cores, but Wong and a colleague seem uncertain about the storage capacity. After some back and forth in Cantonese, they settle on a current total of almost three petabytes (PB) of on site storage in Hong Kong (Shenzhen has a little over 7 PB). The calculation capacity is about 25 teraflops. With the scope of work BGI is hoping for, that may not suffice for long.
... BGI is privately held and employs 3,000 people now across five centers in mainland China (Shenzhen, Beijing, Hangzhou, Shanghai, and Guangzhou), and the three existing international centers. In addition to sequencing and bioinformatics, other areas of focus include diagnostics, biofuels, and agriculture.
... With 3,000 employees currently rising to an expected 5,000 by the end of this year, and a fleet of more than 150 Illumina and Life Technologies next-generation sequencing instruments, most of which are being installed in a former printing factory in Hong Kong (see, p. 44), BGI is poised (if it isn’t already) to become the world’s largest genome sequencing center. And it wants to share its extraordinary resources and expertise with, well, everybody.
Last April, BGI Americas was officially incorporated in Delaware as the official interface for BGI in North America. BGI Europe followed suit the next month (See, “European Union”).
... By the time the Hong Kong facility is fully operational at the end of 2010, BGI will have a total sequencing output of 5 terabytes/day—the equivalent of 1500x human genome/day (see, “Lucky Numbers”). The data center now boasts 50,000 CPUs, 200 terabytes of RAM and will reach a whopping 1,000 petabytes—1 exabyte—of data storage within the next 2-3 years. “It’s an awesome machine to play games on,” jokes Tu.
... The average age of the BGI staff is just 24.7. Tu calls the legions of bioinformatics workers “the young and the brightest,” drawn from the top tiers of mathematicians and scientists, supplemented with operations people who have worked abroad. “If they come to BGI, they get to work on real projects. Plus you get to program all day, with these toys in the background! It’s like a video game; they love it!” New recruits cannot rest on their laurels however: every month for the first six months, there’s a test. Fail it, and it’s bye-bye BGI.