Saturday, July 19, 2014

Hail Britannia -- 100k whole genomes

Progress! Genotyping of large, well-phenotyped samples.
TechnologyReview: The British government says that it plans to hire the U.S. gene-sequencing company Illumina to sequence 100,000 human genomes in what is the largest national project to decode the DNA of a populace. ...

Some other countries are also considering large national sequencing projects. The U.K. project will focus on people with cancer, as well as adults and children with rare diseases. Because all Britons are members of the National Health Service, the project expects to be able to compare DNA data with detailed centralized health records (see “Why the U.K. Wants a Genomic National Health Service”).

While the number of genomes to be sequenced is 100,000, the total number of Britons participating in the study is smaller, about 70,000. That is because for cancer patients Genomics England intends to obtain the sequence of both their inherited DNA as well as that of their cancers.
BGI bid for this work but their transition to the upgraded Complete Genomics technology is still in progress. This delay has affected our cognitive genomics project as well.

Big data sets are also being assembled in the US (note in this case only SNP genotyping; cost is less than $100 per person now):
AKESOgen announced today that it has been awarded a $7.5M contract by the U.S. Department of Veterans Affairs (VA) for genotyping samples from U.S. veterans as part of the Million Veteran Program (MVP). This award covers the genotyping of 105,000 veterans in the first year of a five year contract.

"The VA's Million Veteran Program is one of the largest genetic initiatives ever undertaken in the US and its visionary genomics and genetics approach will provide new insights about how genes affect health. The goal is to improve healthcare for veterans by understanding the genetic basis of many common conditions. The data will ultimately be beneficial to the healthcare of all veterans and of the wider community. We are delighted to have been selected by the VA for this unique endeavor and we will provide genetic data of the highest quality to the VA." said Bob Boisjoli, CEO of AKESOgen. To fulfill the VA contract, AKESOgen will utilize a custom designed array based genotyping solution from Affymetrix, Inc. ...
My prediction is that of order a million phenotype:genotype pairs will be enough to deduce the genetic architecture of complex traits such as height or cognitive ability. SNPs will be enough to solve most of the problem, so that cost is now ~ $100M or less -- interested billionaires please contact me :-)


Richard Seiter said...

Any thoughts on what price points will result in bulk sequencing (SNPs and/or whole exome/genome) of DNA samples that have been taken by forward thinking researchers in years past? It seems like those should add up quickly.

As an example, it looks like NHANES (~70k people since 2000) has done SNPs on their old samples:

P.S. Thanks for the comment about the cognitive genomics study.

Richard Seiter said...

Off topic, but I thought this might be interesting/fun for your readers:
Cognitive abilities versus forecasting capabilities

Pat Boyle said...

When you say a genome scan is now $100 I presume that you are referring to the whole genome scan, not the current part genome scan that sells commercially from 23andme for $100. I think I remember that a whole genome scan was around $2,000 last year. Is that right?
What does the term 'genetic architecture' mean? Right now there are about a hundred SNPs known to effect human height. We seem to also know the percent of the total variance that each of these explains. I'm tall. If I get a print out of all my height-relevant SNPs what more do I know?

steve hsu said...

There are some big biobanks at Vandy, Kaiser-Permanente, etc. Some hospitals are routinely taking blood or other DNA samples from every patient that passes through.

Once the price point is low enough these samples will all be genotyped. Where, exactly, the break point is really depends on funding agencies. Even with the cheapest SNP chips a 100k sample takes ~$(5-10) million to do.

The real problem going forward will be good phenotyping and the ability to pool data. The latter is encumbered by IRB and lack of foresight in bio sample donor release forms.

Richard Seiter said...

Those limitations make me wonder how much of this research will be done outside the US. I've been corresponding with a Finnish researcher and his ability to pool data (from the Young Finns Study and the Finnish Linked Employer-Employee Data) can really help (especially when it allows matching retrospective and current data). Does China have databases of that sort like much of Europe does?

John Salvatier said...

I have a question about the theory that number of deleterious mutation has a big impact on IQ. The theory seems actually very easy to test even with pretty small samples sizes, so I'm surprised that there isn't already strong evidence of the theory. The method of testing I have in mind is to simply use the number of rare variants a sample has as a predictor of IQ. This helpfully turns a high dimensional set of predictors into a single predictor, so you should be able to use a much smaller dataset. I don't think you should even have to use genome wide data, as long as your SNP data has a reasonable number of rare variants.

Is this impossible for some reason? Has it been done already? Seems like we could get really strong evidence about this theory already.

steve hsu said...

Only a small fraction of variants will affect cognitive ability. You have to pick the signal out over a large background of variants that don't affect g score.

Emil Kirkegaard said...

The total burden of rare, non-synonymous exome genetic variants is not associated with childhood or late-life cognitive ability

Emil Kirkegaard said...

He is talking about SNP scans.

What the genes do that affect the phenotype. Their mechanism of action.

steve hsu said...

Deary's paper puts an upper bound on the effect -- it could still be there but not statistically significant for sample size in the low thousands. If only a fraction of rare variants are deleterious for g then the correlation of incidence of those variants with g may not be visible against the background: fluctuations in number of variants that don't affect g. If, say, 5% of rare variants have negative impact on g, you have a pretty tough signal to noise ratio to overcome.

John Salvatier said...

Thanks, that's pretty interesting. Good to know people have tried the obvious stuff and there's at least some evidence about this.

Blog Archive