Information Processing: @Google: Genetics and Intelligence

Wednesday, August 17, 2011

@Google: Genetics and Intelligence

I'll be giving a talk at Google tomorrow (Thursday August 18) at 5 pm. The slides are here. The video will probably be available on Google's TechTalk channel on YouTube, perhaps after some delay.

The Cognitive Genomics Lab at BGI is using this talk to kick off the drive for US participants in our intelligence GWAS. More information at www.cog-genomics.org, including automatic qualifying standards for the study, which are set just above +3 SD. Participants will receive free genotyping and help with interpreting the results. (The functional part of the site should be live after August 18.)

Title: Genetics and Intelligence

Abstract: How do genes affect cognitive ability? I begin with a brief review of psychometric measurements of intelligence, introducing the idea of a "general factor" or IQ score. The main results concern the stability, validity (predictive power), and heritability of adult IQ. Next, I discuss ongoing Genome Wide Association Studies which investigate the genetic basis of intelligence. Due mainly to the rapidly decreasing cost of sequencing, it is likely that within the next 5-10 years we will identify genes which account for a significant fraction of total IQ variation.

We are currently seeking volunteers for a study of high cognitive ability. Participants will receive free genotyping.

80 comments:

James D Miller said...: You might want to contact the Davidson Institute for Talent Development to find volunteers. To get in you must test in the top 99.9%.
http://www.davidsongifted.org/youngscholars/; 5:04 PM
angela said...: I can't believe Google would allow anyone to deliver a talk on such a controversial topic. Have the organizers seen your blog ?; 5:13 PM
Lover of Wisdom said...: Will you update us on how well they received your talk? By the way, how did BGI determined the cutoffs (exam scores) for automatic admission in the study?; 6:07 PM
esmith said...: Just to clarify, by "genotyping", do you mean full genome sequencing (and then providing full 3 GB of data to each participant), or something more limited?

Also, is your US sample open to immigrants, or are you going for any kind of ancestral background restrictions?; 6:29 PM
angela said...: I don't even understand, you're a physicist, since when do physicists give talks about IQ ?; 7:32 PM
steve hsu said...: 1. The first stage of genotyping will involve SNPs only; in the future some or all participants may receive more extensive genotyping such as exome or full genome sequencing.

2. There are no restrictions on ethnicity or citizenship. However, if you reside outside the US or Canada there may be some delay in receiving a saliva dna kit from BGI after qualifying for the study.; 7:54 PM
ziel said...: Angela, you seriously think that Google would not be interested in finding the genetic basis for high IQ?; 8:01 PM
Yan Shen said...: Angela, haven't you been paying attention to this blog? :) If you had been, surely you would've remembered Steve discussing how physicists, with their superior cognitive horsepower, often dabble in other fields such as economics, psychometrics, biology, computer science, etc, but that the converse seldom happens, if ever. And also, as Steve has discussed, many of these other fields could benefit from the superior cognitive horsepower of physicists, as some of the conceptual muddle could certainly be cleared up by intellectually superior minds . ;); 8:18 PM
steve hsu said...: Thanks, it's on our to do list :-); 8:39 PM
botti said...: Heh, yes well biology has certainly benefited from the involvement of physicists.

"According to Crick, the experience of learning physics had taught him something important—hubris—and the conviction that since physics was already a success, great advances should also be possible in other sciences such as biology. Crick felt that this attitude encouraged him to be more daring than typical biologists who tended to concern themselves with the daunting problems of biology and not the past successes of physics."

http://en.wikipedia.org/wiki/Francis_Crick
http://en.wikipedia.org/wiki/Maurice_Wilkins; 9:30 PM
William_JD said...: Can I participate anonymously?; 9:46 PM
angela said...: I don't think Google is stupid enough to step into the political minefield of the genetic basis of IQ. All they need is high IQ employees, not the genetic basis of high IQ. Frankly I'm surprised Steve Hsu still has a job at Univ. of Oregon considering how controversial his views are. He certainly has no shortage of hubris in maintaining an HBD blog using his real name and identity. That takes cajones, considering how liberal and politically correct academia is.; 10:12 PM
RA828 said...: Will you take an LSAT score?; 10:13 PM
RA828 said...: Will you take the LSAT?; 10:15 PM
athelas314 said...: I'm interested in participating. In addition to the raw SNP data, will there be an interpretation like that provided by 23andme, or at least links to places where we can find known associations between SNPs and phenotype?; 10:17 PM
Sud K said...: Question

Is 'Automatic qualifying criteria' the minimum criteria for qualification.

I scored approx. 700, 700, 700! (SAT V, M and GMAT). I am interested to participate just to get my genotype. But if that is too low a score, oh well :-)

Joker; 10:28 PM
Brett Olsen said...: Cool, I'd love to see my genotyping. Mind if I spread this around to some other Tech alums?; 11:08 PM
steve hsu said...: Please do! :-); 11:18 PM
steve hsu said...: Not sure. There's a place in the survey where you can submit your LSAT score (and other additional information), but please submit your other scores or other academic information as well.; 11:22 PM
Mild Speculation said...: I second this question.

I miss the GRE cutoff by 10 verbal points, but I got a 10 on the AIME math exam in high school. Should I apply?; 11:28 PM
Yan Shen said...: I think the math/verbal split poses a bit of a problem for the cutoffs. I know people who qualified for USAMO, but who didn't meet the V cutoff.; 11:39 PM
steve hsu said...: If you miss the automatic criteria you might still get in based on supplemental information. The form allows you to submit additional information and documentation.; 11:40 PM
steve hsu said...: Those are just automatic criteria. USAMO would probably get someone in as long as their other scores are somewhat close to the cutoffs.; 11:42 PM
steve hsu said...: You can't remain anonymous to us but your identity is protected under the privacy agreement. See web site.; 12:13 AM
yahoo-F5TXBM3VHKTITJQWV6IY6I3ZBA said...: Something that came to my mind when looking at this and some of the earlier posts. Aren't IQ scores censored data? Surely a sufficiently large population that gains, on average, a total of 105 points on an IQ test over one hour, but a total of 120 over two, is smarter than one that gains a total of 105 over one hour but makes no improvement in the second hour. Why is this aspect never considered? I suppose one can argue that how `fast' one can think is an important component of IQ, but in that case you can penalize the score gain by the additional time spent...; 12:31 AM
esmith said...: LSAT should definitely be used, in my opinion, it is a much stronger indicator of high-end g than GRE. (I tried them both.) GRE-M will be maxed out by 6% of all students who try to apply to grad schools in the country (which means that the score of 800 only indicates +2 SD or so). In contrast, LSAT is much tougher to max out. The "perfect 180" is achieved by one out of ten thousand (!) test takers.; 12:41 AM
steve hsu said...: How many people have super high LSATs but can't make the SAT/ACT cutoff?

I want to stress again that people can qualify without meeting any of the automatic criteria. The survey form on the site is sufficiently general that you can make your case: e.g., I got 180 on the LSAT; here is a scan of my score report ...; 12:52 AM
esmith said...: Very true. I guess, what I'm trying to say is, automatic SAT/ACT/especially GRE cutoffs are rather low. You're saying that automatic qualifiers are set above +3 SD, but, in reality, they will allow lots of people below 2.5 SD and occasionally even below 2 SD. If you just want to get a bunch of smart guys, SAT and ACT will suffice. (Or you could just run an ad in a Mensa newsletter.) If you really want a list of people above 3 SD without shelling out $1000/person on psychometrist to retest everyone using real, in-person IQ tests, that's where LSAT comes useful.; 1:32 AM
Yan Shen said...: Seems to me like Steve's study is geared more towards people with high mathematical ability, given the various qualifying criteria. It's possible that using the LSAT will lead to some high M people perhaps missing the cutoff.; 1:36 AM
esmith said...: High V low M types will end up buried by the logic games section.

Try this one: http://www.griffonprep.com/logicgame.html (there are answers below, don't peek!); 1:46 AM
MtMoru said...: So 1. Steve finds out wo you are and 2. You might get a letter saying you're a lot smarter than you should be.; 2:54 AM
Reactionary_Konkvistador said...: The slides are really familiar, you already gave this talk or at least a very similar one before haven't you? Can't wait to see the video.; 2:54 AM
Matthew Carnegie said...: Yan,

I think that in terms of V and M scores, it's that the study is all about finding g, and having high V and M subtests is a better indicator of having higher g than having having a lower M but higher V or conversely a lower V but higher M, which are more indicative of having a lower g and a higher subfactor related to maths in some way. Which subfactors may or may not be heritable (current evidence for subfactor heritabiliy I believe being low within population) but isn't what's being looked for here in any case.

There shouldn't be too much difference since g is the largest common factor, but it's probably better to optimise for high g using high V and M.

The LSAT does seem like a fairly general test that isn't strongly skewed towards verbal capabilities, assuming the http://en.wikipedia.org/wiki/Law_School_Admission_Test list of scores is accurate, even if the pool of test takers might be.; 6:08 AM
ben_g said...: In an earlier thread I raised the problem of gene-environment correlations. Here's a specific one to consider.. Suppose that parents with high IQ genes are able to impart better environments for their children. If these environments have any effect, then they'll be correlated with the signals being picked up by Steve's study. Furthermore, there's no reason to believe this correlation wouldn't exist in other societies, so replicating in other populations wouldn't avoid this issue.; 7:11 AM
sykes.1 said...: Angela is right. Nowadays, physicists are not permitted to have opinions on IQ.; 7:45 AM
David Coughlin said...: Funny. I used to make it a habit of applying for things I was 'kind-of' to 'mostly' qualified for in case I got lucky.; 7:48 AM
Leor Jacobi said...: I think I'm pretty smart, but I haven't been able to figure out when one can apply.; 8:22 AM
Leor Jacobi said...: When can we apply? Maybe I'm not smart enough to figure that one out...; 8:23 AM
Christopher Chang said...: The volunteer page will go live in a few hours.; 8:42 AM
James_Lee said...: What you are saying is that the "average excess" will exceed the "average effect" as a result of a kind of population structure: ability-increasing alleles are confounded with beneficial environments. See Ronald Fisher's paper on the subject for definitions of these terms and explications of their meanings. I do not think it can be rigorously shown that Fisher's implicit regression of the phenotype on all loci in the genome properly isolates the effect of an actual causal locus (see this book for what this means)--but it is certainly a very reasonable notion. In any case many tools devised by statistical geneticists (e.g., EIGENSTRAT and EMMAX) can be seen as approximations of Fisher's ideal, and these have proven to be extremely successful in the control of population structure. See this new paper on multiple sclerosis for an impressive exampe.; 9:42 AM
James Lee said...: The GIANT Consortium has confirmed that the great majority of their height loci discovered in population samples replicate in within-family designs. Since nature randomly selects which allele a heterozygous parent passes on to an offspring, within-family designs are immune from population structure. In the future, as GWAS expands for any given phenotype, this kind of confirmation of population associations in (smaller) samples of families will be highly desirable.; 9:50 AM
James Lee said...: The relation between time taken and ability is rather complex. A rough generalization is that more able people take less time on easy times and more time on hard items; less able people tend to give up quickly on harder items.

Psychometricians have proposed using the time taken on a given item to update the provisional estimate of an examinee's ability in computer-adaptive testing. Taking into temporal information should thus extract more information from a fixed number of items. I do not know if any operational testing programs have actually incorporated a proposal of this kind.; 9:54 AM
saucyskeptic said...: ROFL! Y'all are amusingly obsessed (OBSESSED!) with your own brilliance and the need for proof thereof. I look forward to watching the talk. A thought on the use of LSAT scores -- I smoked the LSAT... but that was way back in 1990. Year by year the LSAT has become much harder. Perhaps the exam administrator (is that LSAC?) has a meaningful way to compare scores from different years but I suspect you'll run into the same "low ceiling" problem with old LSAT scores that you have with SAT scores.; 10:38 AM
William_JD said...: My SAT scores make me an automatic qualifier, but how will you verify this claim?; 12:38 PM
5371 said...: Free beer ought to be a bigger draw than free genotyping.; 1:49 PM
ben_g said...: James, thanks for the response..

First, is my example really a case of population stratification? I thought population stratification required that there be sub-populations with systematic differences in allele frequencies outside of the genes that have an effect. What I raised would be a problem even if the high IQ people only correlated on the IQ effect genes.. So I don't see how it can be controlled for in the same way as typical population stratification.

Second, isn't using principle components to control for population structure not without controversy? For example, see this criticism of the method http://www.cell.com/AJHG/retrieve/pii/S0002929711002187; 2:09 PM
James Lee said...: Thanks for bringing that letter to our attention.

The letter addresses a slightly different issue than the control of population structure in determining the effect of a single locus. The letter criticizes a method introduced by Goddard and colleagues for estimating the total genetic variance associated with the SNPs that happen to be present on a genotyping chip (without regard for individual loci). They point out that the method produces a massively biased estimate if there is extreme population structure, a bias that is only partially removed by the use of PCs as regression covariates. (I use the term "population structure," here, to mean any confounding of genotype with other causes of the trait, including environmental causes. The reply by Goddard et al. makes finer distinctions.)

What are the implications of this for identifying individual causal variants? Well, the letter cites the EMMAX method as being appropriate in this context, so according to the letter--not much. Thinking more about your own example of ability being confounded with the environmental boost of being raised by smart parents, I am no longer certain that genomic background can fully control for it. (Even God would not be able to predict the ability of your parents with complete accuracy from just your genome.) However, as I said, family designs are immune to confounding, and in the future I anticipate that such designs will be used to verify any results in samples of unrelated individuals.

The separate issue of whether the letter casts any doubts on applications of SNP-based heritability estimation is also an interesting one. In their reply, I think Goddard et al. get the better of the argument.; 4:10 PM
ben_g said...: James, thanks for the great replies and good luck on the study!; 4:49 PM
Hao Ye said...: Taking the 2010 stats, ~130k took the AMC 10/12 (about 60k each). About ~500 qualify for the USAMO/USAJMO, or about 1 in 250. There's definitely some self-selection bias for people taking the test, so it seems reasonable to me. Oddly enough, if you go back further, there were more students taking the AHSME (240k in 1999) with fewer USAMO qualifiers (around 200 in 1999, I think).

I wonder if the self-selection bias has increased over time? Or maybe the increase of standardized testing has edged out the AMC?; 6:35 PM
Hao Ye said...: What makes you think the cutoffs are low?
2010 stats for the SAT (http://professionals.collegeboard.com/profdownload/sat-percentile-ranks-composite-cr-m-2010.pdf) indicate 4646 with 1560+ for V+M out of ~1500k. Even if you suppose all of those scorers are 800M, that's still +2.7 SD.

Also keep in mind that there's huge selection bias for the GRE already, so the percentiles are not what they would be for the general population.; 6:43 PM
Hao Ye said...: There have been plenty of papers showing heritability, so looking for a genetic basis is a logical next step. I don't see it as a political minefield unless you start throwing in race and gender (see Larry Summers).; 6:48 PM
sykes.1 said...: Again, Angela is correct. I taught at colleges and universities for 37 years Nowadays, each institution has an administrative office dedicated to detecting and suppressing politically correct ideas, usually imposing such punishments as outright dismissal, loss of salary or tenure or rank, or public humiliation. Prof. Hsu has a job only because the office at UO hasn't found his blog yet.

By the way, Angela, it's "cojones."; 8:18 AM
David Cohlton Harold Eaton said...: I made a 1500 (740 verbal, 760 math) and then a 1560 (760 verbal, 800 math) on the SATs in late 2006. Would my higher score get me in?; 5:31 PM
Fang said...: Why is genetic basis of intelligence controversial? This is the mainstream view in psychology and biology. Whether you or millions of average Americans like this or not will not prevent other countries from studying this to benefit their people and prevent the truth from coming out.
Last I checked, even scholar like Philippe Rushton is still tenured with the University of Western Ontario. If you call Steve's research controversial, what do you call Phillippe Rushton's research?; 9:14 PM
Fang said...: Biology is built on physical laws, unless you think intelligence has no biological basis.
Plus have you ever heard of a term called polymath?; 10:07 PM
James Lee said...: We will count your most recent score.; 10:53 PM
whatisgoingon whatisgoingon said...: Aww. 800 math, 800 math 2,800 physics,740 verbal

Damn, so close. I guess I have to wait for graduate school.; 11:17 PM
whatisgoingon whatisgoingon said...: Not quite. The gre-m is taken by college students applying to graduate school. So that means that those taking the test not only got into college, but then had higher than average gpa's there. So it may be closer to a 2.5+ for the math. You need to account for the fact that those applying to grad school probably have an average iq of 110 at least. Well, hopefully.; 12:01 AM
James Lee said...: We cannot give many additional details regarding the design of this study (or others we are carrying out) for several reasons. One is that our potential and actual collaborators may not want to be disclosed at the moment.; 10:54 AM
TheGuyFromEarlier said...: My brother makes the cutoff. Alas, I do not. Le sigh.

...but mama says i'm good at other stuff...! I can draw real good.; 3:31 PM
MtMoru said...: I'm an automatic, but in the consent form there's this:
At an advanced stage of the study, BGI-CGL may provide you access to your genetic dataand interpretations thereof with respect to ancestry, disease risk, and predicted trait levels(including level of cognitive ability).Is that estimate of cognitive ability for the non-automatics only. If not there's a problem. If you; 11:40 PM
MtMoru said...: I'm an automatic, but in the consent form there's this: At an advanced stage of the study, BGI-CGL may provide you access to your genetic data and interpretations thereof with respect to ancestry, disease risk, and predicted trait levels (including level of cognitive ability). Is that estimate of cognitive ability for the non-automatics only. If not there's a huge problem.; 11:52 PM
Christopher Chang said...: The discrepancy between the trait prediction and your actual phenotype is an (increasingly noisy, as the relative contribution of environment increases) estimator of how much is still unknown about your genome.; 4:07 AM
ben_g said...: I'm interested in the answer to point #1.. On that note, what if you have a curiosity gene, which made you want to be a case?
Or a teaching gene that made you want to be in a PhD program? Anything
that separates the case group from smart people as a whole could
confound the study.; 2:17 PM
James Lee said...: Your genetic data will probably never provide as much information about your phenotype as measurements of the phenotype itself. If you want to know how fast you are, use a stopwatch; don't bother to measure your ACTN3 genotype. That said, even elite athletes often *are* curious about their ACTN3 genotype, and there seems to be no harm in allowing that itch to be scratched.

Hopefully we can tell whether an association arises from population stratification.; 7:28 PM
James Lee said...: A case-control design that relies on volunteers to fill out the case group cannot get around this problem. There will need to be replication in other designs that do not suffer from this flaw.; 7:31 PM
esmith said...: I was thinking about prospects of genetic engineering (what would a person with all IQ genes turned on look like?) and estimates on page 28 of the slides made me realize something.

It assumes that intelligence is determined by many (10^3) genes of equal small effect. But it can't work like that! Either the number N must be much smaller, or some genes are significantly more important than others.

Suppose that there are in fact N genes of equal effect. For simplicity, assume that they all have normal allele frequencies of 50%. Then we should be able to construct an "intelligence measure" equal to the share of positive alleles among these N, which correlates linearly with physically measurable quantities, e.g. the speed of solving Raven's matrices of fixed difficulty.

If N=10^3, then the average person has 50% of positive alleles and the person at +3 SD has 57% of positive alleles. It means that the person at +3 SD would only be 10-15% better/faster on any such test. But that is obviously not the case.

The very difficulty of devising tests that measure IQ much further than that would suggest that people at +3..4 SD have their abilities nearly "saturated", which could happen if N is rather low. For example, if N=50, then the median person has 25 positive alleles and the person at +4 SD has 47 positive alleles (and the remaining 3 would not matter much).

Of course, the very idea of additive IQ is rather crude, because, at some points, there are qualitative shifts. Hence IQ is hard to reduce to a linear performance measure like height or a 100-meter sprint time. But still, that is a useful perspective.

If I think about it some more, I should be able to come up with estimates of N and tests that help us measure it.; 5:07 PM
James Lee said...: The simplifying assumptions in that slide are made purely to allow the relevant point to be made with a minimum of complications. The assumptions themselves should not be taken too seriously.

There is a quantitative-genetic literature on the estimation of gene number. To summarize, these methods are not very informative.; 6:12 PM
steve hsu said...: The model in the slides is just a toy model to illustrate scaling. In reality there will be distributions in effect sizes and allele frequencies in a particular population. See the height results which are starting to flesh this out for a different quantitative phenotype.; 6:20 PM
esmith said...: Are you aware of any studies that quantify the relationship between processing speed and IQ? (Preferably the problem-solving processing speed, and not things like reaction time.) I'm trying to quantify it, and I'm getting curious results (see image). The huge dynamic range leads me to suspect that the mean frequency of IQ-positive alleles is very low (maybe 10-20%). But it's hard to reproduce the high-end behavior, regardless of the model I try to use.; 8:48 PM
esmith said...: On the second thought, that high end behavior is EXACTLY what we should expect ... Suppose that we break down the time to execute a task into N pieces, and time to execute each piece depends on a single gene, and total time is a simple sum of all pieces. Having a few "strong" genes which can significantly reduce the processing time, and a lot of "weak" genes, each of which independently shave off a percent or two, would produce the relationship between 'g' and processing speed as shown above.

Let me see if I can come up with a good fit now.; 11:02 PM
MtMoru said...: "Then we should be able to construct an "intelligence measure" equal to the share of positive alleles among these N, which correlates linearly with physically measurable quantities."

If you wanted to, but there'd still be a bell shaped curve.

"It means that the person at +3 SD would only be 10-15% better/faster on any such test."

It doesn't mean that. The samll effect is in IQ points. That's the measure of better worse. With 500,000 SNPs all with +1/2 or -1/2 point effect for homos and 0 points for heteros with probs 1/4, 1/4, 1/2 the SD is sqrt(500,000)*1/2 = A LOT assuming no covariance.; 9:55 AM
William_JD said...: On the "Volunteer" page, when I enter my email address and click submit, I receive the following message: "Check your email for instructions." It's been a week since I submitted my email address, and I have yet to receive any instructions. When can I expect them?; 9:46 AM
Christopher Chang said...: After you've entered your email address on the volunteer page once, any further submits don't send additional emails. Check your spam filter on the day you first tried to submit.; 11:04 AM
William_JD said...: I've received an email finally -- thanks.; 11:16 AM
efalken said...: I thought humans had only 25k genes. 1000 relate to intelligence? That seems a lot. Our bodies have a lot going on other than g-related activity.; 6:33 PM
esmith said...: But a surprisingly large part of those 25k is responsible for brain development or functioning of the nervous system. There was an article a few years ago that estimated that 58% of human transcriptome is expressed in brains of at least 5% of humans. The human brain map at http://human.brain-map.org identifies around 1000 genes which may be relevant here.; 3:00 AM
esmith said...: Hmm, I could swear I made a response to this, but it's not visible any more?

Anyway. Have you ever heard of the protein domain DUF1220? This is a protein domain of unknown function that is encoded independently by at least 30 and possibly over 60 different genes (some of them also do it multiple times); it's highly specific to humans (we have 6 times the number of copies of higher apes and it's almost nonexistent in other mammals); it's expressed primarily in regions of the brain responsible for higher cognitive function, and its copy number variation is correlated with things like brain size, the risk of autism, and the risk of schizophrenia. I'd expect to see a correlation with IQ as well. That's 60 genes right there. And it's just one pathway of many.; 4:24 AM
Mike Chew said...: Steve,
Are open discussions allowed on your blog? I felt compelled to comment on this study of intelligence and posted some points on what I felt was an incongruence with your intended study of intelligence and the qualifying criteria listed, yesterday, but post seemed to have been deleted.

A reply would be appreciated.; 11:40 PM
steve hsu said...: Not sure why the Disqus spam filter grabbed your comment. But I've now released it.; 11:44 PM

Information Processing

About Me

Wednesday, August 17, 2011

@Google: Genetics and Intelligence

80 comments:

Blog Archive

Labels