Information Processing: 04/2013

Saturday, April 27, 2013

The bright young man who comes here and tries to build something

"The Tsarnaev brothers offer a grisly story of American immigration and integration, and Danny offers another ..."

New Yorker: ... A sedan swerved in behind him, a man banged on his window, the door opened, a pistol appeared, and soon they were off.

Danny is an immigrant from China who came to Boston as a graduate student. He now works for a start-up in Kendall Square. The Tsarnaev brothers offer a grisly story of American immigration and integration, and Danny offers another: the bright young man who comes here and tries to build something. It’s twinned in some ways with the tale of Lingzi, who also came from China as a graduate student. “Do you remember my face?” Tamerlan yelled at Danny at one point in the car. “No, no, I don’t remember anything,” Danny lied. “It’s like white guys, they look at black guys and think all black guys look the same,” Tamerlan said.

One of the mysteries of the case has been why the brothers killed a cop but didn’t kill the man whose car they had stolen. Now we know. Danny kept his cool when they picked up Dzhohkar and the trunk was loaded with heavy bags. The three men then drove through Boston, and the brothers asked Danny if he could take them to New York. Eventually, they pulled up to a gas station, hoping to use Danny’s credit card. The station took only cash, so Dzhohkar got out to pay. Tamerlan, allegedly an aspiring mass murderer and a man known by some as the best boxer in Boston, put his gun in the door pocket for a moment. Seeing his chance, with one motion Danny unbuckled his seat belt and opened his door. And then he raced off at an angle, fearing a bullet in the back. In a moment he was across the street in another station, and the attendant there was on the phone as the Tsarnaevs drove off. Cops would come, the shoot-out would commence, and the horrible saga would end with no more innocent people killed. If Danny hadn’t had the courage to run or if he hadn’t gotten his seat belt off, more people would have likely died—very possibly including Danny.

... “I don’t want to be a famous person talking on the TV,” he told Eric Moskowitz of the Globe. “I don’t feel like a hero … I was trying to save myself.”

Friday, April 26, 2013

The cognitive ability of US elites

Jonathan Wai sends me his latest paper, which reveals (click figure below) that ~ 40% or more of US Fortune 500 CEOs, billionaires, federal judges and Senators attended elite undergraduate or graduate institutions whose median standardized test scores are above (roughly) 99th percentile for the overall US population (i.e., SAT M+CR > 1400). Over 10% of individuals in these categories attended Harvard. (In the table: elite school = top 1% undergrad or MBA/JD from program with top 1% scores; grad school = other graduate education; college = college degree but from non-elite program, and no graduate school.)

To put it another way, top 1% ability individuals could be up to ~ 50x overrepresented among the elite groups listed above -- i.e., they are only 1% of the population (by definition), but could be ~ 50% of the super-elite. See also If you're so smart, why aren't you rich? and posts on elite universities and human capital mongering (top 30 elite universities enroll over half of top 1% ability students in the US).

Investigating America's elite: Cognitive ability, education, and sex differences
Intelligence 41 (2013) 203–211 http://dx.doi.org/10.1016/j.intell.2013.03.005
Jonathan Wai
Duke University, Talent Identification Program

Are the American elite drawn from the cognitive elite? To address this, five groups of America's elite (total N = 2254) were examined: Fortune 500 CEOs, federal judges, billionaires, Senators, and members of the House of Representatives. Within each of these groups, nearly all had attended college with the majority having attended either a highly selective undergraduate institution or graduate school of some kind. High average test scores required for admission to these institutions indicated those who rise to or are selected for these positions are highly filtered for ability. Ability and education level differences were found across various sectors in which the billionaires earned their wealth (e.g., technology vs. fashion and retail); even within billionaires and CEOs wealth was found to be connected to ability and education. Within the Senate and House, Democrats had a higher level of ability and education than Republicans. Females were underrepresented among all groups, but to a lesser degree among federal judges and Democrats and to a larger degree among Republicans and CEOs. America's elite are largely drawn from the intellectually gifted, with many in the top 1% of ability.

See Finding the next Einstein (Psychology Today) for a Q&A I did with Jon a couple of years ago.

Thursday, April 25, 2013

How to beat online exam proctoring

Part of the potential of online education is to break the "credentialing chokehold" of traditional universities. But in order for a credential to have value, one has to be sure that the holder has actually mastered the subject matter. Thus, security in testing is important. Certainly, students can cheat at traditional universities, but the problem becomes much more severe for online-only education in which the educational institution may never have physical contact with the student. A security hole in an online proctoring system can be exploited wholesale, by thousands of people in different locations. (See also Magical Mystery Moocs and Whither Higher Education?)

Here is how to beat the online proctoring systems described in a recent Chronicle of Higher Education article.

Attach a second monitor to the test taker's computer (e.g., via a long cable) which is visible only to a hidden confederate, or otherwise arrange to have the image on the main screen visible to the hidden confederate.

The confederate works out problems and transmits answers via, e.g., tapping on the test-taker's leg, using a long stick (not visible to webcam). Another possibility is for the confederate to use a laser pointer (or other projection device) pointed at the wall behind the test taker's monitor. The confederate could also just hide under the desk/table at which the test taker sits. Most of these methods will work on multiple choice tests, but the projection method could even work on essay or programming tests. A commenter also suggests running multiple virtual machines (VMs) on the test taker's computer, one for the testing app and the other for secret communication. A quick toggle on the keyboard will make this difficult to detect by existing security measures.

Eventually I can imagine students setting up special "test (cheating) rooms" for this purpose. Ideally the monitoring company should obtain POV data from the test taker to defeat these methods.

Chronicle: ... The old biases against online education have begun to erode, but companies that offer remote-proctoring services still face an uphill battle in persuading skeptics, many of whom believe that the duty of preserving academic integrity should not be entrusted to online watchers who are often thousands of miles from the test-takers. So ProctorU and other players have installed a battery of protocols aimed at making their systems as airtight as possible.

The result is a monitoring regime that can seem a bit Orwellian. Rather than one proctor sitting at the head of a physical classroom and roaming the aisles every once in a while, remote proctors peer into a student's home, seize control of her computer, and stare at her face for the duration of a test, reading her body language for signs of impropriety. ...

Wednesday, April 24, 2013

Dog cognition

Will they discover a "general factor" for dog intelligence? Anyone want to make a prediction? See also here.

NYTimes: ... Dr. Hare, now an associate professor at Duke, has continued to probe the canine mind, but his research has been constrained by the number of dogs he can study. Now he hopes to expand his research geometrically — with the help of dog owners around the world. He is the chief scientific officer of a new company called Dognition, which produces a Web site where people can test their dog’s cognition, learn about their pets and, Dr. Hare hopes, supply him and his colleagues with scientific data on tens of thousands of dogs.

“Because it’s big data, we can ask questions that nobody could have a chance to look at,” he said.

From his previous research, Dr. Hare has argued that dogs evolved their extraordinary social intelligence once their ancestors began lingering around early human settlements. As he and his wife, Vanessa Woods, explain in their new book, “The Genius of Dogs,” natural selection favored the dogs that did a better job of figuring out the intentions of humans.

While this evolution gave dogs one cognitive gift, it didn’t make them more intelligent in general. “If you compare them to wolves as individuals, they look like idiots,” Dr. Hare said. “But if you then show them having a human solve the problem, they’re geniuses.”

To explore dog cognition further, he set up the Duke Canine Cognition Center in 2009. He and his colleagues built a network of 1,000 dog owners willing to bring in their pets for tests.

... He is trying to find the “cognitive style” of the successful service dogs. To do so, he and his colleagues have developed a battery of 30 tests that altogether take four hours to administer. They have tested 200 dogs and are searching for hallmarks that set the service dogs apart.

He helped form Dognition, he said, partly because of interest from dog trainers who asked him if they could test their own dogs’ cognitive style.

The tests are now available online: For a fee, dog owners get video instructions for how to carry them out. (Besides the pointing test, they include a test in which the owner yawns and then watches to see if the dog does too — a potential sign that dog and owner are strongly bonded.) The company then analyzes how a given dog compares with others in its database for qualities like empathy and memory.

... A surprising link turned up between empathy in dogs and deception. The dogs that are most bonded to their owners turn out to be most likely to observe their owner in order to steal food. “I would not have thought to test for that relationship at Duke, but with Dognition we can see it,” said Dr. Hare.

Tuesday, April 23, 2013

Jon Jones Nike Pro Training

MMA training regimens are among the most advanced in all of sports today.

See also Jon Jones, phenom.

Anyone who has watched Jones fight knows he is incredibly talented. His background is wrestling -- he gave up a scholarship to Iowa State when he started his pro career. I notice he hits a lot of Judo throws in his fights. Most fans think those are greco throws but they aren't -- he's using his legs, which is illegal in greco. When I investigated this, expecting to find that he had trained in Judo as many wrestlers have, I was amazed to discover that he taught himself using internet videos!

Check out the outside leg trip at about 56 seconds in the video.

Monday, April 22, 2013

The Econ Con: Rogoff and Reinhart edition

Rather than tease my friends in economics I'll just refer you, e.g., to this blog post by Krugman and this thread on Econ Job Rumors.

Ulam to Samuelson: name a single result in economics that is both true and non-trivial. (Don't give me comparative advantage. I doubt Ulam would find that non-trivial. Ulam and Samuelson were Harvard Junior Fellows together.)

For more fun see, e.g., Confessions of an Economist.

Common variants vs mutational load

I recommend this blog post (The Differentialist) by Timothy Bates of the University of Edinburgh. (I met Tim there at last year's Behavior Genetics meeting.) He discusses the implications of GCTA results showing high heritability of IQ as measured using common SNPs (see related post Eric, why so gloomy?). One unresolved issue (see comments there) is to what extent mutational load (deleterious effects due to very rare variants) can account for population variation in IQ. The standard argument is that very rare variants will not be well tagged by common SNPs and hence the heritability results (e.g., of about 0.5) found by GCTA suggest that a good chunk of variation is accounted for by common variants (e.g., MAF > 0.05). The counter argument (which I have not yet seen investigated fully) is that relatedness defined over a set of common SNPs is correlated to the similarity in mutational load of a pair of individuals, due to the complex family history of human populations. IIRC, "unrelated" individuals selected at random from a common ethnic group and region are, on average, roughly as related as third cousins (say, r ~ 1E-02?).

Is the heritability detected using common SNPs due to specific common variants tagged by SNPs, or due to a general correlation between SNP relatedness and overall similarity of genomes?

My guess is that we'll find that both common variants and mutational load are responsible for variation in cognitive ability. Does existing data provide any limit on the relative ratio? This requires a calculation, but my intuition is that mutational load cannot account for everything. Fortunately, with whole genome data you can look both for common variants and at mutational load at the same time.

In the case of height it's now clear that common variants account for a significant fraction of heritability, but there is also evidence for a mutational load component. Note that we don't expect to discover any common variants for IQ until past a threshold in sample size, which for height turned out to be about 10k.

Hmm, now that I think about it ... there does seem to be a relevant calculation :-)

In the original GCTA paper (Yang et al. Nature Genetics 2010), it was found that relatedness computed on a set of common genotyped SNPs is a poor predictor of relatedness on rare SNPs (e.g., MAF < 0.1). The rare SNPs are in poor linkage disequilibrium (LD) with the genotyped SNPs, due to the difference in MAF. This was proposed as a plausible mechanism for the still-missing heritability (e.g., 0.4 vs 0.8 expected from classical twin/sib studies; Yang et al. specifically looked at height): if the actual causal variants tend to be rarer than the common genotyped SNPs, the genotypic similarity of two individuals where it counts -- on the causal variants -- would be incorrectly estimated, leading to an underestimate of heritability.

If these simulations are any guide, rare mutations are unlikely to account for the GCTA heritability, but rather may account for (some of) the gap between it and the total additive heritability. See, for example, the following discussion:

A commentary on “Common SNPs explain a large proportion of the heritability for human height” by Yang et al. (2010)

(p.6) ... We cannot measure the LD between causal variants and genotyped SNPs directly because we do not know the causal variants. However, we can estimate the LD between SNPs. If the causal variants have similar characteristics to the SNPs, the LD between causal variants and SNPs should be similar to that between the SNPs themselves. One causal variant can be in LD with multiple SNPs and so the SNPs collectively could trace the causal variant even though no one SNP was in perfect LD with it. Therefore we divided the SNPs randomly into two groups and treated the first group as if they were causal variants and asked how well the second group of SNPs tracked these simulated causal variants. This can be judged by the extent to which the relationship matrices calculated from the SNPs agree with the relationship matrix calculated from the ‘causal variants’. The covariance between the estimated relationships for the two sets of SNPs equals the true variance of relatedness whereas the variance of the estimates of relatedness for each set of SNPs equals true variation in relatedness plus estimation error. Therefore, from the regression of pairwise relatedness estimated from one of the set of SNPs onto the estimated pairwise relatedness from the other set of SNPs we can quantify the amount of error and ‘regress back’ or ‘shrink’ the estimate of relatedness towards the mean to take account of the prediction error.

... If causal variants have a lower MAF than common SNPs the LD between SNPs and causal variants is likely to be lower than the LD between random SNPs. To investigate the effect of this possibility we used SNPs with low MAF to mimic causal variants. We found that the relationship estimated by random SNPs (with MAF typical of the genotyped SNPs on the array) was a poorer predictor of the relationship at these ‘causal variants’ than it was of the relationship at other random SNPs. When the relationship matrix at the SNPs is shrunk to provide an unbiased estimate of the relationship at these ‘causal variants’, we find that the ‘causal variants’ would explain 80% of the phenotypic variance ...

Sunday, April 21, 2013

Dismal Science

Economics Shapes Science by Paula Stephan.

At a time when science is seen as an engine of economic growth, Paula Stephan brings a keen understanding of the ongoing cost-benefit calculations made by individuals and institutions as they compete for resources and reputation. She shows how universities offload risks by increasing the percentage of non–tenure-track faculty, requiring tenured faculty to pay salaries from outside grants, and staffing labs with foreign workers on temporary visas. With funding tight, investigators pursue safe projects rather than less fundable ones with uncertain but potentially path-breaking outcomes. Career prospects in science are increasingly dismal for the young because of ever-lengthening apprenticeships, scarcity of permanent academic positions, and the difficulty of getting funded.

Working paper version of the book.

From the abstract: Scientific research has properties of a public good; there are few monetary incentives for individuals to undertake basic research and the conventional wisdom is that the market, if left to its own devices, would under-invest in research in terms of social benefits relative to social costs. (emphasis mine)...

From the conclusions: ... In one sense, U.S. universities behave like high-end shopping malls. They are in the business of building state-of-the art facilities and a reputation that attracts good students and faculty. They then turn around and “rent” the facilities to faculty in the form of indirect costs on grants and the buy-out of salary. Faculty, in turn, create research programs, staffing them with graduate students and postdocs, who contribute to the research enterprise by their labor and the fresh ideas that they bring, but who can also be easily downsized, if and when times get tough. Universities leverage these outcomes into reputation. The amount of funding universities receive, as well as the citations and prizes awarded to their faculty, determine their peer group—the club to which they belong. They also attract donations and students and affect the university’s ranking.

Saturday, April 20, 2013

A blog is born

Raghu Parasarathy, a biophysicist at U Oregon, and my correspondent in this previous post on faculty blogging, has decided to try it out. Raghu is a deep and creative thinker, so I'm sure we have some interesting contributions to look forward to!

My key motivation ... hopefully recording a variety of thoughts will help them persist, and perhaps coalesce into something useful. And maybe some of the topics I expect to write about — the structure of higher education, biophysics and animal/microbe interactions, my sporadic efforts at painting — will be of interest.

Here's some video from recent imaging work Raghu has done (microbiome of a zebrafish).

Thursday, April 18, 2013

Genius at work

Stephen Smale gave a lecture today in the MSU math department, on protein binding and folding. He mainly presented the mathematical aspects and it was a bit like magic. If (as claimed) his methods, which could be described as coming from machine learning, actually give the best results to date on these problems it really is magic.

Title: Mathematics of Protein Folding

Abstract: Learning methods are used to create a geometry on spaces of amino acid sequences. This geometry is used to study immunology, in particular in the peptide binding problem. Then these ideas are used to obtain new results in protein folding.

Tuesday, April 16, 2013

Digit ratio

Sounds nutty, but what the heck! Don't blame me if I lose my temper or become violent -- my digit ratio made me do it! ;-)

Finger Length Predicts Health and Behavior (Discover): ... In boys, “during fetal development there’s a surge in testosterone in the middle of the second trimester” that seems to influence future health and behavior, says Pete Hurd, a neuroscientist at the University of Alberta. One easy-to-spot result of this flood of testosterone: a ring finger that’s significantly longer than the index finger.

Scientists are not at the point where they can factor in finger length to arrive at a diagnosis, but they’ve gathered evidence that shows how this prenatal hormone imbalance can affect a person for life, from increasing or decreasing your risk of certain diseases, to predicting how easily you get lost or lose your temper. ...

Increased verbal aggression Fq < 1

Improved athletic ability Fq < 1

Improved sense of direction Fq < 1

More physical aggression Fq < 1

More risk taking Fq < 1

Sunday, April 14, 2013

Why blog? A professor responds

A colleague responds to my earlier post Blogging professors, on how universities might encourage more faculty blogging.

What I had in mind was a university-wide platform that would aggregate the output of participating faculty. This kind of branded expert channel might have a place amid the economic collapse in journalism we are currently experiencing. If Huffington Post is worth $315 million (OK, not really, just another dumb move by AOL), what might a platform showcasing 100 clever faculty from a major research university be worth? 100 bloggers (say, each posting once every 10 days or so = 10 new posts per day) out of 2000 MSU faculty doesn't sound too crazy, does it?

Hi Steve,

I liked reading your "Blogging Professors" post, since I've thought several times, "Should I write a blog?" But I've also thought, "Why does anyone bother to write a blog?" The reasons to write are, as you note, to propagate one's "fabulous ideas and opinions worthy of wider attention and discussion" and to create dialogs and conversations. My own reasons not to write have been (1) that it would take time, and I have too little time as it is, and (2) that I doubt I'd be likely to make even the slightest ripple in the vast pool of the internet.

Reason (1) is, I'm sure, obvious. It's hard to find "work time" between experiments, meetings, classes, seminars, journal clubs, staring at data, writing analysis code, talking to students, planning classes, teachings classes, reading papers, reading books, and probably several other things I'm forgetting. And "free time" has its own constraints, and any new activities would have to compete with things I'm very fond of, like wandering the public library with the kids, or playing games with them in random taquerias, or painting pictures myself (which, sadly, has been steadily dwindling in frequency).

Of course, I'm sure most commenters will point out that it's all a matter of incentives: I have no incentive, as a faculty member, to blog. This is true, but not very explanatory in itself. We all do plenty of things that don't have concrete incentives. This past week, I've spent about two hours reviewing a paper. Next week I'll spend at least half an hour with a postdoc (not from my lab) starting a faculty position (elsewhere) giving advice on grants. Later this term, I'll probably put a lot of work into a talk on [ geeky science topic involving microscopy; unspecified to preserve anonymity ] for a journal club I don't usually attend -- it's a fascinating topic I've gotten increasingly involved with. I certainly don't get any reward from the University (or even the department) for doing these sorts of things. So why do them? In all these cases, there's some combination of reciprocity (I publish articles in journal X, so I should review papers for journal X), or personal interactions (I like to have conversations with colleagues), or both. Is any of this the case for blogging?

I'd guess -- though I have no data on this -- that most blogs, especially new ones, have very little readership. Certainly one often stumbles on blogs with a total absence of comments. (Not that blog comments in general are often worth reading…) And even if posts are read, is there likely to be much interaction or dialog, compared to the other activities noted above?

As you note, one way out of this would be group blogs, which might expand readership and reduce writing effort. Another would be if the university actively promoted blogs. (I'm constantly amazed at how little work the university puts into describing to the public what faculty do, and how ineptly what little they do is done.)

And, of course, another solution is to simply look at blogging as a way of recording and refining one's thoughts -- regardless of whether they're read or not. I've toyed with this; maybe I'll take it up…

I certainly view blogging as a means of recording and organizing my thoughts. Sometimes I get really thoughtful and insightful feedback in the comments (although sometimes not). There's also the pleasure of self-expression! As James Salter wrote

There comes a time when you realize that everything is a dream, and only those things preserved in writing have any possibility of being real.

Bezos on the big brains

I recall reading this quote (or something similar) when Bezos was Time magazine's Man of the Year in 1999.

Jeff Bezos: Yeah. So, I went to Princeton primarily because I wanted to study physics, and it's such a fantastic place to study physics. Things went fairly well until I got to quantum mechanics and there were about 30 people in the class by that point and it was so hard for me. I just remember there was a point in this where I realized I'm never going to be a great physicist. There were three or four people in the class whose brains were so clearly wired differently to process these highly abstract concepts, so much more. I was doing well in terms of the grades I was getting, but for me it was laborious, hard work. And, for some of these truly gifted folks -- it was awe-inspiring for me to watch them because in a very easy, almost casual way, they could absorb concepts and solve problems that I would work 12 hours on, and it was a wonderful thing to behold. At the same time, I had been studying computer science, and was really finding that that was something I was drawn toward. I was drawn to that more and more and that turned out to be a great thing. So I found -- one of the great things Princeton taught me is that I'm not smart enough to be a physicist.

However, from the perspective of others:

... To the amazement and irritation of employees, Bezos’s criticisms are almost always on target. Bruce Jones, a former Amazon supply chain vice president, describes leading a five-engineer team figuring out ways to make the movement of workers in fulfillment centers more efficient. The group spent nine months on the task, then presented their work to Bezos. “We had beautiful documents, and everyone was really prepared,” Jones says. Bezos read the paper, said, “You’re all wrong,” stood up, and started writing on the whiteboard.

“He had no background in control theory, no background in operating systems,” Jones says. “He only had minimum experience in the distribution centers and never spent weeks and months out on the line.” But Bezos laid out his argument on the whiteboard, and “every stinking thing he put down was correct and true,” Jones says. “It would be easier to stomach if we could prove he was wrong, but we couldn’t. That was a typical interaction with Jeff. He had this unbelievable ability to be incredibly intelligent about things he had nothing to do with, and he was totally ruthless about communicating it.”

It turns out I know several of Bezos' contemporaries at Princeton (class of 1986), including some members of his eating club, and probably some of the individuals described above. See this old post, Living Like Kings:

Physics library, LeConte Hall, Berkeley, 1987. Studying string theory and Calabi-Yau tomfoolery about 100m from the Campanile in the picture above. We'll never have it better than that.

Me: Mike, I can't believe we're in here working on such a beautiful afternoon. Look at that sunshine!

Mike C. (the pride of Jadwin Hall): Hsu, we're doing exactly what we want to be doing. We're livin' like kings, man! Livin' like kings (big grin).

See also One hundred thousand brains and Defining Merit:

... Bender also had a startlingly accurate sense of how many truly intellectually outstanding students were available in the national pool. He doubted whether more than 100-200 candidates of truly exceptional promise would be available for each year's class. This number corresponds to (roughly) +4 SD in mental ability. Long after Bender resigned, Harvard still reserved only 10 percent of its places (roughly 150 spots) for "top brains".

Saturday, April 13, 2013

Blogging professors

When I first started blogging in 2004, I thought it would be only a matter of a few years before a significant fraction -- say 10-30% -- of all professors would have their own blogs. Surely, I thought, many brilliant professors would have no shortage of (and no shortage of interest in expressing) fabulous ideas and opinions worthy of wider attention and discussion.

But I was wrong. My rough estimate is that, currently, typical research universities (with, say, 1000 or so professors!) have no more than a handful of active faculty bloggers (for some reasonable definition of active, which might include a minimum traffic or readership level).

However, it's not too late. With the continuing collapse of the economic model for traditional journalism, there is significant demand for expert opinion and new ideas. How should a university encourage faculty blogging?

Set up branded group blogs for faculty, using a common template, perhaps organized by themes: health science, engineering and technology, basic science, politics and economics, psychology and cognition, etc. These don't even need to be hosted by the university -- they could be on Wordpress or Blogger.

Group blogs can regularly produce fresh content, even if each contributor posts infrequently.

Hire a grad student to do some light editing, manage comments, and occasionally stimulate the faculty if the rate of posting falls off. Make posting really easy for the professors -- allow them just to shoot off an email with the post content, and have the student clean it up and upload it to the site.

Advertise the blogs in alumni communications, campus news, and other university publications.

Will it work? Ultimately it depends on the faculty...

Tuesday, April 09, 2013

Meeting Watson

I was at the IBM campus in Austin yesterday for some meetings. No sign of the Singularity just yet ;-)

Here's a talk by Manoj Saxena, IBM General Manager of the Watson division.

Sunday, April 07, 2013

Myths, Sisyphus and g

As a punishment, Sisyphus was made to roll a huge boulder up a steep hill. Before he could reach the top, however, the massive stone would always roll back down, forcing him to begin again.

I recommend this well written refutation of Cosma Shalizi's much loved (in certain quarters) g, a Statistical Myth, an attack on the general factor of intelligence. Over the years I have not encountered a single endorser of Shalizi's article who actually understands the relevant subject matter. His article is loved for its reassuring conclusions, not the strength of its arguments. I am sure many "thinkers" resisted Darwinism, the abandonment of geocentrism, and even the notion that the Earth is a sphere, for similar psychological reasons. Some pessimists (speaking, for example, of the quantum revolution in the early 20th century) remarked that science advances one funeral at a time, as the older generation passes away in favor of the next, more open-minded, one. In the case of g it appears we have regressed significantly under relentless attack; social science papers from 50 years ago often seem more clear headed and precise than ones I read today. All battles must be fought and refought again a decade or two later.

As I write here:

We can (crudely) measure cognitive ability using simple tests. (It is amazing to me that this is a controversial statement.) Randomly sampled eminent scientists have (very) high IQs, and given the observed stability of adult IQ the causality is clear ...

Optimistically, we are only a decade away from genomic prediction of g scores (see Eric, why so gloomy?). The existence of such a predictor may allow us to finally push the boulder to the top, and keep it there.

As I mention in talks on this subject, the fact that cognitive abilities reliably have positive correlation is highly nontrivial. Add to this the well-established validity and stability of g and you have a construct that must be taken seriously. See also IQ, Compression and Simple Models.

Is Psychometric g a Myth?

...

V. Conclusions

Shalizi’s first error is his assertion that cognitive tests correlate with each other because IQ test makers exclude tests that do not fit the positive manifold. In fact, more or less the opposite is true. Some of the greatest psychometricians have devoted their careers to disproving the positive manifold only to end up with nothing to show for it. Cognitive tests correlate because all of them truly share one or more sources of variance. This is a fact that any theory of intelligence must grapple with.

Shalizi’s second error is to disregard the large body of evidence that has been presented in support of g as a unidimensional scale of human psychological differences. The g factor is not just about the positive manifold. A broad network of findings related to both social and biological variables indicates that people do in fact vary, both phenotypically and genetically, along this continuum that can be revealed by psychometric tests of intelligence and that has has widespread significance in human affairs.

Shalizi’s third error is to think that were it shown that g is not a unitary variable neurobiologically, it would refute the concept of g. However, for most purposes, brain physiology is not the most relevant level of analysis of human intelligence. What matters is that g is a remarkably powerful and robust variable that has great explanatory force in understanding human behavior. Thus g exists at the behavioral level regardless of what its neurobiological underpinnings are like.

In many ways, criticisms of g like Shalizi’s amount to “sure, it works in practice, but I don’t think it works in theory”. Shalizi faults g for being a “black box theory” that does not provide a mechanistic explanation of the workings of intelligence, disparaging psychometric measurement of intelligence as a mere “stop-gap” rather than a genuine scientific breakthrough. However, the fact that psychometricians have traditionally been primarily interested in validity and reliability is a feature, not a bug. Intelligence testing, unlike most fields of psychology and social science, is highly practical, being widely applied to diagnose learning problems and medical conditions and to select students and employees. What is important is that IQ tests reliably measure an important human characteristic, not the particular underlying neurobiological mechanisms. Nevertheless, research on general mental ability extends naturally into the life sciences, and continuous progress is being made in understanding g in terms of neurobiology (e.g., Lee et al. 2012, Penke et al. 2012, Kievit et al. 2012) and molecular genetics (e.g., Plomin et al., in press, Benyamin et al., in press).

Saturday, April 06, 2013

Genetic prediction: autism

Some time ago I posted on a striking claim of genetic prediction for autism risk that appeared in Nature Molecular Psychiatry:

Predicting the diagnosis of autism spectrum disorder using gene pathway analysis (Nature Molecular Psychiatry)

Abstract
Autism spectrum disorder (ASD) depends on a clinical interview with no biomarkers to aid diagnosis. The current investigation interrogated single-nucleotide polymorphisms (SNPs) of individuals with ASD from the Autism Genetic Resource Exchange (AGRE) database. SNPs were mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG)-derived pathways to identify affected cellular processes and develop a diagnostic test. This test was then applied to two independent samples from the Simons Foundation Autism Research Initiative (SFARI) and Wellcome Trust 1958 normal birth cohort (WTBC) for validation. Using AGRE SNP data from a Central European (CEU) cohort, we created a genetic diagnostic classifier consisting of 237 SNPs in 146 genes that correctly predicted ASD diagnosis in 85.6% of CEU cases. This classifier also predicted 84.3% of cases in an ethnically related Tuscan cohort; however, prediction was less accurate (56.4%) in a genetically dissimilar Han Chinese cohort (HAN). Eight SNPs in three genes (KCNMB4, GNAO1, GRM5) had the largest effect in the classifier with some acting as vulnerability SNPs, whereas others were protective. Prediction accuracy diminished as the number of SNPs analyzed in the model was decreased. Our diagnostic classifier correctly predicted ASD diagnosis with an accuracy of 71.7% in CEU individuals from the SFARI (ASD) and WTBC (controls) validation data sets. In conclusion, we have developed an accurate diagnostic test for a genetically homogeneous group to aid in early detection of ASD. While SNPs differ across ethnic groups, our pathway approach identified cellular processes common to ASD across ethnicities. Our results have wide implications for detection, intervention and prevention of ASD.

From my comments:

The approach taken here first selects 775 SNPs of interest based on pathway information (not considered in standard GWAS) and then only requires 5E-03 significance. A linear predictor is formed from the 237 SNPs that pass this threshold. The ultimate test is, of course, whether the predictor actually works on (independent) validation samples. Once you have a statistically valid predictor, it doesn't matter how you arrived at it.

This recent letter to the editor of Molecular Psychiatry claims that the predictive power came from the fact that the case and control groups had slightly different ancestral origin (via neuroskeptic):

... cases have more diverse ancestral origins within Europe than controls. The putative risk alleles are more common in the Northeastern than in the Northwestern Europe, whereas the putative protective alleles reflect the opposite trend.

But I don't understand how this explains the moderate success of the classifier in the Chinese cohort -- I'll have to look more carefully. Here are the title and abstract of the new paper.

Population structure confounds autism genetic classifier

T G Belgard, I Jankovic, J K Lowe and D H Geschwind

Abstract
A classifier was recently reported to predict with 70% accuracy if an individual has an autism spectrum disorder using 237 single-nucleotide polymorphisms (SNPs).1 Biomarkers, genetic or otherwise, that would facilitate earlier autism spectrum disorder diagnosis are crucial; therefore, these results warrant careful scrutiny. One potential confounder of such genetic studies is bias when cases and controls have different ancestral origins.

Friday, April 05, 2013

Faculty research productivity distribution

Recently I looked at some national-level data for university researchers in physics, chemistry, EE, molecular biology and zoology.

The data confirm my "moneyball" suspicion: the research funding and citations of the top 20% of faculty typically exceed the bottom 60-80% combined!

In other words, one excellent researcher is worth several mediocre ones. The average annual funding for top quintile researchers in the fields listed above is in the neighborhood of $1 million, with the exception of zoology which is smaller by a half or so. Citation numbers vary widely but, again, the top quintile researchers typically generate as many cites as the bottom 60-80% combined.

Moneyball in academia

Let's suppose you're trying to hire a star STEM researcher. For our purposes, define "star" as someone who is roughly top 10% in his or her department at a good research university. Although assistant professors are hired in a very competitive process, the success rate for hiring stars in good (but not the very top ranked) departments is (by definitions given above) only about 10%.

Let's suppose you wait a while to do your hiring. Look only at researchers who have already been professors for 5-10 years (i.e., at other schools), and have a significant track record of grants, papers, citations, etc. It seems plausible that at this stage of career (late assistant and early associate professors) one can pick out top 10% candidates with reasonably high accuracy.

Suppose that, on average, researchers in the top 10% bring in $400k more per year than the average professor (e.g., one additional NIH grant). This generates about $200k per year in additional overhead return to the university, which is much greater than the salary bump required to bid such a person away from their home university. If the difference in startup cost between hiring a new assistant professor and someone with 5-10 years experience is, say, $500k, then it would take only a few years to recoup this cost. The numbers have to be adjusted for different fields (in physics the overhead differential might be less, like $100k per year), but the expected return still seems attractive if you can keep the researcher for at least 5 or possibly 10 years.

Cancer Genomics

Special issue of Science. If genomic methods (e.g., genotyping of tumor cells to identify most promising treatment) fulfill their promise in cancer therapy, the number of human genomes available for other research will skyrocket.

INTRODUCTION—With the completion of the human genome in 2001, many researchers immediately set their sights on using this information to better understand the genetics and, more recently, epigenetic effects identified during the initiation, development, and progression of cancer. Moving from the pre–genome-era identification of single gene variants associated with hereditary cancers, advances in sequencing technology have enabled the use of a whole-genome approach to examine the differences between the genomes, and epigenetic regulation, of tumor and patient DNA. This issue of Science examines how these advances are shaping our current understanding of cancer at the genomic level.

Thursday, April 04, 2013

All That Is

So far I'm enjoying it very much. Amazing that he is 87 years old. Amazon reviews here.

“Forgive him anything, he writes like an angel.”

Salter: To write? Because all this is going to vanish. The only thing left will be the prose and poems, the books, what is written down. Man was very fortunate to have invented the book. Without it the past would completely vanish, and we would be left with nothing, we would be naked on earth.

"There comes a time," James Salter writes in the epigraph for his new novel, All That Is, "when you realize that everything is a dream, and only those things preserved in writing have any possibility of being real." (NPR)

Monday, April 01, 2013

Training days: Nathan Adrian

I've been teaching my kids how to swim, and started showing them some technique and training videos from YouTube. Eventually I came across these Nathan Adrian videos. Yikes -- 6,000 to 8,000 calories a day!

Go Bears! :-)

About Me