Information Processing: 08/2014

Sunday, August 31, 2014

Metabolic costs of human brain development

This paper quantifies the unusually high energetic cost of brain development in humans. Brain energy requirements and body-weight growth rate are anti-correlated in childhood. Given these results it would be surprising if nutritional limitations that prevented individuals from achieving their genetic potential in height didn't also lead to sub-optimal cognitive development. Nutritional deprivation likely stunts both mind and body.

See also Brainpower ain't free.

Metabolic costs and evolutionary implications of human brain development
(PNAS doi:10.1073/pnas.1323099111)

Significance
The metabolic costs of brain development are thought to explain the evolution of humans’ exceptionally slow and protracted childhood growth; however, the costs of the human brain during development are unknown. We used existing PET and MRI data to calculate brain glucose use from birth to adulthood. We find that the brain’s metabolic requirements peak in childhood, when it uses glucose at a rate equivalent to 66% of the body’s resting metabolism and 43% of the body’s daily energy requirement, and that brain glucose demand relates inversely to body growth from infancy to puberty. Our findings support the hypothesis that the unusually high costs of human brain development require a compensatory slowing of childhood body growth.

Abstract
The high energetic costs of human brain development have been hypothesized to explain distinctive human traits, including exceptionally slow and protracted preadult growth. Although widely assumed to constrain life-history evolution, the metabolic requirements of the growing human brain are unknown. We combined previously collected PET and MRI data to calculate the human brain’s glucose use from birth to adulthood, which we compare with body growth rate. We evaluate the strength of brain–body metabolic trade-offs using the ratios of brain glucose uptake to the body’s resting metabolic rate (RMR) and daily energy requirements (DER) expressed in glucose-gram equivalents (glucosermr% and glucoseder%). We find that glucosermr% and glucoseder% do not peak at birth (52.5% and 59.8% of RMR, or 35.4% and 38.7% of DER, for males and females, respectively), when relative brain size is largest, but rather in childhood (66.3% and 65.0% of RMR and 43.3% and 43.8% of DER). Body-weight growth (dw/dt) and both glucosermr% and glucoseder% are strongly, inversely related: soon after birth, increases in brain glucose demand are accompanied by proportionate decreases in dw/dt. Ages of peak brain glucose demand and lowest dw/dt co-occur and subsequent developmental declines in brain metabolism are matched by proportionate increases in dw/dt until puberty. The finding that human brain glucose demands peak during childhood, and evidence that brain metabolism and body growth rate covary inversely across development, support the hypothesis that the high costs of human brain development require compensatory slowing of body growth rate.

Thursday, August 28, 2014

Determination of Nonlinear Genetic Architecture using Compressed Sensing

It is a common belief in genomics that nonlinear interactions (epistasis) in complex traits make the task of reconstructing genetic models extremely difficult, if not impossible. In fact, it is often suggested that overcoming nonlinearity will require much larger data sets and significantly more computing power. Our results show that in broad classes of plausibly realistic models, this is not the case.

Determination of Nonlinear Genetic Architecture using Compressed Sensing (arXiv:1408.6583)
Chiu Man Ho, Stephen D.H. Hsu
Subjects: Genomics (q-bio.GN); Applications (stat.AP)

We introduce a statistical method that can reconstruct nonlinear genetic models (i.e., including epistasis, or gene-gene interactions) from phenotype-genotype (GWAS) data. The computational and data resource requirements are similar to those necessary for reconstruction of linear genetic models (or identification of gene-trait associations), assuming a condition of generalized sparsity, which limits the total number of gene-gene interactions. An example of a sparse nonlinear model is one in which a typical locus interacts with several or even many others, but only a small subset of all possible interactions exist. It seems plausible that most genetic architectures fall in this category. Our method uses a generalization of compressed sensing (L1-penalized regression) applied to nonlinear functions of the sensing matrix. We give theoretical arguments suggesting that the method is nearly optimal in performance, and demonstrate its effectiveness on broad classes of nonlinear genetic models using both real and simulated human genomes.

Click for larger image.

Cosmopolitans -- Whit Stillman returns on Amazon

The pilot isn't bad -- American expats in Paris :-) The cinematography is beautiful, but then it's hard to go wrong in Paris.

More Whit Stillman.

Rabbit genome: domestication via soft sweeps

Domestication -- genetic change in response to a drastic change in environment -- happened via allele frequency changes at many loci. I expect a similar pattern in humans due to, e.g., agriculture.

I don't know why some researchers find this result surprising -- it seemed quite likely to me that "adaptation to domestication" is a complex trait controlled by many loci. Hence a shift in the phenotype is likely to be accomplished through frequency changes at many alleles.

Rabbit genome analysis reveals a polygenic basis for phenotypic change during domestication (Science DOI: 10.1126/science.1253714)

The genetic changes underlying the initial steps of animal domestication are still poorly understood. We generated a high-quality reference genome for the rabbit and compared it to resequencing data from populations of wild and domestic rabbits. We identified more than 100 selective sweeps specific to domestic rabbits but only a relatively small number of fixed (or nearly fixed) single-nucleotide polymorphisms (SNPs) for derived alleles. SNPs with marked allele frequency differences between wild and domestic rabbits were enriched for conserved noncoding sites. Enrichment analyses suggest that genes affecting brain and neuronal development have often been targeted during domestication. We propose that because of a truly complex genetic background, tame behavior in rabbits and other domestic animals evolved by shifts in allele frequencies at many loci, rather than by critical changes at only a few domestication loci.

From the paper:

... directional selection events associated with rabbit domestication are consistent with polygenic and soft sweep modes of selection (18) that primarily acted on standing genetic variation in regulatory regions of the genome. This stands in contrast with breed-specific traits in many domesticated animals that often show a simple genetic basis with complete fixation of causative alleles (19). Our finding that many genes affecting brain and neuronal development have been targeted during rabbit domestication is fully consistent with the view that the most critical phenotypic changes during the initial steps of animal domestication probably involved behavioral traits that allowed animals to tolerate humans and the environment humans offered. On the basis of these observations, we propose that the reason for the paucity of specific fixed domestication genes in animals is that no single genetic change is either necessary or sufficient for domestication. Because of the complex genetic background for tame behavior, we propose that domestic animals evolved by means of many mutations of small effect, rather than by critical changes at only a few domestication loci.

I'll repeat again that simply changing a few hundred allele frequencies in humans could make us much much smarter ...

Wednesday, August 27, 2014

Neural Networks and Deep Learning 2

Inspired by the topics discussed in this earlier post, I've been reading Michael Nielsen's online book on neural nets and deep learning. I particularly liked the subsection quoted below. For people who think deep learning is anything close to a solved problem, or anticipate a near term, quick take-off to the Singularity, I suggest they read the passage below and grok it deeply.

Neural Networks and Deep Learning (Chapter 3):

You have to realize that our theoretical tools are very weak. Sometimes, we have good mathematical intuitions for why a particular technique should work. Sometimes our intuition ends up being wrong [...] The questions become: how well does my method work on this particular problem, and how large is the set of problems on which it works well. -- Question and answer with neural networks researcher Yann LeCun

Once, attending a conference on the foundations of quantum mechanics, I noticed what seemed to me a most curious verbal habit: when talks finished, questions from the audience often began with "I'm very sympathetic to your point of view, but [...]". Quantum foundations was not my usual field, and I noticed this style of questioning because at other scientific conferences I'd rarely or never heard a questioner express their sympathy for the point of view of the speaker. At the time, I thought the prevalence of the question suggested that little genuine progress was being made in quantum foundations, and people were merely spinning their wheels. Later, I realized that assessment was too harsh. The speakers were wrestling with some of the hardest problems human minds have ever confronted. Of course progress was slow! But there was still value in hearing updates on how people were thinking, even if they didn't always have unarguable new progress to report.

You may have noticed a verbal tic similar to "I'm very sympathetic [...]" in the current book. To explain what we're seeing I've often fallen back on saying "Heuristically, [...]", or "Roughly speaking, [...]", following up with a story to explain some phenomenon or other. These stories are plausible, but the empirical evidence I've presented has often been pretty thin. If you look through the research literature you'll see that stories in a similar style appear in many research papers on neural nets, often with thin supporting evidence. What should we think about such stories?

In many parts of science - especially those parts that deal with simple phenomena - it's possible to obtain very solid, very reliable evidence for quite general hypotheses. But in neural networks there are large numbers of parameters and hyper-parameters, and extremely complex interactions between them. In such extraordinarily complex systems it's exceedingly difficult to establish reliable general statements. Understanding neural networks in their full generality is a problem that, like quantum foundations, tests the limits of the human mind. Instead, we often make do with evidence for or against a few specific instances of a general statement. As a result those statements sometimes later need to be modified or abandoned, when new evidence comes to light.

[ Sufficiently advanced AI will come to resemble biology, even psychology, in its complexity and resistance to rigorous generalization ... ]

One way of viewing this situation is that any heuristic story about neural networks carries with it an implied challenge. For example, consider the statement I quoted earlier, explaining why dropout works* *From ImageNet Classification with Deep Convolutional Neural Networks by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton (2012).: "This technique reduces complex co-adaptations of neurons, since a neuron cannot rely on the presence of particular other neurons. It is, therefore, forced to learn more robust features that are useful in conjunction with many different random subsets of the other neurons." This is a rich, provocative statement, and one could build a fruitful research program entirely around unpacking the statement, figuring out what in it is true, what is false, what needs variation and refinement. Indeed, there is now a small industry of researchers who are investigating dropout (and many variations), trying to understand how it works, and what its limits are. And so it goes with many of the heuristics we've discussed. Each heuristic is not just a (potential) explanation, it's also a challenge to investigate and understand in more detail.

Of course, there is not time for any single person to investigate all these heuristic explanations in depth. It's going to take decades (or longer) for the community of neural networks researchers to develop a really powerful, evidence-based theory of how neural networks learn. Does this mean you should reject heuristic explanations as unrigorous, and not sufficiently evidence-based? No! In fact, we need such heuristics to inspire and guide our thinking. It's like the great age of exploration: the early explorers sometimes explored (and made new discoveries) on the basis of beliefs which were wrong in important ways. Later, those mistakes were corrected as we filled in our knowledge of geography. When you understand something poorly - as the explorers understood geography, and as we understand neural nets today - it's more important to explore boldly than it is to be rigorously correct in every step of your thinking. And so you should view these stories as a useful guide to how to think about neural nets, while retaining a healthy awareness of the limitations of such stories, and carefully keeping track of just how strong the evidence is for any given line of reasoning. Put another way, we need good stories to help motivate and inspire us, and rigorous in-depth investigation in order to uncover the real facts of the matter.

See also here from an earlier post on this blog:

... evolution has [ encoded the results of a huge environment-dependent optimization ] in the structure of our brains (and genes), a process that AI would have to somehow replicate. A very crude estimate of the amount of computational power used by nature in this process leads to a pessimistic prognosis for AI even if one is willing to extrapolate Moore's Law well into the future. [ Moore's Law (Dennard scaling) may be toast for the next decade or so! ] Most naive analyses of AI and computational power only ask what is required to simulate a human brain, but do not ask what is required to evolve one. I would guess that our best hope is to cheat by using what nature has already given us -- emulating the human brain as much as possible.

If indeed there are good (deep) generalized learning architectures to be discovered, that will take time. Even with such a learning architecture at hand, training it will require interaction with a rich exterior world -- either the real world (via sensors and appendages capable of manipulation) or a computationally expensive virtual world. Either way, I feel confident in my bet that a strong version of the Turing test (allowing, e.g., me to communicate with the counterpart over weeks or months; to try to teach it things like physics and watch its progress; eventually for it to teach me) won't be passed until at least 2050 and probably well beyond.

Turing as polymath: ... In a similar way Turing found a home in Cambridge mathematical culture, yet did not belong entirely to it. The division between 'pure' and 'applied' mathematics was at Cambridge then as now very strong, but Turing ignored it, and he never showed mathematical parochialism. If anything, it was the attitude of a Russell that he acquired, assuming that mastery of so difficult a subject granted the right to invade others.

Friday, August 22, 2014

Two reflections on SCI FOO 2014

Two excellent blog posts on SCI FOO by Jacob Vanderplas (Astronomer and Data Scientist at the University of Washington) and Dominic Cummings (former director of strategy for the conservative party in the UK).

Hacking Academia: Data Science and the University (Vanderplas)

Almost a year ago, I wrote a post I called the Big Data Brain Drain, lamenting the ways that academia is neglecting the skills of modern data-intensive research, and in doing so is driving away many of the men and women who are perhaps best equipped to enable progress in these fields. This seemed to strike a chord with a wide range of people, and has led me to some incredible opportunities for conversation and collaboration on the subject. One of those conversations took place at the recent SciFoo conference, and this article is my way of recording some reflections on that conversation. ...

The problem we discussed is laid out in some detail in my Brain Drain post, but a quick summary is this: scientific research in many disciplines is becoming more and more dependent on the careful analysis of large datasets. This analysis requires a skill-set as broad as it is deep: scientists must be experts not only in their own domain, but in statistics, computing, algorithm building, and software design as well. Many researchers are working hard to attain these skills; the problem is that academia's reward structure is not well-poised to reward the value of this type of work. In short, time spent developing high-quality reusable software tools translates to less time writing and publishing, which under the current system translates to little hope for academic career advancement. ...

Few scientists know how to use the political system to effect change. We need help from people like Cummings.

AUGUST 19, 2014 BY DOMINIC CUMMINGS

... It was interesting that some very eminent scientists, all much cleverer than ~100% of those in politics [INSERT: better to say 'all with higher IQ than ~100% of those in politics'], have naive views about how politics works. In group discussions, there was little focused discussion about how they could influence politics better even though it is clearly a subject that they care about very much. (Gershenfeld said that scientists have recently launched a bid to take over various local government functions in Barcelona, which sounds interesting.)

... To get things changed in politics, scientists need mechanisms a) to agree priorities in order to focus their actions on b) roadmaps with specifics. Generalised whining never works. The way to influence politicians is to make it easy for them to fall down certain paths without much thought, and this means having a general set of goals but also a detailed roadmap the politicians can apply, otherwise they will drift by default to the daily fog of chaos and moonlight.

...

3. High status people have more confidence in asking basic / fundamental / possibly stupid questions. One can see people thinking ‘I thought that but didn’t say it in case people thought it was stupid and now the famous guy’s said it and everyone thinks he’s profound’. The famous guys don’t worry about looking stupid and they want to get down to fundamentals in fields outside their own.

4. I do not mean this critically but watching some of the participants I was reminded of Freeman Dyson’s comment:

‘I feel it myself, the glitter of nuclear weapons. It is irresistible if you come to them as a scientist. To feel it’s there in your hands. To release the energy that fuels the stars. To let it do your bidding. And to perform these miracles, to lift a million tons of rock into the sky, it is something that gives people an illusion of illimitable power, and it is in some ways responsible for all our troubles... this is what you might call ‘technical arrogance’ that overcomes people when they see what they can do with their minds.’

People talk about rationales for all sorts of things but looking in their eyes the fundamental driver seems to be – am I right, can I do it, do the patterns in my mind reflect something real? People like this are going to do new things if they can and they are cleverer than the regulators. As a community I think it is fair to say that outside odd fields like nuclear weapons research (which is odd because it still requires not only a large collection of highly skilled people but also a lot of money and all sorts of elements that are hard (but not impossible) for a non-state actor to acquire and use without detection), they believe that pushing the barriers of knowledge is right and inevitable. ...

Sunday, August 17, 2014

Genetic Architecture of Intelligence (arXiv:1408.3421)

This paper is based on talks I've given in the last few years. See here and here for video. Although there isn't much that hasn't already appeared in the talks or on this blog (other than some Compressed Sensing results for the nonlinear case) it's nice to have it in one place. The references are meant to be useful to people seriously interested in this subject, although I imagine they are nowhere near comprehensive. Apologies to anyone whose work I missed.

If you don't like the word "intelligence" just substitute "height" and everything will be OK. We live in strange times.

On the genetic architecture of intelligence and other quantitative traits (arXiv:1408.3421)
Categories: q-bio.GN
Comments: 30 pages, 13 figures

How do genes affect cognitive ability or other human quantitative traits such as height or disease risk? Progress on this challenging question is likely to be significant in the near future. I begin with a brief review of psychometric measurements of intelligence, introducing the idea of a "general factor" or g score. The main results concern the stability, validity (predictive power), and heritability of adult g. The largest component of genetic variance for both height and intelligence is additive (linear), leading to important simplifications in predictive modeling and statistical estimation. Due mainly to the rapidly decreasing cost of genotyping, it is possible that within the coming decade researchers will identify loci which account for a significant fraction of total g variation. In the case of height analogous efforts are well under way. I describe some unpublished results concerning the genetic architecture of height and cognitive ability, which suggest that roughly 10k moderately rare causal variants of mostly negative effect are responsible for normal population variation. Using results from Compressed Sensing (L1-penalized regression), I estimate the statistical power required to characterize both linear and nonlinear models for quantitative traits. The main unknown parameter s (sparsity) is the number of loci which account for the bulk of the genetic variation. The required sample size is of order 100s, or roughly a million in the case of cognitive ability.

Saturday, August 16, 2014

Neural Networks and Deep Learning

One of the SCI FOO sessions I enjoyed the most this year was a discussion of deep learning by AI researcher Juergen Schmidhuber. For an overview of recent progress, see this paper. Also of interest: Michael Nielsen's pedagogical book project.

An application which especially caught my attention is described by Schmidhuber here:

Many traditional methods of Evolutionary Computation [15-19] can evolve problem solvers with hundreds of parameters, but not millions. Ours can [1,2], by greatly reducing the search space through evolving compact, compressed descriptions [3-8] of huge solvers. For example, a Recurrent Neural Network [34-36] with over a million synapses or weights learned (without a teacher) to drive a simulated car based on a high-dimensional video-like visual input stream.

More details here. They trained a deep neural net to drive a car using visual input (pixels from the driver's perspective, generated by a video game); output consists of steering orientation and accelerator/brake activation. There was no hard coded structure corresponding to physics -- the neural net optimized a utility function primarily defined by time between crashes. It learned how to drive the car around the track after less than 10k training sessions.

For some earlier discussion of deep neural nets and their application to language translation, see here. Schmidhuber has also worked on Solomonoff universal induction.

These TED videos give you some flavor of Schmidhuber's sense of humor :-) Apparently his younger brother (mentioned in the first video) has transitioned from theoretical physics to algorithmic finance. Schmidhuber on China.

Friday, August 15, 2014

Y Combinator: "fund for the pivot"

I'm catching up on podcasts a bit now that I'm back in Michigan. I had an iTunes problem and was waiting for the next version release while on the road.

Econtalk did a nice interview with Y Combinator President Sam Altman. Y Combinator has always been entrepreneur-centric, to the point that the quality of the founders is one of the main factors they consider (i.e., more important than startup idea or business plan). At around 19 minutes, Altman reveals that they often "fund for the pivot" -- meaning that sometimes they want to place a bet on the entrepreneur even if they think the original idea is doomed. Altman also reveals that Y Combinator never looks at business plans or revenue projections. I can't count the number of times an idiot MBA demanded a detailed revenue projection from one of my startups, at a stage where the numbers and projections were completely meaningless.

Another good observation is about the importance of communication skills in a founder. The leadership team are a central nexus that has to informationally equilibrate the rest of the company + investors + partners + board members + journalists + customers ... This is benefited tremendously by having someone who is articulate, succinct, and can "code switch" so as to speak the native language of an engineer or sales rep or VC.

@30 min or so:

Russ: ... one of the things that happens to me when I come out here in the summer--I live outside of Washington, D.C. and I come out every 6 or 7 weeks in the summer, and come to Stanford--I feel like I'm at the center of the universe. You know, Washington is--everyone in Washington, except for me--

Guest: Thinks they are--

Russ: Thinks they are in the center. And there are things they are in the center in. Obviously. But it's so placid there. And when I come to Stanford, the intellectual, the excitement about products and transforming concepts into reality, is palpable. And then I run into start-up people and venture capitalists. And they are so alive, compared to, say, a lobbyist in Washington, say, just to pick a random example. And there are certain things that just--again, it's almost palpable. You can almost feel them. So the thing is that I notice being here--which are already the next big thing, which at least they feel like they are. [ Visiting Washington DC gives me hives! ]

I recall a Foo Camp (the O'Reilly one, not SCI FOO at Google; perhaps 2007-2010 or so) session led by Paul Graham and some of the other Y Combinator founders/funders. At the time they weren't sure at all that their model would work. It was quite an honest discussion and I think even they must be surprised at how successful they've been since then.

Wednesday, August 13, 2014

Designer babies: selection vs editing

The discussion in this video is sophisticated enough to make the distinction between embryo selection -- the parents get a baby whose DNA originates from them, but the "best baby possible" -- and active genetic editing, which can give the child genes that neither parent had.

The movie GATTACA focuses on selection -- the director made a deliberate decision to eliminate reference to splicing or editing of genes. (Possibly because Ethan Hawke's character Vincent would have no chance competing against edited people.)

At SCI FOO, George Church seemed confident that editing would be an option in the near future. He is convinced that off-target mutations are not a problem for CRISPR. I have not yet seen this demonstrated in the literature, but of course George knows a lot more than what has been published. (Warning: I may have misunderstood his comments as there was a lot of background noise when we were talking.)

One interesting genetic variant (Lrp5?) that I learned about at the meeting, of obvious interest to future splicers and editors, apparently conveys an +8 SD increase in bone strength!

My views on all of this:

... given sufficient phenotype|genotype data, genomic prediction of traits such as cognitive ability will be possible. If, for example, 0.6 or 0.7 of total population variance is captured by the predictor, the accuracy will be roughly plus or minus half a standard deviation (e.g., a few cm of height, or 8 IQ points). The required sample size to extract a model of this accuracy is probably on the order of a million individuals. As genotyping costs continue to decline, it seems likely that we will reach this threshold within five years for easily acquired phenotypes like height (self-reported height is reasonably accurate), and perhaps within the next decade for more difficult phenotypes such as cognitive ability. At the time of this writing SNP genotyping costs are below $50 USD per individual, meaning that a single super-wealthy benefactor could independently fund a crash program for less than $100 million.

Once predictive models are available, they can be used in reproductive applications, rang- ing from embryo selection (choosing which IVF zygote to implant) to active genetic editing (e.g., using powerful new CRISPR techniques). In the former case, parents choosing between 10 or so zygotes could improve their expected phenotype value by a population standard de- viation. For typical parents, choosing the best out of 10 might mean the difference between a child who struggles in school, versus one who is able to complete a good college degree. Zygote genotyping from single cell extraction is already technically well developed [25], so the last remaining capability required for embryo selection is complex phenotype prediction. The cost of these procedures would be less than tuition at many private kindergartens, and of course the consequences will extend over a lifetime and beyond.

The corresponding ethical issues are complex and deserve serious attention in what may be a relatively short interval before these capabilities become a reality. Each society will decide for itself where to draw the line on human genetic engineering, but we can expect a diversity of perspectives. Almost certainly, some countries will allow genetic engineering, thereby opening the door for global elites who can afford to travel for access to reproductive technology. As with most technologies, the rich and powerful will be the first beneficiaries. Eventually, though, I believe many countries will not only legalize human genetic engineering, but even make it a (voluntary) part of their national healthcare systems [26]. The alternative would be inequality of a kind never before experienced in human history.

Here is the version of the GATTACA scene that was cut. The parents are offered the choice of edited or spliced genes conferring rare mathematical or musical ability.

Monday, August 11, 2014

SCI FOO 2014: photos

The day before SCI FOO I visited Complete Genomics, which is very close to the Googleplex.

Self-driving cars:

SCI FOO festivities:

I did an interview with O'Reilly. It should appear in podcast form at some point and I'll post a link.

Obligatory selfie:

Friday, August 08, 2014

Next Super Collider in China?

If you're in particle physics you may have heard rumors that the Chinese government is considering getting into the collider business. Since no one knows what will happen in our field post-LHC, this is a very interesting development. A loose international collaboration has been pushing a new linear collider for some time, perhaps to be built in Japan. But since (1) the results from LHC are thus far not as exciting as some had anticipated, and (2) colliders are very very expensive, the future is unclear.

While in China and Taiwan I was told that it was very likely that a next generation collider project would make it into the coming 5 year science plan. It was even said that the location for the new machine (combining both linear and hadronic components) would be in my maternal ancestral homeland of Shandong province. (Korean physicists will be happy about the proximity of the site :-)

Obviously for the Chinese government the symbolic value of taking the lead in high energy physics is very high -- perhaps on par with putting a man on the moon. In the case of a collider, we're talking about 20 year timescales, so this is a long term project. Stay tuned!

On the importance of experiments, from Voting and Weighing:

There is an old saying in finance: in the short run, the market is a voting machine, but in the long run it's a weighing machine. ...

You might think science is a weighing machine, with experiments determining which theories survive and which ones perish. Healthy sciences certainly are weighing machines, and the imminence of weighing forces honesty in the voting. However, in particle physics the timescale over which voting is superseded by weighing has become decades -- the length of a person's entire scientific career. We will very likely (barring something amazing at the LHC, like the discovery of mini-black holes) have the first generation of string theorists retiring soon with absolutely no experimental tests of their *lifetime* of work. Nevertheless, some have been lavishly rewarded by the academic market for their contributions.

Thursday, August 07, 2014

@ SCI FOO 2014

Sorry for the lack of blog activity. I just returned from Asia and am in Palo Alto for SCI FOO 2014. Hopefully I'll post some cool photos from the event, which starts tomorrow evening. If you are there and read this blog then come over and say hello. If I had free t-shirts I'd give you one, but don't get your hopes up!

Earlier SCI FOO posts.

Sunday, August 03, 2014

It's all in the gene: cows

Some years ago a German driver took me from the Perimeter Institute to the Toronto airport. He was an immigrant to Canada and had a background in dairy farming. During the ride he told me all about driving German farmers to buy units of semen produced by highly prized Canadian bulls. The use of linear polygenic models in cattle breeding is already widespread, and the review article below gives some idea as to the accuracy.

See also Genomic Prediction: No Bull and Plenty of room at the top.

Invited Review: Reliability of genomic predictions for North American Holstein bulls

Journal of Dairy Science Volume 92, Issue 1, Pages 16–24, January 2009.
DOI: http://dx.doi.org/10.3168/jds.2008-1514

Genetic progress will increase when breeders examine genotypes in addition to pedigrees and phenotypes. Genotypes for 38,416 markers and August 2003 genetic evaluations for 3,576 Holstein bulls born before 1999 were used to predict January 2008 daughter deviations for 1,759 bulls born from 1999 through 2002. Genotypes were generated using the Illumina BovineSNP50 BeadChip and DNA from semen contributed by US and Canadian artificial-insemination organizations to the Cooperative Dairy DNA Repository. Genomic predictions for 5 yield traits, 5 fitness traits, 16 conformation traits, and net merit were computed using a linear model with an assumed normal distribution for marker effects and also using a nonlinear model with a heavier tailed prior distribution to account for major genes. The official parent average from 2003 and a 2003 parent average computed from only the subset of genotyped ancestors were combined with genomic predictions using a selection index. Combined predictions were more accurate than official parent averages for all 27 traits. The coefficients of determination (R2) were 0.05 to 0.38 greater with nonlinear genomic predictions included compared with those from parent average alone. Linear genomic predictions had R2 values similar to those from nonlinear predictions but averaged just 0.01 lower. The greatest benefits of genomic prediction were for fat percentage because of a known gene with a large effect. The R2 values were converted to realized reliabilities by dividing by mean reliability of 2008 daughter deviations and then adding the difference between published and observed reliabilities of 2003 parent averages. When averaged across all traits, combined genomic predictions had realized reliabilities that were 23% greater than reliabilities of parent averages (50 vs. 27%), and gains in information were equivalent to 11 additional daughter records. Reliability increased more by doubling the number of bulls genotyped than the number of markers genotyped. Genomic prediction improves reliability by tracing the inheritance of genes even with small effects.

Results and Discussion: ... Marker effects for most other traits were evenly distributed across all chromosomes with only a few regions having larger effects, which may explain why the infinitesimal model and standard quantitative genetic theories have worked well. The distribution of marker effects indicates primarily polygenic rather than simple inheritance and suggests that the favorable alleles will not become homozygous quickly, and genetic variation will remain even after intense selection. Thus, dairy cattle breeders may expect genetic progress to continue for many generations.

... Most animal breeders will conclude that these gains in reliability are sufficient to make genotyping profitable before breeders invest in progeny testing or embryo transfer. Rates of genetic progress should increase substantially as breeders take advantage of these new tools for improving animals (Schaeffer, 2008). Further increases in number of genotyped bulls, revisions to the statistical methods, and additional edits should increase the precision of future genomic predictions.

Table 3

Trait	Parent average		Genomic prediction			Gain from nonlinear genomic prediction compared with published parent average
Trait	Published	Observed	Expected	Linear	Nonlinear
Net merit	30	14	67	53	53	23
Milk yield	35	32	69	56	58	23
Fat yield	35	17	69	65	68	33
Protein yield	35	31	69	58	57	22
Fat percentage	35	29	69	69	78	43
Protein percentage	35	32	69	62	69	34
Productive life	27	28	55	42	45	18

"Horses ain't like people, man. They can't make themselves better than they're born. See, with a horse, it's all in the gene. It's the fucking gene that does the running. The horse has got absolutely nothing to do with it." --- Paulie (Eric Roberts) in The Pope of Greenwich Village

Information Processing

About Me