Information Processing: 07/2010

Saturday, July 31, 2010

SciFoo 2010 notes

There seem to be a lot of physicists here this year. A partial list of theorists: Adi Stern, Chetan Nayak, Frank Wilczek, David Gross, David Tong, Eva Silverstein, Lee Smolin, Erik Verlinde, Alan Guth, Max Tegmark, Paul Davies, Giovanni Amelino-Camelia. Do I count Ed Lu? He was an astronaut for a long time. Guth, who is a very level-headed guy, told me he's now 99 percent confident that inflation is correct, given the CMB results from the last decade. I think I convinced Chetan and maybe Adi that they are actually many worlders ("... if you do a decoherence calculation, and at the end don't insist on throwing away all the parts of the wavefunction except one of the decoherent parts, then you're a many worlder" ;-) Max claims to have a way to get the Born rule from many worlds, but I don't believe him :-) Guth is a many worlder.

I could easily spend all my time at the physics talks, but I think it's better use of this kind of meeting to attend talks outside my specialty. At dinner I met a guy who does fMRI on psychopaths and the guy who built a wind-powered car that goes faster than the wind.

Larry Page addressing the campers.

The campers introducing themselves.

Goofing around with a super croc fossil.

Two Caltechers of my vintage: Tsutomu Shimomura and Ed Felton. We talked about the huge cognitive surplus in physics -- both of these guys were trained in physics before going on to other things.

Thursday, July 29, 2010

The Price of Altruism

I've been meaning to read this book since listening to an interview with the author. See also this review by Razib of the blog Gene Expression.

Amazon: ... The Price of Altruism tells for the first time the moving story of the eccentric American genius George Price (1922–1975), as he strives to answer evolution's greatest riddle. An original and penetrating picture of twentieth century thought, it is also a deeply personal journey. From the heights of the Manhattan Project to the inspired equation that explains altruism to the depths of homelessness and despair, Price's life embodies the paradoxes of Darwin’s enigma. His tragic suicide in a squatter’s flat, among the vagabonds to whom he gave all his possessions, provides the ultimate contemplation on the possibility of genuine benevolence.

The book works well as a biography, and as an intellectual / scientific history for the lay person, but anyone who understands some math will probably want a more precise discussion of what Price actually did. For that, I highly recommend this paper:

George Price’s Contributions to Evolutionary Genetics

J . theor . Biol . (1995) 175, 373–388

George Price studied evolutionary genetics for approximately seven years between 1967 and 1974. During that brief period Price made three lasting contributions to evolutionary theory; these were: (i) the Price Equation, a profound insight into the nature of selection and the basis for the modern theories of kin and group selection; (ii) the theory of games and animal behavior, based on the concept of the evolutionarily stable strategy; and (iii) the modern interpretation of Fisher’s fundamental theorem of natural selection, Fisher’s theorem being perhaps the most cited and least understood idea in the history of evolutionary genetics. This paper summarizes Price’s contributions and brieﬂy outlines why, toward the end of his painful intellectual journey, he chose to focus his deep humanistic feelings and sharp, analytical mind on abstract problems in evolutionary theory.

Here is a wonderful Price quote from the paper (Price was not trained as a geneticist and came to it as an outsider):

When Shannon’s ‘‘Mathematical Theory of Communication’’ appeared in 1948, many scientists must have felt surprise to ﬁnd that at so late a date there had still remained an opportunity to develop so fundamental a scientiﬁc area. Perhaps a similar opportunity exists today in respect to ‘selection theory’. ...

[*Sigh* where is all the low-hanging fruit today? ;-) ]

Harman's book was partially inspired by this article on Price which appeared originally in (the now defunct) Lingua Franca.

I'll go out on a limb and say that for a certain kind of person books like The Price of Altruism can be downright *painful* to read. The whole time I was reading the book I kept thinking: when is the author going to get to the point and give us a concise (compressed = mathematical!) and precise summary of Price's contributions? He finally does in the appendices, but they look like they were cribbed from papers like the one I've linked to above.

Neal Stephenson summarizes my point well in Cryptonomicon, writing about the math prodigy character Lawrence Pritchard Waterhouse (who, in the novel, spends time with Alan Turing at Princeton):

Cryptonomicon: ... The basic problem for Lawrence was that he was lazy. He had figured out that everything was much simpler if, like Superman with his x-ray vision, you just stared through the cosmetic distractions and saw the mathematical skeleton. Once you found the math in the thing, you knew everything about it, and you could manipulate it to your heart's content with nothing more than a pencil and a napkin ...

Of course, not everyone can do this, which brings us to a common theme on this blog: bounded cognition.

Wednesday, July 28, 2010

SciFoo 2010

See you there on Friday! This video is from last year.

BP oil spill Fermi estimates

This isn't meant to minimize the environmental horror of the BP oil spill, but I can't resist some rough estimates. (I did this quickly, so please correct my errors.)

100 days x 50k barrels/day x 150 liters/barrel = 750 million liters

Call it a billion liters of oil: 10^9 liters

Gulf of Mexico: over 2M cubic kilometers of water, or 2 x 10^18 liters

Suppose the spill is concentrated in 1 percent of the Gulf's area (a region 10% by 10% of the Gulf's linear dimensions - about 50 miles by 100 miles). This would presumably only be the case for a limited amount of time, and concentrations would fall off as the oil disperses further. Of course, if the oil is concentrated on a 2-dimensional surface slick, that would be quite bad for anything in the slick.

Then, assuming uniform dispersal within this sub-region, the oil concentration is about 1 part in ten million, or .1 ppm.

Googling around (e.g., ppm oil toxic), I couldn't find evidence of toxicity at any concentrations lower than 1 ppm.

So, aside from shocks to otherwise already endangered species, it seems the long-run effects of the spill won't be that bad. Don't yell at me -- I'm an environmentalist! But numbers don't lie...

Sunday, July 25, 2010

Wikileaks' Afghanistan docs: Pentagon Papers 2.0

It's deja vu all over again... This time, I'm sure no one who is informed will be surprised about the real situation in Afghanistan (the people who would be surprised are probably not going to follow this Wikileaks story anyway). I don't know how it was in 1971, but it seems the Pentagon Papers had a big impact on public opinion.

NYTimes: A six-year archive of classified military documents made public on Sunday offers an unvarnished, ground-level picture of the war in Afghanistan that is in many respects more grim than the official portrayal.

The secret documents, released on the Internet by an organization called WikiLeaks, are a daily diary of an American-led force often starved for resources and attention as it struggled against an insurgency that grew larger, better coordinated and more deadly each year.

The New York Times, the British newspaper The Guardian and the German magazine Der Spiegel were given access to the voluminous records several weeks ago on the condition that they not report on the material before Sunday.

The documents — some 92,000 reports spanning parts of two administrations from January 2004 through December 2009 — illustrate in mosaic detail why, after the United States has spent almost $300 billion on the war in Afghanistan, the Taliban are stronger than at any time since 2001. ...

Julian Assange, meet Dan Ellsberg. See here for more on Ellsberg, including an interview in which he compares Afghanistan and Vietnam.

Thursday, July 22, 2010

Assortative mating, regression and all that: offspring IQ vs parental midpoint

In an earlier post I did a lousy job of trying to estimate the effect of assortative mating on the far tail of intelligence.

Thankfully, James Lee, a real expert in the field, sent me a current best estimate for the probability distribution of offspring IQ as a function of parental midpoint (average between the parents' IQs). James is finishing his Ph.D. at Harvard under Steve Pinker -- you might have seen his review of R. Nesbitt's book Intelligence and how to get it: Why schools and cultures count.

The results are stated further below. Once you plug in the numbers, you get (roughly) the following:

Assuming parental midpoint of n SD above the population average, the kids' IQ will be normally distributed about a mean which is around +.6n with residual SD of about 12 points. (The .6 could actually be anywhere in the range (.5, .7), but the SD doesn't vary much from choice of empirical inputs.)

So, e.g., for n = 4 (parental midpoint of 160 -- very smart parents!), the mean for the kids would be 136 with only a few percent chance of any kid to surpass 160 (requires +2 SD fluctuation). For n = 3 (parental midpoint of 145) the mean for the kids would be 127 and the probability of exceeding 145 less than 10 percent.

No wonder so many physicist's kids end up as doctors and lawyers. Regression indeed! ;-)

Below are some more details; see here for calculations. In my earlier post I arrived at the same formulae as below, but I had rho = 0.

Assuming bivariate normality (and it appears that IQ has been successfully scaled to produce this), the offspring density function is normal with mean n*h^2 and variance 1-(1/2)(1+rho)h^2, where rho is the correlation between mates attributable to assortative mating and h^2 is the narrow-sense heritability.

I put h^2 between .5 and .7. Bouchard and McGue found a median correlation between husband and wife of .33 in their review many years back, but not all of that may be attributable to assortative mating. So anything in (.20, .25) may be a reasonable guesstimate for rho.

In discussing this topic with smart and accomplished parents (e.g., at foo camp, in academic science, or on Wall Street), I've noticed very strong interest in the results ...

See related posts mystery of non-shared environment , regression to the mean

Note: Some people are confused that the value of h^2 = narrow sense (additive) heritability is not higher than (.5 - .7). You may have seen *broad sense* heritability H^2 estimated at values as large as .8 or .9 (e.g., from twin studies). But H^2 includes genetic sources of variation such as dominance and epistasis (interactions between genes, which violate additivity). Because children are not clones of their parents (they only get half of their genes from each parent, and in a random fashion), the correlation between midparent IQ and offspring IQ is not as large as the correlation between the IQs of identical twins. See here and here for more.

Tuesday, July 20, 2010

CrossFit Games

I love this video. I don't do CrossFit, mostly because I'm a bit too old and creaky, although I do use HIIT and Tabata.

Check out the 2010 CrossFit Games -- see link to videos, of training as well as competition. The sport is kind of wacky -- kind of like the early days of triathlon, I guess. The competitors are wannabes in each of the core movements: weak Olympic lifters, clumsy gymnasts, slow sprinters, etc. But they have an all-around versatility.

I like the sisu (toughness) of this Finnish guy, Mikko Salo.

Recent human evolution

Nicholas Wade writes about recent human evolution in the NY Times.

NYTimes: ... Scientists from the Beijing Genomics Institute last month discovered another striking instance of human genetic change. Among Tibetans, they found, a set of genes evolved to cope with low oxygen levels as recently as 3,000 years ago. This, if confirmed, would be the most recent known instance of human evolution.

[I believe the common Tibetan variant is found in about 8% of Han Chinese, so it isn't necessarily that a new mutation swept the Tibetan population during the 3k years. Rather, modern Tibetans are likely to be descended from the people who were well adapted to living at high altitude. As in the case of height (below), extant variation can be acted on by selection faster than new mutations are likely to appear. Most people who don't believe in rapid human evolution don't understand that new mutations are not required. All that is required is a redistribution of allele frequencies. See here for more on the Tibetan results.]

Many have assumed that humans ceased to evolve in the distant past, perhaps when people first learned to protect themselves against cold, famine and other harsh agents of natural selection. But in the last few years, biologists peering into the human genome sequences now available from around the world have found increasing evidence of natural selection at work in the last few thousand years, leading many to assume that human evolution is still in progress.

“I don’t think there is any reason to suppose that the rate has slowed down or decreased,” says Mark Stoneking, a population geneticist at the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany.

... Soft sweeps work on traits affected by many genes, like height. Suppose there are a hundred genes that affect height (about 50 are known already, and many more remain to be found). Each gene exists in a version that enhances height and a version that does not. The average person might inherit the height-enhancing version of 50 of these genes, say, and be of average height as a result.

Suppose this population migrates to a region, like the Upper Nile, where it is an advantage to be very tall. Natural selection need only make the height-enhancing versions of these 100 genes just a little more common in the population, and now the average person will be likely to inherit 55 of them, say, instead of 50, and be taller as a result. Since the height-enhancing versions of the genes already exist, natural selection can go to work right away and the population can adapt quickly to its new home.

In the case of height, no individual gene variant has yet been found which accounts for more than a fraction of a percent of total variance. But, it's possible that half or more of total variance is captured by the 300k SNPs used in recent GWAS studies.

It's very plausible that IQ is like height -- a large number of genes, each of small effect, control most of the variance. I've done some rough calculations, and it seems that the proposed BGI study (which will genotype both an ordinary and an extreme group) has a good chance of detecting a large number of those genes -- stay tuned.

See related post: Recent evolution in humans.

Monday, July 19, 2010

White holes and eternal black holes

New paper! I got interested in this topic because I realized that I didn't understand the quantum aspects of white holes -- i.e., what is the time-reversed equivalent of Hawking radiation (or lack thereof)?

arXiv:1007.2934

Title: White holes and eternal black holes
Authors: Stephen D.H. Hsu
Categories: gr-qc hep-ph hep-th

We investigate isolated white holes surrounded by vacuum, which correspond to the time reversal of eternal black holes that do not evaporate. We show that isolated white holes explode into quasi-thermal radiation.

Here is a figure from the paper:

Caption: A white hole spacetime. We impose the condition that past null infinity $\cal{J}_-^{\rm wh}$ is in the vacuum state -- there is no incoming radiation from the far past. The dotted black line is the initial singularity, and the dashed blue line is the path of a null ray on the anti-horizon. The curved line indicates matter which explodes out of the hole. The dashed black lines refer to modes discussed in the text, which we can think of as originating from the white hole, its ejecta, or from the past, $\cal{J}_-^{\rm wh}$.

Sunday, July 18, 2010

Social Darwinism: 21st century edition

This is a nice summary of economic historian Gregory Clark's views on recent human evolution. See related posts. I think one standard deviation of change in population averages is possible over 1000 years, given plausible values of heritability and correlation between reproductive success and quantitative trait values.

Clark makes a good case (please follow the link and read the paper!). Will modern research rehabilitate the old Social Darwinist ideas of the 19th century?

The Domestication of Man: The Social Implications of Darwin

... Until recently, however, the one creature in the modern farmyard that was believed to be unchanged from Paleolithic times was man himself. We are assumed to still remain in our original wild form. “Our modern skulls house a stone age mind”1. For humans the Darwinian era was presumed to have ended with the Neolithic Revolution. Based on ethnographies of modern forager societies, at the dawn of the settled agrarian era people were impulsive, violent, innumerate, and lazy. Abstract reasoning abilities were limited. If we are biologically identical with these populations then only the thin patina of civilization separates us from the underlying violence and impulsivity of human nature. Scratch away that restraint and we would revert to our natural passions.

In my recent book, A Farewell to Alms: A Brief Economic History of the World I argue two things. First that all societies remained in a state I label the “Malthusian economy” up until the onset of the Industrial Revolution around 1800. In that state crucially the economic laws governing all human societies before 1800 were those that govern all animal societies. Second that was thus subject to natural selection throughout the Malthusian era, even after the arrival of settled agrarian societies with the Neolithic Revolution.

The Darwinian struggle that shaped human nature did not end with the Neolithic Revolution but continued right up until the Industrial Revolution. But the arrival of settled agriculture and stable property rights set natural selection on a very different course. It created an accelerated period of evolution, rewarding with reproductive success a new repertoire of human behaviors – patience, self-control, passivity, and hard work – which consequently spread widely.

And we see in England, from at least 1250, that the kind of people who succeeded in the economic system – who accumulated assets, got skills, got literacy – increased their representation in each generation. Through the long agrarian passage leading up to the Industrial Revolution man was becoming biologically more adapted to the modern economic world. Modern people are thus in part a creation of the market economies that emerged with the Neolithic Revolution. Just as people shaped economies, the pre-industrial economy shaped people. This has left the people of long settled agrarian societies substantially different now from our hunter gatherer ancestors, in terms of culture, and likely also in terms of biology. We are also presumably equivalently different from groups like Australian Aboriginals that never experience the Neolithic Revolution before the arrival of the English settlers in 1788.

The argument here thus unites the doctrines of Malthus and Darwin in studying human history. This is intellectually satisfying since Charles Darwin himself proclaimed his inspiration for On the Origin of Species was Malthus’s On a Principle of Population. ...

Wednesday, July 14, 2010

Max the METs

According to this study regular exercise does not entirely counteract the negative health effects of a sedentary lifestyle. There is an independent benefit from low-level activity throughout the day. That is, risk of heart problems is a function of two (exercise-related) parameters:

1. whether you work out
2. your baseline level of activity.

Remind me to do a few burpees in between every article or calculation! A treadmill in the office isn't a bad idea... :-)

NYTimes: ... In a study published in May in the journal Medicine and Science in Sports and Exercise, they reported that, to no one’s surprise, the men who sat the most had the greatest risk of heart problems. Men who spent more than 23 hours a week watching TV and sitting in their cars (as passengers or as drivers) had a 64 percent greater chance of dying from heart disease than those who sat for 11 hours a week or less. What was unexpected was that many of the men who sat long hours and developed heart problems also exercised. Quite a few of them said they did so regularly and led active lifestyles. The men worked out, then sat in cars and in front of televisions for hours, and their risk of heart disease soared, despite the exercise. Their workouts did not counteract the ill effects of sitting.

Most of us have heard that sitting is unhealthy. But many of us also have discounted the warnings, since we spend our lunch hours conscientiously visiting the gym. We consider ourselves sufficiently active. But then we drive back to the office, settle at our desks and sit for the rest of the day. We are, in a phrase adopted by physiologists, ‘‘active couch potatoes.’’

... adults spend more than nine hours a day in oxymoronic ‘‘sedentary activities.’’ For studies like these, scientists categorize activities by the number of METs they demand. A MET, or metabolic equivalent of task, is a measure of energy, with one MET being the amount of energy you burn lying down for one minute. Sedentary behaviors demand one to one and a half METs, or very little exertion.

Decades ago, before the advent of computers, plasma TVs and Roombas, people spent more time completing ‘‘light-intensity activities,’’ which require between one and a half and three METs. Most ‘‘home activities,’’ like mopping, cooking and changing light bulbs, demand between two and three METs. (One exception is ‘‘butchering animals,’’ a six-MET activity, according to a bogglingly comprehensive compilation from 2000 of the METs associated with different activities.) Nowadays, few of us accumulate much light-intensity activity. We’ve replaced those hours with sitting.

The physiological consequences are only slowly being untangled. In a number of recent animal studies, when rats or mice were not allowed to amble normally around in their cages, they rapidly developed unhealthy cellular changes in their muscles. The animals showed signs of insulin resistance and had higher levels of fatty acids in their blood. Scientists believe the changes are caused by a lack of muscular contractions. If you sit for long hours, you experience no ‘‘isometric contraction of the antigravity (postural) muscles,’’ according to an overview of the consequences of inactivity published this month in Exercise and Sports Sciences Reviews. Your muscles, unused for hours at a time, change in subtle fashion, and as a result, your risk for heart disease, diabetes and other diseases can rise.

Regular workout sessions do not appear to fully undo the effects of prolonged sitting. ...

One MET for a 180 pound male is just over 80 calories per hour, or about 2000 calories per day. Walking is 3-4 METs and doing light work is 1-3 METs. I guess that means on those long travel days I'm burning hundreds of extra calories by walking through airports, standing in line, and staying awake for 24 hours.

Tuesday, July 13, 2010

More perils of precocity

Driving the kids to preschool this morning.

I: "Daddy, does everybody see the same sun?"

M: "If it's day here, is it night in China?"

A short lecture on the earth-sun geometry and rotation of the earth ensues.

I: "But daddy, if the earth is spinning why don't all of these cars fall off?"

Monday, July 12, 2010

BGI: Beijing Genomics Institute

In an earlier post I mentioned BGI (formerly Beijing Genomics Institute, now located in Shenzhen). Below are some excerpts from a Nature article about the Institute, which is funded by a $1.5 billion dollar (!) loan from the China Development Bank.

Some of the researchers at BGI are very young -- the article profiles two who are in their early 20s and already have significant responsibility. Is this really so strange? After all, people who lead teams at startups or at Google or FaceBook, developing key infrastructure, are often not much older.

Nature even ran an editorial about this: Do scientists really need a PhD? Bioinformatics is a good field to try this in as it is computing intensive (even youngsters can produce good code) and a relatively new field (the background genetics and statistics can be taught fairly quickly to smart kids). On the opposite end of the spectrum: particle or string theory, in which even supersmart kids will barely have their footing after 3-5 years of post-BA work. The contrast between fields in which people can quickly get started in research, versus those that have a steep learning curve and lots of accumulated depth, makes for constant misunderstandings and debates over how graduate education should be structured.

Below are pictures of BGI's director and two of the young researchers.

Nature: In 2006, Li Yingrui left Peking University for the BGI, China's premier genome-sequencing institute. Now, freckled and fresh-faced at 23 years old, he baulks at the way a senior BGI colleague characterized his college career — saying Li was wasting time playing video games and sleeping during class. "I didn't sleep in lectures," Li says. "I just didn't go."

He runs a team of 130 bioinformaticians, most no older than himself. His love of games has served him well when deciphering the flood of data spilling out of the BGI's sequencers every day. But "science is more satisfying" than video games, he says. "There's more passion."

The people at the BGI — which stopped officially using the name Beijing Genomics Institute in 2007 after moving its headquarters to Shenzhen — brim with passion, and an ambition so naked that it unsettles some. In the past few years the institute has leapt to the forefront of genome sequencing with a bevy of papers in top-tier journals. Some recent achievements include the genomes of the cucumber1, the giant panda2, the first complete sequence of an ancient human3 and, in this issue of Nature4, the genomes of more than 1,000 species of gut bacteria, compiled from 577 billion base pairs of sequence data.

The mission, BGI staff say with an almost rehearsed uniformity, is to prove that genomics matters to ordinary people. "The whole institute feels this huge responsibility," says Wang Jun, executive director of the BGI and a professor at the University of Copenhagen. The strategy is to sequence — well, pretty much anything that the BGI or its expanding list of collaborators wants to sequence (see 'Mass production'). It has launched projects to tackle 10,000 microbial genomes and those of 1,000 plants and animals as part of an effort to create a genomic tree of life covering the major evolutionary branches. Important species, such as rice, will be sequenced 100 times over, and for humans there seems no limit to the number the institute would like sequenced.

To fulfil that mission, the BGI is transforming itself into a genomics factory, producing cheap, high-quality sequence with an army of young bioinformaticians and a growing arsenal of expensive equipment. In January, the BGI announced the purchase of 128 of the world's newest, fastest sequencers, the HiSeq 2000 from Illumina, each of which can produce 25 billion base pairs of sequence in a day. When all are running at full tilt, the BGI could theoretically sequence more than 10,000 human genomes in a year. This puts it on track to surpass the entire sequencing output of the United States, says David Wheeler, director of the Molecular Biology Computational Resource at Baylor College of Medicine in Houston, Texas. "It is clear there is a new map of the genomics world," he says.

The charge that the BGI has reduced science to brute mechanization does little to ruffle feathers in Shenzhen. Wang himself quips that the BGI brings little intellectual capital into projects: "We are the muscle, we have no brain." But such comments belie a quiet confidence, in everyone from the BGI's seasoned management to its youngest recruits, that they can make an impact not just to the balance of sequencing power but also in biology, medicine and agriculture. This will be a challenge given the significant loans taken out to expand capacity. Torn between scientific and financial goals, even its founder can't seem to decide whether the BGI is a business or a non-profit research institute. Genome scientists around the world are watching to see how it will strike a balance.

... With this breathing room, the BGI has grown to employ 1,500 people nationwide, more than two-thirds of them in Shenzhen, and this is expected to jump to 3,500 by the end of the year. With the investment in new sequencers, provided by a 10-billion-renminbi loan from the China Development Bank, the BGI's capacity will grow, but so will costs.

... The BGI's Luo Ruibang, also a student at the South China University of Technology in Guangzhou, turned 21 while at his last scientific meeting. He says he's had trouble convincing other scientists that, lacking doctoral training, he can do top-notch science. "A lot of the foreigners wonder if I'm really capable," he says. Luo and Li were co-first authors on a paper9 describing the discovery of large DNA segments in the Asian and African genomes that are absent in the Caucasian genome.

Li and his bosses are confident that this youth brigade can piece together and verify sequences. "It is a new field," says Wang. "There is not much experience anyway." But interpreting data and designing experiments are two different things, and BGI staff admit a dearth of knowledge in the latter. "We don't know much about biology," Li says. Liu says the BGI needs to overcome its biological blindspot, but he is supportive of its mission. "They are primarily sequencers, but smart ones with big guns," he says.

... Research alone is not going to pay back the 10-billion renminbi bank loan. The BGI makes some income from collaborations, which account for 40% of the sequencing workload. Outsourced sequencing services for universities, breeding companies or pharmaceutical companies bring in higher margins and account for another 55% of the workload (the final 5% is the BGI's own projects). In 2009, the BGI pulled in 300 million renminbi in revenue. That is not enough, says BGI marketing director Hongsheng Liang. In 2010, Liang hopes to pull in 1.2 billion renminbi.

New income could come from proprietary rights to agricultural applications. The BGI, which owns more than 200 patents, has been attempting to do genomics-based breeding with foxtail millet in Hebei and has other agricultural projects in Laos. More cash could come from expansion of services overseas. Within three years, the institute plans to open offices in Copenhagen and San Francisco. The BGI may also charge for access to its Yanhuang database, a project launched in 2008 to sequence the genomes of 100 Chinese; BGI scientists say they would like to expand this number into the thousands. Although according to Yang, it would be charging "at cost" — to cover computational expenses and maintenance, not for the data. ...

Saturday, July 10, 2010

Beyond Bayes: causality vs correlation

A draft paper by Harvard graduate student James Lee (student of Steve Pinker; I'd love to post the paper here but don't know yet if that's OK) got me interested in the work of statistical learning pioneer Judea Pearl. I found the essay Bayesianism and Causality, or, why I am only a half-Bayesian (excerpted below) a concise, and provocative, introduction to his ideas.

Pearl is correct to say that humans think in terms of causal models, rather than in terms of correlation. Our brains favor simple, linear narratives. The effectiveness of physics is a consequence of the fact that descriptions of natural phenomena are compressible into simple causal models. (Or, perhaps it just looks that way to us ;-)

Judea Pearl: I turned Bayesian in 1971, as soon as I began reading Savage’s monograph The Foundations of Statistical Inference [Savage, 1962]. The arguments were unassailable: (i) It is plain silly to ignore what we know, (ii) It is natural and useful to cast what we know in the language of probabilities, and (iii) If our subjective probabilities are erroneous, their impact will get washed out in due time, as the number of observations increases.

Thirty years later, I am still a devout Bayesian in the sense of (i), but I now doubt the wisdom of (ii) and I know that, in general, (iii) is false. Like most Bayesians, I believe that the knowledge we carry in our skulls, be its origin experience, schooling or hearsay, is an invaluable resource in all human activity, and that combining this knowledge with empirical data is the key to scientific enquiry and intelligent behavior. Thus, in this broad sense, I am a still Bayesian. However, in order to be combined with data, our knowledge must first be cast in some formal language, and what I have come to realize in the past ten years is that the language of probability is not suitable for the task; the bulk of human knowledge is organized around causal, not probabilistic relationships, and the grammar of probability calculus is insufficient for capturing those relationships. Specifically, the building blocks of our scientific and everyday knowledge are elementary facts such as “mud does not cause rain” and “symptoms do not cause disease” and those facts, strangely enough, cannot be expressed in the vocabulary of probability calculus. It is for this reason that I consider myself only a half-Bayesian. ...

Friday, July 09, 2010

New Books in History

This is a consistently good podcast in which historian Marshall Poe (University of Iowa) conducts in-depth (usually an hour or so) discussions with other historians about their recent books. One of the things I enjoy about each podcast is that the author is invited to give a brief precis of their education and intellectual history. You can find it on iTunes or here.

Some specific recommendations:

Jerry Muller, Capitalism and the Jews

Heather Cox Richardson, Wounded Knee: Party Politics and the Road to an American Massacre

P. Bingham and J. Souza, Death From a Distance and the Birth of a Humane Universe

Hilary Earl, The Nuremberg SS-Einsatzgruppen Trial, 1945-1958: Atrocity, Law, and History

Jared Diamond and James A. Robinson, Natural Experiments of History

Benjamin Binstock, Vermeer’s Family Secrets: Genius, Discovery, and the Unknown Apprentice

Mark Bradley and Marilyn Young, Making Sense of the Vietnam Wars

Jennifer Burns, Goddess of the Market: Ayn Rand and the American Right

Peter Fritzsche, Life and Death in the Third Reich

Thomas Wheatland, The Frankfurt School in Exile

James Banner, Jr. and John Gillis, Becoming Historians

Yuma Totani, The Tokyo War Crimes Trials: The Pursuit of Justice in the Wake of World War II

James Mann, The Rebellion of Ronald Reagan: A History of the End of the Cold War

Gregory Cochran, The 10,000 Year Explosion: How Civilization Accelerated Human Evolution (To Poe's credit, his head did not explode during this interview.)

Mark Mazower, Hitler’s Empire: Nazi Rule in Occupied Europe

There are many more, but that's probably enough for now. I listen to these at the gym or while running or in the car. In case Marshall Poe reads this, thanks for many stimulating and enjoyable hours.

Wednesday, July 07, 2010

Chown interview

A nice discussion with Marcus Chown on Bloggingheads. You can get a podcast version as well (iTunes). I wrote about his new book here.

Tuesday, July 06, 2010

Andy Grove on jobs, manufacturing and startups

Sorry for the lack of posts. I am traveling in Canada at the moment. Here's a photo from Mount Tremblant (near Montreal):

This essay by Andy Grove was recommended by commenter LondonYoung, and I thought I'd post an excerpt here. (LY, greetings from 4 techers!)

Bloomberg: Recently an acquaintance at the next table in a Palo Alto, California, restaurant introduced me to his companions: three young venture capitalists from China. They explained, with visible excitement, that they were touring promising companies in Silicon Valley. I’ve lived in the Valley a long time, and usually when I see how the region has become such a draw for global investments, I feel a little proud.

Not this time. I left the restaurant unsettled. Something didn’t add up. Bay Area unemployment is even higher than the 9.7 percent national average. Clearly, the great Silicon Valley innovation machine hasn’t been creating many jobs of late -- unless you are counting Asia, where American technology companies have been adding jobs like mad for years.

The underlying problem isn’t simply lower Asian costs. It’s our own misplaced faith in the power of startups to create U.S. jobs. Americans love the idea of the guys in the garage inventing something that changes the world. New York Times columnist Thomas L. Friedman recently encapsulated this view in a piece called “Start-Ups, Not Bailouts.” His argument: Let tired old companies that do commodity manufacturing die if they have to. If Washington really wants to create jobs, he wrote, it should back startups.

Mythical Moment

Friedman is wrong. Startups are a wonderful thing, but they cannot by themselves increase tech employment. Equally important is what comes after that mythical moment of creation in the garage, as technology goes from prototype to mass production. This is the phase where companies scale up. They work out design details, figure out how to make things affordably, build factories, and hire people by the thousands. Scaling is hard work but necessary to make innovation matter.

The scaling process is no longer happening in the U.S. And as long as that’s the case, plowing capital into young companies that build their factories elsewhere will continue to yield a bad return in terms of American jobs.

Scaling used to work well in Silicon Valley. Entrepreneurs came up with an invention. Investors gave them money to build their business. If the founders and their investors were lucky, the company grew and had an initial public offering, which brought in money that financed further growth.

...

U.S. Versus China

Today, manufacturing employment in the U.S. computer industry is about 166,000 -- lower than it was before the first personal computer, the MITS Altair 2800, was assembled in 1975. Meanwhile, a very effective computer-manufacturing industry has emerged in Asia, employing about 1.5 million workers -- factory employees, engineers and managers.

The largest of these companies is Hon Hai Precision Industry Co., also known as Foxconn. The company has grown at an astounding rate, first in Taiwan and later in China. Its revenue last year was $62 billion, larger than Apple Inc., Microsoft Corp., Dell Inc. or Intel. Foxconn employs more than 800,000 people, more than the combined worldwide head count of Apple, Dell, Microsoft, Hewlett-Packard Co., Intel and Sony Corp.

10-to-1 Ratio

Until a recent spate of suicides at Foxconn’s giant factory complex in Shenzhen, China, few Americans had heard of the company. But most know the products it makes: computers for Dell and HP, Nokia Oyj cell phones, Microsoft Xbox 360 consoles, Intel motherboards, and countless other familiar gadgets. Some 250,000 Foxconn employees in southern China produce Apple’s products. Apple, meanwhile, has about 25,000 employees in the U.S. -- that means for every Apple worker in the U.S. there are 10 people in China working on iMacs, iPods and iPhones. The same roughly 10-to-1 relationship holds for Dell, disk-drive maker Seagate Technology, and other U.S. tech companies.

You could say, as many do, that shipping jobs overseas is no big deal because the high-value work -- and much of the profits -- remain in the U.S. That may well be so. But what kind of a society are we going to have if it consists of highly paid people doing high-value-added work -- and masses of unemployed?

Since the early days of Silicon Valley, the money invested in companies has increased dramatically, only to produce fewer jobs. Simply put, the U.S. has become wildly inefficient at creating American tech jobs. We may be less aware of this growing inefficiency, however, because our history of creating jobs over the past few decades has been spectacular -- masking our greater and greater spending to create each position.

Tragic Mistake

Should we wait and not act on the basis of early indicators? I think that would be a tragic mistake because the only chance we have to reverse the deterioration is if we act early and decisively.

Already the decline has been marked. It may be measured by way of a simple calculation: an estimate of the employment cost- effectiveness of a company. First, take the initial investment plus the investment during a company’s IPO. Then divide that by the number of employees working in that company 10 years later. For Intel, this worked out to be about $650 per job -- $3,600 adjusted for inflation. National Semiconductor Corp., another chip company, was even more efficient at $2,000 per job.

Making the same calculations for a number of Silicon Valley companies shows that the cost of creating U.S. jobs grew from a few thousand dollars per position in the early years to $100,000 today. The obvious reason: Companies simply hire fewer employees as more work is done by outside contractors, usually in Asia.

Alternative Energy

The job-machine breakdown isn’t just in computers. Consider alternative energy, an emerging industry where there is plenty of innovation. Photovoltaics, for example, are a U.S. invention. Their use in home-energy applications was also pioneered by the U.S.

Last year, I decided to do my bit for energy conservation and set out to equip my house with solar power. My wife and I talked with four local solar firms. As part of our due diligence, I checked where they get their photovoltaic panels -- the key part of the system. All the panels they use come from China. A Silicon Valley company sells equipment used to manufacture photo-active films. They ship close to 10 times more machines to China than to manufacturers in the U.S., and this gap is growing. Not surprisingly, U.S. employment in the making of photovoltaic films and panels is perhaps 10,000 -- just a few percent of estimated worldwide employment.

Advanced Batteries

There’s more at stake than exported jobs. With some technologies, both scaling and innovation take place overseas. Such is the case with advanced batteries. It has taken years and many false starts, but finally we are about to witness mass- produced electric cars and trucks. They all rely on lithium-ion batteries. What microprocessors are to computing, batteries are to electric vehicles. Unlike with microprocessors, the U.S. share of lithium-ion battery production is tiny.

That’s a problem. A new industry needs an effective ecosystem in which technology knowhow accumulates, experience builds on experience, and close relationships develop between supplier and customer. The U.S. lost its lead in batteries 30 years ago when it stopped making consumer-electronics devices. Whoever made batteries then gained the exposure and relationships needed to learn to supply batteries for the more demanding laptop PC market, and after that, for the even more demanding automobile market. U.S. companies didn’t participate in the first phase and consequently weren’t in the running for all that followed. I doubt they will ever catch up. ...

Comments on this essay from Tyler Cowen (via Dave Backus). Groves' conclusions may be suspect, but the facts he describes are nevertheless very interesting.

Information Processing

About Me