Information Processing: 08/2012

Thursday, August 30, 2012

Genomic secrets of the dead: high-coverage Denisovan sequence

This new technique may make it easier to obtain sequences of long dead individuals whose burial sites are known.

Science: A High-Coverage Genome Sequence from an Archaic Denisovan Individual

We present a DNA library preparation method that has allowed us to reconstruct a high-coverage (30X) genome sequence of a Denisovan, an extinct relative of Neandertals. The quality of this genome allows a direct estimation of Denisovan heterozygosity, indicating that genetic diversity in these archaic hominins was extremely low. It also allows tentative dating of the specimen on the basis of “missing evolution” in its genome, detailed measurements of Denisovan and Neandertal admixture into present-day human populations, and the generation of a near-complete catalog of genetic changes that swept to high frequency in modern humans since their divergence from Denisovans.

Science News and Analysis: ... As the international team reports in a paper published online in Science this week, more than 99% of the nucleotides are sequenced at least 10 times, so researchers have as sharp a picture of this ancient genome as of a living person's. “No one thought we would have an archaic human genome of such quality,” Meyer says. “Everyone was shocked by the counts. That includes me.”

That precision allows the team to compare the nuclear genome of this girl, who lived in Siberia's Denisova Cave more than 50,000 years ago, directly to the genomes of living people, producing a “near-complete” catalog of the small number of genetic changes that make us different from the Denisovans, who were close relatives of Neandertals. “This is the genetic recipe for being a modern human,” says team leader Svante Pääbo, a paleogeneticist at the institute.

Ironically, this high-resolution genome means that the Denisovans, who are represented in the fossil record by only one tiny finger bone and two teeth, are much better known genetically than any other ancient human—including Neandertals, of which there are hundreds of specimens. The genome offers a glimpse of what the Denisovan girl looked like—her eyes, hair, and skin were brown—and new details about how her lineage evolved. The team confirms that the Denisovans interbred with the ancestors of some living humans and found that Denisovans had little genetic diversity, suggesting that their small population waned further as populations of modern humans expanded.

... By binding special molecules to the ends of a single strand, the ancient DNA was held in place while enzymes copied its sequence. The result was a sixfold to 22-fold increase in the amount of Denisovan DNA sequenced from a meager 10-milligram sample from the girl's finger. The team was able to cover 99.9% of the mappable nucleotide positions in the genome at least once, and more than 92% of the sites at least 20 times, which is considered a benchmark for identifying sites reliably.

... Most exciting to Pääbo is the “nearly complete catalog” of differences in genes between the groups. This includes 111,812 single nucleotides that changed in modern humans in the past 100,000 years or so.

Sunday, August 26, 2012

Back in the MACT

Neal Stephenson, in the foreword to David Foster Wallace's Everything and More, wrote about growing up as a faculty kid in a Midwestern American College Town (MACT). The six year old Stephenson, whose father was a professor of Electrical Engineering, moved to Ames, Iowa the year I was born. He attended Ames High School as did I. When I met Stephenson in person, we compared notes and discovered that some of my friends were the younger siblings of his friends.

Now I am back in the midwest -- in the MACT of East Lansing, Michigan. Feels like home.

Everything and More foreword (2003): When I was a boy growing up in Ames, Iowa, I belonged to a Boy Scout troop whose adult supervision consisted almost entirely of professors from the Iowa State University of Science and Technology ...

Click the link above or the images below to read more.

Wednesday, August 22, 2012

Beating down hash functions

The state of the art in GPU- and statistics-enhanced password cracking. Crackers beating down information entropy just like in the old days at Bletchley Park! (Trivia question: what are "bans" and "cribs"? Answers)

Ars technica: ... An even more powerful technique is a hybrid attack. It combines a word list, like the one used by Redman, with rules to greatly expand the number of passwords those lists can crack. Rather than brute-forcing the five letters in Julia1984, hackers simply compile a list of first names for every single Facebook user and add them to a medium-sized dictionary of, say, 100 million words. While the attack requires more combinations than the mask attack above—specifically about 1 trillion (100 million * 104) possible strings—it's still a manageable number that takes only about two minutes using the same AMD 7970 card. The payoff, however, is more than worth the additional effort, since it will quickly crack Christopher2000, thomas1964, and scores of others.

"The hybrid is my favorite attack," said Atom, the pseudonymous developer of Hashcat, whose team won this year's Crack Me if You Can contest at Defcon. "It's the most efficient. If I get a new hash list, let's say 500,000 hashes, I can crack 50 percent just with hybrid."

With half the passwords in a given breach recovered, cracking experts like Atom can use Passpal and other programs to isolate patterns that are unique to the website from which they came. They then write new rules to crack the remaining unknown passwords. More often than not, however, no amount of sophistication and high-end hardware is enough to quickly crack some hashes exposed in a server breach. To ensure they keep up with changing password choices, crackers will regularly brute-force crack some percentage of the unknown passwords, even when they contain as many as nine or more characters.

"It's very expensive, but you do it to improve your model and keep up with passwords people are choosing," said Moxie Marlinspike, another cracking expert. "Then, given that knowledge, you can go back and build rules and word lists to effectively crack lists without having to brute force all of them. When you feed your successes back into your process, you just keep learning more and more and more and it does snowball."

deCODE, de novo mutations, and autism risk

An interesting Nature paper from deCODE researchers. I find the result stated in the Times coverage of the article a bit shocking. (I haven't read the paper carefully enough to know whether this is an accurate summary.) If these de novo mutations are responsible for a big chunk of the increased autism risk in children of older fathers, then it would be the case that of order 100 additional mutations increases the probability of autism by about 1 percent. But this would imply that about 1 in 10k mutations are in an area strongly affecting brain development, specifically autism risk. For example, over 100k loci (out of 3 billion in the whole genome) have a large (i.e., increasing the probability to close to unity) effect on autism risk, or a much much larger number of loci have at least a small effect on autism risk.

An obvious confound is that older fathers may tend to be geekier than younger fathers, thereby correlating paternal age and autism risk independent of age-related sperm mutation rate.

NYTimes: Older men are more likely than young ones to father a child who develops autism or schizophrenia, because of random mutations that become more numerous with advancing paternal age, scientists reported on Wednesday, in the first study to quantify the effect as it builds each year. The age of mothers had no bearing on the risk for these disorders, the study found.

... The new investigation, led by the Icelandic firm deCODE Genetics, analyzed genetic material taken from blood samples of 78 parent-child trios, focusing on families in which parents with no signs of a mental disorder gave birth to a child who developed autism or schizophrenia. This approach allows scientists to isolate brand-new mutations in the genes of the child that were not present in the parents.

Most people have many of these so-called de novo mutations, which occur spontaneously at or near conception, and a majority of them are harmless. But recent studies suggest that there are several such changes that can sharply increase the risk for autism and possibly schizophrenia — and the more a child has, the more likely he or she is by chance to have one of these rare, disabling ones.

Some difference between the paternal and maternal side is to be expected. Sperm cells divide every 15 days or so, whereas egg cells are relatively stable, and continual copying inevitably leads to errors, in DNA as in life.

Still, when the researchers removed the effect of paternal age, they found no difference in genetic risk between those who had a diagnosis of autism or schizophrenia and a control group of Icelanders who did not. “It is absolutely stunning that the father’s age accounted for all this added risk, given the possibility of environmental factors and the diversity of the population,” said Dr. Kari Stefansson, the chief executive of deCODE and the study’s senior author. “And it’s stunning that so little is contributed by the age of the mother.”

Note Added: After reading the Nature paper more carefully I find the Times article take on the results hard to justify. Yes, de novo mutation rate is significantly higher in sperm from older men. However, a consequent large increase in autism risk is not (as far as I can tell) strongly supported by the results in the paper. It might be true, but I don't regard the evidence as very strong yet.

Monday, August 20, 2012

Genomic prediction: no bull

This Atlantic article discusses the application of genomic prediction to cattle breeding. Breeders have recently started switching from pedigree based methods to statistical models incorporating SNP genotypes. We can now make good predictions of phenotypes like milk and meat production using genetic data alone. Cows are easier than people because, as domesticated animals, they have smaller effective breeding population and less genetic diversity. Nevertheless, I expect very similar methods to be applied to humans within the next 5-10 years.

The Atlantic: ... the semen that Badger-Bluff Fanny Freddie produces has become such a hot commodity in what one artificial-insemination company calls "today's fast paced cattle semen market." In January of 2009, before he had a single daughter producing milk, the United States Department of Agriculture took a look at his lineage and more than 50,000 markers on his genome and declared him the best bull in the land. And, three years and 346 milk- and data-providing daughters later, it turns out that they were right. "When Freddie [as he is known] had no daughter records our equations predicted from his DNA that he would be the best bull," USDA research geneticist Paul VanRaden emailed me with a detectable hint of pride. "Now he is the best progeny tested bull (as predicted)."

Data-driven predictions are responsible for a massive transformation of America's dairy cows. While other industries are just catching on to this whole "big data" thing, the animal sciences -- and dairy breeding in particular -- have been using large amounts of data since long before VanRaden was calculating the outsized genetic impact of the most sought-after bulls with a pencil and paper in the 1980s.

Dairy breeding is perfect for quantitative analysis. Pedigree records have been assiduously kept; relatively easy artificial insemination has helped centralized genetic information in a small number of key bulls since the 1960s; there are a relatively small and easily measurable number of traits -- milk production, fat in the milk, protein in the milk, longevity, udder quality -- that breeders want to optimize; each cow works for three or four years, which means that farmers invest thousands of dollars into each animal, so it's worth it to get the best semen money can buy. The economics push breeders to use the genetics.

The bull market (heh) can be reduced to one key statistic, lifetime net merit, though there are many nuances that the single number cannot capture. Net merit denotes the likely additive value of a bull's genetics. The number is actually denominated in dollars because it is an estimate of how much a bull's genetic material will likely improve the revenue from a given cow. A very complicated equation weights all of the factors that go into dairy breeding and -- voila -- you come out with this single number. For example, a bull that could help a cow make an extra 1000 pounds of milk over her lifetime only gets an increase of $1 in net merit while a bull who will help that same cow produce a pound more protein will get $3.41 more in net merit. An increase of a single month of predicted productive life yields $35 more.

When you add it all up, Badger-Fluff Fanny Freddie has a net merit of $792. No other proven sire ranks above $750 and only seven bulls in the country rank above $700.

See below -- theoretical calculations suggest that even outliers with net merit of $700-800 will be eclipsed by specimens with 10x higher merit that can be produced by further selection on existing genetic variation. Similar results apply to humans.

... It turned out they were in the perfect spot to look for statistical rules. They had databases of old and new bull semen. They had old and new production data. In essence, it wasn't that difficult to generate rules fortransforming genomic data into real-world predictions. Despite -- or because of -- the effectiveness of traditional breeding techniques, molecular biology has been applied in the field for years in different ways. Given that breeders were trying to discover bulls' hidden genetic profiles by evaluating the traits in their offspring that could be measured, it just made sense to start generating direct data about the animals' genomes.

"Each of the bulls on the sire list, we have 50,000 genetic markers. Most of those, we have 700,000," the USDA's VanRaden said. "Every month we get another 12,000 new calves, the DNA readings come in and we send the predictions out. We have a total of 200,000 animals with DNA analysis. That's why it's been so easy. We had such a good phenotype file and we had DNA stored on all these bulls."

... Nowadays breeders can choose between "genomic bulls," which have been evaluated based purely on their genes and "proven bulls," for which real world data is available. Discussions among dairy breeders show that many are beginning to mix in younger bulls with good-looking genomic data into the breeding regimens. How well has it gone? The first of the bulls who were bred from their genetic profiles alone, are receiving their initial production data. So far, it seems as if the genomic estimates were a little high, but more accurate than traditional methods alone.

The unique dataset and success of dairy breeders now has other scientists sniffing around their findings. Leonid Kruglyak, a genomics professor at Princeton, told me that "a lot of the statistical techniques and methodology" that connect phenotype and genotype were developed by animal breeders. In a sense, they are like codebreakers. If you know the rules of encoding. it's not difficult to put information in one end and have it pop out the other as a code. But if you're starting with the code, that's a brutally difficult problem. And it's the one that diary geneticists have been working on.

(Kruglyak was a graduate student in biophysics at Berkeley under Bill Bialek when I was there.)

... John Cole, yet another USDA animal improvement scientist, generated an estimate of the perfect bull by choosing the optimal observed genetic sequences and hypothetically combining them. He found that the optimal bull would have a net merit value of $7,515, which absolutely blows any current bull out of the water. In other words, we're nowhere near creating the perfect milk machine.

Here's a recent paper on the big data aspects of genomic selection applied to animal breeding.

Sunday, August 19, 2012

Recent human evolution: European height

These results were announced last year at a conference talk, now hot off the press at Nature Genetics. As stated in the abstract below, the results are important because they show that selection pressure can work on existing variation in polygenic quantitative traits such as height (no new mutations required! See also here). Group differences in the phenotype are (according to the analysis) not due to drift or founder effects -- it's selection at work. Of course, this has been demonstrated many times in the lab, but certain people refuse to believe the results could apply to Homo sapiens, over timescales of order 10ky.

Nature Genetics: Evidence of widespread selection on standing variation in Europe at height-associated SNPs

Strong signatures of positive selection at newly arising genetic variants are well documented in humans1, 2, 3, 4, 5, 6, 7, 8, but this form of selection may not be widespread in recent human evolution9. Because many human traits are highly polygenic and partly determined by common, ancient genetic variation, an alternative model for rapid genetic adaptation has been proposed: weak selection acting on many pre-existing (standing) genetic variants, or polygenic adaptation10, 11, 12. By studying height, a classic polygenic trait, we demonstrate the first human signature of widespread selection on standing variation. We show that frequencies of alleles associated with increased height, both at known loci and genome wide, are systematically elevated in Northern Europeans compared with Southern Europeans (P < 4.3 × 10−4). This pattern mirrors intra-European height differences and is not confounded by ancestry or other ascertainment biases. The systematic frequency differences are consistent with the presence of widespread weak selection (selection coefficients ~10−3–10−5 per allele) rather than genetic drift alone (P < 10−15).

Here's part of what I wrote a year ago about this result:

3. If the results on selection hold up this will be clear evidence for differential selection between groups of a quantitative trait (as opposed to lactose or altitude tolerance, which are controlled by small sets of loci). We may soon be able to conclude that there has been enough evolutionary time for selection to work within European populations on a trait that is controlled by hundreds (probably thousands) of loci.

Current best guess for number of height loci is of order 10k.

Friday, August 17, 2012

"For the historians and the ladies"

The excerpts below are from interviews with Benoit Mandelbrot.

On the birth of molecular biology under Max Delbruck at Caltech:

I would say the more important event was quite outside of my life's work, the arrival of Max Delbruck. Now Max Delbruck was, I think, one of the great personalities of those times. He was a physicist by training, a man belonging to one of the very highest families in Prussia, in many ways a great liberal, in many ways a great authoritarian.

Max Delbruck had been told in the '30s, according to rumour, that he was just not good enough to be doing physics as well as he hoped; that while Bethe would spend a few hours and write seventeen pages of flawless mathematics, and Weisskopf wrote only fifteen, but equally flawless - and Pauli always preferred Bethe to Weisskopf because of this difference - well, Delbruck only wrote five and there were bugs to fix. He was not up to this competition, which he was finding himself in.

But he had branched into biology, the first physicist (to do so). Schrödinger had written about him in his book What Is Life? Delbruck had suddenly become a challenge. And the challenge was met by an equally remarkable man named Beadle who was a farmer from Nebraska who had gone to university to learn more about corn to improve the yield of his farm and moved on, and by that time had become the chairman of the Biology division at Caltech. Beadle was a very bold person. He hired Delbruck, who knew no biology, and told him, "You go and learn biology by teaching Biology One to freshmen, and then go on and do your thing," - which was to introduce quantum mechanics ideas into biology.

Now, I never knew what they were doing. They were very - I wouldn't say secretive, they were just a little elusive. They never said what they were doing. Well it was clear they were doing something extraordinarily exciting which for me was again this very strong hope of bringing hard mathematical or physical thinking to a field, which had been very soft before.

"For the historians and the ladies"; saved by Oppenheimer and von Neumann:

... I met Oppenheimer now and then, and then saw him in a train from Princeton to New York. We chatted. He asked me what I was doing, and I explained to him what I was doing. He became extraordinarily excited. He said, "It's fantastic. It's extraordinary, that you could have found a way of applying, genuinely thinking of applying thermodynamics to something so different, and that you find everything will be decided upon analytic properties like partition function," and of course he understood instantly.

And he asked me to give a lecture, and he said, "Make it a lecture in the evening, for the historians and the ladies." Those were his words. "Make it easy."

So one evening, which was fixed with his secretary, I came up and I was expecting to find "historians and ladies." To my surprise, Oppenheimer was there. I tried to block his way, and he went, "No, I'm very much interested.' And then Von Neumann came. I said, "You know my story completely. I didn't prepare a lecture for you." "Well," he says, "perhaps not, but I'm the chairman and the discussion may be interesting." Now, the lecture was, again, to explain to this group that the kind of thinking represented by thermodynamics, as Tolman had made me understand it, could be applied outside and that in a way the abyss between, how to say, humanities and sciences was something you could bridge. That was Oppenheimer's idea, or something like it.

Well, I saw all these great men and their spouses arriving, and my heart sank to my heels. I became totally incoherent. I had prepared a very simplified lecture, which I could not do at that age - I could do it now, perhaps, but not then. I tried to change it into a lecture for them. They started falling asleep and snoring. It was a horrible sight. I stopped after forty-five minutes saying, "Ladies and gentlemen, thank you for your attention," etc., etc. Mild applause. Von Neumann stands up and says, "Any questions?" Dead silence. Then a friend asked a question. "Any other question?" Dead silence. Another friend asked a question. And when Von Neumann was about to close the lecture, somebody named Otto Neumenbour stands up in the last row and says, "I have, not a question, but a statement. This is the worst lecture I have attended in my life. I have not understood one word the speaker said. I don't see any relation with the title," and he went on like that, until Oppenheimer stopped him. "Otto, Otto. Please, let me respond, if Dr Mandelbrot would be so kind as to allow me to respond instead for him. The title is very unfortunate. I gave it to my secretary. Dr Mandelbrot should have changed it to be more appropriate. As to the content, well, it may well be that he didn't make justice to his own work, but I believe I understand his every word, and he has a point."

Now he went on into the celebrated Oppenheimer lecture. In fact he was the fear of any lecturer in physics - after he had struggled for an hour explaining things, Oppenheimer would stand up and say, "Well, if I understand correctly, this is what you said." In ten minutes he would speak flawlessly - finished sentences and everything! At that time everybody woke up. They said, "That miserable lecturer, he was trying to say these things!" Well, then when he sat down, Von Neumann stood up and said, "Well, I too have some comments to make about Dr Mandelbrot's lecture. We've had a number of interesting discussions and it may well be that he didn't do much justice to his work." Well, it was an abominable lecture, and he went on in his style which was very different, explaining what he saw in my work and why it was interesting, why this and that. Well, needless to say, I was taken in time by my friends to the only place in Princeton that served beer at that time.

Next day I went to see to Neumenbour, when I entered he said, "Oh, I'm very sorry. I made a fool of myself yesterday. Please excuse me," etc., etc. (I replied) "No, I am coming to thank you. It was you, in a way, that started the real lecture." Well, so you see what was happening was that Von Neumann - and Oppenheimer, I think- understood what I was trying to do, and Von Neumann wanted to encourage me very strongly.

Note added: In this Caltech oral history, Delbruck recalls that at one point he was studying Fisher's The Genetical Theory of Natural Selection, and mentioned this to a visitor from the Rockefeller Foundation (the sponsor of Delbruck's fellowship at the time). He was immediately offered the chance to work with Fisher, but decided to go to Caltech (as a postdoc; he was hired back as a professor some years later) instead. Also, see comments for corrections to Mandelbrot's story from a biologist.

Wednesday, August 15, 2012

Better to be lucky than good

Shorter Taleb (much of this was discussed in his first book, Fooled by Randomness):

Fat tails + nonlinear feedback means that the majority of successful traders were successful due to luck, not skill. It's painful to live in the shadow of such competitors.

What other fields are dominated by noisy feedback loops? See Success vs Ability , Nonlinearity and noisy outcomes , The illusion of skill and Fake alpha.

Why It is No Longer a Good Idea to Be in The Investment Industry

Nassim N. Taleb

Abstract: A spurious tail is the performance of a certain number of operators that is entirely caused by luck, what is called the “lucky fool” in Taleb (2001). Because of winner-take-all-effects (from globalization), spurious performance increases with time and explodes under fat tails in alarming proportions. An operator starting today, no matter his skill level, and ability to predict prices, will be outcompeted by the spurious tail. This paper shows the effect of powerlaw distributions on such spurious tail. The paradox is that increase in sample size magnifies the role of luck.

... The “spurious tail” is therefore the number of persons who rise to the top for no reasons other than mere luck, with subsequent rationalizations, analyses, explanations, and attributions. The performance in the “spurious tail” is only a matter of number of participants, the base population of those who tried. Assuming a symmetric market, if one has for base population 1 million persons with zero skills and ability to predict starting Year 1, there should be 500K spurious winners Year 2, 250K Year 3, 125K Year 4, etc. One can easily see that the size of the winning population in, say, Year 10 depends on the size of the base population Year 1; doubling the initial population would double the straight winners. Injecting skills in the form of better-than-random abilities to predict does not change the story by much.

Because of scalability, the top, say 300, managers get the bulk of the allocations, with the lion’s share going to the top 30. So it is obvious that the winner-take-all effect causes distortions ...

Conclusions: The “fooled by randomness” effect grows under connectivity where everything on the planet flows to the “top x”, where x is becoming a smaller and smaller share of the top participants. Today, it is vastly more acute than in 2001, at the time of publication of (Taleb 2001). But what makes the problem more severe than anticipated, and causes it to grow even faster, is the effect of fat tails. For a population composed of 1 million track records, fat tails multiply the threshold of spurious returns by between 15 and 30 times.

Generalization: This condition affects any business in which prevail (1) some degree of fat-tailed randomness, and (2) winner-take-all effects in allocation.

To conclude, if you are starting a career, move away from investment management and performance related lotteries as you will be competing with a swelling future spurious tail. Pick a less commoditized business or a niche where there is a small number of direct competitors. Or, if you stay in trading, become a market-maker.

Bonus question: what are the ramifications for tax and economic policies (i.e., meant to ensure efficiency and just outcomes) of the observation that a particular industry is noise dominated?

Tuesday, August 14, 2012

Knightmare

I received this from a practitioner (former physicist) a while ago:

The best I have seen on the Knightmare is this: http://www.nanex.net/aqck2/3525.html

Also worth noting that the two big shots that made Knight what it is left for Citadel a year ago. Perhaps related, but perhaps not. What a debacle...

Earlier post $440M in 45 minutes.

Monday, August 13, 2012

Chomsky on po-mo

Noam Chomsky stirs up some trouble. See also here.

A commenter provides another Chomsky quote:

"Since no one has succeeded in showing me what I'm missing, we're left with the second option: I'm just incapable of understanding. I'm certainly willing to grant that it may be true, though I'm afraid I'll have to remain suspicious, for what seem good reasons. There are lots of things I don't understand -- say, the latest debates over whether neutrinos have mass or the way that Fermat's last theorem was (apparently) proven recently. But from 50 years in this game, I have learned two things: (1) I can ask friends who work in these areas to explain it to me at a level that I can understand, and they can do so, without particular difficulty; (2) if I'm interested, I can proceed to learn more so that I will come to understand it. Now Derrida, Lacan, Lyotard, Kristeva, etc. --- even Foucault, whom I knew and liked, and who was somewhat different from the rest --- write things that I also don't understand, but (1) and (2) don't hold: no one who says they do understand can explain it to me and I haven't a clue as to how to proceed to overcome my failures. That leaves one of two possibilities: (a) some new advance in intellectual life has been made, perhaps some sudden genetic mutation, which has created a form of "theory" that is beyond quantum theory, topology, etc., in depth and profundity; or (b) ... I won't spell it out."

Sunday, August 12, 2012

On doping

All dopers? Or just a minority? I had friends in high school who took steroids. At the time, many coaches, doctors and "sports scientists" claimed the drugs didn't work (placebo effect, they said). But it was obvious that they did. I distinctly remember this as the point at which I became very suspicious of statements by medical and "scientific" authorities. If they were wrong about something as simple as this, what else could they be wrong about?

Der Spiegel: Angel Heredia, once a doping dealer and now a chief witness for the U.S. Justice Department, talks about the powerlessness of the investigators, the motives of athletes who cheat and the drugs of the future.

He had been in hiding under an assumed name in a hotel in Laredo, Texas, for two years when the FBI finally caught up with him. The agents wanted to know from Angel Heredia if he knew a coach by the name of Trevor Graham, whether he carried the nickname “Memo”, and what he knew about doping. "No", "no", "nothing" – those were his replies. But then the agents laid the transcripts of 160 wiretapped telephone conversations on the table, as well as the e-mails and the bank statements. That’s when Angel "Memo" Heredia knew that he had lost. He decided to cooperate, and he also knew that he would only have a chance if he didn’t lie – not a single time. “He’s telling the truth,” the investigators say about Heredia today.

SPIEGEL: Mr. Heredia, will you watch the 100 meter final in Beijing?

Heredia: Of course. But the winner will not be clean. Not even any of the contestants will be clean.

SPIEGEL: Of eight runners ... Heredia: ... eight will be doped. ...

Heredia: Yes. When the season ended in October, we waited for a couple of weeks for the body to cleanse itself. Then in November, we loaded growth hormone and epo, and twice a week we examined the body to make sure that no lumps were forming in the blood. Then we gave testosterone shots. This first program lasted eight to ten weeks, then we took a break.

SPIEGEL: And then the goals for the season were established?

Heredia: Yes, that depended on the athlete. Some wanted to run a good time in April to win contracts for the tournaments. Others focused on nothing but the trials, the U.S. qualification for international championships. Others cared only about the Olympics. Then we set the countdown for the goal in question, and the next cycle began. I had to know my athletes well and have an overview of what federation tested with which methods.

SPIEGEL: Where does one get this information?

Heredia: Vigilance. Informers. ...

SPIEGEL: What trainers have you worked together with? Heredia: Particularly with Trevor Graham.

SPIEGEL: Graham has a lifetime ban because he purportedly helped Marion Jones, Tim Montgomery, Justin Gatlin and many others to cheat. Who else?

Heredia: With Winthrop Graham, his cousin. With John Smith, Maurice Greene’s coach. With Raymond Stewart, the Jamaican. With Dennis Mitchell ...

SPIEGEL: ... who won gold in the 4 x 100 meters in 1992 and today is a coach. How did the collaboration work?

Heredia: It’s a small world. It gets around who can provide you with something how quickly and at what price, who is discreet. The coaches approached me and asked if I could help them, and I said: yes. Then they gave me money, $15,000 or thereabouts, we got a first shipment and then we did business. At some point it led to one-on-one cooperation with the athletes.

SPIEGEL: Was there a regimen of sorts?

Heredia: Yes. I always combined several things. For example, I had one substance called actovison that increased blood circulation – not detectable. That was good from a health standpoint and even better from a competitive standpoint. Then we had the growth factors IGF-1 and IGF-2. And epo. Epo increases the number of red blood cells and thus the transportation of oxygen, which is the key for every athlete: the athlete wants to recover quickly, keep the load at a constantly high level and achieve a constant performance.

SPIEGEL: Once again: a constant performance at the world-class level is unthinkable without doping?

Heredia: Correct. 400 meters in 44 seconds? Unthinkable. 71 meters with a discus? No way. You might be able to run 100 meters in 9.8 seconds once with a tailwind. But ten times a year under 10 seconds, in the rain or heat? Only with doping. Part III: “If he maintains he is clean, I can only answer that that is a lie.”

SPIEGEL: Testosterone, growth hormone, epo – that was your combination?

Heredia: Yes, with individual variations. And then amazing things are possible. In 2002 Jerome Young was ranked number 38 in the 400 meters. Then we began to work together, and in 2003 he won almost every big race.

SPIEGEL: How were you paid?

Heredia: I had an annual wage. For big wins I got a $40,000 bonus. ...

Thursday, August 09, 2012

Gell-Mann, Feynman, Everett

This site is a treasure trove of interesting video interviews -- including with Francis Crick, Freeman Dyson, Sydney Brenner, Marvin Minsky, Hans Bethe, Donald Knuth, and others. Many of the interviews have transcripts, which are much faster to read than listening to the interviews themselves.

Here's what Murray Gell-Mann has to say about quantum foundations:

In '63…'64 I worked on trying to understand quantum mechanics, and I brought in Felix Villars and for a while some comments... there were some comments by Dick Feynman who was nearby. And we all agreed on a rough understanding of quantum mechanics and the second law of thermodynamics and so on and so on, that was not really very different from what I'd been working on in the last ten or fifteen years.

I was not aware, and I don't think Felix was aware either, of the work of Everett when he was a graduate student at Princeton and worked on this, what some people have called 'many worlds' idea, suggested more or less by Wheeler. Apparently Everett was, as we learned at the Massagon [sic] meeting, Everett was an interesting person. He… it wasn't that he was passionately interested in quantum mechanics; he just liked to solve problems, and trying to improve the understanding of quantum mechanics was just one problem that he happened to look at. He spent most of the rest of his life working for the Weapon System Evaluation Group in Washington, WSEG, on military problems. Apparently he didn't care much as long as he could solve some interesting problems! [Some of these points, concerning Everett's life and motivations, and Wheeler's role in MW, are historically incorrect.]

Anyway, I didn't know about Everett's work so we discovered our interpretation independent of Everett. Now maybe Feynman knew about… about Everett's work and when he was commenting maybe he was drawing upon his knowledge of Everett, I have no idea, but… but certainly Felix and I didn't know about it, so we recreated something related to it.

Now, as interpreted by some people, Everett's work has two peculiar features: one is that this talk about many worlds and equally… many worlds equally real, which has confused a lot of people, including some very scholarly students of quantum mechanics. What does it mean, 'equally real'? It doesn't really have any useful meaning. What the people mean is that there are many histories of the… many alternative histories of the universe, many alternative course-grained, decoherent histories of the universe, and the theory treats them all on an equal footing, except for their probabilities. Now if that's what you mean by equally real, okay, but that's all it means; that the theory treats them on an equal footing apart from their probabilities. Which one actually happens in our experience, is a different matter and it's determined only probabilistically. Anyway, there's considerable continuity between the thoughts of '63-'64 and the thoughts that, and… and maybe earlier in the ‘60s, and the thoughts that Jim Hartle and I have had more recently, starting around '84-'85.

Indeed, Feynman was familiar with Everett's work -- see here and here.

Where Murray says "it's determined only probabilistically" I would say there is a subjective probability which describes how surprised one is to find oneself on a particular decoherent branch or history of the overall wavefunction -- i.e., how likely or unlikely we regard the outcomes we have observed to have been. For more see here.

Murray against Copenhagen:

... although the so-called Copenhagen interpretation is perfectly correct for all laboratory physics, laboratory experiments and so on, it's too special otherwise to be fundamental and it sort of strains credulity. It's… it’s not a convincing fundamental presentation, correct though… though it is, and as far as quantum cosmology is concerned it's hopeless. We were just saying, we were just quoting that old saw: describe the universe and give three examples. Well, to apply the… the Copenhagen interpretation to quantum cosmology, you'd need a physicist outside the universe making repeated experiments, preferably on multiple copies of the universe and so on and so on. It's absurd. Clearly there is a definition to things happening independent of human observers. So I think that as this point of view is perfected it should be included in… in teaching fairly early, so that students aren't convinced that in order to understand quantum mechanics deeply they have to swallow some of this…very… some of these things that are very difficult to believe. But in the end of course, one can use the Copenhagen interpretations perfectly okay for experiments.

The greatest of all time

Bolt becomes the first man to repeat in the 100m and 200m, holding the world and Olympic records in both. As I said in my earlier post, he is a 1 in a billion (or more) talent.

“If he wins, that should end the debate about who is the greatest sprinter in history,” Ato Boldon of Trinidad and Tobago, who won bronze medals in the 200 at the 1996 Atlanta Games and the 2000 Sydney Games, said before the race. “Anyone saying it is not Bolt would be doing it without objectivity.”

In other news, as I predicted years ago, former UO competitor Ashton Eaton is the Olympic champion and world record holder in the decathlon.

Wednesday, August 08, 2012

The path not taken

Parvez did a PhD in theoretical physics at Oregon and then joined my first startup. Subsequently he worked at Microsoft and then founded his own company, which was recently acquired (I was briefly on the board of advisors). We had a nice breakfast this morning, dwelling on both the past and future. Note the shiny new Porsche :-)

See A tale of two geeks, Path integrals.

Tuesday, August 07, 2012

Quantum correspondence

I've been corresponding with a German theoretical physicist ("R") recently about quantum mechanics and thought I would share some of it here.

[R] Dear Prof.Hsu: I enjoyed reading your recent, very clearly written paper On the origin of probability in quantum mechanics very much. I discussed its subject matter oftentimes with Hans-Dieter Zeh ... We both think that many worlds is an idea that is probably true in some sense.

[ME] I have corresponded with Dieter over the years and read most (all?) of his work in this area. I would say we do not really disagree about anything.

To me many worlds (MW) is very appealing and should really be considered the "minimal" interpretation of QM since I do not know of any other logically complete interpretations.

However, anyone who endorses MW should think very carefully about the origin of probability. Since MW is really a deterministic theory (at least from the viewpoint of a "global" observer not subject to decoherence), the only kind of probabilities it allows are subjective ones.

It is disturbing to me that most versions of me in the multiverse do not believe in the Born Rule (and probably then don't believe in QM!). MW proponents (e.g., Deutsch) would like to argue that, subjectively, I should not be "surprised" to be one of the few versions of me that see experimental verification of the Born Rule, but I am still uncomfortable about this. (The use of "most" above implies adopting a measure, and that is the root of all problems here.)

I hope this helps -- all I've done in the above paragraphs is recapitulate the paper you already read!

[ME] The "subjective" nature of probability is because the theory is actually deterministic. (Einstein would have liked it, except for the many branches in the wavefunction.)

Let's suppose you live in a deterministic world and are about to flip a coin. You assign a probability to the outcome because you don't know what it will be. In secret, the outcome is already determined. To you, the process appears probabilistic, but really it is not. That is actually how MW works, but this is not widely appreciated. See esp. eqn 4 and figure in my paper.

Copenhagen is not logically complete because it does not explain how QM applies to the particles in the observer (which is always treated classically). Collapse theories have different physical predictions than MW because collapse is not unitary.

[R] Without going into the details, it seems absolutely clear to me that the main protagonists of Copenhagen, Heisenberg, Pauli, Bohr etc. did not believe that there is some explicit, QM-violating collapse mechanism. Do u agree?

[ME] I can't read the minds of the ancients. The only clear formulation is that of von Neumann, and there a measurement outcome requires collapse = non-unitary projection.

[R] A lack of free will is actually also the way out of Bell for Gerard (t'Hooft), and he convinced me that the idea is not so crazy at all. I don't know why this loophole got so little attention in Bell experiments. What is your take?

[ME] ... it is funny that everyone (physicists should know better) assumes a priori that we have free will. For example, the Free Will Theorem guys (admittedly, they are only mathematicians ;-) take it for granted.

... Strangely, not many people understand how MWI evades Bell without non-locality. There are a couple of papers on this but they are not well appreciated. Actually the result is kind of trivial.

... MW has no problem with Bell's inequality because MW reproduces [see footnote #] the experimental predictions of the CI (Conventional or Copenhagen or Collapse Interpretation). An experimenter in a MW universe will not observe violation of Bell's inequality, or of the GHZ prediction, etc.

Does this mean that MW avoids non-locality? That depends on what you mean by non-locality (I imagine this is relevant to your H-D anecdote). On the one hand the Hamiltonian is local and the evolution of Psi is deterministic, so from that perspective there is obviously nothing non-local going on: Psi(x,t) only affects Psi(x',t') if (x',t') is in the forward lightcone of (x,t). From other perspectives one can speak of "non-local correlations" or influences, but I find this to be simply creating mystery where there is none.

More succinctly, in a deterministic theory with a local evolution equation (Schrodinger equation with local Hamiltonian), there cannot be any non-locality. Just think about the wave equation.

# The exception is macroscopic interference experiments as proposed by Deutsch that can tell the difference between reversible (unitary) and irreversible (collapse) theories. But these experiments are not yet technically feasible.

[R] No sorry, I must think beyond "just the wave equation". I must think about "result of a measurement" when facing the Bell trouble.

[ME] The great beauty of decoherence and MW is that it takes the mystery out of "measurement" and shows it to simply result from the unitary evolution of the wavefunction. There is no mystery and, indeed, everything is governed by a causal wave-like equation (Schrodinger equation).

Rather than belabor this further I will refer you to more detailed treatments like the ones below:

The EPR paradox, Bell’s inequality, and the question of locality, Am. J. Phys. 78 1 , January 2010.

[Reference 36] Our explanation of the many-worlds interpretation branching in the text follows similar descriptions by Don N. Page, “The Einstein–Podolsky–Rosen physical reality is completely described by quantum mechanics,” Phys. Lett. A 91, 57–60 (1982), [Inspec] [ISI] Michael Clive Price, “The Everett FAQ,” www.hedweb.com/manworld.htm, and C. Hewitt-Horsman and V. Vedral, “Entanglement without nonlocality,” Phys. Rev. A 76, 062319-1–8 (2007).

... As I said, "non-locality" must be defined carefully. Even standard QFT can appear "non-local" to the foolish (positrons go backwards in time!). Recall that MW is the most "realistic" of all QM interpretations -- Psi contains all information (including about what is happening in a given mind, the process of measurement, etc.), and Psi evolves entirely causally in spacetime. So any mystery about this is manufactured. In the papers linked to above you can track exactly what happens in an EPR/Bell experiment in MW and see that everything is local; but the result is trivial from the beginning if you grasp the points I made above.

Monday, August 06, 2012

Curiosity has landed

The crazy sky crane landing worked!

First images of Mars from the nuclear powered rover. Check out the laser armed robot's menacing shadow :-)

Geeks exuberant!

Landing in progress, captured by MRO (Mars Reconnaissance Orbiter).

Sunday, August 05, 2012

Bolt, again!

From 2008: Phelps, shmelps -- Bolt is the man!

That was a pretty impressive final: 9.63, 9.75, 9.79, 9.80

Phelps may be a 1 in 100 million talent (maybe not), but Bolt is 1 in a billion and possibly 1 in 10 billion.

Friday, August 03, 2012

Correlation, Causation and Personality

A new paper from my collaborator James Lee. Ungated copy here, including commentary from other researchers including Judea Pearl.

Correlation and Causation in the Study of Personality

Abstract: Personality psychology aims to explain the causes and the consequences of variation in behavioural traits. Because of the observational nature of the pertinent data, this endeavour has provoked many controversies. In recent years, the computer scientist Judea Pearl has used a graphical approach to extend the innovations in causal inference developed by Ronald Fisher and Sewall Wright. Besides shedding much light on the philosophical notion of causality itself, this graphical framework now contains many powerful concepts of relevance to the controversies just mentioned. In this article, some of these concepts are applied to areas of personality research where questions of causation arise, including the analysis of observational data and the genetic sources of individual differences.

From the conclusions:

... This article is in part an effort to unify the contributions of three innovators in causal reasoning: Ronald Fisher, Sewall Wright, and Judea Pearl

Fisher began his career at a time when the distinction between correlation and causation was poorly understood and indeed scorned by leading intellectuals. Nevertheless, he persisted in valuing this distinction. This led to his insight that randomization of the putative cause—whether by the deliberate introduction of ‘error’, as his biologist colleagues thought of it, or ‘beautifully . . . by the meiotic process’—in fact reveals more than it obscures. His subsequent introduction of the average excess and average effect is perhaps the first explicit use of the distinction between correlation and causation in any formal scientific theory.

Structural equation modelers will know Wright—Fisher’s great rival in population genetics—as the ingenious inventor of path analysis. Wright’s diagrammatic approach to cause and effect serves as a conceptual bridge toward Pearl’s graphical formalization, which has greatly extended the innovations developed by both of the population-genetic pioneers.

The fruitfulness of Pearl’s graphical framework when applied to the problems discussed in this article bear out its utility to personality psychology. Perhaps the most surprising instance of the theory’s fruitfulness concerns the role of colliders. Although obscure before Pearl’s seminal work, this role turns out to be obvious in retrospect and a great aid to the understanding of covariate choice, assortative mating, selection bias, and a myriad of other seemingly unrelated problems. This article has surely only scratched the surface of the ramifications following from our recognition of colliders.

Conspicuous from these accolades by his absence is Charles Spearman—the inventor of factor analysis and thereby a founder of personality psychology. Spearman (1927) did conceive of his g factor as a hidden causal force. However, new and brilliant ideas are often only partially understood, even by their authors. After a century of theoret- ical scrutiny and empirical applications, common factors appear to be more plausibly defended as mild formalizations of folk-psychological terms than as causal forces uncovered by matrix algebra. I have thus advocated a sharp distinction between the measurement of personality traits (factor analysis) and the study of their causal relations (graphical SEM).

... The puzzle is that by using common factors in our causal explanations, we seem to be retreating from this reductionistic approach. A single node called g sending an arrow to a single node called liberalism is surely an approximation to the true and extraordinarily more complicated graph entangling the various physical mechanisms that underlie mental characteristics. Why this compromise? Is it sensible to test models of ethereal emergent properties shoving and being shoved by corporeal bits of matter—or, perhaps even worse, by other emergent properties? If we are committing to a calculus of causation, should we not also discard the convenient fictions of folk psychology?

The answer to this puzzle may be that reductionistic decomposition is not always the royal road to scientific understanding. ... [[In physics we refer to "effective descriptions" or "effective degrees of freedom" appropriate to a particular scale of dynamics or organization -- no need to invoke quarks to explain the mechanics of a baseball.]]

From the author's response to commentary:

... It was the genius of Darwin to realize the power of explanation (4): phenotypes and environments cohere in such an uncanny way because nature is a statistician who has allowed only a subset of the logically possible combinations to persist over time.

Although phenotypes are what nature selects, it cannot be phenotypes alone that preserve the record of natural selection. Phenotypes typically lack the property that variations in them are replicated with high fidelity across an indefinite number of generations. DNA, however, does have this property— hence the memorable phrase “the immortal replicator” (Dawkins, 1976). If DNA is furthermore causally efficacious, such that the possession of one variant rather than another has phenotypic consequences that are reasonably robust, then we have the potential for natural selection to bring about a lasting correlation between environmental demands and the causes of adaptation to those very same demands.

When statistically controlling fitness, nature does not actually use the [causal] average effect of any allele. If an allele has a positive average excess in [is correlated with] fitness, for any reason whatsoever, it will tend to displace its alternatives. Nevertheless, it seems to be the case that nature correctly picks out alleles for their effects often enough; the results are evident in the living world all around us. Davey Smith and I are confident that where nature has succeeded, patient and ingenious human scientists will be able to follow.

For more on Judea Pearl's work, see the earlier post: Beyond Bayes: causality vs correlation.

Thursday, August 02, 2012

$440M in 45 minutes

Does this qualify as the most expensive software bug of all time? Raises concerns about our future as passengers in driverless vehicles ;-)

I suppose it's a positive that Knight had to recognize the losses immediately, instead of sweeping them under the rug by adjusting a parameter in a risk model (see, e.g., JP Morgan whale + a million other recent examples). Would Knight have lost even more money if the exchange hadn't shut down trading in the affected names?

NYTimes: $10 million a minute.

That’s about how much the trading problem that set off turmoil on the stock market on Wednesday morning is already costing the trading firm.

The Knight Capital Group announced on Thursday that it lost $440 million when it sold all the stocks it accidentally bought Wednesday morning because a computer glitch. ...

The problem on Wednesday led the firm’s computers to rapidly buy and sell millions of shares in over a hundred stocks for about 45 minutes after the markets opened. Those trades pushed the value of many stocks up, and the company’s losses appear to have occurred when it had to sell the overvalued shares back into the market at a lower price.

The company said the problems happened because of new trading software that had been installed. The event was the latest to draw attention to the potentially destabilizing affect of the computerized trading that has increasingly dominated the nation’s stock markets.

This says it all. Previous posts on high frequency trading.

Update: My representative is on the job!

NYTimes: ... Some critics of the current market structure have said that much bolder reform is needed. One change that has been contemplated is a financial transaction tax, which would force firms to pay a small levy on each trade. At the right level, this could pare back high-frequency trading without undermining other types, supporters say.

“It would benefit investors because there would be less volatility in the market,” said Representative Peter DeFazio, a Democrat of Oregon. He introduced a bill containing a financial transaction tax last year.

Opponents of such a levy say that it could hurt the markets and even make it more expensive for companies to raise capital.

“I would be very concerned about unintended consequences,” said Mr. Sauter.

But Representative DeFazio, who favors a levy of three-hundredths of a percentage point on each trade, says he thinks the benefits of high-frequency trading are overstated. “Some people say it’s necessary for liquidity, but somehow we built the strongest industrial nation on earth without algorithmic trading,” he said.

Information Processing

About Me