Monday, August 20, 2012

Genomic prediction: no bull

This Atlantic article discusses the application of genomic prediction to cattle breeding. Breeders have recently started switching from pedigree based methods to statistical models incorporating SNP genotypes. We can now make good predictions of phenotypes like milk and meat production using genetic data alone. Cows are easier than people because, as domesticated animals, they have smaller effective breeding population and less genetic diversity. Nevertheless, I expect very similar methods to be applied to humans within the next 5-10 years.
The Atlantic: ... the semen that Badger-Bluff Fanny Freddie produces has become such a hot commodity in what one artificial-insemination company calls "today's fast paced cattle semen market." In January of 2009, before he had a single daughter producing milk, the United States Department of Agriculture took a look at his lineage and more than 50,000 markers on his genome and declared him the best bull in the land. And, three years and 346 milk- and data-providing daughters later, it turns out that they were right. "When Freddie [as he is known] had no daughter records our equations predicted from his DNA that he would be the best bull," USDA research geneticist Paul VanRaden emailed me with a detectable hint of pride. "Now he is the best progeny tested bull (as predicted)." 
Data-driven predictions are responsible for a massive transformation of America's dairy cows. While other industries are just catching on to this whole "big data" thing, the animal sciences -- and dairy breeding in particular -- have been using large amounts of data since long before VanRaden was calculating the outsized genetic impact of the most sought-after bulls with a pencil and paper in the 1980s. 
Dairy breeding is perfect for quantitative analysis. Pedigree records have been assiduously kept; relatively easy artificial insemination has helped centralized genetic information in a small number of key bulls since the 1960s; there are a relatively small and easily measurable number of traits -- milk production, fat in the milk, protein in the milk, longevity, udder quality -- that breeders want to optimize; each cow works for three or four years, which means that farmers invest thousands of dollars into each animal, so it's worth it to get the best semen money can buy. The economics push breeders to use the genetics. 
The bull market (heh) can be reduced to one key statistic, lifetime net merit, though there are many nuances that the single number cannot capture. Net merit denotes the likely additive value of a bull's genetics. The number is actually denominated in dollars because it is an estimate of how much a bull's genetic material will likely improve the revenue from a given cow. A very complicated equation weights all of the factors that go into dairy breeding and -- voila -- you come out with this single number. For example, a bull that could help a cow make an extra 1000 pounds of milk over her lifetime only gets an increase of $1 in net merit while a bull who will help that same cow produce a pound more protein will get $3.41 more in net merit. An increase of a single month of predicted productive life yields $35 more. 
When you add it all up, Badger-Fluff Fanny Freddie has a net merit of $792. No other proven sire ranks above $750 and only seven bulls in the country rank above $700.
See below -- theoretical calculations suggest that even outliers with net merit of $700-800 will be eclipsed by specimens with 10x higher merit that can be produced by further selection on existing genetic variation. Similar results apply to humans.
... It turned out they were in the perfect spot to look for statistical rules. They had databases of old and new bull semen. They had old and new production data. In essence, it wasn't that difficult to generate rules fortransforming genomic data into real-world predictions. Despite -- or because of -- the effectiveness of traditional breeding techniques, molecular biology has been applied in the field for years in different ways. Given that breeders were trying to discover bulls' hidden genetic profiles by evaluating the traits in their offspring that could be measured, it just made sense to start generating direct data about the animals' genomes. 
"Each of the bulls on the sire list, we have 50,000 genetic markers. Most of those, we have 700,000," the USDA's VanRaden said. "Every month we get another 12,000 new calves, the DNA readings come in and we send the predictions out. We have a total of 200,000 animals with DNA analysis. That's why it's been so easy. We had such a good phenotype file and we had DNA stored on all these bulls."
... Nowadays breeders can choose between "genomic bulls," which have been evaluated based purely on their genes and "proven bulls," for which real world data is available. Discussions among dairy breeders show that many are beginning to mix in younger bulls with good-looking genomic data into the breeding regimens. How well has it gone? The first of the bulls who were bred from their genetic profiles alone, are receiving their initial production data. So far, it seems as if the genomic estimates were a little high, but more accurate than traditional methods alone.
The unique dataset and success of dairy breeders now has other scientists sniffing around their findings. Leonid Kruglyak, a genomics professor at Princeton, told me that "a lot of the statistical techniques and methodology" that connect phenotype and genotype were developed by animal breeders. In a sense, they are like codebreakers. If you know the rules of encoding. it's not difficult to put information in one end and have it pop out the other as a code. But if you're starting with the code, that's a brutally difficult problem. And it's the one that diary geneticists have been working on.
(Kruglyak was a graduate student in biophysics at Berkeley under Bill Bialek when I was there.)
... John Cole, yet another USDA animal improvement scientist, generated an estimate of the perfect bull by choosing the optimal observed genetic sequences and hypothetically combining them. He found that the optimal bull would have a net merit value of $7,515, which absolutely blows any current bull out of the water. In other words, we're nowhere near creating the perfect milk machine.  
Here's a recent paper on the big data aspects of genomic selection applied to animal breeding.


Richard Seiter said...

Very interesting (I need to read this more carefully). It is a bit entertaining that they give the bull optimal net merit estimate to four significant digits.

5371 said...

So what are the human equivalents of milk, fat, protein and udder quality?

Adam Sears said...

This was fascinating... I wonder what an Amazon Engineer would say about their techniques? Is there industry crossover with Hadoop et al.?

MtMoru said...

Again I will wail om a dead horse:

Those cattle strains producing the most meat or milk in the US or UK, will they produce the most in Chihuahua, the Great Karoo, or Jamaica?

botti said...

In terms of genetic engineering an Oxford ethics Professor has an article in Readers Digest suggesting there is a moral duty to do so (if the technology permits).

Most respondents to this Daily Telegraph poll are against the idea of screening for behavioral traits.

stevesailer said...

How does thoroughbred breeding compare for scientific sophistication? Last I checked it seemed like horse breeders still had a lot of rules of thumb rather data-driven models. Thoroughbreds don't seem to be getting any faster. Michael Phelps is a lot faster than Mark Spitz, but current thoroughbreds aren't faster than Secretariat.

Richard Seiter said...

Good question. Here is a paper that does some analysis of the difference you are observing:

Blog Archive