Physicist, Startup Founder, Blogger, Dad

Thursday, August 11, 2011

The rise of data science

See also this follow up article from O'Reilly Radar, and the earlier post Exuberant geeks.

What is data science: ... Data science requires skills ranging from traditional computer science to mathematics to art. Describing the data science group he put together at Facebook (possibly the first data science group at a consumer-oriented web property), Jeff Hammerbacher said:

"... on any given day, a team member could author a multistage processing pipeline in Python, design a hypothesis test, perform a regression analysis over data samples with R, design and implement an algorithm for some data-intensive product or service in Hadoop, or communicate the results of our analyses to other members of the organization."

Where do you find the people this versatile? According to DJ Patil, chief scientist at LinkedIn (@dpatil), the best data scientists tend to be "hard scientists," particularly physicists, rather than computer science majors. Physicists have a strong mathematical background, computing skills, and come from a discipline in which survival depends on getting the most from the data. They have to think about the big picture, the big problem. When you've just spent a lot of grant money generating data, you can't just throw the data out if it isn't as clean as you'd like. You have to make it tell its story. You need some creativity for when the story the data is telling isn't what you think it's telling.

... Entrepreneurship is another piece of the puzzle. Patil's first flippant answer to "what kind of person are you looking for when you hire a data scientist?" was "someone you would start a company with." That's an important insight: we're entering the era of products that are built on data. We don't yet know what those products are, but we do know that the winners will be the people, and the companies, that find those products. Hilary Mason came to the same conclusion. Her job as scientist at bit.ly is really to investigate the data that bit.ly is generating, and find out how to build interesting products from it. No one in the nascent data industry is trying to build the 2012 Nissan Stanza or Office 2015; they're all trying to find new products. In addition to being physicists, mathematicians, programmers, and artists, they're entrepreneurs.

Data scientists combine entrepreneurship with patience, the willingness to build data products incrementally, the ability to explore, and the ability to iterate over a solution. They are inherently interdiscplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions. They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: "here's a lot of data, what can you make from it?"

The future belongs to the companies who figure out how to collect and use data successfully. Google, Amazon, Facebook, and LinkedIn have all tapped into their datastreams and made that the core of their success. They were the vanguard, but newer companies like bit.ly are following their path. Whether it's mining your personal biology, building maps from the shared experience of millions of travellers, or studying the URLs that people pass to others, the next generation of successful businesses will be built around data.

Here is a nice talk on machine learning and data science by Hilary Mason of bit.ly. One of my students will be working with her starting in the fall.


bigdata said...

You mean Data Alchemy?

5371 said...

Another depressing glance at an economy focused on pimping rather than producing.

LondonYoung said...

5371 - would you like to figure out how to produce cars like Mercedes and BMW in the USA?  Please help ...  Or plastic toys at prices competitive with China?  We are all ears ....

MtMoru said...

"Search results are an obvious example"

But it's paid for with advertising.

MtMoru said...

"We are all ears ...."

Is that the royal we?

One of the costs of automation is unemployment or pseudo-employment. More Americans are employed in hospitality and tourism than in manufacturing. Tasteless LY chears.

Neo-liberal ideology insists that everyone "contribute". The result is 20% of the labor force produces and 80% steals.

Jake said...

Took the words right out of my mouth. The whole time I was reading this I was thinking about Steve Jobs's old, circa-1997 WWDC talk where he's talking about how much more progress there is to be made in tech product development even without resorting to exotic R&D. 

5371 said...

The economy isn't like physical reality. When you exchange your classical theory for a quantum theory, the universe doesn't change. But when a society adopts a concept of value that is completely detached from the production of commodities by means of commodities, that directly influences its economic organisation.

edwin kim said...

I cannot help noticing that Jeff Hammerbacher was actually a math  and not physics major at Harvard.

Blog Archive