Saturday, August 31, 2013

"Hadoop wildly overhyped" and other database arcana

Michael Stonebreaker talk on the current state of database architecture and technology, entitled The Traditional RDBMS Wisdom is (Almost Certainly) All Wrong. This field was pretty dead for a long time, but big changes are afoot now. Readers might be familiar with Streambase, one of his startups.

A friend summarizes the talk as follows
1. Relational databases are going through large transformations (not just as a threat from the non relational no-sql DBs, but the internal model that Oracle, MySQL itself use is breaking apart and need to be rewritten).

2. The current relational model has effectively already broken apart into two solutions (in large companies)- Huge datasets for after the fact statistical analysis (ie- Walmart trying to analyze a recent sale, etc) have already moved over to column grouped data (as opposed to row records), which lets you stream through a given column of data quickly. Real time data on the other hand, is kept at the terabyte range, and the whole database is loaded into memory. Although this speeds up things significantly, the next bottlenecks appear around thread locking, etc- Fixing this is an area of current research, but the speedup gains are ultimately orders of magnitude faster than the old relational database model, and so will probably all change soon.

3. The speaker described Hadoop as being terrible at everything except embarrassingly parallel multi machine computations. He mentioned that Google itself doesn't use mapreduce anymore and is probably amused at all the attention that it is getting in the world.

There were some other interesting points in the talk, but the main takeaway is that we probably are in for some huge changes in the next decade.

4 comments:

David Coughlin said...

I frequently ask myself, "How brittle is it going to be if I lock it all in RAM?"

sudheer 1414 said...

Thanks for providing the best information it's very useful for HADOOP learners.123trainings also provide the bestHadoop Online Training you can see free demo Hadoop Online Training Demo in Hyderabad India

oregonlocal said...

The purpose of an RDBMS is to allow multiple processes/persons to perform set operations on data simultaneously. This need is not going to disappear. Also, most non-relational physical implementations could stand a good dose of normalization at the logical level before they are implemented..

Sundara Rami Reddy said...

Hi,its really nice post on hadoop. i appreciate for your post. thanks for shearing it with us. keep it up.

Hadoop Training in
hyderabad

Blog Archive

Labels