Last week we had Jiawei Han of UIUC here to give a talk: Exploring the Power of Links in Information Network Mining. He's the author of a well-known book on data mining.
During our conversation we discussed a number of projects his group has worked on in the past, all of which involve teasing out the structure in large bodies of data. Being a lazy theorist, my attitude in the past about data mining has been as follows: sit and think about the problem, come up with list of potential signals, analyze data to see which signals actually work. The point being that the good signals would turn out to be a subset (or possibly combination) of the ones you could think of a priori -- i.e., for which there is a plausible, human-comprehensible, reason.
In many of the examples we discussed I was able to guess the main signals that turned out to be useful. However, Han impressed on me that, these days, with gigantic corpora of data available, one often encounters very subtle signals that are identified only by algorithm -- that human intuition completely fails to identify. (Gee, why that weird linear combination of those inputs, with alternating signs, even?! :-)
Our conversation made me want to get my hands dirty on some big data mining project. Of course, it's much easier for him -- his group has something like ten graduate students at a time! Interestingly, he identified this ability to tap into large chunks of manpower as an advantage of being in academia as opposed to, e.g., at Microsoft Research. Of course, if you are doing very commercially applicable research you can access even greater resources at a company lab/startup, but for blue sky academic work it wouldn't be the case.
Pessimism of the Intellect, Optimism of the Will Favorite posts | Manifold podcast | Twitter: @hsu_steve
Sunday, May 11, 2008
Subscribe to:
Post Comments (Atom)
Blog Archive
-
▼
2008
(255)
-
▼
05
(21)
- Back and jetlagged!
- Le Louvre
- Paris conference on black hole information
- Obama's Silicon Valley money machine
- Vive la France!
- Confessions of an economist
- Obama in Oregon
- Books
- The big dog
- Nuclear weapons sites in China's earthquake zone
- Elbow strikes
- A bankers' utopia
- On data mining
- Conference fun
- Gladwell amongst the patent trolls
- How the other half works
- Brainpower ain't free
- Inflation, deconstructed
- Grand unification and quantum gravitational effects
- Don't become a scientist! Philip Greenspun edition
- Obama and race on campus
-
▼
05
(21)

Labels
- physics (377)
- genetics (302)
- globalization (281)
- finance (268)
- brainpower (265)
- genomics (251)
- technology (237)
- american society (230)
- China (210)
- innovation (193)
- economics (183)
- ai (181)
- psychometrics (172)
- science (166)
- photos (162)
- psychology (158)
- machine learning (145)
- biology (142)
- travel (142)
- genetic engineering (130)
- universities (129)
- higher education (124)
- human capital (119)
- credit crisis (115)
- startups (113)
- iq (106)
- cognitive science (99)
- podcasts (98)
- autobiographical (88)
- political correctness (85)
- politics (85)
- careers (84)
- geopolitics (82)
- statistics (80)
- credit crunch (78)
- elitism (76)
- evolution (75)
- bounded rationality (74)
- quantum mechanics (74)
- gilded age (73)
- talks (72)
- income inequality (71)
- social science (71)
- genius (70)
- history of science (66)
- caltech (64)
- realpolitik (63)
- books (62)
- MSU (59)
- mma (57)
- sci fi (57)
- harvard (54)
- biotech (53)
- silicon valley (53)
- academia (51)
- mathematics (51)
- kids (50)
- education (49)
- bgi (48)
- history (48)
- intellectual history (48)
- cdo (45)
- derivatives (43)
- neuroscience (43)
- behavioral economics (41)
- jiujitsu (41)
- literature (41)
- physical training (39)
- video (38)
- computing (37)
- ufc (37)
- bjj (36)
- bubbles (36)
- film (36)
- mortgages (36)
- google (35)
- expert prediction (34)
- many worlds (34)
- affirmative action (33)
- hedge funds (33)
- economic history (32)
- nuclear weapons (31)
- race relations (31)
- security (31)
- black holes (30)
- quants (30)
- von Neumann (30)
- efficient markets (29)
- feynman (29)
- foo camp (29)
- movies (29)
- sports (29)
- music (28)
- singularity (26)
- entrepreneurs (25)
- housing (25)
- obama (25)
- subprime (25)
- berkeley (24)
- taiwan (24)
- conferences (23)
- epidemics (23)
- venture capital (23)
- athletics (22)
- meritocracy (22)
- quantum field theory (22)
- ultimate fighting (22)
- wall street (22)
- cds (20)
- internet (20)
- blogging (19)
- scifoo (19)
- gender (18)
- goldman sachs (18)
- new yorker (18)
- cryptography (17)
- dna (17)
- freeman dyson (17)
- smpy (17)
- treasury bailout (17)
- university of oregon (17)
- algorithms (16)
- japan (16)
- personality (16)
- privacy (16)
- autism (15)
- christmas (15)
- cosmology (15)
- happiness (15)
- height (15)
- oppenheimer (15)
- Fermi problems (14)
- fitness (14)
- les grandes ecoles (14)
- social networks (14)
- wwii (14)
- chess (13)
- government (13)
- hedonic treadmill (13)
- india (13)
- neanderthals (13)
- probability (13)
- russia (13)
- war (13)
- aspergers (12)
- blade runner (12)
- malcolm gladwell (12)
- net worth (12)
- nobel prize (12)
- nsa (12)
- philosophy of mind (12)
- research (12)
- Einstein (11)
- entropy (11)
- geeks (11)
- harvard society of fellows (11)
- string theory (11)
- television (11)
- Go (10)
- ability (10)
- art (10)
- climate change (10)
- cold war (10)
- football (10)
- italy (10)
- mutants (10)
- nerds (10)
- olympics (10)
- pseudoscience (10)
- complexity (9)
- crossfit (9)
- democracy (9)
- encryption (9)
- energy (9)
- eugene (9)
- flynn effect (9)
- france (9)
- james salter (9)
- pop culture (9)
- turing test (9)
- alan turing (8)
- alpha (8)
- data mining (8)
- dating (8)
- determinism (8)
- games (8)
- keynes (8)
- manhattan (8)
- pca (8)
- philip k. dick (8)
- qcd (8)
- quantum computers (8)
- real estate (8)
- robot genius (8)
- success (8)
- usain bolt (8)
- Iran (7)
- aig (7)
- ashkenazim (7)
- basketball (7)
- environmentalism (7)
- free will (7)
- fx (7)
- game theory (7)
- hugh everett (7)
- new york times (7)
- paris (7)
- patents (7)
- poker (7)
- simulation (7)
- tail risk (7)
- teaching (7)
- volatility (7)
- anthropic principle (6)
- bayes (6)
- class (6)
- drones (6)
- godel (6)
- intellectual property (6)
- markets (6)
- nassim taleb (6)
- noam chomsky (6)
- prostitution (6)
- rationality (6)
- academia sinica (5)
- bobby fischer (5)
- econtalk (5)
- fake alpha (5)
- global warming (5)
- information theory (5)
- iraq war (5)
- kasparov (5)
- luck (5)
- nonlinearity (5)
- perimeter institute (5)
- renaissance technologies (5)
- sad but true (5)
- software development (5)
- vietnam war (5)
- warren buffet (5)
- 100m (4)
- Poincare (4)
- bill gates (4)
- borges (4)
- cambridge uk (4)
- censorship (4)
- charles darwin (4)
- creativity (4)
- demographics (4)
- hormones (4)
- humor (4)
- inequality (4)
- judo (4)
- kerviel (4)
- microsoft (4)
- mixed martial arts (4)
- monsters (4)
- moore's law (4)
- solar energy (4)
- soros (4)
- trento (4)
- 200m (3)
- babies (3)
- brain drain (3)
- charlie munger (3)
- cheng ting hsu (3)
- chet baker (3)
- correlation (3)
- ecosystems (3)
- equity risk premium (3)
- facebook (3)
- fannie (3)
- feminism (3)
- fst (3)
- intellectual ventures (3)
- jim simons (3)
- language (3)
- lee kwan yew (3)
- lewontin fallacy (3)
- lhc (3)
- magic (3)
- michael lewis (3)
- nathan myhrvold (3)
- neal stephenson (3)
- olympiads (3)
- path integrals (3)
- risk preference (3)
- search (3)
- sec (3)
- sivs (3)
- society generale (3)
- supercomputers (3)
- systemic risk (3)
- thailand (3)
- alibaba (2)
- assortative mating (2)
- bear stearns (2)
- bruce springsteen (2)
- charles babbage (2)
- cloning (2)
- computers (2)
- david mamet (2)
- digital books (2)
- donald mackenzie (2)
- drugs (2)
- eliot spitzer (2)
- empire (2)
- exchange rates (2)
- frauds (2)
- freddie (2)
- gaussian copula (2)
- heinlein (2)
- industrial revolution (2)
- james watson (2)
- ltcm (2)
- mating (2)
- mba (2)
- mccain (2)
- mit (2)
- monkeys (2)
- national character (2)
- nicholas metropolis (2)
- no holds barred (2)
- offices (2)
- oligarchs (2)
- palin (2)
- population structure (2)
- prisoner's dilemma (2)
- skidelsky (2)
- socgen (2)
- sprints (2)
- twitter (2)
- ussr (2)
- variance (2)
- virtual reality (2)
- abx (1)
- anathem (1)
- andrew lo (1)
- antikythera mechanism (1)
- athens (1)
- atlas shrugged (1)
- ayn rand (1)
- bay area (1)
- beats (1)
- book search (1)
- bunnie huang (1)
- car dealers (1)
- carlos slim (1)
- catastrophe bonds (1)
- cdos (1)
- ces 2008 (1)
- chance (1)
- children (1)
- cochran-harpending (1)
- cpi (1)
- david x. li (1)
- dick cavett (1)
- dolomites (1)
- dune (1)
- eharmony (1)
- escorts (1)
- faces (1)
- fads (1)
- favorite posts (1)
- fiber optic cable (1)
- francis crick (1)
- gary brecher (1)
- gizmos (1)
- greece (1)
- greenspan (1)
- hypocrisy (1)
- igon value (1)
- iit (1)
- inflation (1)
- information asymmetry (1)
- iphone (1)
- jack kerouac (1)
- jaynes (1)
- jazz (1)
- jfk (1)
- john dolan (1)
- john kerry (1)
- john paulson (1)
- john searle (1)
- john tierney (1)
- jonathan littell (1)
- las vegas (1)
- lawyers (1)
- lehman auction (1)
- les bienveillantes (1)
- lowell wood (1)
- lse (1)
- machine (1)
- mcgeorge bundy (1)
- mexico (1)
- michael jackson (1)
- mickey rourke (1)
- migration (1)
- money:tech (1)
- myron scholes (1)
- netwon institute (1)
- networks (1)
- newton institute (1)
- nfl (1)
- oliver stone (1)
- phil gramm (1)
- philanthropy (1)
- philip greenspun (1)
- portfolio theory (1)
- power laws (1)
- pyschology (1)
- randomness (1)
- recession (1)
- sales (1)
- singapore (1)
- skype (1)
- standard deviation (1)
- star wars (1)
- starship troopers (1)
- students today (1)
- teleportation (1)
- tierney lab blog (1)
- tomonaga (1)
- tyler cowen (1)
- venice (1)
- violence (1)
- virtual meetings (1)
- war nerd (1)
- wealth effect (1)

No comments:
Post a Comment