Aaron Swartz: I first met Jimbo Wales, the face of Wikipedia, when he came to speak at Stanford. Wales told us about Wikipedia’s history, technology, and culture, but one thing he said stands out. “The idea that a lot of people have of Wikipedia,” he noted, “is that it’s some emergent phenomenon — the wisdom of mobs, swarm intelligence, that sort of thing — thousands and thousands of individual users each adding a little bit of content and out of this emerges a coherent body of work.”† But, he insisted, the truth was rather different: Wikipedia was actually written by “a community … a dedicated group of a few hundred volunteers” where “I know all of them and they all know each other”. Really, “it’s much like any traditional organization.”
The difference, of course, is crucial. Not just for the public, who wants to know how a grand thing like Wikipedia actually gets written, but also for Wales, who wants to know how to run the site. “For me this is really important, because I spend a lot of time listening to those four or five hundred and if … those people were just a bunch of people talking … maybe I can just safely ignore them when setting policy” and instead worry about “the million people writing a sentence each”.
So did the Gang of 500 actually write Wikipedia? Wales decided to run a simple study to find out: he counted who made the most edits to the site. “I expected to find something like an 80-20 rule: 80% of the work being done by 20% of the users, just because that seems to come up a lot. But it’s actually much, much tighter than that: it turns out over 50% of all the edits are done by just .7% of the users … 524 people. … And in fact the most active 2%, which is 1400 people, have done 73.4% of all the edits.” The remaining 25% of edits, he said, were from “people who [are] contributing … a minor change of a fact or a minor spelling fix … or something like that.” ...
[But what if we analyze the amount of text contributed by each person, not just the number of edits? See original for analysis of edit patterns of specific articles, including amount of text added.]
... When you put it all together, the story become clear: an outsider makes one edit to add a chunk of information, then insiders make several edits tweaking and reformatting it. In addition, insiders rack up thousands of edits doing things like changing the name of a category across the entire site — the kind of thing only insiders deeply care about. As a result, insiders account for the vast majority of the edits. But it’s the outsiders who provide nearly all of the content.
And when you think about it, this makes perfect sense. Writing an encyclopedia is hard. To do anywhere near a decent job, you have to know a great deal of information about an incredibly wide variety of subjects. Writing so much text is difficult, but doing all the background research seems impossible.
On the other hand, everyone has a bunch of obscure things that, for one reason or another, they’ve come to know well. So they share them, clicking the edit link and adding a paragraph or two to Wikipedia. At the same time, a small number of people have become particularly involved in Wikipedia itself, learning its policies and special syntax, and spending their time tweaking the contributions of everybody else.
Pessimism of the Intellect, Optimism of the Will Favorite posts | Manifold podcast | Twitter: @hsu_steve
Thursday, January 07, 2010
Wikipedia: emergent phenomenon?
Is Wikipedia a magical aggregator and filter of expertise from millions of different contributors? Or is it more like traditional encyclopedia projects, with a thousand or so core Wikipedians doing most of the work? The distribution of edits (a typical power law) supports the latter interpretation, but a detailed analysis of particular articles shows that important knowledge is injected by individuals who are not part of the core group.
Labels:
complexity,
internet
Subscribe to:
Post Comments (Atom)
Blog Archive
-
▼
2010
(234)
-
▼
01
(23)
- Money talks
- Time After Time
- So long, Howard Zinn...
- 100 cents on the dollar
- Dyson video
- Raghuram Rajan and the view from a financier
- Art, Politics, Judaism, and The Mind of David Mamet
- Aurora uses Chinese error-checking algorithm?
- Vive les Grandes Ecoles (AIG-Goldman edition)
- Contested Modernity
- Chicago School interviews
- "Aurora" doesn't sound very Chinese
- Google dead in China?
- What's up with Google and China?
- The killing of a physicist in Tehran
- Labor arbitrage
- The Chicago School and the financial crisis
- Wikipedia: emergent phenomenon?
- Mixergy interview
- Personnel selection
- Tsinghua uber alles
- Adjustable spectacles
- Happy New Year!
-
▼
01
(23)
Labels
- physics (420)
- genetics (325)
- globalization (301)
- genomics (295)
- technology (282)
- brainpower (280)
- finance (275)
- american society (261)
- China (249)
- innovation (231)
- ai (206)
- economics (202)
- psychometrics (190)
- science (172)
- psychology (169)
- machine learning (166)
- biology (163)
- photos (162)
- genetic engineering (150)
- universities (150)
- travel (144)
- podcasts (143)
- higher education (141)
- startups (139)
- human capital (127)
- geopolitics (124)
- credit crisis (115)
- political correctness (108)
- iq (107)
- quantum mechanics (107)
- cognitive science (103)
- autobiographical (97)
- politics (93)
- careers (90)
- bounded rationality (88)
- social science (86)
- history of science (85)
- realpolitik (85)
- statistics (83)
- elitism (81)
- talks (80)
- evolution (79)
- credit crunch (78)
- biotech (76)
- genius (76)
- gilded age (73)
- income inequality (73)
- caltech (68)
- books (64)
- academia (62)
- history (61)
- intellectual history (61)
- MSU (60)
- sci fi (60)
- harvard (58)
- silicon valley (58)
- mma (57)
- mathematics (55)
- education (53)
- video (52)
- kids (51)
- bgi (48)
- black holes (48)
- cdo (45)
- derivatives (43)
- neuroscience (43)
- affirmative action (42)
- behavioral economics (42)
- economic history (42)
- literature (42)
- nuclear weapons (42)
- computing (41)
- jiujitsu (41)
- physical training (40)
- film (39)
- many worlds (39)
- quantum field theory (39)
- expert prediction (37)
- ufc (37)
- bjj (36)
- bubbles (36)
- mortgages (36)
- google (35)
- race relations (35)
- hedge funds (34)
- security (34)
- von Neumann (34)
- meritocracy (31)
- feynman (30)
- quants (30)
- taiwan (30)
- efficient markets (29)
- foo camp (29)
- movies (29)
- sports (29)
- music (28)
- singularity (27)
- entrepreneurs (26)
- conferences (25)
- housing (25)
- obama (25)
- subprime (25)
- venture capital (25)
- berkeley (24)
- epidemics (24)
- war (24)
- wall street (23)
- athletics (22)
- russia (22)
- ultimate fighting (22)
- cds (20)
- internet (20)
- new yorker (20)
- blogging (19)
- japan (19)
- scifoo (19)
- christmas (18)
- dna (18)
- gender (18)
- goldman sachs (18)
- university of oregon (18)
- cold war (17)
- cryptography (17)
- freeman dyson (17)
- smpy (17)
- treasury bailout (17)
- algorithms (16)
- autism (16)
- personality (16)
- privacy (16)
- Fermi problems (15)
- cosmology (15)
- happiness (15)
- height (15)
- india (15)
- oppenheimer (15)
- probability (15)
- social networks (15)
- wwii (15)
- fitness (14)
- government (14)
- les grandes ecoles (14)
- neanderthals (14)
- quantum computers (14)
- blade runner (13)
- chess (13)
- hedonic treadmill (13)
- nsa (13)
- philosophy of mind (13)
- research (13)
- aspergers (12)
- climate change (12)
- harvard society of fellows (12)
- malcolm gladwell (12)
- net worth (12)
- nobel prize (12)
- pseudoscience (12)
- Einstein (11)
- art (11)
- democracy (11)
- entropy (11)
- geeks (11)
- string theory (11)
- television (11)
- Go (10)
- ability (10)
- complexity (10)
- dating (10)
- energy (10)
- football (10)
- france (10)
- italy (10)
- mutants (10)
- nerds (10)
- olympics (10)
- pop culture (10)
- crossfit (9)
- encryption (9)
- eugene (9)
- flynn effect (9)
- james salter (9)
- simulation (9)
- tail risk (9)
- turing test (9)
- alan turing (8)
- alpha (8)
- ashkenazim (8)
- data mining (8)
- determinism (8)
- environmentalism (8)
- games (8)
- keynes (8)
- manhattan (8)
- new york times (8)
- pca (8)
- philip k. dick (8)
- qcd (8)
- real estate (8)
- robot genius (8)
- success (8)
- usain bolt (8)
- Iran (7)
- aig (7)
- basketball (7)
- free will (7)
- fx (7)
- game theory (7)
- hugh everett (7)
- inequality (7)
- information theory (7)
- iraq war (7)
- markets (7)
- paris (7)
- patents (7)
- poker (7)
- teaching (7)
- vietnam war (7)
- volatility (7)
- anthropic principle (6)
- bayes (6)
- class (6)
- drones (6)
- econtalk (6)
- empire (6)
- global warming (6)
- godel (6)
- intellectual property (6)
- nassim taleb (6)
- noam chomsky (6)
- prostitution (6)
- rationality (6)
- academia sinica (5)
- bobby fischer (5)
- demographics (5)
- fake alpha (5)
- kasparov (5)
- luck (5)
- nonlinearity (5)
- perimeter institute (5)
- renaissance technologies (5)
- sad but true (5)
- software development (5)
- solar energy (5)
- warren buffet (5)
- 100m (4)
- Poincare (4)
- assortative mating (4)
- bill gates (4)
- borges (4)
- cambridge uk (4)
- censorship (4)
- charles darwin (4)
- computers (4)
- creativity (4)
- hormones (4)
- humor (4)
- judo (4)
- kerviel (4)
- microsoft (4)
- mixed martial arts (4)
- monsters (4)
- moore's law (4)
- soros (4)
- supercomputers (4)
- trento (4)
- 200m (3)
- babies (3)
- brain drain (3)
- charlie munger (3)
- cheng ting hsu (3)
- chet baker (3)
- correlation (3)
- ecosystems (3)
- equity risk premium (3)
- facebook (3)
- fannie (3)
- feminism (3)
- fst (3)
- intellectual ventures (3)
- jim simons (3)
- language (3)
- lee kwan yew (3)
- lewontin fallacy (3)
- lhc (3)
- magic (3)
- michael lewis (3)
- mit (3)
- nathan myhrvold (3)
- neal stephenson (3)
- olympiads (3)
- path integrals (3)
- risk preference (3)
- search (3)
- sec (3)
- sivs (3)
- society generale (3)
- systemic risk (3)
- thailand (3)
- twitter (3)
- alibaba (2)
- bear stearns (2)
- bruce springsteen (2)
- charles babbage (2)
- cloning (2)
- david mamet (2)
- digital books (2)
- donald mackenzie (2)
- drugs (2)
- dune (2)
- exchange rates (2)
- frauds (2)
- freddie (2)
- gaussian copula (2)
- heinlein (2)
- industrial revolution (2)
- james watson (2)
- ltcm (2)
- mating (2)
- mba (2)
- mccain (2)
- monkeys (2)
- national character (2)
- nicholas metropolis (2)
- no holds barred (2)
- offices (2)
- oligarchs (2)
- palin (2)
- population structure (2)
- prisoner's dilemma (2)
- singapore (2)
- skidelsky (2)
- socgen (2)
- sprints (2)
- star wars (2)
- ussr (2)
- variance (2)
- virtual reality (2)
- war nerd (2)
- abx (1)
- anathem (1)
- andrew lo (1)
- antikythera mechanism (1)
- athens (1)
- atlas shrugged (1)
- ayn rand (1)
- bay area (1)
- beats (1)
- book search (1)
- bunnie huang (1)
- car dealers (1)
- carlos slim (1)
- catastrophe bonds (1)
- cdos (1)
- ces 2008 (1)
- chance (1)
- children (1)
- cochran-harpending (1)
- cpi (1)
- david x. li (1)
- dick cavett (1)
- dolomites (1)
- eharmony (1)
- eliot spitzer (1)
- escorts (1)
- faces (1)
- fads (1)
- favorite posts (1)
- fiber optic cable (1)
- francis crick (1)
- gary brecher (1)
- gizmos (1)
- greece (1)
- greenspan (1)
- hypocrisy (1)
- igon value (1)
- iit (1)
- inflation (1)
- information asymmetry (1)
- iphone (1)
- jack kerouac (1)
- jaynes (1)
- jazz (1)
- jfk (1)
- john dolan (1)
- john kerry (1)
- john paulson (1)
- john searle (1)
- john tierney (1)
- jonathan littell (1)
- las vegas (1)
- lawyers (1)
- lehman auction (1)
- les bienveillantes (1)
- lowell wood (1)
- lse (1)
- machine (1)
- mcgeorge bundy (1)
- mexico (1)
- michael jackson (1)
- mickey rourke (1)
- migration (1)
- money:tech (1)
- myron scholes (1)
- netwon institute (1)
- networks (1)
- newton institute (1)
- nfl (1)
- oliver stone (1)
- phil gramm (1)
- philanthropy (1)
- philip greenspun (1)
- portfolio theory (1)
- power laws (1)
- pyschology (1)
- randomness (1)
- recession (1)
- sales (1)
- skype (1)
- standard deviation (1)
- starship troopers (1)
- students today (1)
- teleportation (1)
- tierney lab blog (1)
- tomonaga (1)
- tyler cowen (1)
- venice (1)
- violence (1)
- virtual meetings (1)
- wealth effect (1)
8 comments:
What I've always wondered is if Wikipedia articles tend to improve (i.e. be more accurate and complete) over time. It seems like many of them improve until a peak then degrade from there. Also, it seems like that pattern is especially pronounced on controversial articles.
Based on my own experience, that's exactly what I always suspected. I only edited three entries ever, on subjects where I am likely to be among ~ top 10^2 experts in the world (and I do wish that people did not touch subjects on which they know little in depth). But then, I am HTML-illiterate and I can't be bothered with a non-GUI interface to "Wiki" the edits. Someone else then took care of the formatting, sectioning and style. Those entries are all noncontroversial and the articles do improve over time (only marginally from certain point).
I also noticed that those insiders frequently start new entries even though they have very limited knowledge of the subject. I suppose this sort of "priming" can be good but as a result many articles on relatively obscure things look like a total joke.
Swartz's research is very old and his conclusions are not shared by more updated researchers. It occurred well before the 2007-2008 bump in the number of new regular editors of Wikipedia.
I highly recommend the work of PARC's Augmented Social Cognition Group as an alternative.
Their work clearly demonstrates that not only are the majority of contributions derived from a core of dedicated contributors, but that the margin between those who contribute casually and those who contribute a great deal is growing.
Steven Walling: Are you saying that an overwhelming majority of the actual Wikipedia expertise resides with Wikipedians? I would actually prefer the alternative that Nanonymous describes, in which real experts contribute their knowledge, which is then cross-checked, cleaned up, etc. by Wikipedian editors.
Your description and Nanonymous' experience could be reconciled if there are a lot of mediocre articles -- that is, the number of entries written by dedicated (but non-expert) Wikipedians dominates the number written (or contributed to) by real experts. I do notice a lot of "stub" entries on specialized, but important, subjects that seem to have been merely cribbed from a generic source.
I think you will get a laugh out of this. It isn't related to wikipedia at all though.
http://online.wsj.com/article/SB126238854939012923.html
Steve:
To ask whether the expertise comes from amateur Wikipedians or professionals is a question based on a fundamental misunderstanding of what Wikipedia is. Wikipedia is a tertiary source that depends on what published sources, not individuals, say. Yes, some poorer articles don't cite their sources well, but no fact in Wikipedia ever rests on personal expertise.
If personal expertise is not a substantial influence on the site, then asking whether the expertise comes from amateur Wikipedians or anonymous experts is pointless. As Cory Doctorow explains, "While the Britannica says, These facts are true, Wikipedia says, It is true that these facts were reported by these sources. The Britannica contains facts, Wikipedia contains facts about facts."
Swartz's research isn't about where the facts within Wikipedia come from, it's about who contributes the majority of valuable textual content - whether it's a core community of editors or a distributed, anonymous crowd. As a Wikipedia editor himself, he knows that, no matter who they are, none of knowledge added to Wikipedia comes directly from the individual making the edits.
Where more contemporary research disagrees with Swartz is in his conclusion that the majority of substantive content is added by anonymous, casual contributors rather than devoted Wikipedians who contribute regularly.
"To ask whether the expertise comes from amateur Wikipedians or professionals is a question based on a fundamental misunderstanding of what Wikipedia is. Wikipedia is a tertiary source that depends on what published sources, not individuals, say."
This is a misunderstanding of how knowledge is actually filtered and aggregated. It requires significant expertise to understand *which* references are important, *which* points are the most salient, *which* claims are correct or well-supported by facts or data. To claim that the process of assembling an encyclopedia does not rely on specialized expertise is completely incorrect. Whose expertise is being used to create the entries? is an extremely important question. Many scientific encyclopedias rely on individual contributors to cover particular topics and the quality of an entry is usually a direct reflection of the quality of the expert. (This is rather obvious to me when I look at Wikipedia, btw. I strongly suspect that crappy articles are often assembled in a low quality way by enthusiastic but non-expert core Wikipedians or outsiders who are not really experts, whereas very good articles have had outside input from real experts.)
If Wikipedia were only about "facts about facts" it would be unreadable and many entries would be 100x longer than they currently are. One cannot report all statements ever made about the big bang, even all the correct ones. An article of finite length has to rely on expert judgements about *which* facts or theories are important enough to be included (as well as which ones are correct to begin with!). This is related to the concept of compression, and many experts believe that AI is more or less equivalent to the ability to compress information. So to generate a compressed representation (i.e., encyclopedia entry) of a body of knowledge requires intelligence. The question is: whose intelligence? That of 1000 Wikipedians, or some larger set, with crucial contributions from outside?
From the reading I've done and my experience as a Wikipedian, the answer to your question is it's both, depending on how you look at it.
Statistical studies I trust show unambiguously that the majority of significant additive edits to Wikipedia are the product of a dedicated community of a few thousand people.
The degree to which those high frequency editors are professionals/experts varies. If you're talking about very technical subjects, maths and sciences, there are more experts. But those are generally exceptions when you look at Wikipedia as a whole. On average Wikipedians are passionate, knowledge-hungry amateurs who write about something to learn as much as to share what they already know.
But regardless of the subject and whether the editor is a professional or not, Wikipedia relies on verification from outside expert source material. If we want to talk about crucial contributions from those who consider themselves outside the community, I would say that's where it's at, rather than in actual edits to the site.
Post a Comment