found 2220 obsolete 1-gram headwords (“diestock”,
“alkalescent”) in AHD4. Their mean frequency declined
throughout the 20th century, and dipped below 10⁻⁹ decades
ago (Fig. 2D, Inset).
Our results suggest that culturomic tools will aid
lexicographers in at least two ways: (i) finding low-frequency
words that they do not list; and (ii) providing accurate
estimates of current frequency trends to reduce the lag
between changes in the lexicon and changes in the dictionary.
The Evolution of Grammar
Next, we examined grammatical trends. We studied the
English irregular verbs, a classic model of grammatical
change (14-17). Unlike regular verbs, whose past tense is
generated by adding –ed (jump/jumped), irregulars are
conjugated idiosyncratically (stick/stuck, come/came, get/got)
(15).
All irregular verbs coexist with regular competitors (e.g.,
“strived” and “strove”) that threaten to supplant them (Fig.
2E). High-frequency irregulars, which are more readily
remembered, hold their ground better. For instance, we found
“found” (frequency: 5x10⁻⁴) 200,000 times more often than
we finded “finded.” In contrast, “dwelt” (frequency: 1x10⁻⁵)
dwelt in our data only 60 times as often as “dwelled” dwelled.
We defined a verb’s “regularity” as the percentage of
instances in the past tense (i.e., the sum of “drived”, “drove”,
and “driven”) in which the regular form is used. Most
irregulars have been stable for the last 200 years, but 16%
underwent a change in regularity of 10% or more (Fig. 2F).
These changes occurred slowly: it took 200 years for our
fastest moving verb, “chide”, to go from 10% to 90%.
Otherwise, each trajectory was sui generis; we observed no
characteristic shape. For instance, a few verbs, like “spill”,
regularized at a constant speed, but others, such as “thrive”
and “dig”, transitioned in fits and starts (7). In some cases, the
trajectory suggested a reason for the trend. For example, with
“sped/speeded” the shift in meaning from “to move rapidly”
and towards “to exceed the legal limit” appears to have been
the driving cause (Fig. 2G).
Six verbs (burn, chide, smell, spell, spill, thrive)
regularized between 1800 and 2000 (Fig. 2F). Four are
remnants of a now-defunct phonological process that used –t
instead of –ed; they are members of a pack of irregulars that
survived by virtue of similarity (bend/bent, build/built,
burn/burnt, learn/learnt, lend/lent, rend/rent, send/sent,
smell/smelt, spell/spelt, spill/spilt, and spoil/spoilt). Verbs
have been defecting from this coalition for centuries
(wend/went, pen/pent, gird/girt, geld/gelt, and gild/gilt all
blend/blent into the dominant –ed rule). Culturomic analysis
reveals that the collapse of this alliance has been the most
significant driver of regularization in the past 200 years. The
regularization of burnt, smelt, spelt, and spilt originated in the
US; the forms still cling to life in British English (Fig. 2E,F).
But the –t irregulars may be doomed in England too: each
year, a population the size of Cambridge adopts “burned” in
lieu of “burnt.”
Though irregulars generally yield to regulars, two verbs
did the opposite: light/lit and wake/woke. Both were irregular
in Middle English, were mostly regular by 1800, and
subsequently backtracked and are irregular again today. The
fact that these verbs have been going back and forth for
nearly 500 years highlights the gradual nature of the
underlying process.
Still, there was at least one instance of rapid progress by
an irregular form. Presently, 1% of the English speaking
population switches from “sneaked” to “snuck” every year:
someone will have snuck off while you read this sentence. As
before, this trend is more prominent in the United States, but
recently sneaked across the Atlantic: America is the world’s
leading exporter of both regular and irregular verbs.
Out with the Old
Just as individuals forget the past (18, 19), so do societies
(20). To quantify this effect, we reasoned that the frequency
of 1-grams such as “1951” could be used to measure interest
in the events of the corresponding year, and created plots for
each year between 1875 and 1975.
The plots had a characteristic shape. For example, “1951”
was rarely discussed until the years immediately preceding
1951. Its frequency soared in 1951, remained high for three
years, and then underwent a rapid decay, dropping by half
over the next fifteen years. Finally, the plots enter a regime
marked by slower forgetting: collective memory has both a
short-term and a long-term component.
But there have been changes. The amplitude of the plots is
rising every year: precise dates are increasingly common.
There is also a greater focus on the present. For instance,
“1880” declined to half its peak value in 1912, a lag of 32
years. In contrast, “1973” declined to half its peak by 1983, a
lag of only 10 years. We are forgetting our past faster with
each passing year (Fig. 3A).
We were curious whether our increasing tendency to forget
the old was accompanied by more rapid assimilation of the
new (21). We divided a list of 154 inventions into time-
resolved cohorts based on the forty-year interval in which
they were first invented (1800-1840, 1840-1880, and 1880-
1920) (7). We tracked the frequency of each invention in the
nth after it was invented as compared to its maximum value,
and plotted the median of these rescaled trajectories for each
cohort.
The inventions from the earliest cohort (1800-1840) took
over 66 years from invention to widespread impact
(frequency >25% of peak). Since then, the cultural adoption
of technology has become more rapid: the 1840-1880
invention cohort was widely adopted within 50 years; the
1880-1920 cohort within 27 (Fig. 3B).
Downloaded from www.sciencemag.org on December 16, 2010
Sciencexpress / www.sciencexpress.org / 16 December 2010 / Page 3 / 10.1126/science.1199644
HOUSE_OVERSIGHT_016998
Discussion 0
No comments yet
Be the first to share your thoughts on this epstein document