HOUSE_OVERSIGHT_017027.jpg

2.45 MB

Extraction Summary

9
People
3
Organizations
1
Locations
0
Events
1
Relationships
3
Quotes

Document Information

Type: Scientific paper / technical appendix
File Size: 2.45 MB
Summary

This document appears to be page 19 of a scientific paper or technical appendix (likely related to evolutionary dynamics or cultural analytics) analyzing the concept of 'Fame.' It details a methodology for extracting biographical data from Encyclopedia Britannica and Wikipedia via DBPedia to study the frequency of names over time. The text uses famous historical figures like Albert Einstein, Henry Kissinger, and Rupert Murdoch as examples of how data is categorized by birth year.

People (9)

Name Role Context
Albert Einstein Example Subject
Used as an example for Wikipedia category extraction (German physicist, American physicist, etc.).
Joseph Heller Example Subject
Used as an example for Wikipedia category extraction (American novelist, Catch-22).
Wallace Stevens Example Subject
Listed in the '1879_births' category example.
Leon Trotsky Example Subject
Listed in the '1879_births' category example.
Henry Kissinger Example Subject
Listed in the '1923_births' category example.
Maria Callas Example Subject
Listed in the '1923_births' category example.
Michael Gorbachev Example Subject
Listed in the '1931_births' category example.
Raul Castro Example Subject
Listed in the '1931_births' category example.
Rupert Murdoch Example Subject
Listed in the '1931_births' category example.

Organizations (3)

Locations (1)

Location Context
Ulm

Relationships (1)

Albert Einstein Data Subject DBPedia
Article for Albert Einstein is a member of 73 categories in DBPedia.

Key Quotes (3)

"We study the fame of individuals appearing in the biographical sections of Encyclopedia Britannica and Wikipedia."
Source
HOUSE_OVERSIGHT_017027.jpg
Quote #1
"For our purposes, the most relevant component of DBPedia is the 'Categories' relational database."
Source
HOUSE_OVERSIGHT_017027.jpg
Quote #2
"We recognize articles referring to non-fictional people by their membership in a 'year_births' category."
Source
HOUSE_OVERSIGHT_017027.jpg
Quote #3

Full Extracted Text

Complete text extracted from the document (3,646 characters)

handful of years after the year X itself. The lag between a year and its peak is partly due to the length of the authorship and publication process. For instance, a book about the events of 1950 may be written over the period from 1950-1952 and only published in 1953.
For each year, we estimated the slope of the exponential decay shortly past its peak. The exponent was estimated using the slope of the curve on a logarithmic plot of frequency between the year Y+5 and the year Y+25. This estimate is robust to the specific values of the interval, as long as the first value (here, Y+5) is past the peak of Y, and the second value is in the fifty years that follow Y. The Inset in Figure 4A was generated using 5 and 25. The half-life could thus be derived.
Half-life can also be estimated directly by asking how many years past the peak elapse before frequency drops below half its peak value. These values are noisier, but exhibit the same trend as in Figure 4A, Inset (not shown).
Trends similar to those described here may capture more general events, such as those shown in Figure S9.
III.7. The Pursuit of Fame
We study the fame of individuals appearing in the biographical sections of Encyclopedia Britannica and Wikipedia. Given the encyclopedic objective of these sources, we argue these represent comprehensive lists of notable individuals. Thus, from Encyclopedia Britannica and Wikipedia, we produce databases of all individuals born between 1800-1980, recording their full name and year of birth. We develop a method to identify the most common, relevant names used to refer to all individuals in our databases. This method enables us to deal with potentially complicated full names, sometimes including multiple titles and middle names. On the basis of the amount of biographical information regarding each individual, we resolve the ambiguity arising when multiple individuals share some part, or all, their name. Finally, using the time series of the word frequency of people’s name, we compare the fame of individuals born in the same year or having the same occupation.
III.7A) Complete procedure
7.A.1 - Extraction of individuals appearing in Wikipedia.
Wikipedia is a large encyclopedic information source, with an important number of articles referring to people. We identify biographical Wikipedia articles through the DBPedia engine (Ref S9), a relational database created by extensively parsing Wikipedia. For our purposes, the most relevant component of DBPedia is the “Categories” relational database.
Wikipedia categories are structural entities which unite articles related to a specific topic. The DBPedia “Categories” database includes, for all articles within Wikipedia, a complete listing of the categories of which this article is a member. As an example, the article for Albert Einstein (http://en.wikipedia.org/wiki/Albert_Einstein) is a member of 73 categories, including “German physicists”, “American physicists”, “Violonists”, “People from Ulm” and “1879_births”. Likewise, the article for Joseph Heller (http://en.wikipedia.org/wiki/Joseph_Heller) is a member of 23 categories, including “Russian-American Jews”, “American novelists”, “Catch-22” and “1923_births”.
We recognize articles referring to non-fictional people by their membership in a “year_births” category. The category “1879_births” includes Albert Einstein, Wallace Stevens and Leon Trotsky ,likewise “1923_births” includes Henry Kissinger, Maria Callas and Joseph Heller while “1931_births” includes Michael Gorbachev, Raul Castro and Rupert Murdoch. If only the approximate birth year of a person is
19
HOUSE_OVERSIGHT_017027

Discussion 0

Sign in to join the discussion

No comments yet

Be the first to share your thoughts on this epstein document