HOUSE_OVERSIGHT_017024.jpg

2.29 MB
View Original

Extraction Summary

2
People
3
Organizations
0
Locations
0
Events
0
Relationships
3
Quotes

Document Information

Type: Academic paper / scientific report (appendix or supplementary material)
File Size: 2.29 MB
Summary

This document is page 16 of a scientific or academic paper regarding quantitative linguistics, specifically the 'Estimation of Lexicon Size.' It details a methodology for analyzing word frequency (1-grams) over time (1900-2000) using the Oxford English Dictionary and other sources. The text explains a classification system for filtering words (e.g., typos, proper nouns, foreign words) to estimate the size of the English lexicon. While the content is purely academic, the footer 'HOUSE_OVERSIGHT_017024' indicates this document was collected as part of a House Oversight Committee investigation, likely related to the broader Epstein document production.

People (2)

Name Role Context
Native English Speaker (Unnamed) Annotator
Classified random samples of alphabetical forms into categories.
Different Native Speaker (Unnamed) Annotator
Repeated the sampling process for the year 2000 lexicon to confirm independence.

Organizations (3)

Name Type Context
Oxford English Dictionary
Used to estimate the upper bound of unique 1-grams.
House Oversight Committee
The document bears the Bates stamp 'HOUSE_OVERSIGHT', indicating it was part of a congressional document production.
AHD4 (American Heritage Dictionary, 4th Ed.)
Used to plot frequency histograms of 1-grams.

Key Quotes (3)

"Therefore, we estimate an upper bound of the number of unique 1-grams defined by this dictionary as 615,100-169,000 which is approximately 446,000."
Source
HOUSE_OVERSIGHT_017024.jpg
Quote #1
"We found that 90% of 1-gram headwords had a frequency greater than 10^-9, but only 70% were more frequent than 10^-8."
Source
HOUSE_OVERSIGHT_017024.jpg
Quote #2
"A typo is a one-time typing error by someone who presumably knows the correct spelling (as in improtant); a misspelling, which generally has the same pronunciation as the correct spelling, arises when a person is ignorant of the correct spelling (as in abberation)."
Source
HOUSE_OVERSIGHT_017024.jpg
Quote #3

Discussion 0

Sign in to join the discussion

No comments yet

Be the first to share your thoughts on this epstein document