This is page 22 of a technical document produced to the House Oversight Committee (Bates stamp HOUSE_OVERSIGHT_017030). The text describes a data processing methodology (Section III.7.A.5) for standardizing and extracting individual names from databases like Encyclopedia Britannica and Wikipedia to create 'query names.' It outlines specific algorithmic rules for handling titles, prefixes (e.g., 'von', 'de'), and formatting issues to accurately identify individuals despite variations in how their names appear in text.
| Name | Role | Context |
|---|---|---|
| Henry David Thoreau | Example Subject |
Used as an example of naming conventions in Encyclopedia Britannica/Wikipedia.
|
| Oliver Joseph Lodge | Example Subject |
Used as an example of naming conventions where the middle name is dropped.
|
| Name | Type | Context |
|---|---|---|
| Encyclopedia Britannica |
Source database mentioned for name extraction rules.
|
|
| Wikipedia |
Source database mentioned for name extraction rules.
|
|
| US House Committee on Oversight and Accountability |
Implied by the 'HOUSE_OVERSIGHT' Bates stamp.
|
"Find possible names used to refer to individuals."Source
"Given a full name with complex structure potentially containing details such as titles, initials, nobility rights and ranks, in addition to multiple first and last names, we must extract a list of simple names"Source
"Query names are (2,3) grams which will be used in order to measure the fame of the individual."Source
Complete text extracted from the document (3,487 characters)
Discussion 0
No comments yet
Be the first to share your thoughts on this epstein document