HOUSE_OVERSIGHT_017030.jpg

2.23 MB
View Original

Extraction Summary

2
People
3
Organizations
0
Locations
0
Events
0
Relationships
3
Quotes

Document Information

Type: Technical methodology report / data processing protocol (house oversight committee production)
File Size: 2.23 MB
Summary

This is page 22 of a technical document produced to the House Oversight Committee (Bates stamp HOUSE_OVERSIGHT_017030). The text describes a data processing methodology (Section III.7.A.5) for standardizing and extracting individual names from databases like Encyclopedia Britannica and Wikipedia to create 'query names.' It outlines specific algorithmic rules for handling titles, prefixes (e.g., 'von', 'de'), and formatting issues to accurately identify individuals despite variations in how their names appear in text.

People (2)

Name Role Context
Henry David Thoreau Example Subject
Used as an example of naming conventions in Encyclopedia Britannica/Wikipedia.
Oliver Joseph Lodge Example Subject
Used as an example of naming conventions where the middle name is dropped.

Organizations (3)

Name Type Context
Encyclopedia Britannica
Source database mentioned for name extraction rules.
Wikipedia
Source database mentioned for name extraction rules.
US House Committee on Oversight and Accountability
Implied by the 'HOUSE_OVERSIGHT' Bates stamp.

Key Quotes (3)

"Find possible names used to refer to individuals."
Source
HOUSE_OVERSIGHT_017030.jpg
Quote #1
"Given a full name with complex structure potentially containing details such as titles, initials, nobility rights and ranks, in addition to multiple first and last names, we must extract a list of simple names"
Source
HOUSE_OVERSIGHT_017030.jpg
Quote #2
"Query names are (2,3) grams which will be used in order to measure the fame of the individual."
Source
HOUSE_OVERSIGHT_017030.jpg
Quote #3

Discussion 0

Sign in to join the discussion

No comments yet

Be the first to share your thoughts on this epstein document