This document appears to be a methodology appendix for a study on 'Historical N-grams Corpora' utilizing Google Books data. It describes the technical process of filtering metadata to ensure accuracy, specifically removing serial publications via an algorithm dubbed 'Serial Killer.' The document bears a 'HOUSE_OVERSIGHT_017013' Bates stamp, indicating it was part of a document production to the House Oversight Committee, though the text itself contains no direct references to Epstein, Maxwell, or specific criminal activities.
| Name | Role | Context |
|---|---|---|
| Annotator | Researcher/Verifier |
An individual with no knowledge of the study who manually determined date-of-publication for 1000 volumes.
|
| Name | Type | Context |
|---|---|---|
|
Digitized 15 million books used as the source for the study.
|
||
| US Government |
Mentioned in the context of 'US Government report' as a filter phrase.
|
|
| House Oversight Committee |
Documents bears the Bates stamp 'HOUSE_OVERSIGHT'.
|
"As noted in the paper text, we did not analyze the entire set of 15 million books digitized by Google."Source
"Our 'Serial Killer' algorithm removed serial publications by looking for suggestive metadata entries"Source
"For English books, 29.4% of books were filtered using the 'Serial Killer'"Source
Complete text extracted from the document (3,399 characters)
Discussion 0
No comments yet
Be the first to share your thoughts on this epstein document