HOUSE_OVERSIGHT_017009.jpg

1.28 MB

Extraction Summary

1
People
2
Organizations
0
Locations
0
Events
1
Relationships
4
Quotes

Document Information

Type: Scientific paper supplement / table of contents / government investigation evidence
File Size: 1.28 MB
Summary

This document is a Table of Contents for the 'Materials and Methods' section of a scientific paper titled 'Quantitative analysis of culture using millions of digitized books' by Michel et al. It outlines technical methodologies regarding Google Books digitization, metadata accuracy, OCR quality, and n-gram corpus construction. While the user identified this as Epstein-related, the page itself contains no direct mention of Jeffrey Epstein; however, the Bates stamp 'HOUSE_OVERSIGHT_017009' indicates this document was part of a larger production of evidence to the House Oversight Committee, possibly included in a larger cache of emails or documents.

People (1)

Name Role Context
Michel Primary Author
Lead author of the referenced paper 'Quantitative analysis of culture using millions of digitized books'

Organizations (2)

Name Type Context
Google Books
Subject of the digitization and analysis described in the document
House Oversight Committee
Implied by the Bates stamp 'HOUSE_OVERSIGHT'

Relationships (1)

Michel Academic/Professional et al. (Co-authors)
Cited as 'Michel et al.' in the document header

Key Quotes (4)

"Quantitative analysis of culture using millions of digitized books"
Source
HOUSE_OVERSIGHT_017009.jpg
Quote #1
"Overview of Google Books Digitization"
Source
HOUSE_OVERSIGHT_017009.jpg
Quote #2
"Construction of Historical N-grams Corpora"
Source
HOUSE_OVERSIGHT_017009.jpg
Quote #3
"Culturomic Analyses"
Source
HOUSE_OVERSIGHT_017009.jpg
Quote #4

Full Extracted Text

Complete text extracted from the document (3,913 characters)

Materials and Methods
“Quantitative analysis of culture using millions of digitized books”,
Michel et al.
Contents
I. Overview of Google Books Digitization...................................................................................................... 3
I.1. Metadata ....................................................................................................................................... 3
I.2. Digitization..................................................................................................................................... 4
I.3. Structure Extraction ...................................................................................................................... 4
II. Construction of Historical N-grams Corpora ............................................................................................. 5
II.1. Additional filtering of books .......................................................................................................... 5
II.1A. Accuracy of Date-of-Publication metadata ............................................................................... 5
II.1B. OCR quality .............................................................................................................................. 6
II.1C. Accuracy of language metadata................................................................................................ 6
II.1D. Year Restriction ........................................................................................................................ 7
II.2. Metadata based subdivision of the Google Books Collection...................................................... 7
II.2A. Determination of language ....................................................................................................... 7
II.2B. Determination of book subject assignments.............................................................................. 7
II.2C. Determination of book country-of-publication........................................................................... 7
II.3. Construction of historical n-grams corpora ................................................................................. 8
II.3A. Creation of a digital sequence of 1-grams and extraction of n-gram counts............................ 8
II.3B. Generation of historical n-grams corpora ............................................................................... 10
III. Culturomic Analyses .............................................................................................................................. 12
III.0. General Remarks ..................................................................................................................... 12
III.0.1 On Corpora. ........................................................................................................................... 12
III.0.2 On the number of books published ......................................................................................... 13
III.1. Generation of timeline plots ..................................................................................................... 13
III.1A. Single Query .......................................................................................................................... 13
III.1B. Multiple Query/Cohort Timelines ........................................................................................... 14
III.2. Note on collection of historical and cultural data ...................................................................... 14
III.3. Controls.................................................................................................................................... 15
III.4. Lexicon Analysis ...................................................................................................................... 15
1
HOUSE_OVERSIGHT_017009

Discussion 0

Sign in to join the discussion

No comments yet

Be the first to share your thoughts on this epstein document