Scientific Paper Supplement / Table Of Contents / Government Investigation Evidence - HOUSE_OVERSIGHT_017009

Processing Document...

0%

Initializing...

Extraction Summary

1

People

2

Organizations

0

Locations

0

Events

1

Relationships

4

Quotes

Document Information

Type: Scientific paper supplement / table of contents / government investigation evidence

File Size: 1.28 MB

Summary

This document is a Table of Contents for the 'Materials and Methods' section of a scientific paper titled 'Quantitative analysis of culture using millions of digitized books' by Michel et al. It outlines technical methodologies regarding Google Books digitization, metadata accuracy, OCR quality, and n-gram corpus construction. While the user identified this as Epstein-related, the page itself contains no direct mention of Jeffrey Epstein; however, the Bates stamp 'HOUSE_OVERSIGHT_017009' indicates this document was part of a larger production of evidence to the House Oversight Committee, possibly included in a larger cache of emails or documents.

People (1)

Name	Role	Context
Michel	Primary Author	Lead author of the referenced paper 'Quantitative analysis of culture using millions of digitized books'

Organizations (2)

Name	Type	Context
Google Books		Subject of the digitization and analysis described in the document
House Oversight Committee		Implied by the Bates stamp 'HOUSE_OVERSIGHT'

Relationships (1)

Michel → Academic/Professional → et al. (Co-authors)

Cited as 'Michel et al.' in the document header

Key Quotes (4)

"Quantitative analysis of culture using millions of digitized books"

Source

HOUSE_OVERSIGHT_017009.jpg

Quote #1

"Overview of Google Books Digitization"

Source

HOUSE_OVERSIGHT_017009.jpg

Quote #2

"Construction of Historical N-grams Corpora"

Source

HOUSE_OVERSIGHT_017009.jpg

Quote #3

"Culturomic Analyses"

Source

HOUSE_OVERSIGHT_017009.jpg

Quote #4

Full Extracted Text

Complete text extracted from the document (3,913 characters)

Materials and Methods

“Quantitative analysis of culture using millions of digitized books”,
Michel et al.

Contents
I. Overview of Google Books Digitization...................................................................................................... 3
I.1. Metadata ....................................................................................................................................... 3
I.2. Digitization..................................................................................................................................... 4
I.3. Structure Extraction ...................................................................................................................... 4

II. Construction of Historical N-grams Corpora ............................................................................................. 5
II.1. Additional filtering of books .......................................................................................................... 5
II.1A. Accuracy of Date-of-Publication metadata ............................................................................... 5
II.1B. OCR quality .............................................................................................................................. 6
II.1C. Accuracy of language metadata................................................................................................ 6
II.1D. Year Restriction ........................................................................................................................ 7
II.2. Metadata based subdivision of the Google Books Collection...................................................... 7
II.2A. Determination of language ....................................................................................................... 7
II.2B. Determination of book subject assignments.............................................................................. 7
II.2C. Determination of book country-of-publication........................................................................... 7
II.3. Construction of historical n-grams corpora ................................................................................. 8
II.3A. Creation of a digital sequence of 1-grams and extraction of n-gram counts............................ 8
II.3B. Generation of historical n-grams corpora ............................................................................... 10

III. Culturomic Analyses .............................................................................................................................. 12
III.0. General Remarks ..................................................................................................................... 12
III.0.1 On Corpora. ........................................................................................................................... 12
III.0.2 On the number of books published ......................................................................................... 13
III.1. Generation of timeline plots ..................................................................................................... 13
III.1A. Single Query .......................................................................................................................... 13
III.1B. Multiple Query/Cohort Timelines ........................................................................................... 14
III.2. Note on collection of historical and cultural data ...................................................................... 14
III.3. Controls.................................................................................................................................... 15
III.4. Lexicon Analysis ...................................................................................................................... 15
1

HOUSE_OVERSIGHT_017009

View Original PDF

HOUSE_OVERSIGHT_017009.jpg

Processing Document...

Extraction Summary

Document Information

People (1)

Organizations (2)

Relationships (1)

Key Quotes (4)

Full Extracted Text

Discussion 0