HOUSE_OVERSIGHT_017038.jpg

1.67 MB

Extraction Summary

19
People
5
Organizations
1
Locations
3
Events
3
Relationships
5
Quotes

Document Information

Type: Bibliography / supplementary material / evidence document
File Size: 1.67 MB
Summary

This document is a page of supplementary references for a scientific paper titled 'Quantitative analysis of culture using millions of digitized books' by Michel et al. (likely the 2011 'Culturomics' paper). It lists technical citations related to OCR technology, data processing (MapReduce), and information quantification. The document bears a 'HOUSE_OVERSIGHT' Bates stamp, indicating it was part of evidence collected during a congressional investigation, likely related to Jeffrey Epstein's connections to scientific research and funding.

People (19)

Name Role Context
L. Taycher Author
Cited in reference S1 regarding Google Books
Ray Smith Author
Cited in reference S2 regarding Tesseract OCR
Daria Antonova Author
Cited in reference S2
Dar-Shyang Lee Author
Cited in reference S2
Ashok Popat Author
Cited in reference S3 regarding anomalous text detection
Thorsten Brants Author
Cited in reference S4
Alex Franz Author
Cited in reference S4
Jeffrey Dean Author
Cited in reference S5 regarding MapReduce
Sanjay Ghemawat Author
Cited in reference S5 regarding MapReduce
Peter Lyman Author
Cited in reference S6
Hal R. Varian Author
Cited in reference S6
Christian Bizer Author
Cited in reference S9 regarding DBpedia
Jens Lehmann Author
Cited in reference S9
Georgi Kobilarov Author
Cited in reference S9
Sören Auer Author
Cited in reference S9
Christian Becker Author
Cited in reference S9
Richard Cyganiak Author
Cited in reference S9
Sebastian Hellmann Author
Cited in reference S9
Michel et al. Primary Author
Author of the main paper 'Quantitative analysis of culture using millions of digitized books' for which this document...

Organizations (5)

Name Type Context
ACM
Association for Computing Machinery, publisher of cited proceedings
LDC
Linguistic Data Consortium, publisher of cited dataset
University of California, Berkeley
Associated with reference S6 URL
Wikipedia
Cited in references S7, S8, S10
House Oversight Committee
Source of the document (Footer: HOUSE_OVERSIGHT_017038)

Timeline (3 events)

2004
OSDI '04 (Operating Systems Design and Implementation)
Unknown
2009
International Conference on Multilingual OCR
Barcelona, Spain
2009
9th ACM symposium on Document Engineering (DocEng '09)
Unknown

Locations (1)

Location Context
Location of the International Conference on Multilingual OCR mentioned in S2

Relationships (3)

Jeffrey Dean Co-authors Sanjay Ghemawat
Co-authored 'MapReduce' paper cited in S5
Peter Lyman Co-authors Hal R. Varian
Co-authored 'How Much Information' cited in S6
Ray Smith Co-authors Daria Antonova
Co-authored OCR paper cited in S2

Key Quotes (5)

"Quantitative analysis of culture using millions of digitized books"
Source
HOUSE_OVERSIGHT_017038.jpg
Quote #1
"Books of the world stand up and be counted"
Source
HOUSE_OVERSIGHT_017038.jpg
Quote #2
"Adapting the Tesseract open source OCR engine for multilingual OCR"
Source
HOUSE_OVERSIGHT_017038.jpg
Quote #3
"MapReduce: Simplified Data Processing on Large Clusters"
Source
HOUSE_OVERSIGHT_017038.jpg
Quote #4
"How Much Information"
Source
HOUSE_OVERSIGHT_017038.jpg
Quote #5

Full Extracted Text

Complete text extracted from the document (1,559 characters)

Supplementary References
“Quantitative analysis of culture using millions of digitized books”,
Michel et al.
S1. L. Taycher, “Books of the world stand up and be counted”,
2010. http://booksearch.blogspot.com/2010/08/books-of-world-stand-up-and-be-
counted.html
S2. Ray Smith, Daria Antonova, and Dar-Shyang Lee, Adapting the Tesseract
open source OCR engine for multilingual OCR, Proceedings of the
International Conference on Multilingual OCR, Barcelona Spain, 2009,
http://doi.acm.org/10.1145/1577802.1577804
S3. Popat, Ashok. "A panlingual anomalous text detector." DocEng '09: Proceedings
of the 9th ACM symposium on Document Engineering, 2009, pp. 201-204.
S4. Brants, Thorsten and Franz, Alex. "Web 1T 5-gram Version 1." LDC2006T13
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13
S5. Dean, Jeffrey and Ghemawat, Sanjay. "MapReduce: Simplified Data Processing
on Large Clusters." OSDI '04 p137--150
S6. Lyman, Peter and Hal R. Varian, "How Much Information", 2003.
http://www2.sims.berkeley.edu/research/projects/how-much-info-
2003/print.htm#books
S7. http://en.wikipedia.org/wiki/List_of_treaties.
S8. http://en.wikipedia.org/wiki/Geographical_renaming]
S9. Christian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker,
Richard Cyganiak, Sebastian Hellmann.” DBpedia – A Crystallization Point for
the Web of Data.” Journal of Web Semantics: Science, Services and Agents on
the World Wide Web, 2009, pp. 154–165.
S10. http://en.wikipedia.org/wiki/Timeline_of_historic_inventions
HOUSE_OVERSIGHT_017038

Discussion 0

Sign in to join the discussion

No comments yet

Be the first to share your thoughts on this epstein document