a. Conflict resolution involves the decision of whether a query name, associated with
multiple records, can unambiguously refer to a single one of them.
b. Wikipedia. Conflict resolution for Wikipedia records is carried out on the basis the main
article word count and traffic statistics. A conflict is resolved as such :
i. Find the cumulative word count of words written in the articles in conflict.
ii. Find the cumulative number of views resulting from the traffic to the articles in
conflict.
iii. For every record in the conflict, find the fraction of words and views resulting from
this record by dividing by the cumulative counts.
iv. Does a record have the largest fraction of both words written and page views?
v. Does this record have above 66% of either words written and page views?
vi. If so, the conflicted query name can be considered as being sufficiently specific
to the record with these properties.
c. Encyclopedia Britannica. Conflict resolution for Encyclopedia Britannica records is carried
on the basis of the quantity of information snippets present in the dataset.
i. Find the cumulative number of information snippets related to the records in
conflicts.
ii. For every record in the conflict, find the fraction of informational snippets by
dividing with the cumulative count
iii. If a record has greater than 66% of the cumulative total, the query name in
conflict is considered to refer to this record.
III.7.A.9 Identify the most relevant name used to refer to an individual.
So far, we have obtained, for all individuals in both our databases, a set of names by which they can
plausibly be mentioned. From this set, we wish to identify the best such candidate and use its word
frequency to observe the fame of the person at hand. This optimal name is identified on the basis of the
amplitude of the word frequency, the potential ambiguities which arise from name homonimity and the
quality of the word frequency time series. Examples are shown in Fig S11 and S12.
9) Determine the best query name for every record.
a. Order all the query names associated with a record on the basis of the integral of the
fame signal from the year of birth until the year 2000.
b. Iterating from the strongest fame signal to the lowest, the selected query name is the first
result with the following properties :
i. Unambiguously refers to the record (as determined by conflict resolution, if
needed).
ii. The average fame signal in the window [year of birth ± 10 years] is less than 10⁻⁹
or an order of magnitude less than the average fame signal from the year of birth
to the year 2000.
iii. (Wikipedia Only). The query name, when converted to a Wikipedia URL by
replacing whitespaces with underscores, refers to the record or an inexistent
article. If the name refers to another article or a disambiguation page, the query
name is rejected.
c. If the best query name is a 2-gram name corresponding the last two names in 3-gram
query name, and if the fame integral of the 3-gram name is 80% of the fame integral of
the 2-gram, the best query name is replaced by the 3-gram.
24
HOUSE_OVERSIGHT_017032
Discussion 0
No comments yet
Be the first to share your thoughts on this epstein document