This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages.
A new algorithm for finding large itemsets which uses fewer passes over the data than classic algorithms, and yet uses fewer candidate itemsets than methods based on sampling and a new way of generating “implication rules” which are normalized based on both the antecedent and the consequent.
This work develops the notion of mining rules that identify correlations (generalizing associations), and proposes measuring significance of associations via the chi-squared test for correlation from classical statistics, enabling the mining problem to reduce to the search for a border between correlated and uncorrelated itemsets in the lattice.
This paper presents a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample and uses it to extract a relation of (author,title) pairs from the World Wide Web.
A data structure to solve the problem of finding approximate matches in a large database called a GNAT { Geometric Near-neighbor Access Tree} is introduced based on the philosophy that the data structure should act as a hierarchical geometrical model of the data as opposed to a simple decomposition of theData that does not use its intrinsic geometry.
This paper proposes a system for registering documents and then detecting copies, either complete copies or partial copies, and describes algorithms for such detection, and metrics required for evaluating detection mechanisms (covering accuracy, efficiency, and security).
This work develops the notion of dependence rules that identify statistical dependence in both the presence and absence of items in itemsets in the lattice and develops pruning strategies based on the closure property that lead to an efficient algorithm for discovering dependence rules.
The results indicate that the approach proposed here is both computationally feasible and successful in identifying interesting causal structures, and an interesting outcome is that it is perhaps easier to infer the lack of causality than to infer causality, information that is useful in preventing erroneous decision making.