Learn More
We consider the problem of evaluating a large number of XPath expressions on an XML stream. Our main contribution consists in showing that Deterministic Finite Automata (DFA) can be used effectively for this problem: in our experiments we achieve a throughput of about 5.4MB/s, independent of the number of XPath expressions (up to 1,000,000 in our tests).(More)
We consider the problem of evaluating a large number of XPath expressions on a stream of XML packets. We contribute two novel techniques. The first is to use a single Deterministic Finite Automaton (DFA). The contribution here is to show that the DFA can be used effectively for this problem: in our experiments we achieve a constant throughput, independently(More)
We describe a toolkit for highly scalable XML data processing, consisting of two components. The first is a collection of stand-alone XML tools, s.a. sorting , aggregation, nesting, and unnesting, that can be chained to express more complex restructurings. The second is a highly scalable XPath processor for XML streams that can be used to develop scalable(More)
Graphs are fundamental data structures and have been employed for centuries to model real-world systems and phenomena. Random walk with restart (RWR) provides a good proximity score between two nodes in a graph, and it has been successfully used in many applications such as automatic image captioning, recommender systems, and link prediction. The goal of(More)
Personalize PageRank (PPR) is an effective relevance (proximity) measure in graph mining. The goal of this paper is to efficiently compute single node relevance and top-k/highly relevant nodes without iteratively computing the relevances of all nodes. Based on a "random surfer model", PPR iteratively computes the relevances of all nodes in a graph until(More)
Graphs are a fundamental data structure and have been employed to model objects as well as their relationships. The similarity of objects on the web (e.g., webpages, photos, music, micro-blogs, and social networking service users) is the key to identifying relevant objects in many recent applications. SimRank, proposed by Jeh and Widom, provides a good(More)
Interleukin IL-17 is a proinflammatory cytokine that has been implicated in the pathogenesis of various autoimmune diseases. The single nucleotide polymorphism (SNP), rs2275913, in the promoter region of the IL-17 gene is associated with susceptibility to ulcerative colitis. When we examined the impact of rs2275913 in a cohort consisting of 438 pairs of(More)
Graph clustering is one of the key techniques for understanding the structures present in graphs. Besides cluster detection, identifying hubs and outliers is also a key task, since they have important roles to play in graph data mining. The structural clustering algorithm SCAN, proposed by Xu et al., is successfully used in many application because it not(More)
This paper proposes an incremental maintenance algorithm that efficiently updates the materialized XPath/XSLT views defined using XPath expressions in XP<sup>([],*,//,vars)</sup>. The algorithm consists of two processes. 1) The dynamic execution flow of an XSLT program is stored as an XT (XML Transformation) tree during the full transformation. 2) In(More)