Learn More
We consider the problem of evaluating a large number of XPath expressions on an XML stream. Our main contribution consists in showing that Deterministic Finite Automata (DFA) can be used effectively for this problem: in our experiments we achieve a throughput of about 5.4MB/s, independent of the number of XPath expressions (up to 1,000,000 in our tests).(More)
We consider the problem of evaluating a large number of XPath expressions on a stream of XML packets. We contribute two novel techniques. The first is to use a single Deterministic Finite Automaton (DFA). The contribution here is to show that the DFA can be used effectively for this problem: in our experiments we achieve a constant throughput, independently(More)
We describe a toolkit for highly scalable XML data processing, consisting of two components. The first is a collection of stand-alone XML tools, s.a. sorting , aggregation, nesting, and unnesting, that can be chained to express more complex restructurings. The second is a highly scalable XPath processor for XML streams that can be used to develop scalable(More)
Interleukin IL-17 is a proinflammatory cytokine that has been implicated in the pathogenesis of various autoimmune diseases. The single nucleotide polymorphism (SNP), rs2275913, in the promoter region of the IL-17 gene is associated with susceptibility to ulcerative colitis. When we examined the impact of rs2275913 in a cohort consisting of 438 pairs of(More)
Graphs are fundamental data structures and have been employed for centuries to model real-world systems and phenomena. Random walk with restart (RWR) provides a good proximity score between two nodes in a graph, and it has been successfully used in many applications such as automatic image captioning, recommender systems, and link prediction. The goal of(More)
Personalize PageRank (PPR) is an effective relevance (proximity) measure in graph mining. The goal of this paper is to efficiently compute single node relevance and top-k/highly relevant nodes without iteratively computing the relevances of all nodes. Based on a "random surfer model", PPR iteratively computes the relevances of all nodes in a graph until(More)
Previous studies have repeatedly reported that increasing age is a significant risk factor for worse outcomes after allogeneic hematopoietic stem cell transplantation (allo-HSCT) among patients with acute myeloid leukemia (AML). However, more recent studies reported conflicting results regarding the association between age and outcomes in elderly patients.(More)
Graphs are a fundamental data structure and have been employed to model objects as well as their relationships. The similarity of objects on the web (e.g., webpages, photos, music, micro-blogs, and social networking service users) is the key to identifying relevant objects in many recent applications. SimRank, proposed by Jeh and Widom, provides a good(More)
Graph clustering is one of the key techniques for understanding the structures present in graphs. Besides cluster detection, identifying hubs and outliers is also a key task, since they have important roles to play in graph data mining. The structural clustering algorithm SCAN, proposed by Xu et al., is successfully used in many application because it not(More)
The goal of this work is to identify the diameter, the maximum distance between any two nodes, of graphs that evolve over time. This problem is useful for many applications such as improving the quality of P2P networks. Our solution, G-Scale, can track the diameter of time-evolving graphs in the most efficient and correct manner. G-Scale is based on two(More)