Learn More
The volume of RDF data continues to grow over the past decade and many known RDF datasets have billions of triples. A grant challenge of managing this huge RDF data is how to access this big RDF data efficiently. A popular approach to addressing the problem is to build a full set of permutations of (S, P, O) indexes. Although this approach has shown to(More)
Large scale graph processing represents an interesting challenge due to the lack of locality. This paper presents PathGraph for improving iterative graph computation on graphs with billions of edges. Our system design has three unique features: First, we model a large graph using a collection of tree-based partitions and use an path-centric computation(More)
With wide application of virtualization technology, the demand is increasing for performance analysis and system diagnosis in virtualization environment. There are some profiling toolkits based on hardware events, such as OProfile in native Linux and Xenoprof in Xen virtual machine environment. However, sometimes users in different domains need monitor(More)
The flexibility of the RDF data model has attracted an increasing number of organizations to store their data in an RDF format. With the rapid growth of RDF datasets, we envision that it is inevitable to deploy a cluster of computing nodes to process large-scale RDF data in order to deliver desirable query performance. In this paper, we address the(More)
In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier's archiving and manuscript policies are encouraged to visit: Keywords: Group Steiner tree Keyword search RDF graph Top-K a b s t r a c t This paper(More)
Server consolidation, an application form of virtualization technology, consolidates multiple physical servers into a single or fewer real machines. It results in higher resource utilization and smaller space consumption and is considered as a tendency for enterprise application deployment. Some researches were carried for evaluating its static performance.(More)
Text categorization is the process of assigning documents to a set of previously fixed categories. It is widely used in many data-oriented management applications. Many popular algorithms for text categorization have been proposed, such as Naive Bayes, k-Nearest Neighbor (k-NN), Support Vector Machine (SVM). However, those classification approaches do not(More)
The emerging need for conducting complex analysis over big RDF datasets calls for scale-out solutions that can harness a computing cluster to process big RDF datasets. Queries over RDF data often involve complex self-joins, which would be very expensive to run if the data are not carefully partitioned across the cluster and hence distributed joins over(More)