Learn More
An accurate performance model for MapReduce is increasingly important for analyzing and optimizing MapReduce jobs. It is also a precondition to implement cost-based scheduling strategies or to translate Hive like query jobs into sets of low cost MapReduce jobs. However, the multiple processing steps in MapReduce task, as well as the complexity of(More)
MapReduce is a distributed computing programming framework which provides an effective solution to the data processing challenge. As an open-source implementation of MapReduce, Hadoop has been widely used in practice. The performance of Hadoop MapReduce heavily depends on its configuration settings, so tuning these configuration parameters could be an(More)
On one hand, compared with traditional relational and XML models, graphs have more expressive power and are widely used today. On the other hand, various applications of social computing trigger the pressing need of a new search paradigm. In this article, we argue that big graph search is the one filling this gap. We first introduce the application of graph(More)
MapReduce is a widely used programming model for large scale data processing. In order to estimate the performance of MapReduce job and analyze the bottleneck of MapReduce job, a practical performance model for MapReduce is needed. Many works have been done on modeling the performance of MapReduce jobs. However, existing performance models ignore some(More)
More and more Internet companies rely on large scale data analysis as part of their core services for tasks such as log analysis, feature extraction or data filtering. Map-Reduce, through its Hadoop implementation, has proved to be an efficient model for dealing with such data. One important challenge when performing such analysis is to predict the(More)
Temporal graphs are a class of graphs whose nodes and edges, together with the associated properties, continuously change over time. Recently, systems have been developed to support snapshot queries over temporal graphs. However, these systems barely support aggregate time range queries. Moreover, these systems cannot guarantee ACID transactions, an(More)
Nowadays, with the rapid advance of network and computer technology, sharing and integration of across-field, across-organization government information resources to improve the quality of government service and management is urgently needed in the informatization of e-government in China. As the information resources are enormous, various and(More)
There have been recently quite a few works on optimizing the MapReduce execution plans, which either optimize the join operators or apply a set of translation rules to reduce the number of MapReduce jobs in an execution plan. However, none of these works has put into consideration and utilized how MapReduce jobs are generated and combined. To further(More)
As an efficient parallel computing system based on MapReduce model, Hadoop is widely used for large-scale data analysis such as data mining, machine learning and scientific simulation. However, there are still some performance problems in MapReduce, especially the situation in the shuffle phase. In order to solve these problems, in this paper, a lightweight(More)
Nowadays, various sensors are collecting, storing and transmitting tremendous trajectory data, and it is known that raw trajectory data seriously wastes the storage, network band and computing resource. Line simplification (LS) algorithms are an effective approach to attacking this issue by compressing data points in a trajectory to a set of continuous line(More)