Learn More
MapReduce and Spark are two very popular open source cluster computing frameworks for large scale data analytics. These frameworks hide the complexity of task parallelism and fault-tolerance, by exposing a simple programming API to users. In this paper, we evaluate the major architectural components in MapReduce and Spark frameworks including: shuffle,(More)
Clipping Web pages, namely extracting the informative clips (areas) from Web pages, has many applications, such as Web printing and e-reading on small handheld devices. Although many existing methods attempt to address this task, most of them can either work only on certain types of Web pages (e.g., news- and blog-like web pages), or perform(More)
The different applications make performance evaluation for data-intensive large-scale systems become a very important work. General test methods pursue the peak value as the final result without paying enough attention on resource utilization. However, the recent studies have proved that the behavior of resources can reflect the latent problems. In this(More)
Nowadays vast amounts of data are being produced in continuous ways. They may come from sensors, smart meters, application logs, monitoring software etc. The data need to be processed in realtime to gain actionable insights. Services like smart grid load balancing, cloud platform maintenance, can be carried out in an efficient way. Stream processing is the(More)
Fork-join is a basic query processing model in shared-nothing parallel database systems. A query Q is decomposed into a number of sub-queries, and each of which is processed independently on a processing element(PE), then all the results of sub-queries are "joined" and returned as Q's results. In this scheme, the query processing time of Q depends on the(More)
We describe a method to extract style and branding elements from multiple web pages in a given site for content repurposing. Style and branding elements convey the values of the site owners effectively and connect with the target prospects. They are manifested through logos, graphical elements, background color, font styles, font colors and other(More)
  • 1