Learn More
Graph analysis performs many random reads and writes, thus, these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the graph size. We demonstrate that a multicore server can process graphs with billions of vertices and hundreds of billions of edges, utilizing(More)
Statistical semantic parser trained on sufficient in-domain data has shown robustness to speech recognition errors in end-to-end spoken dialogue systems. However, when the dialogue domain is extended, due to the introduction of new semantic slots, values and unknown speech pattern, the parsing performance may significantly degrade. Effective retraining of(More)
We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a userspace file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space.(More)
We present a set-associative page cache for scalable parallelism of IOPS in multicore systems. The design eliminates lock contention and hardware cache misses by partitioning the global cache into many independent page sets, each requiring a small amount of metadata that fits in few processor cache lines. We extend this design with message passing among(More)
Gliomas are the most common type of primary brain tumors. Despite the improvement in current treatments for gliomas, including surgical resection, radiation, and chemotherapy, there has been very little progress in curing this kind of disease. Stat3 is a member of signal transducer and activator of transcription family. It plays an important role in(More)
We present the work on automatic parallelization of array-oriented programs for multi-core machines. Source programs written in standard APL are translated by a parallelizing APL-to-C compiler into parallelized C code, i.e. C mixed with OpenMP directives. We describe techniques such as virtual operations and data-partitioning used to effectively exploit(More)
We present a study on the execution performance of APL and MATLAB on a suite of five programs ranging from one of highly iterative nature to ones mainly do array operations. The comparison on performance is carried out in three different modes of execution: interpreted, compiled and parallel. We found that MATLAB interpreter is in general much faster than(More)
Many eigensolvers such as ARPACK and Anasazi have been developed to compute eigenvalues of a large sparse matrix. These eigensolvers are limited by the capacity of RAM. They run in memory of a single machine for smaller eigenvalue problems and require the distributed memory for larger problems. In contrast, we develop an SSD-based eigensolver framework(More)
A canonical problem in graph mining is the detection of dense communities. This problem is exacerbated for a graph with a large order and size – the number of vertices and edges – as many community detection algorithms scale poorly. In this work we propose a novel framework for detecting active communities that consist of the most active vertices in massive(More)