Learn More
Graph analysis performs many random reads and writes, thus, these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the graph size. We demonstrate that a multicore server can process graphs with billions of vertices and hundreds of billions of edges, utilizing(More)
Statistical semantic parser trained on sufficient in-domain data has shown robustness to speech recognition errors in end-to-end spoken dialogue systems. However, when the dialogue domain is extended, due to the introduction of new semantic slots, values and unknown speech pattern, the parsing performance may significantly degrade. Effective retraining of(More)
Sparse matrix multiplication is traditionally performed in memory and scales to large matrices using the distributed memory of multiple nodes. In contrast, we scale sparse matrix multiplication beyond memory capacity by implementing sparse matrix dense matrix multiplication (SpMM) in a semi-external memory (SEM) fashion; i.e., we keep the sparse matrix on(More)
We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a userspace file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space.(More)
Gliomas are the most common type of primary brain tumors. Despite the improvement in current treatments for gliomas, including surgical resection, radiation, and chemotherapy, there has been very little progress in curing this kind of disease. Stat3 is a member of signal transducer and activator of transcription family. It plays an important role in(More)
We present a set-associative page cache for scalable parallelism of IOPS in multicore systems. The design eliminates lock contention and hardware cache misses by partitioning the global cache into many independent page sets, each requiring a small amount of metadata that fits in few processor cache lines. We extend this design with message passing among(More)
Circadian rhythms of heart rate variability have been widely studied in recent years. However, most previous reports described such rhythms in terms of normalized units of the low- and high-frequency (LF and HF) spectral components. In this study, we analyzed circadian rhythms of spectral components in absolute units and found unexpected results in normal(More)
We present the work on automatic parallelization of array-oriented programs for multi-core machines. Source programs written in standard APL are translated by a parallelizing APL-to-C compiler into parallelized C code, i.e. C mixed with OpenMP directives. We describe techniques such as virtual operations and data-partitioning used to effectively exploit(More)
Many eigensolvers such as ARPACK and Anasazi have been developed to compute eigenvalues of a large sparse matrix. These eigensolvers are limited by the capacity of RAM. They run in memory of a single machine for smaller eigenvalue problems and require the distributed memory for larger problems. In contrast, we develop an SSD-based eigensolver framework(More)