#### Filter Results:

#### Publication Year

2010

2016

#### Publication Type

#### Co-author

#### Key Phrase

#### Publication Venue

Learn More

Statistical semantic parser trained on sufficient in-domain data has shown robustness to speech recognition errors in end-to-end spoken dialogue systems. However, when the dialogue domain is extended, due to the introduction of new semantic slots, values and unknown speech pattern, the parsing performance may significantly degrade. Effective retraining of… (More)

We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a userspace file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space.… (More)

Graph analysis performs many random reads and writes, thus, these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the graph size. We demonstrate that a multicore server can process graphs with billions of vertices and hundreds of billions of edges, utilizing… (More)

We present a set-associative page cache for scalable parallelism of IOPS in multicore systems. The design eliminates lock contention and hardware cache misses by partitioning the global cache into many independent page sets, each requiring a small amount of metadata that fits in few processor cache lines. We extend this design with message passing among… (More)

Many eigensolvers such as ARPACK and Anasazi have been developed to compute eigenvalues of a large sparse matrix. These eigensolvers are limited by the capacity of RAM. They run in memory of a single machine for smaller eigenvalue problems and require the distributed memory for larger problems. In contrast, we develop an SSD-based eigensolver framework… (More)

A canonical problem in graph mining is the detection of dense communities. This problem is exacerbated for a graph with a large order and size – the number of vertices and edges – as many community detection algorithms scale poorly. In this work we propose a novel framework for detecting active communities that consist of the most active vertices in massive… (More)

Owing to random memory access patterns, sparse matrix multiplication is traditionally performed in memory and scales to large matrices using the distributed memory of multiple nodes. In contrast, we scale sparse matrix multiplication by utilizing commodity SSDs. We implement sparse matrix dense matrix multiplication (SpMM) in a semi-external memory (SEM)… (More)

- Da Zheng, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E Priebe, Alexander S Szalay
- 2015

Graph analysis performs many random reads and writes, thus, these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the graph size. We demonstrate that a multicore server can process graphs with billions of vertices and hundreds of billions of edges, utilizing… (More)

We present the work on automatic parallelization of array-oriented programs for multi-core machines. Source programs written in standard APL are translated by a parallelizing APL-to-C compiler into parallelized C code, i.e. C mixed with OpenMP directives. We describe techniques such as virtual operations and data-partitioning used to effectively exploit… (More)