Learn More
Machine learning and data mining are gaining increasing attentions of the computing society. FPGA provides a highly parallel, low power, and flexible hardware platform for this domain, while the difficulty of programming FPGA greatly limits its prevalence. MapReduce is a parallel programming framework that could easily utilize inherent parallelism in(More)
²Sparse matrix-vector multiplication (SpMV) is a fundamental operation for many applications. Many studies have been done to implement the SpMV on different platforms, while few work focused on the very large scale datasets with millions of dimensions. This paper addresses the challenges of implementing large scale SpMV with FPGA and GPU in the application(More)
Domain of stereo vision is highly important in the fields of autonomous cars, video tolling, robotics, and aerial surveys. The specific feature of this domain is that we should handle not only the pixel-by-pixel 2D processing in one image but also the 3D processing for depth estimation by comparing information about a scene from several images with(More)
—This paper presents an FPGA based stereo vision system for future video tolling, which can achieve real-time processing for high resolution video streams. The key component for the system is SAD (Sum of Absolute Difference) based stereo matching. Although simple and effective, this method usually needs much computation power to satisfy real-time(More)
We present a state-of-the-art image recognition system, Deep Image, developed using end-to-end deep learning. The key components are a custom-built supercomputer dedicated to deep learning, a highly optimized parallel algorithm using new strategies for data partitioning and communication, larger deep neural network models, novel data augmentation(More)
Sparse matrix factorization is a critical step for the circuit simulation problem, since it is time consuming and computed repeatedly in the flow of circuit simulation. To accelerate the factorization of sparse matrices, a parallel CPU+FPGA based architecture is proposed in this paper. While the pre-processing of the matrix is implemented on CPU, the(More)
Google's famous PageRank algorithm is widely used to determine the importance of web pages in search engines. Given the large number of web pages on the World Wide Web, efficient computation of PageRank becomes a challenging problem. We accelerated the power method for computing PageRank on AMD GPUs. The core component of the power method is the Sparse(More)
The cessation of Moore's Law has limited further improvements in power efficiency. In recent years, the physical realization of the memristor has demonstrated a promising solution to ultra-integrated hardware realization of neural networks, which can be leveraged for better performance and power efficiency gains. In this work, we introduce a power efficient(More)
A balanced pool of hematopoietic stem cells (HSCs) in bone marrow is tightly regulated, and this regulation is disturbed in hematopoietic malignancies such as chronic myeloid leukemia (CML). The underlying mechanisms are largely unknown. Here we show that the Lin(-)Sca-1(+)c-Kit(-) (LSK(-)) cell population derived from HSC-containing Lin(-)Sca-1(+)c-Kit(+)(More)