Srinivas Chellappa

Learn More
This article gives an overview on the techniques needed to implement the discrete Fourier transform (DFT) efficiently on current multicore systems. The focus is on Intel-compatible multicores, but we also discuss the IBM Cell and, briefly, graphics processing units (GPUs). The performance optimization is broken down into three key challenges:(More)
<i>Kimberley</i> is a system that simplifies transient use of fixed hardware infrastructure by a mobile device. It uses virtual machine (VM) technology to resolve the tension between standardizing infrastructure for ease of deployment and maintenance, and customizing that infrastructure to meet the specific needs of a user. Kimberley decomposes the state of(More)
The complexity of modern computing platforms has made it extremely difficult to write numerical code that achieves the best possible performance. Straightforward implementations based on algorithms that minimize the operations count often fall short in performance by at least one order of magnitude. This tutorial introduces the reader to a set of general(More)
A methodology for surveillance of multiple targets through a distributed mobile sensor network is proposed in this paper. We examine coordination among sensors that monitor a rectangular surveillance zone that is crisscrossed by targets. After a target is detected, monitoring sensors either remain stationary or begin following their targets. The decision to(More)
The Cell BE is a multicore processor with eight vector accelerators (called SPEs) that implement explicit cache management through direct memory access engines. While the Cell has an impressive floating point peak performance, programming and optimizing for it is difficult as it requires explicit memory management, multithreading, streaming, and(More)
In navigation that involves several moving agents or robots that are not in possession of each other’s plans, a scheme for resolution of collision conflicts between them becomes mandatory. A resolution scheme is proposed in this paper specifically for the case where it is not feasible to have a priori the plans and locations of all other robots, robots can(More)
This paper presents a program generator for fast software Viterbi decoders for arbitrary convolutional codes. The input to the generator is a specification of the code and a single-instruction multiple-data (SIMD) vector length. The output is an optimized C implementation of the decoder that uses explicit Intel SSE vector instructions. At the heart of the(More)
We overview a library generation framework called Spiral. For the domain of linear transforms, Spiral automatically generates implementations for parallel platforms including SIMD vector extensions, multicore processors, field-programmable gate arrays (FPGAs) and FPGA accelerated processors. The performance of the generated code is competitive with the best(More)
High-performance discrete Fourier transform (DFT) libraries are an important requirement for many computing platforms. Unfortunately, developing and optimizing these libraries for modern, complex platforms has become extraordinarily difficult. Tomake thingsworse, performance often does not port, thus requiring permanent re-optimizations. Overcoming this(More)