Learn More
We present lower bounds on the amount of communication that matrixmultiplication algorithms must perform on a distributed-memory parallel computer. We denote the number of processors by P and the dimension of square matrices by n. We show that the most widely-used class of algorithms, the so-called 2-dimensional (2D) algorithms, are optimal, in the sense(More)
In this paper we present a speaker recognition algorithm that models explicitly intra-speaker inter-session variability. Such variability may be caused by changing speaker characteristics (mood, fatigue, etc.), channel variability or noise variability. We define a session-space in which each session (either train or test session) is a vector. We then(More)
We describe the design, implementation, and performance of a new parallel sparse Cholesky factorization code. The code uses a multifrontal factorization strategy. Operations on small dense submatrices are performed using new dense matrix subroutines that are part of the code, although the code can also use the blas and lapack. The new code is recursive at(More)
We present new communication-efficient parallel dense linear solvers: a solver for triangular linear systems with multiple right-hand sides and an LU factorization algorithm. These solvers are highly parallel and they perform a factor of 0.4P1/6 less communication than existing algorithms, where P is number of processors. The new solvers reduce(More)
The four existing stable factorization methods for symmetric indefinite pivoting (row or column exchanges) maintains a band structure in the reduced matrix and the factors, but destroys symmetry completely once an off-diagonal pivot is used. Two-by-two block pivoting maintains symmetry at all times, but quickly destroys the band structure. Gaussian(More)
We describe the design, implementation, and performance of a new parallel sparse Cholesky factorization code. The code uses a supernodal multifrontal factorization strategy. Operations on small dense submatrices are performed using new dense-matrix subroutines that are part of the code, although the code can also use the BLAS and LAPACK. The new code is(More)
We present a new out-of-core sparse symmetric-indefinite factorization algorithm. The most significant innovation of the new algorithm is a dynamic partitioning method for the sparse factor. This partitioning method results in very low I/O traffic and allows the algorithm to run at high computational rates, even though the factor is stored on a slow disk.(More)
The four existing stable factorization methods for symmetric indefinite matrices suffer serious defects when applied to banded matrices. Partial pivoting (row or column exchanges) maintains a band structure in the reduced matrix and the factors, but destroys symmetry completely once an off-diagonal pivot is used. Two-by-two block pivoting and Gaussian(More)
We present an extension for HEVC intra-frame coding with trapezoidal splits and orthogonal transforms. A block can be split into two 180-degrees rotationally-symmetric (C2) trapezoidal parts, each coded separately using standard DCT implementation. We also introduce part-to-part prediction from a diagonal edge. The optimal trapezoidal split of a quad tree(More)