Peter Y. K. Cheung

Learn More
This paper presents an approach to the wordlength allocation and optimization problem for linear digital signal processing systems implemented as custom parallel processing units. Two techniques are proposed, one which guarantees an optimum set of wordlengths for each internal variable, and one which is a heuristic approach. Both techniques allow the user(More)
This paper compares three heuristic search algorithms: genetic algorithm (GA), simulated annealing (SA) and tabu search (TS), for hardware-software partitioning. The algorithms operate on functional blocks for designs represented as directed acyclic graphs, with the objective of minimising processing time under various hardware area constraints. The(More)
This paper presents a method that offers a uniform treatment for bit-width optimisation of both fixed-point and floating-point designs. Our work utilises automatic differentiation to compute the sensitivities of outputs to the bit-width of the various operands in the design. This sensitivity analysis enables us to explore and compare fixed-point and(More)
Automatic bitwidth analysis is a key ingredient for highlevel programming of FPGAs and high-level synthesis of VLSI circuits. The objective is to find the minimal number of bits to represent a value in order to minimize the circuit area and to improve efficiency of the respective arithmetic operations, while satisfying user-defined numerical constraints. We(More)
This paper presents a method for evaluating functions in hardware based on polynomial approximation with non-uniform segments. The novel use of nonuniform segments enables us to approximate non-linear regions of a function particularly well. The appropriate segment address for a given function can be rapidly calculated in run time by a simple combinational(More)
This paper presents a method for evaluating functions based on piecewise polynomial approximation with a novel hierarchical segmentation scheme. The use of a novel hierarchy scheme of uniform segments and segments with size varying by powers of two enables us to approximate nonlinear regions of a function particularly well. This partitioning is automated:(More)
This paper presents a method for producing hardware designs for elliptic curve cryptography (ECC) systems over the finite field GF(2 ), using the optimal normal basis for the representation of numbers. Our field multiplier design is based on a parallel architecture containing multiple -bit serial multipliers; by changing the number of such serial(More)
Extracting video structures is important for video indexing and navigation in large digital video archives. It is usually achieved by video segmentation algorithms. Little research efforts has been invested on segmentation solutions that utilize the video's emotional content. These solutions not only have the potential of providing better performances than(More)
This paper describes a framework and tools for automating the production of designs which can be partially recon gured at run time. The tools include: (i) a partial evaluator, which produces con guration les for a given design, where the number of con gurations can be minimised by a process known as compile-time sequencing; (ii) an incremental con guration(More)