Cheng-Yang Fu

Learn More
We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default(More)
The main contribution of this paper is an approach for introducing additional context into state-of-the-art general object detection. To achieve this we first combine a state-ofthe-art classifier (Residual-101 [14]) with a fast detection framework (SSD [18]). We then augment SSD+Residual101 with deconvolution layers to introduce additional largescale(More)
For applications in navigation and robotics, estimating the 3D pose of objects is as important as detection. Many approaches to pose estimation rely on detecting or tracking parts or keypoints [11, 21]. In this paper we build on a recent state-of-the-art convolutional network for slidingwindow detection [10] to provide detection and rough pose estimation in(More)
This paper proposes a synchronization approach for fast and accu-rate Multi-Core Instruction-Set Simulation (MCISS). An ideal MCISS should run accurately in a real-time fashion. In order to achieve accurate simulation results of MCISS, a lock-step approach, which synchronizes every cycle, is commonly used. However, this approach introduces immense overhead(More)
The multicore revolution is having limited impact in safety-critical application domains. A key reason is the “one-out-of-m” problem: when validating real-time constraints on an m-core platform, excessive analysis pessimism can effectively negate the processing capacity of the additional $$m-1$$ m - 1 cores so that only “one core’s worth” of capacity is(More)
Ideally, multi-core instruction-set simulation should run in parallel to improve simulation performance. However, the conventional low-parallelism centralized scheduler greatly constrains simulation performance. To resolve this issue, we propose a high-parallelism distributed scheduling mechanism. The experimental results show that our proposed approach(More)
As multi-core architecture has become the mainstream, the corresponding multi-core instruction-set simulation (MCISS) is also needed to aid system development. Ideally, we may run a MCISS in parallel to enhance the simulation speed. However, the conventional centralized timing synchronization mechanism would greatly constrain the parallelism of a MCISS, so(More)
In this article, we propose an extended SystemC framework that directly enables software simulation in SystemC. Although SystemC has been widely adopted for system-level simulation of hardware designs nowadays, to complete HW/SW co-simulation, it still requires an additional instruction set simulator (ISS) for software execution. However, the heavy(More)
This paper proposes a timing synchronization method for fast and accurate Multi-Core Instruction-Set Simulation (MCISS). In order to achieve accurate simulation results of MCISS, a lock-step approach, which synchronizes every cycle, is commonly used. However, this approach introduces immense overhead and lowers the simulation speed. Instead of synchronizing(More)
This paper proposes a shared-variable-based approach for fast and accurate multi-core cache coherence simulation. While the intuitive, conventional approach — synchronizing at either every cycle or memory access — gives accurate simulation results, it has poor performance due to huge simulation overloads. We observe that timing synchronization(More)