Paolo Faraboschi

Learn More
embedded computing a vliw approach to architecture embedded computing a vliw approach to architecture embedded computing a vliw approach to architecture embedded computing a vliw approach to architecture document about embedded computing a vliw approach to embedded computing a vliw approach to architecture embedded computing a vliw approach to architecture(More)
Lx is a scalable and customizable VLIW processor technology platform designed by Hewlett-Packard and STMicroelectronics that allows variations in instruction issue width, the number and capabilities of structures and the processor instruction set. For Lx we developed the architecture and software from the beginning to support both scalability (variable(More)
The Dynamic Execution Layer Interface (DELl) offers the following unique capability: it provides fine-grain control over the execution of programs, by allowing its clients to observe and optionally manipulate every single instruction---at run time---just before it runs. DELl accomplishes this by opening up an interface to the layer between the execution of(More)
Simulation has historically been the primary technique used for evaluating the performance of new proposals in computer architecture. Speed and complexity considerations have traditionally limited its applicability to single-thread processors running application-level code. This is no longer sufficient to model modern multicore systems running the complex(More)
In this paper we report on a system which automatically designs realistic VLIW architectures highly optimized for one given application (the input for this system), while running all other code correctly. The system uses a product-quality compiler that generates very aggressive VLIW code. We retarget the compiler until we have found a VLIW architecture(More)
Cloud offerings are increasingly serving workloads with a large variability in terms of compute, storage and networking resources. Computing requirements (all the way to High Performance Computing or HPC), criticality, communication intensity, memory requirements, and scale can vary widely. Virtual Machine (VM) placement and consolidation for effective(More)
This paper proposes a novel methodology to efficiently simulate shared-memory multiprocessors composed of hundreds of cores. The basic idea is to use thread-level parallelism in the software system and translate it into corelevel parallelism in the simulated world. To achieve this, we first augment an existing full-system simulator to identify and separate(More)