MMX TM Microarchitecture of Pentium ® Processors With MMX Technology and Pentium ® II Microprocessors
@inproceedings{Kagan1997MMXTM,
title={MMX TM Microarchitecture of Pentium {\textregistered} Processors With MMX Technology and Pentium {\textregistered} II Microprocessors},
author={Michael Kagan and Doron Orenstien and Derrick Foster City Lin},
year={1997}
}The MMXTM technology is an extension to the Intel Architecture (IA) aimed at boosting the performance of multimedia applications. This technology is the most significant IA extension since the introduction of the Intel386TM microprocessor. The challenge in implementing this technology came from retrofitting the new functionality into existing Pentium ® and Pentium ® Pro processor designs. The main challenge was how to incorporate the new instructions while also keeping upcoming products on the…
8 Citations
Efficient orchestration of sub-word parallelism in media processors
- Computer ScienceSPAA '04
- 2004
This work proposes to make sub-word data movement a first-class operation in microprocessor architectures by introducing a Sub-word Permutation Unit (SPU) in the execution pipeline, and introduces a decoupled SPU control mechanism at the basic block level which allows static optimization to eliminate data-movement verhead in tight loops.
Listing 1 : Implementation for MOVS in AMD processors
- Computer Science
- 2014
It is shown that a malicious microcode update can potentially implement a new malicious instructions or alter the functionality of existing instructions, including processor-accelerated virtualization or cryptographic primitives, in order to subvert all software-enforced security policies and access controls.
Register file optimizations for superscalar microprocessors
- Computer Science
- 2005
This dissertation proposes several microarchitectural techniques, which also make use of some hardware support, in order to decrease the register file pressure by implementing more efficient register allocation and deallocation policies.
Macro-op scheduling and execution
- Computer Science
- 2004
This thesis presents the concept of coarse-grained instruction processing, which reduces the hardware overhead involved in coordinating all of the concurrent actions in a modern out-of-order processor, and shows that a significant portion of the instructions can be processed together in groups without requiring fine- grained controls for scheduling and execution.
Mostly-Static Program Partitioning for Dynamic Parallelization of Binary Executables
- Computer Science
- 2006
An off-line preprocessing step that extracts a mostly correct control flow graph from the binary program ahead of time enables us to statically partition a binary executable into single-entry multiple-exit regions and to identify potential parallelization candidates ahead of execution.
Mostly static program partitioning of binary executables
- Computer ScienceTOPL
- 2009
A runtime compilation system that takes unmodified sequential binaries and improves their performance on off-the-shelf multiprocessors using dynamic vectorization and loop-level parallelization techniques, and describes how these techniques are discovered and handled at runtime to ensure an incomplete static analysis never leads to an incorrect optimization result.
A Bibliography of Publications on Visual Instruction Sets
- Physics
- 2013
accelerate [TONH96]. Accelerates [DDHS00]. Accelerating [DDC98, Lee96, Sun96a]. achieve [Smi94]. Across [DDC98]. algorithms [TONH96]. Alpha [RRM96]. Alternatives [Ano98]. AltiVec [DDC98, DDHS00,…
The Algorithm and Circuit Design of a 400MHz 16-Bit Hybrid Multiplier
- Computer ScienceAsia-Pacific Computer Systems Architecture Conference
- 2006
The algorithm of a 16-bit hybrid multiplier, which can work in two modes, is presented, based on the raix-4 modified Booth’s algorithm, which is adopted by YHFT-DSP/800, a high performance fixed-point DSP.





