Learn More
Multiprocessors based on processors with multiple cores usually include a non-uniform memory architecture (NUMA); even current 2-processor systems with 8 cores exhibit non-uniform memory access times. As the cores of a processor share a common cache, the issues of memory management and process mapping must be revisited. We find that optimizing only for data(More)
Many recent multiprocessor systems are realized with a nonuniform memory architecture (NUMA) and accesses to remote memory locations take more time than local memory accesses. Optimizing NUMA memory system performance is difficult and costly for three principal reasons: (1) Today’s programming languages/libraries have no explicit support for NUMA(More)
An important aspect of workload characterization is understanding memory system performance (i.e., understanding a workload’s interaction with the memory system). On systems with a non-uniform memory architecture (NUMA) the performance critically depends on the distribution of data and computations. The actual memory access patterns have a large influence(More)
Future exascale systems will be based on multi-core processors, but even today’s multi-core processors can be asymmetric and exhibit limitations and bottlenecks that are different from those found on a symmetric multiprocessor. In this paper we investigate the performance of a cluster node based on the Intel Xeon E5345 quad-core processor and note that(More)
Many multicore multiprocessors have a non-uniform memory architecture (NUMA), and for good performance, data and computations must be partitioned so that (ideally) all threads execute on the processor that holds their data. However, many multithreaded applications show heavy use of shared data structures that are accessed by all threads of the application.(More)
The Java®HotSpot Virtual Machine includes a multi-tier compilation system that may invoke a compiler at any time. Lower tiers instrument the program to gather information for the highly optimizing compiler at the top tier, and this compiler bases its optimizations on these profiles. But if the assumptions made by the top-tier compiler are proven wrong(More)
This paper presents the design and implementation of an embedded system for real-time network flow identification. The system identifies data flows based on packet inspection. The main advantage of this system is that it reduces significantly the processing time required for the flow identification. For the hardware implementation, a Xilinx Virtex-II Pro(More)
  • 1