Energy reduction in multiprocessor systems using transactional memory

@article{Moreshet2005EnergyRI,
  title={Energy reduction in multiprocessor systems using transactional memory},
  author={Tali Moreshet and R. Iris Bahar and Maurice Herlihy},
  journal={ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.},
  year={2005},
  pages={331-334}
}
  • Tali Moreshet, R. I. Bahar, M. Herlihy
  • Published 8 August 2005
  • Computer Science, Medicine
  • ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.
The emphasis in microprocessor design has shifted from high performance, to a combination of high performance and low power. Until recently, this trend was mostly true for uniprocessors. In this work the authors focused on new energy consumption issues unique to multiprocessor systems: synchronization of accesses to shared memory. The authors investigated and compared different means of providing atomic access to shared memory, including locks and lock-free synchronization (i.e., transactional… 
The Implications of Shared Data Synchronization Techniques on Multi-Core Energy Efficiency
TLDR
It is shown that Software Transactional Memory (STM) systems can perform better than locks for workloads where a significant portion of the running time is spent in the critical sections and how power-conserving techniques available on modern processors like C-states and clock frequency scaling impact energy consumption and performance.
Energy-Performance Tradeoffs in Software Transactional Memory
TLDR
This work characterize the behavior of three state-of-the-art lock-based STM algorithms, along with three different conflict resolution schemes, and proposes a DVFS-based technique that can be integrated into the resolution policies so as to improve the energy-delay product (EDP).
STM versus lock-based systems: An energy consumption perspective
TLDR
This work presents a comprehensive study on the energy consumption of a state-of-the-art STM (Software Transactional Memory) implementation using STAMP, a representative set of transactional workloads, comparing it to its lock-based counterpart.
Energy Implications of Transactional Memory for Embedded Architectures
Roughly ninety percent of all microprocessorsmanufactured in any one year are intended for embedded devices such as cameras, cell-phones, or machine controllers. We evaluate the energy-efficiency and
Adaptive Transactional Memories: Performance and Energy Consumption Tradeoffs
TLDR
The results hereby provided show that adaptively is a strictly necessary requirement to reduce energy consumption in STM systems: Without it, it is not possible to reach any acceptable level of energy efficiency at all.
On the energy-efficiency of software transactional memory
TLDR
Experimental results show that the proposed novel scratchpad-based energy-aware STM design strategies can achieve an energy improvement of up to ~36% with regard to the base STM for applications characterized by short-lived transactions and relatively high abort rate.
Characterizing the Energy Consumption of Software Transactional Memory
TLDR
A thorough evaluation of energy consumption in a state-of-the-art STM shows that energy and performance results do not always follow the same trend and, therefore, it might be appropriate to consider different strategies depending on the focus of the optimization.
Adaptive transaction scheduling for transactional memory systems
TLDR
This paper proposes a new paradigm called adaptive transaction scheduling, based on the parallelism feedback from applications, that dynamically dispatches and controls the number of concurrently executing transactions and significantly improves performance for both hardware and software transactional memory systems.
Energy efficient synchronization techniques for embedded architectures
TLDR
A novel energy-efficient hardware semaphore construction in which cores spin on local scratchpad memory, reducing the load on the shared bus is proposed and evaluated.
Using Transactional Memory to Avoid Blocking in OpenMP Synchronization Directives - Don't Wait, Speculate!
TLDR
This paper presents methods based on hardware transactional memory (HTM) for executing OpenMP barrier, critical, and taskwait directives without blocking, and shows a 73 % performance improvement over traditional locking approaches, and 23 % better than other HTM approaches on critical sections.
...
1
2
3
4
...

References

SHOWING 1-10 OF 23 REFERENCES
The thrifty barrier: energy-aware synchronization in shared-memory multiprocessors
TLDR
This work presents the thrifty barrier, a hardware-software approach to saving energy in parallel applications that exhibit barrier synchronization imbalance, and leverages the coherence protocol and proposes small hardware extensions to achieve timely wake-up of dormant threads.
Transactional Memory: Architectural Support For Lock-free Data Structures
  • M. Herlihy, J. Moss
  • Computer Science
    Proceedings of the 20th Annual International Symposium on Computer Architecture
  • 1993
TLDR
Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.
Power and performance tradeoffs using various caching strategies
  • R. I. Bahar, G. Albera, S. Manne
  • Computer Science
    Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379)
  • 1998
TLDR
It is shown that, by using buffers, energy consumption of the memory subsystem may be reduced by as much as 13% for certain data cache configurations and by asmuch as 23% forcertain instruction cache configurations without adversely effecting processor performance or on-chip energy consumption.
Transactional lock-free execution of lock-based programs
TLDR
This paper proposes Transactional Lock Removal (TLR) and shows how a program that uses lock-based synchronization can be executed by the hardware in a lock-free manner, even in the presence of conflicts, without programmer support or software changes.
Performance and power impact of issue-width in chip-multiprocessor cores
  • M. Ekman, P. Stenström
  • Computer Science
    2003 International Conference on Parallel Processing, 2003. Proceedings.
  • 2003
TLDR
This work shows that scalable parallel applications from SPLASH-2 can be run as efficiently and with comparable power consumption on a chip-multiprocessor (CMP) with fewer, but wider-issue cores.
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery
TLDR
Using full-system simulation of a 16-way multiprocessor running commercial workloads, it is found that SafetyNet adds statistically insignificant runtime overhead in the common-case of fault-free execution, and avoids a crash when tolerated faults occur.
Speculative lock reordering: optimistic out-of-order execution of critical sections
TLDR
It is shown that SLR can be implemented in a chip-multiprocessor by only modest extensions to already published thread-level data dependence speculation systems, and since an execution order can be selected that removes as many data dependences as possible, it can expose more concurrency.
Checkpoint processing and recovery: towards scalable large instruction window processors
TLDR
The CPR proposal incorporates novel microarchitecture scheme for addressing design issues-a selective checkpoint mechanism for recovering from mispredicts, a hierarchical store queue organization for fast store-load forwarding, and an effective algorithm for aggressive physical register reclamation.
Unbounded Transactional Memory
TLDR
A hardware implementation of unbounded transactional memory, called UTM, is described, which exploits the common case for performance without sacrificing correctness on transactions whose footprint can be nearly as large as virtual memory.
Software transactional memory for dynamic-sized data structures
TLDR
A new form of software transactional memory designed to support dynamic-sized data structures, and a novel non-blocking implementation of this STM that uses modular contention managers to ensure progress in practice.
...
1
2
3
...