Computing on many cores

  title={Computing on many cores},
  author={Bernard Goossens and David Parello and Katarzyna Porada and Djallal Rahmoune},
  journal={Concurrency and Computation: Practice and Experience},
This paper presents an alternative method to parallelize programs, better suited to manycore processors than actual operating system–/API‐based approaches like OpenMP and MPI. The method relies on a parallelizing hardware and an adapted programming style. It frees and captures the instruction‐level parallelism (ILP). A many‐core design is presented in which cores are multithreaded and able to fork new threads. The programming style is based on functions. The hardware creates a concurrent thread… 
Exploring Parallelism in MiBench with Loop and Procedure Level Speculation
  • Deqing Bu, Yaobin Wang, Ling Li, Zhiqin Liu, Wenxin Yu, Manasah Musariri
  • Computer Science
    2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)
  • 2018
The experiment results prove that firstly, speculative thread level parallelism is better than instruction level parallel technology, and secondly, the best dijkstra application has a result 13.3x speedup in loop level speculation and a 29.7x speed up in procedure level.
A Many-Core Parallelizing Processor
  • Katarzyna Porada
  • Computer Science
    2017 International Conference on High Performance Computing & Simulation (HPCS)
  • 2017
A new many-core processor design to parallelize by hardware which is distributed according to the sequential order in a way which favors neighbor cores communications and simplified the processor interconnect and the memory sharing is presented.
Foreword to the Special Issue of the workshop on the seventh international workshop on programming models and applications for multicores and manycores (PMAM 2016)
This special issue is intended to collate representative research articles that were presented at the Seventh International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2016), held in conjunction with the Twenty First SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2016).


Parallel Locality and Parallelization Quality
This paper shows how a consumer can be located in the same core as its producer, improving parallel locality and parallelization quality.
Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory
STMlite is proposed, a light-weight software transactional memory model that is customized to facilitate profile-guided automatic loop parallelization and enables sequential applications to extract meaningful performance gains on commodity multicore hardware.
Programming with POSIX threads
This book offers an in-depth description of the IEEE operating system interface standard, POSIXAE (Portable Operating System Interface) threads, commonly called Pthreads, and explains basic concepts such as asynchronous programming, the lifecycle of a thread, and synchronization.
Fg-STP: Fine-Grain Single Thread Partitioning on Multicores
Fine-Grain Single-Thread Partitioning (Fg-STP), a hardware-only scheme that takes advantage of CMP designs to speedup single-threaded applications, improves single thread performance by reconfiguring two cores with the aim of collaborating on the fetching and execution of the instructions.
Disjoint out-of-order execution processor
A new architecture that uses multiple small latency-tolerant out-of-order cores to improve single-thread performance and is called Disjoint Out- of-Order Execution (DOE), which improves throughput performance by a significant amount over a large superscalar core, up to 2.5 times, when running multitasking applications.
Limits of Instruction-Level Parallelism Capture
Programming models and applications for multicores and manycores
This paper proposes an efficient CPU–GPU cooperative computing scheme for solving the subset-sum problem, which enables the full utilization of all the computing power of both CPUs and GPUs.
PerPI: A Tool to Measure Instruction Level Parallelism
PerPI is a programmer-oriented tool the function of which is to improve the understanding of how the algorithm and the (micro-) architecture will interact, and reproducible measures of the average number of instructions per cycle executed on an ideal machine are introduced.
Simultaneous multithreading: Maximizing on-chip parallelism
Simultaneous multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multi-threading, and is an attractive alternative to single-chip multiprocessors.
Limits of instruction-level parallelism
The results of simulations of 18 different test programs under 375 different models of available parallelism analysis are presented, showing how simulations based on instruction traces can model techniques at the limits of feasibility and even beyond.