Knights Landing: Second-Generation Intel Xeon Phi Product

@article{Sodani2016KnightsLS,
  title={Knights Landing: Second-Generation Intel Xeon Phi Product},
  author={Avinash Sodani and Roger Gramunt and Jes{\'u}s Corbal and Ho-Seop Kim and Krishna Vinod and Sundaram Chinthamani and Steven Hutsell and Rajat Agarwal and Yen-Chen Liu},
  journal={IEEE Micro},
  year={2016},
  volume={36},
  pages={34-46}
}
This article describes the architecture of Knights Landing, the second-generation Intel Xeon Phi product family, which targets high-performance computing and other highly parallel workloads. It provides a significant increase in scalar and vector performance and a big boost in memory bandwidth compared to the prior generation, called Knights Corner. Knights Landing is a self-booting, standard CPU that is completely binary compatible with prior Intel Xeon processors and is capable of running all… 
Performance comparison of Intel Xeon Phi Knights Landing
The Intel Xeon Phi is a many-core processor with a theoretical peak performance of over 3 TFLOP/s of double precision. We contrast the performance of the second-generation Intel Xeon Phi, code-named
Early Experience on Using Knights Landing Processors for Lattice Boltzmann Applications
TLDR
In the OpenMP code this work considers several memory data-layouts that meet the conflicting computing requirements of distinct parts of the application, and sustain a large fraction of peak performance, and makes some performance comparisons with other processors and accelerators.
Performance of Popular HPC Applications on the Intel Knights Landing Platform
We present direct performance measurements for six popular HPC applications in the Knights Landing (KNL) platform. Performance numbers for Sandy Bridge and Haswell processors are provided for
Evaluating the Intel Skylake Xeon Processor for HPC Workloads
TLDR
Together, the new hardware functions provide up to 1.8x speedup on HPC benchmark codes when compared with the previous generation Haswell processor core, providing much greater utility to a broader range of HPC applications that rely on this class of compute node.
Scalability of Hybrid SpMV on Intel Xeon Phi Knights Landing
  • Brian A. Page, P. Kogge
  • Computer Science
    2019 International Conference on High Performance Computing & Simulation (HPCS)
  • 2019
TLDR
This study develops and evaluates a hybrid implementation for strong scaling of the Compressed Vectorization-oriented sparse Row (CVR) approach to SpMV on a cluster of Intel Xeon Phi Knights Landing (KNL) processors and shows how this implementation achieves increased computational performance, yet does not address the dominant communication overhead factor at extreme scale.
Kernel-Assisted Communication Engine for MPI on Emerging Manycore Processors
TLDR
The experimental evaluation shows that the proposed designs provide up to 2.5X improvement at the microbenchmark-level and improve the total execution time of the MPI+OpenMP version of HPCG by up to 15% when compared with other approaches.
Evaluation of Intel Omni-Path on the Intel Knights Landing Processor
TLDR
This paper presents a set of studies that investigate the effectiveness of system comprised of this processor and network and can be used as guidelines for a better exploitation of these resources on production systems.
Empirical Analysis of the I/O Characteristics of a Highly Integrated Many-Core Processor
  • Cheongjun Lee, J. Lee, +4 authors Hyeonsang Eom
  • Computer Science
    2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)
  • 2020
TLDR
It is determined that KNL has a bottleneck in its buffered write operation that utilizes page cache, and the characteristics on KNL’s I/O performance involving the performance bottlenecks are discussed.
Performance Characterization of Parallel Discrete Event Simulation on Knights Landing Processor
TLDR
It is shown that in most cases the performance of ROSS scales well with the best results achieved when thread affinity is assigned, CPU cores are evenly loaded, cache sharing is exploited and communication is limited to small clusters of cores.
Evaluating the Impact of High-Bandwidth Memory on MPI Communications
  • Giuseppe Congiu, P. Balaji
  • Computer Science
    2018 IEEE 4th International Conference on Computer and Communications (ICCC)
  • 2018
TLDR
A fine-grained evaluation of HBM usage in MPI using Knights Landing Multi-Channel DRAM shows that although MCDRAM can improve MPI communication performance, this improvement comes at the cost of higher memory usage.
...
1
2
3
4
5
...

References

SHOWING 1-2 OF 2 REFERENCES
Through-Silicon Via (TSV)
  • M. Motoyoshi
  • Engineering, Computer Science
    Proceedings of the IEEE
  • 2009
TLDR
The current and future 3D-LSI technologies with through-silicon via (TSV) have the simplest structure and is expected to realize a high-performance, high-functionality, and high-density LSI cube.
Intel Xeon Phi X100 Family Coprocessor—The Architecture, white
  • 2012