3.3 A 14nm 1GHz FPGA with 2.5D transceiver integration

@article{Greenhill201733A1,
  title={3.3 A 14nm 1GHz FPGA with 2.5D transceiver integration},
  author={David Greenhill and Ron Ho and David M. Lewis and Herman Schmit and Kok Hong Chan and Andy Tong and Sean Atsatt and Dana How and Peter McElheny and Keith Duwel and Jeffrey Schulz and Darren Faulkner and Gopal Iyer and George Chen and Hee Kong Phoon and Han Wooi Lim and Wei-Yee Koay and Ty Garibay},
  journal={2017 IEEE International Solid-State Circuits Conference (ISSCC)},
  year={2017},
  pages={54-55}
}
  • D. Greenhill, Ron Ho, +15 authors Ty Garibay
  • Published 1 February 2017
  • Engineering, Computer Science
  • 2017 IEEE International Solid-State Circuits Conference (ISSCC)
A Field Programmable Gate Array (FPGA) family was designed to match a programmable fabric die built in 14nm process technology with 28Gb/s transceiver dice. The 2.5D packaging (Fig. 3.3.1) uses embedded interconnect bridges (EMIB) [1]. 20nm transceivers were reused enabling a transceiver roadmap independent of FPGA fabric. Fig. 3.3.2 shows a 560mm2 fabric die and six transceiver dice. The programmable fabric contains 2.8M logic elements, DSP, memory components, and routing interconnect… 
2.2 AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMO's “Rome” and “Matisse” are second-generation AMD Infinity Fabric-based SoCs using 3 unique hybrid process technology chiplets to achieve leading performance, performance/$ and performance/W,
2.3 A 220GOPS 96-Core Processor with 6 Chiplets 3D-Stacked on an Active Interposer Offering 0.6ns/mm Latency, 3Tb/s/mm2 Inter-Chiplet Interconnects and 156mW/mm2@ 82%-Peak-Efficiency DC-DC Converters
TLDR
An active interposer integrating a Switched Capacitor Voltage Regulator (SCVR) for on-chip power management, flexible system interconnect topologies between all chiplets for scalable cache coherency support, and energy-efficient 3D-plugs for dense inter-layer communication is presented.
Multi-die Integration Using Advanced Packaging Technologies
TLDR
Three new multi-die platforms which embody recent innovations in system platform integration to create new FPGA and CPU architectures which can be used to efficiently implement AI, HPC, and machine learning algorithms are presented.
Design of a cost-efficient controller for realizing a data-shift-minimized nonvolatile field-programmable gate array
TLDR
This paper proposes a cost-efficient controller for realizing data-shift-minimized MTJ-based nonvolatile FPGA and demonstrates that the hardware overhead in the proposed controller is significantly reduced by sharing the common functionality.
A 7.5-mW 10-Gb/s 16-QAM wireline transceiver with carrier synchronization and threshold calibration for mobile inter-chip communications in 16-nm FinFET
TLDR
A compact energy-efficient 16-QAM wireline transceiver with carrier synchronization and threshold calibration with Carrier synchronization algorithm to overcome nontrivial current and phase mismatches is proposed to leverage high-density fine-pitch interconnects.
A 2.1 pJ/bit, 8 Gb/s Ultra-Low Power In-Package Serial Link Featuring a Time-based Front-end and a Digital Equalizer
TLDR
An 8 Gb/s time-to-digital converter (TDC) based receiver with a time-based front-end in 65nm CMOS is specifically designed for in-package serial link applications and is digital intensive and hence highly resilient to voltage headroom and/or PVT issues.
IntAct: A 96-Core Processor With Six Chiplets 3D-Stacked on an Active Interposer With Distributed Interconnects and Integrated Power Management
TLDR
This article presents the first CMOS active interposer, integrating: 1) power management without any external components; 2) distributed interconnects enabling any chiplet-to-chiplet communication; and3) system infrastructure, design-for-test, and circuit IOs.
Enabling scalable chiplet-based uniform memory architectures with silicon photonics
TLDR
This paper proposes the use of integrated silicon-photonic (SiPh) interconnects on an organic package substrate which combines low material costs with a high IO bandwidth, distance-independent energy consumption, and low-latency point-to-point interconnection fabric to effectively overcome current interconnect and packaging limitations.
Interconnect Aware Power Optimization of Low Swing Driver for Multi-Chip Interfaces
TLDR
This paper proposes an optimization approach for interconnect aware low swing driver with a case study of source follower based architecture and shows that by using this strategy, the driver can reach an energy efficiency of 0.15 pJ/bit at 1 Gb/s data rate on 3.8 mm organic substrate interconnect.
A 1.02-pJ/b 20.83-Gb/s/Wire USR Transceiver Using CNRZ-5 in 16-nm FinFET
TLDR
Correlated non-return to zero (CNRZ) signaling with low sensitivity to inter-symbol interference (ISI) has been developed to improve the link budget and provide very good resistance against common-mode and crosstalk noise sources, allowing for dense routing.
...
1
2
3
...

References

SHOWING 1-4 OF 4 REFERENCES
The Stratix™ 10 Highly Pipelined FPGA Architecture
This paper describes architectural enhancements in the Altera Stratix? 10 HyperFlex? FPGA architecture, fabricated in the Intel 14nm FinFET process. Stratix 10 includes ubiquitous flip-flops in the
The case for registered routing switches in field programmable gate arrays
TLDR
This paper investigates architectural features that could allow us to automatically pipeline the delay associated with long routes without an excessive area penalty, and is the first study that attempts to evauate the tradeoffs associated with switches required in FPGA architectures.
Embedded Multi-die Interconnect Bridge (EMIB) -- A High Density, High Bandwidth Packaging Interconnect
The EMIB dense MCP technology is a new packaging paradigm that provides localized high density interconnects between two or more die on an organic package substrate, opening up new opportunities for
Stratix™ 10 High Performance Routable Clock Networks
TLDR
It is shown how this capability to generate customized clock trees can provide better performance through reduced clock loss while maintaining the ability to handle the large number of clock domains that modern systems require.