AMD’s Instinct MI200 Exascale GPU: 128GB HBM2E

[ad_1]
AMD will not talk about its next-generation Instinct MI200 computing GPU in a few months, but its Linux patch continues to disclose the new features and functions of this upcoming product.It turns out that it’s set up for Frontier Exascale Supercomputer (To be delivered this year) There will be a fairly large memory subsystem that supports up to 128GB of HBM2E DRAM.
we have already known AMD’s Instinct MI200 computing GPU codenamed Aldebaran, based on cDNA 2 architecture Two chips are used in a single chip package using AMD Infinity high-performance interconnect. One of AMD’s latest Linux patch Its AMD64 EDAC driver (located on the system DRAM ECC) reveals the memory architecture of Instinct MI200, the report Fronix.
It turns out that each Aldebaran chip has four unified memory controllers (UMC). Each UMC supports 8 memory channels, and each channel is connected to 2GB of Gen 2 high-bandwidth memory (HBM2). Although AMD’s description of the Aldebaran memory subsystem is quite detailed, it may actually cause some confusion, so let’s try to explain it.
The HBM2 stack supports a 1024-bit wide interface, commonly referred to as an HBM2 channel. However, internally, the HBM2 stack consists of two, four, or eight DDR DRAM devices, each with two 128-bit channels on the basic logic chip. Essentially, the HBM stack supports up to eight 128-bit channels on its 1024-bit interface.
It is not entirely clear what AMD means by channels, but it seems likely to refer to 8 128-bit DDR channels in a 1024-bit HBM2 stack. Essentially, this means that each UMC of Aldebaran can be connected to four HBM2 stacks through a 4096-bit memory interface.Each channel addresses 2GB of memory, one chip can address up to 64GB of memory, and two chips can handle up to 128GB RAMThe actual bandwidth of Aldebaran’s memory subsystem is unknown, but assuming AMD uses SK Hynix’s latest 3.6 Gbps HBM2 stack, its memory subsystem will provide up to 3.64TB/s bandwidth.
Computing GPU must use ECC for memory, so part of its memory bandwidth and capacity is used for error correction. For this reason, not all of the 128GB of Instinct MI200’s memory is actually available for use by applications.
AMD has not yet commented on Aldebaran.
AMD’s official description of the Aldebaran memory subsystem is as follows:
Aldebaran has 2 Dies (enumeration is MCx, x= 8 ~ 15)
Each Die has 4 UMCs (enumeration is csrowx, x=0~3)
Each die has 2 root ports, and each root has 4 misc ports.
Each UMC manages 8 UMC channels, and each channel is connected to 2GB of HBM memory.
[ad_2]