If Samsung has its own way, the memory chips in future desktop computers, laptops or GPUs can think for themselves. On Hot Chips 33, Samsung announced that in addition to its HBM2 chip, it will expand its memory processing technology to DDDR4 modules, GDDR6 and LPDDR5X. Earlier this year, Samsung released HBM2 memory with an integrated processor that can calculate up to 1.2 TFLOPS for AI workloads, allowing the memory itself to perform operations normally reserved for CPU, GPU, ASIC or FPGA. Today marks greater progress in the chip, but Samsung also has more powerful variants on its roadmap for next-generation HBM3. Given the rise of AI-based rendering technologies (such as upgrades), we can even see this technology play a role in game GPUs.
Today’s announcement revealed the official brand of Aquabolt-XL HBM2 memory, as well as AXDIMM DDR4 memory modules and LPDDR5 memory that also have embedded computing capabilities.We covered The details of the first HBM-PIM (Processing in Memory) chip are hereIn short, these chips have an AI engine injected into each DRAM group. This allows the memory itself to process the data, which means that the system does not have to move data between the memory and the processor, saving time and power. Of course, there is a capacity trade-off for the current memory type technology, but Samsung said that HBM3 and future memory will have the same capacity as ordinary memory chips.
Samsung’s Aquabolt-XL HBM-PIM plugs directly into the company’s product stack and works with JEDEC-compliant standard HBM2 memory controllers, so it is a direct replacement for standard HBM2 memory. Samsung recently demonstrated this by swapping its HBM2 memory to a standard Xilinx Alveo FPGA without modifying the card, increasing system performance by 2.5 times and reducing energy consumption by 62%.
Although Samsung’s PIM technology is already compatible with any standard memory controller, the enhanced support of the CPU vendor will bring higher performance in some cases (for example, it does not require as many threads to make full use of the processing elements). Samsung told us that it is testing HBM2-PIM with an unnamed CPU supplier for use in its future products. Of course, this could be any number of potential manufacturers, whether in x86 or Arm-Intel’s Sapphire Rapids, AMD’s Genoa and Arm’s Neoverse platforms all support HBM memory (and so on).
Naturally, Samsung’s PIM technology is very suitable for data centers, mainly because it is very suitable for memory-constrained AI workloads that do not have a lot of computing, such as speech recognition. Nevertheless, the company also envisages that the technology will also shift to a more standard climate. To this end, the company also demonstrated its AXDIMM, a new accelerated DIMM prototype that can perform processing in a buffer chip. Like the HBM2 chip, it can use standard TensorFlow and Python code to perform FP16 processing, although Samsung is working hard to extend support to other types of software. Samsung said that this DIMM type can be put into any DDR4-equipped server equipped with LRDIMM or UDIMM, and we expect DDR5 support to follow up in due course.
The company stated that its tests (conducted on Facebook AI workloads) found that using the Level 2 kit, performance increased by 1.8 times, energy consumption was reduced by 42.6%, and tail latency was reduced by 70%, all of which are very impressive -Especially considering that Samsung inserts DIMMs into standard servers without modification. Samsung has already tested this in customer servers, so we can expect this technology to be introduced to the market in the near future.
Samsung’s PIM technology can be transferred to any of its memory processes or products, so it has even begun to experiment with PIM memory in LPDDR5 chips, which means that the technology can be applied to laptops, tablets and even mobile phones in the future. Samsung is still in the simulation stage of this technology. Nevertheless, its tests on the analog LPDDR5X-6400 chip show that the performance of speech recognition workloads has increased by 2.3 times, the translation performance based on converters has increased by 1.8 times, and the GPT-2 text generation capability has increased by 2.4 times. These performance improvements are accompanied by a reduction of 3.85 times, 2.17 times and 4.35 times in power consumption, respectively.
This technology is developing rapidly and can be used with standard memory controllers and existing infrastructure, but it has not yet been certified by the JEDEC Standards Committee. This is a key obstacle that Samsung needs to overcome before it can be widely adopted. However, the company hopes that the original PIM specification will be accepted by the HBM3 standard later this year.
Speaking of HBM3, Samsung said it will move forward from FP16 SIMD processing in HBM2 to FP64 in HBM3, which means that the chip will have extended functions. FP16 and FP32 will be reserved for data center use, while INT8 and INT16 will serve the LPDDR5, DDR5 and GDDR6 segments.
In addition, if you want the computing power of HBM2 PIM, you will lose half of the capacity of the 8GB chip, but there will be no such capacity trade-off in the future: regardless of the computing power, the chip will have the full standard capacity.
Samsung will also introduce this feature to other types of memory, such as GDDR6, and broaden the range of possible applications. CXL support may also be coming soon. Samsung said its Aquabolt-XL HBM2 chip is now available for purchase and integration, and its other products are already playing a role in the development pipeline.
Who knows, with the rise of AI-based upgrades and rendering technology, this technology may be more game-changing for enthusiasts than we see on the surface. In the future, it is possible that GPU memory can handle some computational workloads to improve GPU performance and reduce energy consumption.