AMD continues to cutting edge Exascale supercomputer support
AMD is building Frontier, the fastest supercomputer in the world, Which will provide exascale performance for the Oak Ridge National Laboratory (ORNL) in the United States. Supercomputers have brought many new technologies, and AMD is laying the foundation for the software stack to enable Frontier to run smoothly.According to reports Fronix, This work continues in the form of newly submitted Linux kernel patches.
The Frontier supercomputer is a $600 million project designed to provide more than 1.5 ExaFLOPs of computing power, which ORNL will use to carry out various government projects. The system uses AMD’s next-generation EPYC processor and Radeon Instinct graphics card to combine novel memory, storage and processing elements into one system.
According to the Linux kernel patch submitted by AMD today, “AMD is building a system architecture for the Frontier supercomputer to establish a coherent interconnection between the CPU and the GPU. This hardware architecture allows the CPU to access the GPU device memory coherently. In our laboratory With hardware, we are working with our partner HPE to develop BIOS, firmware and software for delivery to DOE.”
This is in stark contrast to Intel’s Aurora, which is expected to be the first supercomputer in the United States Announcement time. However, the system has Now postponed to the 2022-2023 time frame, Which means that Frontier powered by AMD will not only become the world’s fastest tens of billions of computers, but also the first.
Subsequent code work continues. Back to May, AMD started Work to ensure proper support for Frontier’s cutting-edge storage subsystems. Frontier involves one of the first large-scale deployments with GPU-to-CPU memory consistency, which will require additional code work and qualifications. As you can see in the patch notes below, today’s work improves Frontier’s memory management capabilities.
“The system BIOS advertises GPU device memory (aka VRAM) as SPM (dedicated memory) in the UEFI system address map. The amdgpu driver uses devm_memremap_pages to register the memory with devmap as MEMORY_DEVICE_PUBLIC. This patch series adds MEMORY_DEVICE_PUBLIC, similar to MEMORY_DEVICE_GENERIC can be mapped to CPU access, but support for migrating this memory is added, similar to MEMORY_DEVICE_PRIVATE.”
“We also included and updated two patches from Ralph Campbell (Nvidia), which changed the ZONE_DEVICE reference count as required in the previous comments of this patch series (see https://patchwork.freedesktop.org/series/90706/). Finally, we extended hmm_test to cover the migration of MEMORY_DEVICE_PUBLIC. This work is based on HMM and our SVM memory manager that recently landed on Linux 5.14. “