Cerebras Systems announced today that it has created what it claims is the first brain-level AI solution-a single system that can support a 120 trillion parameter AI model, defeating the 100 trillion synapses that exist in the human brain. In contrast, GPU clusters most commonly used for AI workloads typically have up to 1 trillion parameters. Cerebras can achieve this industry first with a single 850,000 core system, but it can also distribute workloads to up to 192 CS-2 systems and has 162 million AI optimized cores to release more performance.
As the fastest artificial intelligence processor known to man, Brain CS-2 It is undoubtedly one of the most unique semiconductor devices on earth. With 46,225 square millimeters of silicon, 2.6 trillion transistors and 850,000 AI optimized cores, all packaged on a single wafer-sized 7nm processor, its computing power is truly unique.
However, each huge chip is embedded in a single CS-2 system, even if it has enough memory, this will limit the size of the AI model. The chip has 40 GB of on-chip SRAM memory, but adding a new external cabinet with additional memory allows the company to run larger brain-level AI models.
Scalability is also a challenge. With 20 PB of memory bandwidth and 220 PB of total fabric bandwidth, it is challenging to communicate between multiple chips using traditional techniques that share the entire workload between processors. The extreme computing power of the system also makes scalability across multiple systems particularly challenging-especially considering the chip’s 15kW power consumption. This requires customized cooling and power delivery, and it is almost impossible to pack more wafer-size chips into a system.
Cerebras’ multi-node solution takes a different approach: it stores the model parameters in the MemoryX cabinet, and at the same time stores the model on the chip. This not only allows a single system to compute larger AI models than ever before, but it also solves the typical latency and memory bandwidth issues that often limit the scalability of “smaller” processor groups such as GPUs. In addition, Cerebras stated that this technology allows the system to scale performance in a near linear fashion on up to 192 CS-2 systems.
The company uses its SwarmX Fabric to scale workloads across nodes. This interconnection consists of the company’s AI-optimized communication structure, which has Ethernet at the PHY level, but runs a custom protocol to transmit compressed and reduced data throughout the structure. Each SwarmX switch supports up to 32 Cerebras CS-2 systems and provides nearly 1TB of bandwidth for each node.
These switches connect the system to the MemoryX box, which has a memory capacity of 4TB to 2.4PB. The memory mixes flash memory and DRAM, but the company has not yet shared the ratio of flash memory to DRAM. This single box can store up to 120 trillion in weight, and there are “a few” x86 processors to run the system’s software and data plane.
Of course, only a few hundred customers in the world can use such a system, but Cerebras’ goal is to simplify the operation of AI models, which can easily exceed the size of any existing model. Many of these customers may include the military and intelligence communities, and they can use these systems for multiple purposes, including nuclear modeling, but Cerebras cannot disclose several of its customers (for obvious reasons). We do know that the company worked with Argonne National Laboratory, which commented on the new system:
“The past few years have shown us that for NLP models, insight is directly proportional to the parameters—the more parameters, the better the results,” said Rick Stevens, deputy director of Argonne National Laboratory. “Cerebras’ invention will increase the parameter capacity by 100 times and may have the potential to change the industry. For the first time, we will be able to explore models of the size of the brain, opening up broad new avenues for research and insights.”