According to reports, Baidu divested its semiconductor design business into an independent company in June Data center dynamics. Kunlun Chip Technology Company is valued at approximately US$2 billion
Kunlun Chip, a wholly-owned subsidiary of Chinese high-tech giant Baidu, Said this week It has begun mass production of Kunlun II processors for artificial intelligence applications. The new AI chip is based on the second-generation XPU micro-architecture, manufactured using 7-nanometer process technology, and is expected to provide two to three times higher performance than its predecessor.
The first-generation Kunlun K200 processor released three years ago is designed for cloud, edge, and autonomous vehicle applications, providing approximately 256 INT8 TOPS performance, approximately 64 TOPS INT/FP16 performance, and 16 INT/FP32 TOPS performance at 150 watt-hours .
If it is correct to say that the performance of Kunlun second-generation chips is 2-3 times higher than the previous generation, then the new chip can provide 512 to 768 INT8 TOPS, 128-192 INT/FP16 TOPS and 32-48 INT/FP32 TOPS throughput. In contrast, Nvidia’s A100 offers 19.5 FP32 TFLOPS and 624/1248 (sparse) INT8 TOPS. In terms of numbers, at least in terms of artificial intelligence computing, Kunlun II can compete with processors such as NVIDIA’s A100.
Baidu Kunlun II’s Relative Performance
|Baidu Kunlun||Baidu Kunlun II||NVIDIA A100|
|INT8||256 Top||512 ~ 768 Top||624/1248* Top|
|INT / FP16||64 Tops||128 ~ 192 Top||312/624* TFLOPS (bfloat16/FP16 tensor)|
|Tensor float 32 (TF32)||——||——||156/312* TFLOPS|
|INT/FP32||16 likes||32 ~ 48 likes||19.5 TFLOPS|
|FP64 tensor core||——||——||19.5 TFLOPS|
Although it is not always a good thing to compare different artificial intelligence platforms in terms of peak performance, because a large part of it depends on the software, these numbers can give us an idea of the capabilities of the second-generation Kunlun artificial intelligence processors.
Baidu started its Kunlun AI chip project in 2011. Initially, the company used FPGA to research and simulate its multi-core XPU microarchitecture, but in 2018, the company finally built a dedicated chip built using one of Samsung’s foundries. 14nm manufacturing process (probably 14LPP). The chip is equipped with two 8GB HBM memory packages, which can provide a peak bandwidth of 512GB/s.
Currently, the first generation of Kunlun has been used in Baidu’s cloud data center, the company’s Apollon self-driving car platform, and many other applications.