Cerebras Systems, a prominent player in the chip industry, recently unveiled their latest innovation, the “Wafer Scale Engine 3” (WSE-3), a groundbreaking achievement and the world’s largest semiconductor chip. This third-generation AI chip is designed for training AI models, offering impressive performance enhancements over its predecessor, the WSE-2. CEO Andrew Feldman highlighted that the new chip doubles the rate of instructions, boasting significant improvements in performance while maintaining power efficiency and cost-effectiveness, echoing the essence of Moore’s Law in chip evolution.
The WSE-3’s colossal size, almost equivalent to a 12-inch wafer, houses an increased transistor count of 4 trillion, up from 2.6 trillion in the WSE-2, resulting in enhanced processing capabilities. Notably, Cerebras has maintained a delicate balance between logic transistors and memory circuits, slightly increasing on-chip memory content to 44GB and compute cores to 900,000. The chip’s innovative design, including larger compute cores and improved SIMD capabilities, contributes to its exceptional performance metrics.
Cerebras’ WSE-3 comes packaged in the CS-3 computer, which can now cluster up to 2,048 machines, providing an impressive combined computing power of 256 exaFLOPS. Noteworthy is the chip’s seamless integration within the CS-3 system, enabling efficient and advanced AI model training capabilities far beyond traditional computing architectures.
The comparison between the WSE-3 and Nvidia’s H100 GPU reveals the significant technological gap, with Cerebras’ chip outperforming Nvidia’s offering in size, core count, memory capacity, and bandwidth by a substantial margin. Additionally, Cerebras’ chip excels in simplifying programming tasks compared to GPUs, as demonstrated by the reduced lines of code required for training tasks, highlighting its ease of use and efficiency.
Feldman’s announcement of a partnership with Qualcomm signals Cerebras’ continued commitment to AI innovation, aiming to reduce inference costs through cutting-edge techniques like sparsity, speculative decoding, and network architecture search. These methods enhance processing efficiency and reduce computational expenses, ensuring that the transition from training to production inference is seamless and cost-effective.