El Capitan Installation Begins: First APU-based Exascale System Shaping Up For 2024
by Anton Shilov on July 6, 2023 8:30 AM EST- Posted in
- Supercomputers
- AMD
- AMD Instinct
- El Capitan
- MI300
Lawrence Livermore National Laboratory had received the first components of its upcoming El Capitan supercomputer and begun to install them, the laboratory announced on Wednesday. The system is set to come online in mid-2024 and is expected to deliver performance of over 2 ExaFLOPS.
LLML's El Capitan is based on Cray's Shasta supercomputer architecture and will be built by HPE, just like two other exascale systems in the U.S., Frontier and Aurora. Unlike the first two exascale machines, which use a traditional discrete CPU plus discrete GPU configuration, the El Capitan supercomputer will be the first one based on AMD server-grade APUs that integrate both processor types in to a single, highly connected package.
AMD's Instinct MI300A APU incorporates both CPU and GPU chiplets, offering 24 general-purpose Zen 4 cores, compute GPUs powered by the CDNA 3 architecture, and 128 GB of unified on-package HBM3 memory. AMD has been internally evaluating its Instinct MI300A APU for months, and it appears that AMD and HPE are now ready to start installing the first pieces of hardware that make up El Capitan.
According to pictures released by the Lawrence Livermore National Laboratory, its engineers have already put a substantial number of servers into racks. Though LLNL's announcement leaves it unclear whether these are "completed" servers with production-quality silicon, or pre-production servers that will be filled out with production silicon at a later date. Notably, parts of Aurora were initially assembled with pre-production CPUs, which were only swapped out for Xeon CPU Max chips over the past couple of months. Given the amount of validation work required to stand-up a world-class supercomputer, AMD and HPE may be employing a similar strategy here.
"We have begun receiving & installing components for El Capitan, first #exascale #supercomputer," a Tweet by LLNL reads. "While we are still a ways from deploying it for national security purposes in 2024, it is exciting to see years of work becoming reality."
When it comes online in 2024, LLNL is expecting El Capitan to be the fastest supercomputer in the world. Though with its full specifications still being held back, it's not clear how much faster it is on paper compared to the 2 EFLOPS Aurora – let alone real-world performance. Part of the design goal of AMD's MI300A APU is to exploit additional performance efficiency gains that come from placing CPU and GPU blocks so close together, so it will be interesting to see what the software development teams programming for El Capitan can achieve, especially as they get their software further optimized.
LLNL's El Capitan is expected to cost $600 million. The system will be used nuclear weapons simulations and will be crucial for the U.S. national security. It replaces Sierra, a supercomputer based on IBM Power 9 and NVIDIA Volta accelerators, and promises to offer performance that is 16 times higher.
Source: LLNL Twtter
18 Comments
View All Comments
lemurbutton - Thursday, July 6, 2023 - link
As a tax payer, they should have picked Nvidia H100 + Grace combination. It'd be more useful.Yojimbo - Thursday, July 6, 2023 - link
There are a few reasons it wasn't. Number one is that AMD offered a huge discount on Frontier and El Capitan pricing. Nvidia also hasn't been willing to share information on future features because they are more worried about being copied in the commercial space than winning DOE contracts. But yeah, the DOE probably should have used Nvidia for at least one of the three exascale systems. But the timing due to the supposed "race to exascale" made it difficult. El Capitan is a specialized supercomputer running a specific set of code bases, though, and it's possible they really wanted the extra FP64 throughput that the AMD architecture gives them. Hopper is set up for comparatively more AI performance and less FP64 performance. Aurora was already contracted for (from Intel), and Frontier was promised by Cray and AMD earlier than it was eventually delivered, long before Hopper, let alone Grace, was set to be delivered (and the US wanted to be "first to exascale", which they weren't, anyway, with the delays to Frontier and Aurora and which never really mattered because the Chinese decided not to list their machines for whatever reason. Also, I believe one thing the DOE really wanted for Frontier was coherency between the CPUs and GPUs, which couldn't be provided with an Nvidia GPU after IBM dropped out of the bidding and before Grace was available.Curiousland - Thursday, July 6, 2023 - link
As a tax payer, you don't want them to waste money, and that's why after intensive study of proposals they skipped H100 with lower performance/cost.lemurbutton - Thursday, July 6, 2023 - link
That's poor logic. There's a reason why Nvidia is a $1 trillion company now. Their hardware is irreplaceable and simply better than AMD and Intel's.Just leave it to the government for wasting money on AMD hardware. In the business world, where performance/cost/utility actually matter, Nvidia is picked over and over again.
turtile - Thursday, July 6, 2023 - link
The performance numbers for the MI300A haven't been disclosed yet you know that the Nvidia product is better?Nvidia is chosen because of their software. However, this system is written with custom code so it doesn't matter. AMD's performance is probably compatible at a lower price.
These government contracts are meant to help businesses to compete so that we don't have monopolies that result in overpriced goods with little innovation.
duploxxx - Friday, July 7, 2023 - link
You just live on the AI hype. Some companies I sky rocketing on stock and will hard fall down soon enough when revenue are far of target.Dante Verizon - Saturday, July 8, 2023 - link
It's not better. Only the software ecosystem that is already mature...Zoolook - Saturday, July 8, 2023 - link
More like not enough knowledge and understanding on your part or pure Nvidia bias, the logic choice was AMD.lemurbutton - Monday, July 10, 2023 - link
Businesses don't have Nvidia bias. They buy what they think has the best value.Eliadbu - Monday, July 10, 2023 - link
AMD hardware is very capable, but what alway has and still does hold them back is software and complete package. Nvidia for years built very effective CUDA and cuDNN libraries, SDKs and other software. Add that to complete systems like DGX and it makes a lot of sense for businesses to buy from nvidia - as a business you want something that is closest as can be to be doing work for you. You don't want to buy something that you need to create the entire software stack for it. As a government entity depending on what they do it might not be so necessary.