ATS-5, the fifth Advanced Technology System (ATS) in the Advanced Simulation and Computing (ASC) program, will be critical to the success of these NNSA missions. ATS-5 will support current and future simulation codes and will tackle some of the largest-scale 3D simulation workloads in support of the stockpile stewardship mission. These large-scale simulations are known as “hero”-class simulations, and ATS-5 will reduce these hero simulations’ time-to-completion from months to days. Additionally, ATS-5 will provide the ability to run multiple of these hero-class simulations simultaneously. Vastly reduced time-to-solutions, combined with the ability to run multiple large-scale simulations, will dramatically improve NNSA’s ability and agility to manage the stockpile.
In 2027, Crossroads (ATS-3) will be nearing the end of its useful lifetime. ATS‑5 will replace Crossroads and will be the first NNSA HPC system in the post-exascale era providing a large portion of the simulation resources for the NNSA ASC tri-lab community of Lawrence Livermore National Laboratory (LLNL), Los Alamos National Laboratory (LANL), and Sandia National Laboratories (SNL).
ATS‑5 will support a diverse set of applications spanning the analysis of manufacturing defects, changing material properties as a result of the use of new materials with both current and new manufacturing techniques, the analysis of the impact of combined environments, aging defects, and potentially alternate delivery vehicles. New and emerging workloads, including digital engineering and machine learning (ML), will also be enabled by ATS‑5.
Numerous architectural advancements in ATS‑5 system design will support the NNSA Office of Defense Programs mission needs. ATS-5 will be designed with the following architectural advancements and major project goals:
- Overcoming the memory wall—continued memory bandwidth performance improvements for tri-lab applications
- Improved efficiency—programmer productivity, energy usage, and increased processor utilization
- Architectural diversity—ensuring that the high-performance computing ecosystem remains vibrant with multiple advanced technology solutions
- Time-to-solution—advancing strong scaling improvements to tackle the most pressing challenge of major improvements in time-to-solution for NNSA’s largest and most complex stockpile simulations
Achieving these goals will enable weapons designers, analysts, and computational scientists to make more routine use of today’s hero-class simulations in support of the stockpile stewardship certification and assessments to ensure that the Nation’s nuclear stockpile is safe, secure, and reliable.
ATS-5 Timeline
- Draft Technical Specification - September 2023
- RFP release – September 3rd, 2024
- Award NRE/Build subcontract - May 2025
- System Delivery - late 2026/early 2027
- Acceptance - August/September 2027
Triad National Security, LLC (Triad) which manages and operates Los Alamos National Laboratory (LANL), is planning to release a Request for Proposal (RFP) for NNSA’s next generation Advanced Technology System (ATS), ATS-5, to be delivered in the 2027 time frame.
As part of Triad’s RFP release in September 2024, a Technical Requirements document (current version 1.0, dated September 3rd, 2024) was included therein and below. The following technical documents are posted in the links below and may be subject to change.
- RFP Technical Requirements Document
- ATS-5 Offeror Guidance (pdf)
- SSNI Example (xls)
Assuring that real applications perform efficiently on ATS-5 is key to their success. A suite of benchmarks have been developed for Request For Proposal (RFP) response evaluation and system acceptance. These codes are representative of the workloads of the NNSA laboratories.
The benchmarks contained within this site represent a pre-RFP draft state. Over the next few months the benchmarks will change somewhat. While we expect most of the changes will be additions and modifications it is possible that we will remove benchmarks prior to RFP.
To use these benchmarks, please refer to the ATS-5 benchmark documentation as well as the benchmark code repository.
Benchmark changes from Crossroads
The key differences from Crossroads benchmarks and ATS-5 benchmarks are as summarized below:
Crossroads |
ATS-5 |
Notes |
---|---|---|
Few GPU-ready benchmarks |
All proxy benchmarks have
GPU implementations.
| |
System level performance metric:
Scalable System Improvement
geometric mean of app FOMs.
Use of single node benchmarks
for RFP.
|
Multi-node benchmarking for
system acceptance based on
RFP benchmarks, negotiated
with vendor as part of SOW.
|
Attempting to limit multi-node
benchmarking for RFP
to communication (MPI), and
IO (IOR). Expect responses to
include multiple node
configurations and ability to
compose them to meet our needs
in a codesign partnership.
Will use scaled single node
improvement to assess proposals
(along with other factors) and
SSI for acceptance.
|
Mini-Apps + full scale apps
some of which were export
controlled.
|
Mini-apps only - all open
source.
| |
No Machine Learning. |
ML training and inference
included.
|
Focuses on material science
workloads of relevance.
|
Benchmark Overview
Benchmark |
Description |
Language |
Parallelism |
---|---|---|---|
Branson |
Implicit Monte Carlo transport |
C++ |
MPI + Cuda/HIP |
AMG2023 |
AMG solver of sparse matrices
using Hypre
|
C |
MPI+CUDA/HIP/SYCL
OpenMP on CPU
|
MiniEM |
Electro-Magnetics solver |
C++ |
MPI+Kokkos |
MLMD |
ML Training of interatomic
potential model using HIPYNN
on VASP Simulation data.
ML inference using LAMMPS,
Kokkos, and HIPYNN trained
interatomic potential model.
|
Python, C++, C |
MPI+Cuda/HIP |
Parthenon-VIBE |
Block structured AMR proxy using
the Parthenon framework.
|
C++ |
MPI+Kokkos |
Sparta |
Direct Simulation Monte Carlo |
C++ |
MPI+Kokkos |
UMT |
Deterministic (Sn) transport |
Fortran |
MPI+OpenMP and
OpenMP Offload
|
Microbenchmark Overview
Benchmark |
Description |
Language |
Parallelism |
Multi-node |
---|---|---|---|---|
Stream |
Streaming memory bandwidth test |
C/Fortran |
OpenMP |
No |
Spatter |
Sparse memory bandwidth test
driven by application memory
access patterns.
|
C++ |
MPI+OpenMP/
CUDA/OpenCL
|
No |
OSU MPI +
Sandia SMB
message rate
|
MPI Performance Benchmarks |
C++ |
MPI |
Yes |
DGEMM |
Single node floating-point
performance on matrix multiply.
|
C/Fortran |
Various |
No |
DAXPY |
Single node floating-point
performance of a scaled vector
plus a vector.
|
C/Fortran |
Various |
No |
IOR |
Performance testing of parallel
file system using various
interfaces and access patterns.
|
C |
MPI |
No |
mdtest |
Metadata benchmark that performs
open/stat/close operations on
files and directories.
|
C |
MPI |
Yes |
Offerors shall submit any written questions related to the RFP or Technical Requirements through the Ariba system. Questions will be sanitized to maintain anonymity and posted with corresponding answers below. Questions and answers will be grouped and posted by week. These questions must be submitted through the Ariba system no later than 7 calendar days prior to the proposal due date.
Questions & Answers, 09-05-2024