Trinity supercomputer, August 2015 Trinity: Advanced Technology System

Advancing Predictive Capability for Stockpile Stewardship

The Trinity supercomputer is designed to provide increased computational capability for the NNSA Nuclear Security Enterprise in support of ever-demanding workloads, e.g., increasing geometric and physics fidelities while maintaining expectations for total time to solution. The capabilities of Trinity are required for supporting the NNSA Stockpile Stewardship program’s certification and assessments to ensure that the nation’s nuclear stockpile is safe, reliable, and secure.

The Trinity project is managed and operated by Los Alamos National Laboratory and Sandia National Laboratories under the Alliance for Computing at Extreme Scale (ACES) partnership. The system is physically located in Los Alamos at the Nicholas Metropolis Center for Modeling and Simulation.

The first instantiation of an AT system

The National Nuclear Security Administration (NNSA) Office of Advanced Simulation and Computing (ASC) is faced with significant challenges by ongoing technology advancements and must continue to meet the mission needs of the current applications while also adapting to computing technology revolutionary and evolutionary changes. ASC recognizes that the simulation environment of the future will be transformed with new computing architectures and new programming models and has established the development and deployment of a series of Advanced Technology (AT) systems. The ASC roadmap states, “Work in this timeframe will establish the technological foundation to build toward exascale computing environments, which predictive capability may demand.” It is critical for ASC to both explore the rapidly changing technology of future systems and provide platforms with more capability and higher performance for predictive capability. Trinity is the first instantiation of an AT system and will achieve a balance between usability of the current simulation codes while also allowing adaptation to new computing technologies and programming methodologies.

The Trinity supercomputer is provided by Cray, Inc. and is based on its XC40 platform architecture. Trinity is a mixture of Intel Xeon (Haswell) & Intel Xeon Phi (Knights Landing) processors. The Haswell partition provides a natural transition path for many of the legacy codes running on the Cielo supercomputer, Trinity’s predecessor. In order to effectively use the Knights Landing processor to its full potential, the ASC code teams must expose higher levels of thread- and vector-level parallelism than has been necessary for the traditional multicore architectures. To help facilitate this transition, the Trinity Center of Excellence was established, with staff from the ASC tri-labs, Cray, and Intel.

Trinity introduces tightly integrated, nonvolatile “burst buffer” storage capabilities. Embedded within the high-speed fabric are nodes with attached solid-state disk drives. The burst buffer capability will allow for accelerated checkpoint/restart performance and relieve much of the pressure normally loaded on the back-end storage arrays. In addition, the burst buffer will support novel new workload management strategies such as in-situ analysis, which opens a whole space in which projects can manage their overall workflows.

Trinity also introduces advanced power management functionality that allows monitoring and control of power consumption at the system, application and component levels. Although advanced power management is not needed for the current power and operational budget, its functionality is being used to gain a better understanding for future system requirements and features.

Technical Specifications

Trinity High-level Technical Specifications

Operational Lifetime

2015 to 2020

Capability

8x improvement over Cielo in fidelity, physics, and performance capabilities

Architecture

Cray XC40

Memory capacity

2.07 PiB

Peak performance

41.5 PF/s

Number of compute nodes

19,420

Processor architecture

Intel Xeon (Haswell) & Intel Xeon Phi (Knights Landing)

Parallel file system capacity (usable)

78 PB 

Parallel file system bandwidth (sustained)

1.45 TB/s

Burst buffer storage capacity (usable)

3.7 PB 

Burst buffer bandwidth (sustained)

3.3 TB/s

Footprint

3773 sq ft (system) and 832 sq ft (storage)

Power requirement

8.6 MW

Resources