The US Army has procured a $12m supercomputer for users from all of the services and agencies of the Department of Defense.
The IBM system is housed in a shipping container with on-board uninterruptible power supply, chilled water cooling, and fire suppression systems.
It will be deployed at the US Army Combat Capabilities Developmental Command Army Research Laboratory DoD Supercomputing Resource Center at Aberdeen Proving Ground, Maryland, later this year.
Box it up
"This is a singularly important achievement for its end-users," Sally Parsons, Huntsville Center Information Technology Systems Division chief, said.
"Because of the sensitive nature of the work involved, we here at Huntsville Center will never know exactly the solutions this unique tool will provide in the field, but I am quite confident that our work will result in both lives saved and problems avoided."
Capable of six petaflops of single precision performance, the system consists of:
- 22 nodes for machine learning training workloads, each with two IBM Power9 processors, 512GB of system memory, 8 Nvidia V100 GPUs with 32GB of high-bandwidth memory, and 15TB of local solid-state storage
- 128 nodes for inferencing workloads, each with two IBM Power9 processors, 256GB of system memory, 4 Nvidia T4 GPUs with 16GB of high-bandwidth memory, and 4TB of local solid state storage
- Three solid-state parallel file systems, totaling 1.3 PB
- A 100 Gigabit per second InfiniBand network, as well as dual 10 gigabit Ethernet networks
- Platform LSF HPC job scheduling integrated with a Kubernetes container orchestration solution
- Integrated support for TensorFlow, PyTorch, Caffe, in addition to traditional HPC libraries and toolsets including FFTW and Dakota
"The system brings a significant capability to support militarily significant use-cases that were not possible with supercomputers installed in fixed facilities," the DoD said.
The HPC-in-a-Container is designed to be deployable to the tactical edge; with deployment opportunities to remote locations "currently being explored and evaluated."
Requiring external power, outside of the temporary UPS power, it is unlikely that the system would be deployed on a battlefield itself, but could be placed near to the theater of war.
The US maintains hundreds of military bases around the world (800 across more than 70 countries as of 2015), which could serve as a home for such facilities.
In England, the US plans to deploy a scalable modular data center at a base in Suffolk next year, while the NSA has operated a 10,000 sq ft Tier III facility in North Yorkshire for some time. Over in the Middle East, the Defense Information Systems Agency (DISA) operates a data center in Bahrain - its scale and scope is unknown, but last year the facility suffered layoffs as part of a wider DISA consolidation effort.
Elsewhere in the region, we broke the news earlier this year that the US Army Contracting Command is conducting market research into requirements for building a Network Operations Center in Iraq, which will include two modular data centers.
Back on the mainland, DoD HPC Modernization Program (HPCMP) aims to have a 100 petaflops system by 2025, a cognitive production system in 2026, an exaflops system in 2031 and a 10 exaflops system and a quantum pilot in 2036. Finally, in 2040, it hopes to have a quantum production system.
A smarter war
With this HPC-in-a-Container deployment, the system's expected workloads were unsurprisingly not shared, but HPCMP said it would welcome the "mobile, containerized supercomputer to support artificial intelligence, machine learning, and data analytics" workloads.
Earlier this year, DoD's chief information officer, Dana Deasy, gave some insight into the military's aims: “One of the things traditional computing has always had a problem with is the warfighter sitting out on the tactical edge, [with the] cloud sitting [elsewhere].
"Now imagine a world where we can take that compute power with new applications on top of it, and put the cloud right into the hands of the tactical fighter on the edge. That’s why the cloud is so important to us."
Part of that cloud push is JEDI, the repeatedly delayed $10bn program currently set to go to either AWS or Azure. But it will likely have to work hand in hand with the Distributed Common Ground System (DCGS-A) program, currently being built by Silicon Valley data analytics company Palantir Technologies for $800 million.
DCGS-A, a mixture of software, ruggedized computer networks and sensors, acts as the Army's primary system to post data, process information, and disseminate Intelligence, Surveillance and Reconnaissance (ISR) information about enemy troop movements, weather and terrain.
"The Army Operating Concept, Win in a Complex World, requires intelligence warfighting function training to increasingly focus on employing the DCGS-A as a "weapons system" to support expeditionary operations with light and lethal formations capable of deploying quickly," a 2014 US Army article on DCGS-A by Maj. Gen. Robert P. Ashley and Col. William L. Edwards states.