NASA Case Study

NASAs Flexible Cloud Fabric:
Moving Cluster Applications to the Cloud
Network architects at the NASA Center for Climate Simulation recently
began investigating the viability of running the organization’s modeling
and simulation applications on cloud infrastructure, as an alternative to
its purpose-built computing cluster named Discover. Hoping to capture
the inherent advantages of cloud infrastructure, such as agility and
elasticity, they want to establish whether an open cloud architecture can
meet the applications’ rigorous throughput and latency requirements.
In particular, they need to ensure that overhead associated with
virtualization would not limit performance.
As part of the shift to the cloud, the team hopes to converge the
environment’s backbone and management infrastructures onto
10 Gigabit Ethernet. Using a single network fabric is expected to help
optimize the exibility and cost-effectiveness of the overall solution.
Traditional Architecture for Cluster Computing
The NASA Center for Climate Simulation’s research on climate change and related
phenomena, which requires extensive computer modeling, contributes to efforts such
as hurricane prediction, analysis of past weather patterns, and scientific support of
government climate policy. The cluster named Discover that has done this work for some
years uses an integrated set of supercomputing, visualization, and data-management
technologies to deliver roughly 400 teraflops of capacity:
Compute resources: 30,000 conventional Intel® Xeon® processor cores and 64 GPUs
Inter-node backbone: DDR and QDR InfiniBand*
Management networking: Gigabit and 10 Gigabit Ethernet (GbE and 10GbE)
Data store: ~4 petabyte RAID-based parallel file system (GPFS),
plus ~20 petabyte tape archive
Discover is based entirely on non-virtualized machines, so adding capacity requires
additional physical servers to be provisioned. Reducing the traditional cost and
complexity of those changes is one benet of cloud computing. Moreover, cloud
architectures add elasticity that aids in job scheduling and helps avoid operational
bottlenecks associated with long-running jobs.
CASE STUDY
Intel® Ethernet 10 Gigabit Server Adapters
Single-Root I/O Virtualization
NASA
The NASA Center for Climate
Simulation found that an open
cloud architecture using 10 Gigabit
Ethernet for both inter-node
communication and management
traffic is a viable alternative to
its purpose-built InfiniBand*-
based cluster for many large-
scale modeling applications. The
organization hopes to capture the
elasticity and flexibility benefits
of both cloud computing and
converged networking on Ethernet.

Summary of content (4 pages)