Trestles: A high-productivity HPC system targeted to modest-scale and gateway users

Richard L. Moore, David L. Hart, Wayne Pfeiffer, Mahidhar Tatineni, Kenneth Yoshimoto, William S. Young

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    12 Scopus citations

    Abstract

    Trestles is a new 100TF HPC resource at SDSC designed to enhance scientific productivity for modest-scale and gateway users within the TeraGrid. This paper discusses the Trestles hardware and user environment, as well as the rationale for targeting this user base and the planned operational policies and procedures to optimize scientific productivity, including a focus on turnaround time in addition to the traditional system utilization. A surprisingly large fraction of TeraGrid users run modest-scale jobs (e.g. <1K cores), and an increasing fraction of TeraGrid users access HPC resources via gateways; while these users represent a large percentage of the user base, they consume a smaller fraction of the TeraGrid resources. Thus, while Trestles is not the largest HPC resource in TeraGrid, it will be able to support this large class of TeraGrid users in an environment designed to enhance their productivity. This targeted usage model also frees up other TeraGrid systems for users/jobs that require large-scale, SMP or other specific resource features. One of the key differentiators for Trestles is that it will be allocated and scheduled to optimize queue wait times and expansion factors, as well as the traditional system utilization metric. In addition, the node design, with 32 cores and 64GB DRAM, will accommodate many jobs without inter-node communications, while the 120GB local flash memory will speed up many applications. A robust set of application software, including Gaussian, BLAST, Abaqus, GAMESS, Amber and NAMD, is installed on the system. Standard job limits are 32 nodes (1K cores) and 48 hours runtime, but exceptions can be made, particularly for long jobs up to 2 weeks. Standing system reservations ensure that some nodes are always set aside for shorter, smaller jobs, and user-settable reservations are available to ensure users predictable access to the system. Nodes can be accessed in exclusive or shared mode. Finally, Trestles is the only TeraGrid resource with automatic on-demand access; a limited number of nodes is configured for jobs to "run at risk" (with a discount in the usage rate charged) and be subject to being pre-emptively killed by on-demand jobs (which carry a premium in the usage rate). The allocation, scheduling and software environments will be adjusted and tuned over time as usage patterns emerge and users provide feedback to further enhance their productivity.

    Original languageEnglish
    Title of host publicationProceedings of the TeraGrid 2011 Conference
    Subtitle of host publicationExtreme Digital Discovery, TG'11
    DOIs
    StatePublished - 2011
    EventTeraGrid 2011 Conference: Extreme Digital Discovery, TG'11 - Salt Lake City, UT, United States
    Duration: Jul 18 2011Jul 21 2011

    Publication series

    NameProceedings of the TeraGrid 2011 Conference: Extreme Digital Discovery, TG'11

    Conference

    ConferenceTeraGrid 2011 Conference: Extreme Digital Discovery, TG'11
    Country/TerritoryUnited States
    CitySalt Lake City, UT
    Period07/18/1107/21/11

    Keywords

    • allocations
    • capacity computing
    • gateways
    • on-demand
    • scheduling

    Fingerprint

    Dive into the research topics of 'Trestles: A high-productivity HPC system targeted to modest-scale and gateway users'. Together they form a unique fingerprint.

    Cite this