TY - GEN
T1 - Trestles
T2 - TeraGrid 2011 Conference: Extreme Digital Discovery, TG'11
AU - Moore, Richard L.
AU - Hart, David L.
AU - Pfeiffer, Wayne
AU - Tatineni, Mahidhar
AU - Yoshimoto, Kenneth
AU - Young, William S.
PY - 2011
Y1 - 2011
N2 - Trestles is a new 100TF HPC resource at SDSC designed to enhance scientific productivity for modest-scale and gateway users within the TeraGrid. This paper discusses the Trestles hardware and user environment, as well as the rationale for targeting this user base and the planned operational policies and procedures to optimize scientific productivity, including a focus on turnaround time in addition to the traditional system utilization. A surprisingly large fraction of TeraGrid users run modest-scale jobs (e.g. <1K cores), and an increasing fraction of TeraGrid users access HPC resources via gateways; while these users represent a large percentage of the user base, they consume a smaller fraction of the TeraGrid resources. Thus, while Trestles is not the largest HPC resource in TeraGrid, it will be able to support this large class of TeraGrid users in an environment designed to enhance their productivity. This targeted usage model also frees up other TeraGrid systems for users/jobs that require large-scale, SMP or other specific resource features. One of the key differentiators for Trestles is that it will be allocated and scheduled to optimize queue wait times and expansion factors, as well as the traditional system utilization metric. In addition, the node design, with 32 cores and 64GB DRAM, will accommodate many jobs without inter-node communications, while the 120GB local flash memory will speed up many applications. A robust set of application software, including Gaussian, BLAST, Abaqus, GAMESS, Amber and NAMD, is installed on the system. Standard job limits are 32 nodes (1K cores) and 48 hours runtime, but exceptions can be made, particularly for long jobs up to 2 weeks. Standing system reservations ensure that some nodes are always set aside for shorter, smaller jobs, and user-settable reservations are available to ensure users predictable access to the system. Nodes can be accessed in exclusive or shared mode. Finally, Trestles is the only TeraGrid resource with automatic on-demand access; a limited number of nodes is configured for jobs to "run at risk" (with a discount in the usage rate charged) and be subject to being pre-emptively killed by on-demand jobs (which carry a premium in the usage rate). The allocation, scheduling and software environments will be adjusted and tuned over time as usage patterns emerge and users provide feedback to further enhance their productivity.
AB - Trestles is a new 100TF HPC resource at SDSC designed to enhance scientific productivity for modest-scale and gateway users within the TeraGrid. This paper discusses the Trestles hardware and user environment, as well as the rationale for targeting this user base and the planned operational policies and procedures to optimize scientific productivity, including a focus on turnaround time in addition to the traditional system utilization. A surprisingly large fraction of TeraGrid users run modest-scale jobs (e.g. <1K cores), and an increasing fraction of TeraGrid users access HPC resources via gateways; while these users represent a large percentage of the user base, they consume a smaller fraction of the TeraGrid resources. Thus, while Trestles is not the largest HPC resource in TeraGrid, it will be able to support this large class of TeraGrid users in an environment designed to enhance their productivity. This targeted usage model also frees up other TeraGrid systems for users/jobs that require large-scale, SMP or other specific resource features. One of the key differentiators for Trestles is that it will be allocated and scheduled to optimize queue wait times and expansion factors, as well as the traditional system utilization metric. In addition, the node design, with 32 cores and 64GB DRAM, will accommodate many jobs without inter-node communications, while the 120GB local flash memory will speed up many applications. A robust set of application software, including Gaussian, BLAST, Abaqus, GAMESS, Amber and NAMD, is installed on the system. Standard job limits are 32 nodes (1K cores) and 48 hours runtime, but exceptions can be made, particularly for long jobs up to 2 weeks. Standing system reservations ensure that some nodes are always set aside for shorter, smaller jobs, and user-settable reservations are available to ensure users predictable access to the system. Nodes can be accessed in exclusive or shared mode. Finally, Trestles is the only TeraGrid resource with automatic on-demand access; a limited number of nodes is configured for jobs to "run at risk" (with a discount in the usage rate charged) and be subject to being pre-emptively killed by on-demand jobs (which carry a premium in the usage rate). The allocation, scheduling and software environments will be adjusted and tuned over time as usage patterns emerge and users provide feedback to further enhance their productivity.
KW - allocations
KW - capacity computing
KW - gateways
KW - on-demand
KW - scheduling
UR - https://www.scopus.com/pages/publications/80052337243
U2 - 10.1145/2016741.2016768
DO - 10.1145/2016741.2016768
M3 - Conference contribution
AN - SCOPUS:80052337243
SN - 9781450308885
T3 - Proceedings of the TeraGrid 2011 Conference: Extreme Digital Discovery, TG'11
BT - Proceedings of the TeraGrid 2011 Conference
Y2 - 18 July 2011 through 21 July 2011
ER -