A data-driven support strategy for a sustainable research software repository

Mehmet Belgin, Tyler A. Perini, Fang Liu, Nuyun Zhang, Semir Sarajlic, Andre McNeill, Paul Manno, Neil C. Bright

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

We describe a sustainable strategy to support a large number of researchers with widely varying scientific software needs, which is a common problem for most centralized Research Computing Centers on university campuses. Changes in systems and hardware, coupled with aging software, often necessitates re-compilation of existing software. The naive approach of re-compiling all of the existing packages is not only counterproductive but may also become unrealistic, especially for small support teams such as Georgia Tech's PACE Team. Instead, we analyze job scheduling data to identify actively used software, then rank, and distribute them in three support tiers, which define the level of support we provide. The distribution of software into multiple tiers is a non-trivial problem. We use a heuristic ranking algorithm that uses four metrics, namely the number of users, groups, jobs, and their collective runtimes. The results revealed a surprisingly small subset of software that is sufficient to support a very large portion of the overall research computing activity on campus. This approach allows us to make data-driven strategic technical and policy decisions to provide high-quality support for the software that really matters and sustain these services with a relatively small team in the long term.

Original languageEnglish
Article numbere5338
JournalConcurrency and Computation: Practice and Experience
Volume31
Issue number20
DOIs
StatePublished - Oct 25 2019

Keywords

  • compilation
  • optimization
  • pareto ranking
  • repository
  • research software

Fingerprint

Dive into the research topics of 'A data-driven support strategy for a sustainable research software repository'. Together they form a unique fingerprint.

Cite this