Using data science to understand tape-based archive workloads

Bill Anderson, Marc Genty, David L. Hart, Erich Thanhardt

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Data storage needs continue to grow in most fields, and the cost per byte for tape remains lower than the cost for disk, making tape storage a good candidate for cost-effective long-term storage. However, the workloads suitable for tape archives differ from those for disk file systems, and archives must handle internally generated workloads that can be more demanding than those generated by end users (e.g., migration of data from an old tape technology to a new one). To better understand the variegated workloads, we have followed the first steps in the data science methodology. For anyone considering the use or deployment of a tape-based data archive or for anyone interested in details of data archives in the context of data science, this paper describes key aspects of data archive workloads.

Original languageEnglish
Title of host publicationProceedings of the XSEDE 2015 Conference
Subtitle of host publicationScientific Advancements Enabled by Enhanced Cyberinfrastructure
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450337205
DOIs
StatePublished - Jul 26 2015
Event4th Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2015 - St. Louis, United States
Duration: Jul 26 2015Jul 30 2015

Publication series

NameACM International Conference Proceeding Series
Volume2015-July

Conference

Conference4th Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2015
Country/TerritoryUnited States
CitySt. Louis
Period07/26/1507/30/15

Keywords

  • Analysis
  • Archive
  • Data science
  • Metrics

Fingerprint

Dive into the research topics of 'Using data science to understand tape-based archive workloads'. Together they form a unique fingerprint.

Cite this