Using K-means clustering to detect anomalous file removes

B. Anderson, M. Genty

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

One of the purposes of a data archive is to preserve irreplaceable data for future studies and generations. There are a number of ways that data can be lost from an archive, including accidental or malicious deletion of data. While there is a lot of software that can check for specific known threats or problems on a system, detecting non-specific anomalous behavior, such as unusual file removal patterns, is harder. One approach to detecting this kind of problem is machine learning. Machine learning algorithms can build a statistical model of what constitutes normal behavior and then flag data points that are outliers. To help protect the 87 petabytes of data in the National Center for Atmospheric Research's data archive, we explored our file removal patterns and implemented a k-means clustering solution to detect anomalous file removes. This approach can also be used to detect other anomalies, such as operational inconsistencies.

Original languageEnglish
Title of host publication2018 World Congress in Computer Science, Computer Engineering and Applied Computing, CSCE 2018 - Proceedings of the 2018 International Conference on Artificial Intelligence, ICAI 2018
EditorsHamid R. Arabnia, David de la Fuente, Elena B. Kozerenko, Jose A. Olivas, Fernando G. Tinetti
PublisherCSREA Press
Pages454-458
Number of pages5
ISBN (Electronic)1601324804, 9781601324801
StatePublished - 2018
Event2018 International Conference on Artificial Intelligence, ICAI 2018 at 2018 World Congress in Computer Science, Computer Engineering and Applied Computing, CSCE 2018 - Las Vegas, United States
Duration: Jul 30 2018Aug 2 2018

Publication series

Name2018 World Congress in Computer Science, Computer Engineering and Applied Computing, CSCE 2018 - Proceedings of the 2018 International Conference on Artificial Intelligence, ICAI 2018

Conference

Conference2018 International Conference on Artificial Intelligence, ICAI 2018 at 2018 World Congress in Computer Science, Computer Engineering and Applied Computing, CSCE 2018
Country/TerritoryUnited States
CityLas Vegas
Period07/30/1808/2/18

Keywords

  • Analysis
  • Archive
  • Cybersecurity
  • Data science
  • Machine learning
  • Metrics

Fingerprint

Dive into the research topics of 'Using K-means clustering to detect anomalous file removes'. Together they form a unique fingerprint.

Cite this