General Information
    • ISSN: 1793-8201 (Print), 2972-4511 (Online)
    • Abbreviated Title: Int. J. Comput. Theory Eng.
    • Frequency: Quarterly
    • DOI: 10.7763/IJCTE
    • Editor-in-Chief: Prof. Mehmet Sahinoglu
    • Associate Editor-in-Chief: Assoc. Prof. Alberto Arteta, Assoc. Prof. Engin Maşazade
    • Managing Editor: Ms. Cecilia Xie
    • Abstracting/Indexing: Scopus (Since 2022), INSPEC (IET), CNKI,  Google Scholar, EBSCO, etc.
    • Average Days from Submission to Acceptance: 192 days
    • APC: 800 USD
    • E-mail: editor@ijcte.org
    • Journal Metrics:
    • SCImago Journal & Country Rank
Article Metrics in Dimensions

IJCTE 2012 Vol.4(5): 726-730 ISSN: 1793-8201
DOI: 10.7763/IJCTE.2012.V4.566

A Failure Detection and Prediction Mechanism for Enhancing Dependability of Data Centers

Qiang Guan, Ziming Zhang, and Song Fu

Abstract—Modern data centers continue to grow in their scale and complexity. They are changing dynamically as well due to the addition and removal of system components, changing execution environments, frequent updates and upgrades, online repairs and more. Classical reliability theory and conventional methods do rarely consider the actual state of a system and are therefore not capable to reflect the dynamics of runtime systems and failure processes. In this paper, we present an unsupervised failure detection and prediction method using an ensemble of Bayesian models. It characterizes normal execution states of the system and detects anomalous behaviors. We implement a prototype of our failure detection and prediction mechanism and evaluate its performance on a data center test platform. Experimental results show that our proposed method can forecast failure dynamics with high accuracy.

Index Terms—Data centers, failure detection, failure management, dependable computing.

Q. Guan, Z. Zhang, and S. Fu are with the Department of Computer Science and Engineering, University of North Texas, Denton, Texas 76203 USA (e-mail: QiangGuan@my.unt.edu; ZimingZhang@my.unt.edu; Song.Fu@unt.edu, Tel.: +1-940-565-2341; fax: +1-940-565-2799).

[PDF]

Cite: Qiang Guan, Ziming Zhang, and Song Fu, "A Failure Detection and Prediction Mechanism for Enhancing Dependability of Data Centers," International Journal of Computer Theory and Engineering vol. 4, no. 5, pp. 726-730, 2012.


Copyright © 2008-2024. International Association of Computer Science and Information Technology. All rights reserved.