Coding techniques for handling failures in large disk arrays

Lisa Hellerstein, G. A. Gibson, R. M. Karp, R. H. Katz, D. A. Patterson

    Research output: Contribution to journalArticle

    Abstract

    A crucial issue in the design of very large disk arrays is the protection of data against catastrophic disk failures. Although today single disks are highly reliable, when a disk array consists of 100 or 1000 disks, the probability that at least one disk will fail within a day or a week is high. In this paper we address the problem of designing erasure-correcting binary linear codes that protect against the loss of data caused by disk failures in large disk arrays. We describe how such codes can be used to encode data in disk arrays, and give a simple method for data reconstruction. We discuss important reliability and performance constraints of these codes, and show how these constraints relate to properties of the parity check matrices of the codes. In so doing, we transform code design problems into combinatorial problems. Using this combinatorial framework, we present codes and prove they are optimal with respect to various reliability and performance constraints.

    Original languageEnglish (US)
    Pages (from-to)182-208
    Number of pages27
    JournalAlgorithmica (New York)
    Volume12
    Issue number2-3
    DOIs
    StatePublished - Sep 1994

    Fingerprint

    Disk Array
    Coding
    Binary Code
    Linear Codes
    Combinatorial Problems
    Parity
    Transform

    Keywords

    • Availability
    • Error-correcting codes
    • Input/output architecture
    • RAID
    • Redundant disk arrays
    • Reliability

    ASJC Scopus subject areas

    • Applied Mathematics
    • Safety, Risk, Reliability and Quality
    • Software
    • Computer Graphics and Computer-Aided Design

    Cite this

    Hellerstein, L., Gibson, G. A., Karp, R. M., Katz, R. H., & Patterson, D. A. (1994). Coding techniques for handling failures in large disk arrays. Algorithmica (New York), 12(2-3), 182-208. https://doi.org/10.1007/BF01185210

    Coding techniques for handling failures in large disk arrays. / Hellerstein, Lisa; Gibson, G. A.; Karp, R. M.; Katz, R. H.; Patterson, D. A.

    In: Algorithmica (New York), Vol. 12, No. 2-3, 09.1994, p. 182-208.

    Research output: Contribution to journalArticle

    Hellerstein, L, Gibson, GA, Karp, RM, Katz, RH & Patterson, DA 1994, 'Coding techniques for handling failures in large disk arrays', Algorithmica (New York), vol. 12, no. 2-3, pp. 182-208. https://doi.org/10.1007/BF01185210
    Hellerstein L, Gibson GA, Karp RM, Katz RH, Patterson DA. Coding techniques for handling failures in large disk arrays. Algorithmica (New York). 1994 Sep;12(2-3):182-208. https://doi.org/10.1007/BF01185210
    Hellerstein, Lisa ; Gibson, G. A. ; Karp, R. M. ; Katz, R. H. ; Patterson, D. A. / Coding techniques for handling failures in large disk arrays. In: Algorithmica (New York). 1994 ; Vol. 12, No. 2-3. pp. 182-208.
    @article{2de8df1dc6854877bb841cbbd4cc028b,
    title = "Coding techniques for handling failures in large disk arrays",
    abstract = "A crucial issue in the design of very large disk arrays is the protection of data against catastrophic disk failures. Although today single disks are highly reliable, when a disk array consists of 100 or 1000 disks, the probability that at least one disk will fail within a day or a week is high. In this paper we address the problem of designing erasure-correcting binary linear codes that protect against the loss of data caused by disk failures in large disk arrays. We describe how such codes can be used to encode data in disk arrays, and give a simple method for data reconstruction. We discuss important reliability and performance constraints of these codes, and show how these constraints relate to properties of the parity check matrices of the codes. In so doing, we transform code design problems into combinatorial problems. Using this combinatorial framework, we present codes and prove they are optimal with respect to various reliability and performance constraints.",
    keywords = "Availability, Error-correcting codes, Input/output architecture, RAID, Redundant disk arrays, Reliability",
    author = "Lisa Hellerstein and Gibson, {G. A.} and Karp, {R. M.} and Katz, {R. H.} and Patterson, {D. A.}",
    year = "1994",
    month = "9",
    doi = "10.1007/BF01185210",
    language = "English (US)",
    volume = "12",
    pages = "182--208",
    journal = "Algorithmica",
    issn = "0178-4617",
    publisher = "Springer New York",
    number = "2-3",

    }

    TY - JOUR

    T1 - Coding techniques for handling failures in large disk arrays

    AU - Hellerstein, Lisa

    AU - Gibson, G. A.

    AU - Karp, R. M.

    AU - Katz, R. H.

    AU - Patterson, D. A.

    PY - 1994/9

    Y1 - 1994/9

    N2 - A crucial issue in the design of very large disk arrays is the protection of data against catastrophic disk failures. Although today single disks are highly reliable, when a disk array consists of 100 or 1000 disks, the probability that at least one disk will fail within a day or a week is high. In this paper we address the problem of designing erasure-correcting binary linear codes that protect against the loss of data caused by disk failures in large disk arrays. We describe how such codes can be used to encode data in disk arrays, and give a simple method for data reconstruction. We discuss important reliability and performance constraints of these codes, and show how these constraints relate to properties of the parity check matrices of the codes. In so doing, we transform code design problems into combinatorial problems. Using this combinatorial framework, we present codes and prove they are optimal with respect to various reliability and performance constraints.

    AB - A crucial issue in the design of very large disk arrays is the protection of data against catastrophic disk failures. Although today single disks are highly reliable, when a disk array consists of 100 or 1000 disks, the probability that at least one disk will fail within a day or a week is high. In this paper we address the problem of designing erasure-correcting binary linear codes that protect against the loss of data caused by disk failures in large disk arrays. We describe how such codes can be used to encode data in disk arrays, and give a simple method for data reconstruction. We discuss important reliability and performance constraints of these codes, and show how these constraints relate to properties of the parity check matrices of the codes. In so doing, we transform code design problems into combinatorial problems. Using this combinatorial framework, we present codes and prove they are optimal with respect to various reliability and performance constraints.

    KW - Availability

    KW - Error-correcting codes

    KW - Input/output architecture

    KW - RAID

    KW - Redundant disk arrays

    KW - Reliability

    UR - http://www.scopus.com/inward/record.url?scp=0028483751&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0028483751&partnerID=8YFLogxK

    U2 - 10.1007/BF01185210

    DO - 10.1007/BF01185210

    M3 - Article

    VL - 12

    SP - 182

    EP - 208

    JO - Algorithmica

    JF - Algorithmica

    SN - 0178-4617

    IS - 2-3

    ER -