HadoopDB

An architectural hybrid of mapreduce and DBMS technologies for analytical workloads

Azza Abouzied, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi Silberschatz, Alexander Rasin

    Research output: Contribution to journalArticle

    Abstract

    The production environment for analytical data management applications is rapidly changing. Many enterprises are shifting away from deploying their analytical databases on high-end proprietary machines, and moving towards cheaper, lower-end, commodity hardware, typically arranged in a shared-nothing MPP architecture, often in a virtualized environment inside public or private "clouds". At the same time, the amount of data that needs to be analyzed is exploding, requiring hundreds to thousands of machines to work in parallel to perform the analysis. There tend to be two schools of thought regarding what technology to use for data analysis in such an environment. Proponents of parallel databases argue that the strong emphasis on performance and efficiency of parallel databases makes them wellsuited to perform such analysis. On the other hand, others argue that MapReduce-based systems are better suited due to their superior scalability, fault tolerance, and flexibility to handle unstructured data. In this paper, we explore the feasibility of building a hybrid system that takes the best features from both technologies; the prototype we built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.

    Original languageEnglish (US)
    Pages (from-to)922-933
    Number of pages12
    JournalProceedings of the VLDB Endowment
    Volume2
    Issue number1
    DOIs
    StatePublished - Jan 1 2009

    Fingerprint

    Fault tolerance
    Scalability
    Hybrid systems
    Information management
    Hardware
    Industry

    ASJC Scopus subject areas

    • Computer Science (miscellaneous)
    • Computer Science(all)

    Cite this

    HadoopDB : An architectural hybrid of mapreduce and DBMS technologies for analytical workloads. / Abouzied, Azza; Bajda-Pawlikowski, Kamil; Abadi, Daniel; Silberschatz, Avi; Rasin, Alexander.

    In: Proceedings of the VLDB Endowment, Vol. 2, No. 1, 01.01.2009, p. 922-933.

    Research output: Contribution to journalArticle

    Abouzied, A, Bajda-Pawlikowski, K, Abadi, D, Silberschatz, A & Rasin, A 2009, 'HadoopDB: An architectural hybrid of mapreduce and DBMS technologies for analytical workloads', Proceedings of the VLDB Endowment, vol. 2, no. 1, pp. 922-933. https://doi.org/10.14778/1687627.1687731
    Abouzied, Azza ; Bajda-Pawlikowski, Kamil ; Abadi, Daniel ; Silberschatz, Avi ; Rasin, Alexander. / HadoopDB : An architectural hybrid of mapreduce and DBMS technologies for analytical workloads. In: Proceedings of the VLDB Endowment. 2009 ; Vol. 2, No. 1. pp. 922-933.
    @article{acaa9228ce4340c8977cb284ec70f70f,
    title = "HadoopDB: An architectural hybrid of mapreduce and DBMS technologies for analytical workloads",
    abstract = "The production environment for analytical data management applications is rapidly changing. Many enterprises are shifting away from deploying their analytical databases on high-end proprietary machines, and moving towards cheaper, lower-end, commodity hardware, typically arranged in a shared-nothing MPP architecture, often in a virtualized environment inside public or private {"}clouds{"}. At the same time, the amount of data that needs to be analyzed is exploding, requiring hundreds to thousands of machines to work in parallel to perform the analysis. There tend to be two schools of thought regarding what technology to use for data analysis in such an environment. Proponents of parallel databases argue that the strong emphasis on performance and efficiency of parallel databases makes them wellsuited to perform such analysis. On the other hand, others argue that MapReduce-based systems are better suited due to their superior scalability, fault tolerance, and flexibility to handle unstructured data. In this paper, we explore the feasibility of building a hybrid system that takes the best features from both technologies; the prototype we built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.",
    author = "Azza Abouzied and Kamil Bajda-Pawlikowski and Daniel Abadi and Avi Silberschatz and Alexander Rasin",
    year = "2009",
    month = "1",
    day = "1",
    doi = "10.14778/1687627.1687731",
    language = "English (US)",
    volume = "2",
    pages = "922--933",
    journal = "Proceedings of the VLDB Endowment",
    issn = "2150-8097",
    publisher = "Very Large Data Base Endowment Inc.",
    number = "1",

    }

    TY - JOUR

    T1 - HadoopDB

    T2 - An architectural hybrid of mapreduce and DBMS technologies for analytical workloads

    AU - Abouzied, Azza

    AU - Bajda-Pawlikowski, Kamil

    AU - Abadi, Daniel

    AU - Silberschatz, Avi

    AU - Rasin, Alexander

    PY - 2009/1/1

    Y1 - 2009/1/1

    N2 - The production environment for analytical data management applications is rapidly changing. Many enterprises are shifting away from deploying their analytical databases on high-end proprietary machines, and moving towards cheaper, lower-end, commodity hardware, typically arranged in a shared-nothing MPP architecture, often in a virtualized environment inside public or private "clouds". At the same time, the amount of data that needs to be analyzed is exploding, requiring hundreds to thousands of machines to work in parallel to perform the analysis. There tend to be two schools of thought regarding what technology to use for data analysis in such an environment. Proponents of parallel databases argue that the strong emphasis on performance and efficiency of parallel databases makes them wellsuited to perform such analysis. On the other hand, others argue that MapReduce-based systems are better suited due to their superior scalability, fault tolerance, and flexibility to handle unstructured data. In this paper, we explore the feasibility of building a hybrid system that takes the best features from both technologies; the prototype we built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.

    AB - The production environment for analytical data management applications is rapidly changing. Many enterprises are shifting away from deploying their analytical databases on high-end proprietary machines, and moving towards cheaper, lower-end, commodity hardware, typically arranged in a shared-nothing MPP architecture, often in a virtualized environment inside public or private "clouds". At the same time, the amount of data that needs to be analyzed is exploding, requiring hundreds to thousands of machines to work in parallel to perform the analysis. There tend to be two schools of thought regarding what technology to use for data analysis in such an environment. Proponents of parallel databases argue that the strong emphasis on performance and efficiency of parallel databases makes them wellsuited to perform such analysis. On the other hand, others argue that MapReduce-based systems are better suited due to their superior scalability, fault tolerance, and flexibility to handle unstructured data. In this paper, we explore the feasibility of building a hybrid system that takes the best features from both technologies; the prototype we built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.

    UR - http://www.scopus.com/inward/record.url?scp=79957809015&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=79957809015&partnerID=8YFLogxK

    U2 - 10.14778/1687627.1687731

    DO - 10.14778/1687627.1687731

    M3 - Article

    VL - 2

    SP - 922

    EP - 933

    JO - Proceedings of the VLDB Endowment

    JF - Proceedings of the VLDB Endowment

    SN - 2150-8097

    IS - 1

    ER -