Invisible loading

Access-driven data transfer from raw files into database systems

Azza Abouzied, Daniel J. Abadi, Avi Silberschatz

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Commercial analytical database systems suffer from a high "time-to-first-analysis": before data can be processed, it must be modeled and schematized (a human effort), transferred into the database's storage layer, and optionally clustered and indexed (a computational effort). For many types of structured data, this upfront effort is unjustifiable, so the data are processed directly over the file system using the Hadoop framework, despite the cumulative performance benefits of processing this data in an analytical database system. In this paper we describe a system that achieves the immediate gratification of running MapReduce jobs directly over a file system, while still making progress towards the long-term performance benefits of database systems. The basic idea is to piggyback on MapReduce jobs, leverage their parsing and tuple extraction operations to incrementally load and organize tuples into a database system, while simultaneously processing the file system data. We call this scheme Invisible Loading, as we load fractions of data at a time at almost no marginal cost in query latency, but still allow future queries to run much faster.

    Original languageEnglish (US)
    Title of host publicationAdvances in Database Technology - EDBT 2013
    Subtitle of host publication16th International Conference on Extending Database Technology, Proceedings
    Pages1-10
    Number of pages10
    DOIs
    StatePublished - May 2 2013
    Event16th International Conference on Extending Database Technology, EDBT 2013 - Genoa, Italy
    Duration: Mar 18 2013Mar 22 2013

    Other

    Other16th International Conference on Extending Database Technology, EDBT 2013
    CountryItaly
    CityGenoa
    Period3/18/133/22/13

    Fingerprint

    Data transfer
    Processing
    Costs

    ASJC Scopus subject areas

    • Software
    • Human-Computer Interaction
    • Computer Vision and Pattern Recognition
    • Computer Networks and Communications

    Cite this

    Abouzied, A., Abadi, D. J., & Silberschatz, A. (2013). Invisible loading: Access-driven data transfer from raw files into database systems. In Advances in Database Technology - EDBT 2013: 16th International Conference on Extending Database Technology, Proceedings (pp. 1-10) https://doi.org/10.1145/2452376.2452377

    Invisible loading : Access-driven data transfer from raw files into database systems. / Abouzied, Azza; Abadi, Daniel J.; Silberschatz, Avi.

    Advances in Database Technology - EDBT 2013: 16th International Conference on Extending Database Technology, Proceedings. 2013. p. 1-10.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abouzied, A, Abadi, DJ & Silberschatz, A 2013, Invisible loading: Access-driven data transfer from raw files into database systems. in Advances in Database Technology - EDBT 2013: 16th International Conference on Extending Database Technology, Proceedings. pp. 1-10, 16th International Conference on Extending Database Technology, EDBT 2013, Genoa, Italy, 3/18/13. https://doi.org/10.1145/2452376.2452377
    Abouzied A, Abadi DJ, Silberschatz A. Invisible loading: Access-driven data transfer from raw files into database systems. In Advances in Database Technology - EDBT 2013: 16th International Conference on Extending Database Technology, Proceedings. 2013. p. 1-10 https://doi.org/10.1145/2452376.2452377
    Abouzied, Azza ; Abadi, Daniel J. ; Silberschatz, Avi. / Invisible loading : Access-driven data transfer from raw files into database systems. Advances in Database Technology - EDBT 2013: 16th International Conference on Extending Database Technology, Proceedings. 2013. pp. 1-10
    @inproceedings{8b9a712478e6424da744256c4d63c749,
    title = "Invisible loading: Access-driven data transfer from raw files into database systems",
    abstract = "Commercial analytical database systems suffer from a high {"}time-to-first-analysis{"}: before data can be processed, it must be modeled and schematized (a human effort), transferred into the database's storage layer, and optionally clustered and indexed (a computational effort). For many types of structured data, this upfront effort is unjustifiable, so the data are processed directly over the file system using the Hadoop framework, despite the cumulative performance benefits of processing this data in an analytical database system. In this paper we describe a system that achieves the immediate gratification of running MapReduce jobs directly over a file system, while still making progress towards the long-term performance benefits of database systems. The basic idea is to piggyback on MapReduce jobs, leverage their parsing and tuple extraction operations to incrementally load and organize tuples into a database system, while simultaneously processing the file system data. We call this scheme Invisible Loading, as we load fractions of data at a time at almost no marginal cost in query latency, but still allow future queries to run much faster.",
    author = "Azza Abouzied and Abadi, {Daniel J.} and Avi Silberschatz",
    year = "2013",
    month = "5",
    day = "2",
    doi = "10.1145/2452376.2452377",
    language = "English (US)",
    isbn = "9781450315975",
    pages = "1--10",
    booktitle = "Advances in Database Technology - EDBT 2013",

    }

    TY - GEN

    T1 - Invisible loading

    T2 - Access-driven data transfer from raw files into database systems

    AU - Abouzied, Azza

    AU - Abadi, Daniel J.

    AU - Silberschatz, Avi

    PY - 2013/5/2

    Y1 - 2013/5/2

    N2 - Commercial analytical database systems suffer from a high "time-to-first-analysis": before data can be processed, it must be modeled and schematized (a human effort), transferred into the database's storage layer, and optionally clustered and indexed (a computational effort). For many types of structured data, this upfront effort is unjustifiable, so the data are processed directly over the file system using the Hadoop framework, despite the cumulative performance benefits of processing this data in an analytical database system. In this paper we describe a system that achieves the immediate gratification of running MapReduce jobs directly over a file system, while still making progress towards the long-term performance benefits of database systems. The basic idea is to piggyback on MapReduce jobs, leverage their parsing and tuple extraction operations to incrementally load and organize tuples into a database system, while simultaneously processing the file system data. We call this scheme Invisible Loading, as we load fractions of data at a time at almost no marginal cost in query latency, but still allow future queries to run much faster.

    AB - Commercial analytical database systems suffer from a high "time-to-first-analysis": before data can be processed, it must be modeled and schematized (a human effort), transferred into the database's storage layer, and optionally clustered and indexed (a computational effort). For many types of structured data, this upfront effort is unjustifiable, so the data are processed directly over the file system using the Hadoop framework, despite the cumulative performance benefits of processing this data in an analytical database system. In this paper we describe a system that achieves the immediate gratification of running MapReduce jobs directly over a file system, while still making progress towards the long-term performance benefits of database systems. The basic idea is to piggyback on MapReduce jobs, leverage their parsing and tuple extraction operations to incrementally load and organize tuples into a database system, while simultaneously processing the file system data. We call this scheme Invisible Loading, as we load fractions of data at a time at almost no marginal cost in query latency, but still allow future queries to run much faster.

    UR - http://www.scopus.com/inward/record.url?scp=84876789119&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84876789119&partnerID=8YFLogxK

    U2 - 10.1145/2452376.2452377

    DO - 10.1145/2452376.2452377

    M3 - Conference contribution

    SN - 9781450315975

    SP - 1

    EP - 10

    BT - Advances in Database Technology - EDBT 2013

    ER -