Algorithms for low-latency remote file synchronization

Hao Yan, Utku Irmak, Torsten Suei

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    The remote file synchronization problem is how to update an outdated version of afile located on one machine to the current version located on another machine with a minimal amount of network communication. It arises in many scenarios including web site mirroring, file system backup and replication, or web access over slow links. A widely used open-source tool called rsync uses a single round of messages to solve this problem (plus an initial round for exchanging meta information). While research has shown that significant additional savings in bandwidth are possible by using multiple rounds, such approaches are often not desirable due to network latencies, increased protocol complexity, and higher I/O and CPU overheads at the endpoints. In this paper, we study single-round synchronization techniques that achieve savings in bandwidth consumption while preserving many of the advantages of the rsync approach. In particular, we propose a new and simple algorithm for file synchronization based on set reconciliation techniques. We then show how to integrate sampling techniques into our approach in order to adoptively select the most suitable algorithm and parameter setting for a given data set. Experimental results on several data sets show that the resulting protocol gives significant benefits over rsync, particularly on data sets with high degrees of redundancy between the versions.

    Original languageEnglish (US)
    Title of host publicationINFOCOM 2008: 27th IEEE Communications Society Conference on Computer Communications
    Pages655-663
    Number of pages9
    DOIs
    StatePublished - 2008
    EventINFOCOM 2008: 27th IEEE Communications Society Conference on Computer Communications - Phoenix, AZ, United States
    Duration: Apr 13 2008Apr 18 2008

    Other

    OtherINFOCOM 2008: 27th IEEE Communications Society Conference on Computer Communications
    CountryUnited States
    CityPhoenix, AZ
    Period4/13/084/18/08

    Fingerprint

    Synchronization
    Bandwidth
    Network protocols
    Telecommunication networks
    Program processors
    Redundancy
    Websites
    Sampling

    ASJC Scopus subject areas

    • Electrical and Electronic Engineering
    • Hardware and Architecture

    Cite this

    Yan, H., Irmak, U., & Suei, T. (2008). Algorithms for low-latency remote file synchronization. In INFOCOM 2008: 27th IEEE Communications Society Conference on Computer Communications (pp. 655-663). [4509635] https://doi.org/10.1109/INFOCOM.2007.40

    Algorithms for low-latency remote file synchronization. / Yan, Hao; Irmak, Utku; Suei, Torsten.

    INFOCOM 2008: 27th IEEE Communications Society Conference on Computer Communications. 2008. p. 655-663 4509635.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Yan, H, Irmak, U & Suei, T 2008, Algorithms for low-latency remote file synchronization. in INFOCOM 2008: 27th IEEE Communications Society Conference on Computer Communications., 4509635, pp. 655-663, INFOCOM 2008: 27th IEEE Communications Society Conference on Computer Communications, Phoenix, AZ, United States, 4/13/08. https://doi.org/10.1109/INFOCOM.2007.40
    Yan H, Irmak U, Suei T. Algorithms for low-latency remote file synchronization. In INFOCOM 2008: 27th IEEE Communications Society Conference on Computer Communications. 2008. p. 655-663. 4509635 https://doi.org/10.1109/INFOCOM.2007.40
    Yan, Hao ; Irmak, Utku ; Suei, Torsten. / Algorithms for low-latency remote file synchronization. INFOCOM 2008: 27th IEEE Communications Society Conference on Computer Communications. 2008. pp. 655-663
    @inproceedings{eaf499369f2b4fc3ad719162f101375b,
    title = "Algorithms for low-latency remote file synchronization",
    abstract = "The remote file synchronization problem is how to update an outdated version of afile located on one machine to the current version located on another machine with a minimal amount of network communication. It arises in many scenarios including web site mirroring, file system backup and replication, or web access over slow links. A widely used open-source tool called rsync uses a single round of messages to solve this problem (plus an initial round for exchanging meta information). While research has shown that significant additional savings in bandwidth are possible by using multiple rounds, such approaches are often not desirable due to network latencies, increased protocol complexity, and higher I/O and CPU overheads at the endpoints. In this paper, we study single-round synchronization techniques that achieve savings in bandwidth consumption while preserving many of the advantages of the rsync approach. In particular, we propose a new and simple algorithm for file synchronization based on set reconciliation techniques. We then show how to integrate sampling techniques into our approach in order to adoptively select the most suitable algorithm and parameter setting for a given data set. Experimental results on several data sets show that the resulting protocol gives significant benefits over rsync, particularly on data sets with high degrees of redundancy between the versions.",
    author = "Hao Yan and Utku Irmak and Torsten Suei",
    year = "2008",
    doi = "10.1109/INFOCOM.2007.40",
    language = "English (US)",
    isbn = "9781424420261",
    pages = "655--663",
    booktitle = "INFOCOM 2008: 27th IEEE Communications Society Conference on Computer Communications",

    }

    TY - GEN

    T1 - Algorithms for low-latency remote file synchronization

    AU - Yan, Hao

    AU - Irmak, Utku

    AU - Suei, Torsten

    PY - 2008

    Y1 - 2008

    N2 - The remote file synchronization problem is how to update an outdated version of afile located on one machine to the current version located on another machine with a minimal amount of network communication. It arises in many scenarios including web site mirroring, file system backup and replication, or web access over slow links. A widely used open-source tool called rsync uses a single round of messages to solve this problem (plus an initial round for exchanging meta information). While research has shown that significant additional savings in bandwidth are possible by using multiple rounds, such approaches are often not desirable due to network latencies, increased protocol complexity, and higher I/O and CPU overheads at the endpoints. In this paper, we study single-round synchronization techniques that achieve savings in bandwidth consumption while preserving many of the advantages of the rsync approach. In particular, we propose a new and simple algorithm for file synchronization based on set reconciliation techniques. We then show how to integrate sampling techniques into our approach in order to adoptively select the most suitable algorithm and parameter setting for a given data set. Experimental results on several data sets show that the resulting protocol gives significant benefits over rsync, particularly on data sets with high degrees of redundancy between the versions.

    AB - The remote file synchronization problem is how to update an outdated version of afile located on one machine to the current version located on another machine with a minimal amount of network communication. It arises in many scenarios including web site mirroring, file system backup and replication, or web access over slow links. A widely used open-source tool called rsync uses a single round of messages to solve this problem (plus an initial round for exchanging meta information). While research has shown that significant additional savings in bandwidth are possible by using multiple rounds, such approaches are often not desirable due to network latencies, increased protocol complexity, and higher I/O and CPU overheads at the endpoints. In this paper, we study single-round synchronization techniques that achieve savings in bandwidth consumption while preserving many of the advantages of the rsync approach. In particular, we propose a new and simple algorithm for file synchronization based on set reconciliation techniques. We then show how to integrate sampling techniques into our approach in order to adoptively select the most suitable algorithm and parameter setting for a given data set. Experimental results on several data sets show that the resulting protocol gives significant benefits over rsync, particularly on data sets with high degrees of redundancy between the versions.

    UR - http://www.scopus.com/inward/record.url?scp=51349094370&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=51349094370&partnerID=8YFLogxK

    U2 - 10.1109/INFOCOM.2007.40

    DO - 10.1109/INFOCOM.2007.40

    M3 - Conference contribution

    AN - SCOPUS:51349094370

    SN - 9781424420261

    SP - 655

    EP - 663

    BT - INFOCOM 2008: 27th IEEE Communications Society Conference on Computer Communications

    ER -