Extending the effective throughput of NoCs with distributed shared-buffer routers

Rohit Sunkam Ramanujam, Vassos Soteriou Soteriou, Bill Lin, Li Shiuan Peh

    Research output: Contribution to journalArticle

    Abstract

    Router microarchitecture plays a central role in the performance of networks-on-chip (NoCs). Buffers are needed in routers to house incoming flits that cannot be immediately forwarded due to contention. This buffering can be done at the inputs or the outputs of a router, corresponding to an input-buffered router (IBR) or an output-buffered router (OBR). OBRs are attractive because they can sustain higher throughputs and have lower queuing delays under high loads than IBRs. However, a direct implementation of an OBR requires a router speedup equal to the number of ports, making such a design prohibitive under aggressive clocking needs and limited power budgets of most NoC applications. In this paper, a new router design based on a distributed shared-buffer (DSB) architecture is proposed that aims to practically emulate an OBR. The proposed architecture introduces innovations to address the unique constraints of NoCs, including efficient pipelining and novel flow control. Practical DSB configurations are also presented with reduced power overheads while exhibiting negligible performance degradation. Compared to a state-of-the-art pipelined IBR, the proposed DSB router achieves up to 19% higher throughput on synthetic traffic and reduces packet latency on average by 61% when running SPLASH-2 benchmarks with high contention. On average, the saturation throughput of DSB routers is within 7% of the theoretically ideal saturation throughput under the synthetic workloads evaluated.

    Original languageEnglish (US)
    Article number5737868
    Pages (from-to)548-561
    Number of pages14
    JournalIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
    Volume30
    Issue number4
    DOIs
    StatePublished - Apr 1 2011

    Fingerprint

    Routers
    Throughput
    Flow control
    Innovation
    Degradation

    Keywords

    • Network throughput
    • networks-on-chip
    • on-chip interconnection networks
    • router microarchitecture

    ASJC Scopus subject areas

    • Software
    • Computer Graphics and Computer-Aided Design
    • Electrical and Electronic Engineering

    Cite this

    Extending the effective throughput of NoCs with distributed shared-buffer routers. / Ramanujam, Rohit Sunkam; Soteriou, Vassos Soteriou; Lin, Bill; Peh, Li Shiuan.

    In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 30, No. 4, 5737868, 01.04.2011, p. 548-561.

    Research output: Contribution to journalArticle

    Ramanujam, Rohit Sunkam ; Soteriou, Vassos Soteriou ; Lin, Bill ; Peh, Li Shiuan. / Extending the effective throughput of NoCs with distributed shared-buffer routers. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 2011 ; Vol. 30, No. 4. pp. 548-561.
    @article{67e1783197034b0b9a091cd97ecce35f,
    title = "Extending the effective throughput of NoCs with distributed shared-buffer routers",
    abstract = "Router microarchitecture plays a central role in the performance of networks-on-chip (NoCs). Buffers are needed in routers to house incoming flits that cannot be immediately forwarded due to contention. This buffering can be done at the inputs or the outputs of a router, corresponding to an input-buffered router (IBR) or an output-buffered router (OBR). OBRs are attractive because they can sustain higher throughputs and have lower queuing delays under high loads than IBRs. However, a direct implementation of an OBR requires a router speedup equal to the number of ports, making such a design prohibitive under aggressive clocking needs and limited power budgets of most NoC applications. In this paper, a new router design based on a distributed shared-buffer (DSB) architecture is proposed that aims to practically emulate an OBR. The proposed architecture introduces innovations to address the unique constraints of NoCs, including efficient pipelining and novel flow control. Practical DSB configurations are also presented with reduced power overheads while exhibiting negligible performance degradation. Compared to a state-of-the-art pipelined IBR, the proposed DSB router achieves up to 19{\%} higher throughput on synthetic traffic and reduces packet latency on average by 61{\%} when running SPLASH-2 benchmarks with high contention. On average, the saturation throughput of DSB routers is within 7{\%} of the theoretically ideal saturation throughput under the synthetic workloads evaluated.",
    keywords = "Network throughput, networks-on-chip, on-chip interconnection networks, router microarchitecture",
    author = "Ramanujam, {Rohit Sunkam} and Soteriou, {Vassos Soteriou} and Bill Lin and Peh, {Li Shiuan}",
    year = "2011",
    month = "4",
    day = "1",
    doi = "10.1109/TCAD.2011.2110550",
    language = "English (US)",
    volume = "30",
    pages = "548--561",
    journal = "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems",
    issn = "0278-0070",
    publisher = "Institute of Electrical and Electronics Engineers Inc.",
    number = "4",

    }

    TY - JOUR

    T1 - Extending the effective throughput of NoCs with distributed shared-buffer routers

    AU - Ramanujam, Rohit Sunkam

    AU - Soteriou, Vassos Soteriou

    AU - Lin, Bill

    AU - Peh, Li Shiuan

    PY - 2011/4/1

    Y1 - 2011/4/1

    N2 - Router microarchitecture plays a central role in the performance of networks-on-chip (NoCs). Buffers are needed in routers to house incoming flits that cannot be immediately forwarded due to contention. This buffering can be done at the inputs or the outputs of a router, corresponding to an input-buffered router (IBR) or an output-buffered router (OBR). OBRs are attractive because they can sustain higher throughputs and have lower queuing delays under high loads than IBRs. However, a direct implementation of an OBR requires a router speedup equal to the number of ports, making such a design prohibitive under aggressive clocking needs and limited power budgets of most NoC applications. In this paper, a new router design based on a distributed shared-buffer (DSB) architecture is proposed that aims to practically emulate an OBR. The proposed architecture introduces innovations to address the unique constraints of NoCs, including efficient pipelining and novel flow control. Practical DSB configurations are also presented with reduced power overheads while exhibiting negligible performance degradation. Compared to a state-of-the-art pipelined IBR, the proposed DSB router achieves up to 19% higher throughput on synthetic traffic and reduces packet latency on average by 61% when running SPLASH-2 benchmarks with high contention. On average, the saturation throughput of DSB routers is within 7% of the theoretically ideal saturation throughput under the synthetic workloads evaluated.

    AB - Router microarchitecture plays a central role in the performance of networks-on-chip (NoCs). Buffers are needed in routers to house incoming flits that cannot be immediately forwarded due to contention. This buffering can be done at the inputs or the outputs of a router, corresponding to an input-buffered router (IBR) or an output-buffered router (OBR). OBRs are attractive because they can sustain higher throughputs and have lower queuing delays under high loads than IBRs. However, a direct implementation of an OBR requires a router speedup equal to the number of ports, making such a design prohibitive under aggressive clocking needs and limited power budgets of most NoC applications. In this paper, a new router design based on a distributed shared-buffer (DSB) architecture is proposed that aims to practically emulate an OBR. The proposed architecture introduces innovations to address the unique constraints of NoCs, including efficient pipelining and novel flow control. Practical DSB configurations are also presented with reduced power overheads while exhibiting negligible performance degradation. Compared to a state-of-the-art pipelined IBR, the proposed DSB router achieves up to 19% higher throughput on synthetic traffic and reduces packet latency on average by 61% when running SPLASH-2 benchmarks with high contention. On average, the saturation throughput of DSB routers is within 7% of the theoretically ideal saturation throughput under the synthetic workloads evaluated.

    KW - Network throughput

    KW - networks-on-chip

    KW - on-chip interconnection networks

    KW - router microarchitecture

    UR - http://www.scopus.com/inward/record.url?scp=79953092220&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=79953092220&partnerID=8YFLogxK

    U2 - 10.1109/TCAD.2011.2110550

    DO - 10.1109/TCAD.2011.2110550

    M3 - Article

    VL - 30

    SP - 548

    EP - 561

    JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

    JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

    SN - 0278-0070

    IS - 4

    M1 - 5737868

    ER -