Scalable lookahead regular expression detection system for deep packet inspection

Masanori Bando, N. Sertac Artan, H. Jonathan Chao

Research output: Contribution to journalArticle

Abstract

Regular expressions (RegExes) are widely used, yet their inherent complexity often limits the total number of RegExes that can be detected using a single chip for a reasonable throughput. This limit on the number of RegExes impairs the scalability of today's RegEx detection systems. The scalability of existing schemes is generally limited by the traditional detection paradigm based on per-character-state processing and state transition detection. The main focus of existing schemes is on optimizing the number of states and the required transitions, but not on optimizing the suboptimal character-based detection method. Furthermore, the potential benefits of allowing out-of-sequence detection, instead of detecting components of a RegEx in the order of appearance, have not been explored. Lastly, the existing schemes do not provide ways to adapt to the evolving RegExes. In this paper, we propose Lookahead Finite Automata (LaFA) to perform scalable RegEx detection. LaFA requires less memory due to these three contributions: 1) providing specialized and optimized detection modules to increase resource utilization; 2) systematically reordering the RegEx detection sequence to reduce the number of concurrent operations; 3) sharing states among automata for different RegExes to reduce resource requirements. Here, we demonstrate that LaFA requires an order of magnitude less memory compared to today's state-of-the-art RegEx detection systems. Using LaFA, a single-commodity field programmable gate array (FPGA) chip can accommodate up to 25000 (25 k) RegExes. Based on the throughput of our LaFA prototype on FPGA, we estimate that a 34-Gb/s throughput can be achieved.

Original languageEnglish (US)
Article number6216497
Pages (from-to)699-714
Number of pages16
JournalIEEE/ACM Transactions on Networking
Volume20
Issue number3
DOIs
StatePublished - Jun 2012

Fingerprint

Finite automata
Inspection
Throughput
Field programmable gate arrays (FPGA)
Scalability
Data storage equipment
Processing

Keywords

  • Deep packet inspection
  • DPI
  • LaFA
  • Lookahead Finite Automata
  • network intrusion detection and prevention system
  • NIDPS
  • regular expressions

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Software
  • Computer Science Applications
  • Computer Networks and Communications

Cite this

Scalable lookahead regular expression detection system for deep packet inspection. / Bando, Masanori; Artan, N. Sertac; Chao, H. Jonathan.

In: IEEE/ACM Transactions on Networking, Vol. 20, No. 3, 6216497, 06.2012, p. 699-714.

Research output: Contribution to journalArticle

@article{0677e14653d44af99956d90a6804be4b,
title = "Scalable lookahead regular expression detection system for deep packet inspection",
abstract = "Regular expressions (RegExes) are widely used, yet their inherent complexity often limits the total number of RegExes that can be detected using a single chip for a reasonable throughput. This limit on the number of RegExes impairs the scalability of today's RegEx detection systems. The scalability of existing schemes is generally limited by the traditional detection paradigm based on per-character-state processing and state transition detection. The main focus of existing schemes is on optimizing the number of states and the required transitions, but not on optimizing the suboptimal character-based detection method. Furthermore, the potential benefits of allowing out-of-sequence detection, instead of detecting components of a RegEx in the order of appearance, have not been explored. Lastly, the existing schemes do not provide ways to adapt to the evolving RegExes. In this paper, we propose Lookahead Finite Automata (LaFA) to perform scalable RegEx detection. LaFA requires less memory due to these three contributions: 1) providing specialized and optimized detection modules to increase resource utilization; 2) systematically reordering the RegEx detection sequence to reduce the number of concurrent operations; 3) sharing states among automata for different RegExes to reduce resource requirements. Here, we demonstrate that LaFA requires an order of magnitude less memory compared to today's state-of-the-art RegEx detection systems. Using LaFA, a single-commodity field programmable gate array (FPGA) chip can accommodate up to 25000 (25 k) RegExes. Based on the throughput of our LaFA prototype on FPGA, we estimate that a 34-Gb/s throughput can be achieved.",
keywords = "Deep packet inspection, DPI, LaFA, Lookahead Finite Automata, network intrusion detection and prevention system, NIDPS, regular expressions",
author = "Masanori Bando and Artan, {N. Sertac} and Chao, {H. Jonathan}",
year = "2012",
month = "6",
doi = "10.1109/TNET.2011.2181411",
language = "English (US)",
volume = "20",
pages = "699--714",
journal = "IEEE/ACM Transactions on Networking",
issn = "1063-6692",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "3",

}

TY - JOUR

T1 - Scalable lookahead regular expression detection system for deep packet inspection

AU - Bando, Masanori

AU - Artan, N. Sertac

AU - Chao, H. Jonathan

PY - 2012/6

Y1 - 2012/6

N2 - Regular expressions (RegExes) are widely used, yet their inherent complexity often limits the total number of RegExes that can be detected using a single chip for a reasonable throughput. This limit on the number of RegExes impairs the scalability of today's RegEx detection systems. The scalability of existing schemes is generally limited by the traditional detection paradigm based on per-character-state processing and state transition detection. The main focus of existing schemes is on optimizing the number of states and the required transitions, but not on optimizing the suboptimal character-based detection method. Furthermore, the potential benefits of allowing out-of-sequence detection, instead of detecting components of a RegEx in the order of appearance, have not been explored. Lastly, the existing schemes do not provide ways to adapt to the evolving RegExes. In this paper, we propose Lookahead Finite Automata (LaFA) to perform scalable RegEx detection. LaFA requires less memory due to these three contributions: 1) providing specialized and optimized detection modules to increase resource utilization; 2) systematically reordering the RegEx detection sequence to reduce the number of concurrent operations; 3) sharing states among automata for different RegExes to reduce resource requirements. Here, we demonstrate that LaFA requires an order of magnitude less memory compared to today's state-of-the-art RegEx detection systems. Using LaFA, a single-commodity field programmable gate array (FPGA) chip can accommodate up to 25000 (25 k) RegExes. Based on the throughput of our LaFA prototype on FPGA, we estimate that a 34-Gb/s throughput can be achieved.

AB - Regular expressions (RegExes) are widely used, yet their inherent complexity often limits the total number of RegExes that can be detected using a single chip for a reasonable throughput. This limit on the number of RegExes impairs the scalability of today's RegEx detection systems. The scalability of existing schemes is generally limited by the traditional detection paradigm based on per-character-state processing and state transition detection. The main focus of existing schemes is on optimizing the number of states and the required transitions, but not on optimizing the suboptimal character-based detection method. Furthermore, the potential benefits of allowing out-of-sequence detection, instead of detecting components of a RegEx in the order of appearance, have not been explored. Lastly, the existing schemes do not provide ways to adapt to the evolving RegExes. In this paper, we propose Lookahead Finite Automata (LaFA) to perform scalable RegEx detection. LaFA requires less memory due to these three contributions: 1) providing specialized and optimized detection modules to increase resource utilization; 2) systematically reordering the RegEx detection sequence to reduce the number of concurrent operations; 3) sharing states among automata for different RegExes to reduce resource requirements. Here, we demonstrate that LaFA requires an order of magnitude less memory compared to today's state-of-the-art RegEx detection systems. Using LaFA, a single-commodity field programmable gate array (FPGA) chip can accommodate up to 25000 (25 k) RegExes. Based on the throughput of our LaFA prototype on FPGA, we estimate that a 34-Gb/s throughput can be achieved.

KW - Deep packet inspection

KW - DPI

KW - LaFA

KW - Lookahead Finite Automata

KW - network intrusion detection and prevention system

KW - NIDPS

KW - regular expressions

UR - http://www.scopus.com/inward/record.url?scp=84862568379&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84862568379&partnerID=8YFLogxK

U2 - 10.1109/TNET.2011.2181411

DO - 10.1109/TNET.2011.2181411

M3 - Article

VL - 20

SP - 699

EP - 714

JO - IEEE/ACM Transactions on Networking

JF - IEEE/ACM Transactions on Networking

SN - 1063-6692

IS - 3

M1 - 6216497

ER -