LaFA: Lookahead finite automata for scalable regular expression detection

Masanori Bando, N. Sertac Artan, H. Jonathan Chao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Although Regular Expressions (RegExes) have been widely used in network security applications, their inherent complexity often limits the total number of RegExes that can be detected using a single chip for a reasonable throughput. This limit on the number of RegExes impairs the scalability of today's RegEx detection systems. The scalability of existing schemes is generally limited by the traditional per character state processing and state transition detection paradigm. The main focus of existing schemes is in optimizing the number of states and the required transitions, but not the suboptimal character-based detection method. Furthermore, the potential benefits of reduced number of operations and states using out-of-sequence detection methods have not been explored. In this paper, we propose Looka-head Finite Automata (LaFA) to perform scalable RegEx detection using very small amount of memory. LaFA's memory requirement is very small due to the following three areas of effort described in this paper: (1) Different parts of a RegEx, namely RegEx components, are detected using different detectors, each of which is specialized and optimized for the detection of a certain RegEx component. (2) We systematically reorder the RegEx component detection sequence, which provides us with new possibilities for memory optimization. (3) Many redundant states in classical finite automata are identified and eliminated in LaFA. Our simulations show that LaFA requires an order of magnitude less memory compared to today's state-of-the-art RegEx detection systems. A single commodity Field Programmable Gate Array (FPGA) chip can accommodate up to twenty-five thousand (25k) RegExes. Based on the throughput of our LaFA prototype on FPGA, we estimated that a 34-Gbps throughput can be achieved.

Original languageEnglish (US)
Title of host publicationANCS'09: Symposium on Architecture for Networking and Communications Systems
Pages40-49
Number of pages10
DOIs
StatePublished - 2009
Event2009 Symposium on Architecture for Networking and Communications Systems, ANCS'09 - Princeton, NJ, United States
Duration: Oct 19 2009Oct 20 2009

Other

Other2009 Symposium on Architecture for Networking and Communications Systems, ANCS'09
CountryUnited States
CityPrinceton, NJ
Period10/19/0910/20/09

Fingerprint

Finite automata
Data storage equipment
Throughput
Field programmable gate arrays (FPGA)
Scalability
Network security
Detectors
Processing

Keywords

  • deep packet inspection
  • finite automation
  • FPGA
  • LaFA
  • network intrusion detection system
  • regular expressions

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Electrical and Electronic Engineering

Cite this

Bando, M., Artan, N. S., & Chao, H. J. (2009). LaFA: Lookahead finite automata for scalable regular expression detection. In ANCS'09: Symposium on Architecture for Networking and Communications Systems (pp. 40-49) https://doi.org/10.1145/1882486.1882496

LaFA : Lookahead finite automata for scalable regular expression detection. / Bando, Masanori; Artan, N. Sertac; Chao, H. Jonathan.

ANCS'09: Symposium on Architecture for Networking and Communications Systems. 2009. p. 40-49.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bando, M, Artan, NS & Chao, HJ 2009, LaFA: Lookahead finite automata for scalable regular expression detection. in ANCS'09: Symposium on Architecture for Networking and Communications Systems. pp. 40-49, 2009 Symposium on Architecture for Networking and Communications Systems, ANCS'09, Princeton, NJ, United States, 10/19/09. https://doi.org/10.1145/1882486.1882496
Bando M, Artan NS, Chao HJ. LaFA: Lookahead finite automata for scalable regular expression detection. In ANCS'09: Symposium on Architecture for Networking and Communications Systems. 2009. p. 40-49 https://doi.org/10.1145/1882486.1882496
Bando, Masanori ; Artan, N. Sertac ; Chao, H. Jonathan. / LaFA : Lookahead finite automata for scalable regular expression detection. ANCS'09: Symposium on Architecture for Networking and Communications Systems. 2009. pp. 40-49
@inproceedings{db0a6d7af1ae4493948b36432f00ad4c,
title = "LaFA: Lookahead finite automata for scalable regular expression detection",
abstract = "Although Regular Expressions (RegExes) have been widely used in network security applications, their inherent complexity often limits the total number of RegExes that can be detected using a single chip for a reasonable throughput. This limit on the number of RegExes impairs the scalability of today's RegEx detection systems. The scalability of existing schemes is generally limited by the traditional per character state processing and state transition detection paradigm. The main focus of existing schemes is in optimizing the number of states and the required transitions, but not the suboptimal character-based detection method. Furthermore, the potential benefits of reduced number of operations and states using out-of-sequence detection methods have not been explored. In this paper, we propose Looka-head Finite Automata (LaFA) to perform scalable RegEx detection using very small amount of memory. LaFA's memory requirement is very small due to the following three areas of effort described in this paper: (1) Different parts of a RegEx, namely RegEx components, are detected using different detectors, each of which is specialized and optimized for the detection of a certain RegEx component. (2) We systematically reorder the RegEx component detection sequence, which provides us with new possibilities for memory optimization. (3) Many redundant states in classical finite automata are identified and eliminated in LaFA. Our simulations show that LaFA requires an order of magnitude less memory compared to today's state-of-the-art RegEx detection systems. A single commodity Field Programmable Gate Array (FPGA) chip can accommodate up to twenty-five thousand (25k) RegExes. Based on the throughput of our LaFA prototype on FPGA, we estimated that a 34-Gbps throughput can be achieved.",
keywords = "deep packet inspection, finite automation, FPGA, LaFA, network intrusion detection system, regular expressions",
author = "Masanori Bando and Artan, {N. Sertac} and Chao, {H. Jonathan}",
year = "2009",
doi = "10.1145/1882486.1882496",
language = "English (US)",
isbn = "9781605586304",
pages = "40--49",
booktitle = "ANCS'09: Symposium on Architecture for Networking and Communications Systems",

}

TY - GEN

T1 - LaFA

T2 - Lookahead finite automata for scalable regular expression detection

AU - Bando, Masanori

AU - Artan, N. Sertac

AU - Chao, H. Jonathan

PY - 2009

Y1 - 2009

N2 - Although Regular Expressions (RegExes) have been widely used in network security applications, their inherent complexity often limits the total number of RegExes that can be detected using a single chip for a reasonable throughput. This limit on the number of RegExes impairs the scalability of today's RegEx detection systems. The scalability of existing schemes is generally limited by the traditional per character state processing and state transition detection paradigm. The main focus of existing schemes is in optimizing the number of states and the required transitions, but not the suboptimal character-based detection method. Furthermore, the potential benefits of reduced number of operations and states using out-of-sequence detection methods have not been explored. In this paper, we propose Looka-head Finite Automata (LaFA) to perform scalable RegEx detection using very small amount of memory. LaFA's memory requirement is very small due to the following three areas of effort described in this paper: (1) Different parts of a RegEx, namely RegEx components, are detected using different detectors, each of which is specialized and optimized for the detection of a certain RegEx component. (2) We systematically reorder the RegEx component detection sequence, which provides us with new possibilities for memory optimization. (3) Many redundant states in classical finite automata are identified and eliminated in LaFA. Our simulations show that LaFA requires an order of magnitude less memory compared to today's state-of-the-art RegEx detection systems. A single commodity Field Programmable Gate Array (FPGA) chip can accommodate up to twenty-five thousand (25k) RegExes. Based on the throughput of our LaFA prototype on FPGA, we estimated that a 34-Gbps throughput can be achieved.

AB - Although Regular Expressions (RegExes) have been widely used in network security applications, their inherent complexity often limits the total number of RegExes that can be detected using a single chip for a reasonable throughput. This limit on the number of RegExes impairs the scalability of today's RegEx detection systems. The scalability of existing schemes is generally limited by the traditional per character state processing and state transition detection paradigm. The main focus of existing schemes is in optimizing the number of states and the required transitions, but not the suboptimal character-based detection method. Furthermore, the potential benefits of reduced number of operations and states using out-of-sequence detection methods have not been explored. In this paper, we propose Looka-head Finite Automata (LaFA) to perform scalable RegEx detection using very small amount of memory. LaFA's memory requirement is very small due to the following three areas of effort described in this paper: (1) Different parts of a RegEx, namely RegEx components, are detected using different detectors, each of which is specialized and optimized for the detection of a certain RegEx component. (2) We systematically reorder the RegEx component detection sequence, which provides us with new possibilities for memory optimization. (3) Many redundant states in classical finite automata are identified and eliminated in LaFA. Our simulations show that LaFA requires an order of magnitude less memory compared to today's state-of-the-art RegEx detection systems. A single commodity Field Programmable Gate Array (FPGA) chip can accommodate up to twenty-five thousand (25k) RegExes. Based on the throughput of our LaFA prototype on FPGA, we estimated that a 34-Gbps throughput can be achieved.

KW - deep packet inspection

KW - finite automation

KW - FPGA

KW - LaFA

KW - network intrusion detection system

KW - regular expressions

UR - http://www.scopus.com/inward/record.url?scp=77954055303&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954055303&partnerID=8YFLogxK

U2 - 10.1145/1882486.1882496

DO - 10.1145/1882486.1882496

M3 - Conference contribution

AN - SCOPUS:77954055303

SN - 9781605586304

SP - 40

EP - 49

BT - ANCS'09: Symposium on Architecture for Networking and Communications Systems

ER -