Hardware implementation for scalable lookahead regular expressions detection

Masanori Bando, N. Sertac Artan, Nishit Mehta, Yi Guan, H. Jonathan Chao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Regular Expressions (RegExes) are widely used in various applications to identify strings of text. Their flexibility, however, increases the complexity of the detection system and often limits the detection speed as well as the total number of RegExes that can be detected using limited resources. The two classical detection methods, Deterministic Finite Automaton (DFA) and Non-Deterministic Finite Automaton (NFA), have the potential problems of prohibitively large memory requirements and a large number of concurrent operations, respectively. Although recent schemes addressing these problems to improve DFA and NFA are promising, they are inherently limited by their scalability, since they follow the state transition model in DFA and NFA, where the state transitions occur per each character of the input. We recently proposed a scalable RegEx detection system called Lookahead Finite Automata (LaFA) to solve these problems with three novel ideas: 1. Provide specialized and optimized detection modules to increase resource utilizations. 2. Systematically reordering the RegEx detection sequence to reduce number of concurrent operations. 3. Sharing states among automata for different RegExes to reduce resource requirements. In this paper, we propose an efficient hardware architecture and prototype design implementation based on LaFA. Our proof-of-concept prototype design is built on a fraction of a single commodity Field Programmable Gate Array (FPGA) chip and can accommodate up to twenty-five thousand (25k) RegExes. Using only 7% of the logic area and 25% of the memory on a Xilinx Virtex-4 FX100, the prototype design can achieve 2-Gbps (gigabits-per-second) detection throughput with only one detection engine. We estimate that 34-Gbps detection throughput can be achieved if the entire resources of a state-of-the-art FPGA chip are used to implement multiple detection engines.

Original languageEnglish (US)
Title of host publicationProceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010
DOIs
StatePublished - 2010
Event2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010 - Atlanta, GA, United States
Duration: Apr 19 2010Apr 23 2010

Other

Other2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010
CountryUnited States
CityAtlanta, GA
Period4/19/104/23/10

Fingerprint

Regular Expressions
Look-ahead
Hardware Implementation
Finite automata
Hardware
Finite Automata
Deterministic Finite Automata
Resources
Field programmable gate arrays (FPGA)
Prototype
State Transition
Field Programmable Gate Array
Throughput
Concurrent
Engines
Data storage equipment
Engine
Chip
Transition Model
Hardware Architecture

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software
  • Theoretical Computer Science

Cite this

Bando, M., Artan, N. S., Mehta, N., Guan, Y., & Chao, H. J. (2010). Hardware implementation for scalable lookahead regular expressions detection. In Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010 [5470750] https://doi.org/10.1109/IPDPSW.2010.5470750

Hardware implementation for scalable lookahead regular expressions detection. / Bando, Masanori; Artan, N. Sertac; Mehta, Nishit; Guan, Yi; Chao, H. Jonathan.

Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010. 2010. 5470750.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bando, M, Artan, NS, Mehta, N, Guan, Y & Chao, HJ 2010, Hardware implementation for scalable lookahead regular expressions detection. in Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010., 5470750, 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010, Atlanta, GA, United States, 4/19/10. https://doi.org/10.1109/IPDPSW.2010.5470750
Bando M, Artan NS, Mehta N, Guan Y, Chao HJ. Hardware implementation for scalable lookahead regular expressions detection. In Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010. 2010. 5470750 https://doi.org/10.1109/IPDPSW.2010.5470750
Bando, Masanori ; Artan, N. Sertac ; Mehta, Nishit ; Guan, Yi ; Chao, H. Jonathan. / Hardware implementation for scalable lookahead regular expressions detection. Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010. 2010.
@inproceedings{2a707e45fbe14b58be83259f2358a521,
title = "Hardware implementation for scalable lookahead regular expressions detection",
abstract = "Regular Expressions (RegExes) are widely used in various applications to identify strings of text. Their flexibility, however, increases the complexity of the detection system and often limits the detection speed as well as the total number of RegExes that can be detected using limited resources. The two classical detection methods, Deterministic Finite Automaton (DFA) and Non-Deterministic Finite Automaton (NFA), have the potential problems of prohibitively large memory requirements and a large number of concurrent operations, respectively. Although recent schemes addressing these problems to improve DFA and NFA are promising, they are inherently limited by their scalability, since they follow the state transition model in DFA and NFA, where the state transitions occur per each character of the input. We recently proposed a scalable RegEx detection system called Lookahead Finite Automata (LaFA) to solve these problems with three novel ideas: 1. Provide specialized and optimized detection modules to increase resource utilizations. 2. Systematically reordering the RegEx detection sequence to reduce number of concurrent operations. 3. Sharing states among automata for different RegExes to reduce resource requirements. In this paper, we propose an efficient hardware architecture and prototype design implementation based on LaFA. Our proof-of-concept prototype design is built on a fraction of a single commodity Field Programmable Gate Array (FPGA) chip and can accommodate up to twenty-five thousand (25k) RegExes. Using only 7{\%} of the logic area and 25{\%} of the memory on a Xilinx Virtex-4 FX100, the prototype design can achieve 2-Gbps (gigabits-per-second) detection throughput with only one detection engine. We estimate that 34-Gbps detection throughput can be achieved if the entire resources of a state-of-the-art FPGA chip are used to implement multiple detection engines.",
author = "Masanori Bando and Artan, {N. Sertac} and Nishit Mehta and Yi Guan and Chao, {H. Jonathan}",
year = "2010",
doi = "10.1109/IPDPSW.2010.5470750",
language = "English (US)",
isbn = "9781424465347",
booktitle = "Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010",

}

TY - GEN

T1 - Hardware implementation for scalable lookahead regular expressions detection

AU - Bando, Masanori

AU - Artan, N. Sertac

AU - Mehta, Nishit

AU - Guan, Yi

AU - Chao, H. Jonathan

PY - 2010

Y1 - 2010

N2 - Regular Expressions (RegExes) are widely used in various applications to identify strings of text. Their flexibility, however, increases the complexity of the detection system and often limits the detection speed as well as the total number of RegExes that can be detected using limited resources. The two classical detection methods, Deterministic Finite Automaton (DFA) and Non-Deterministic Finite Automaton (NFA), have the potential problems of prohibitively large memory requirements and a large number of concurrent operations, respectively. Although recent schemes addressing these problems to improve DFA and NFA are promising, they are inherently limited by their scalability, since they follow the state transition model in DFA and NFA, where the state transitions occur per each character of the input. We recently proposed a scalable RegEx detection system called Lookahead Finite Automata (LaFA) to solve these problems with three novel ideas: 1. Provide specialized and optimized detection modules to increase resource utilizations. 2. Systematically reordering the RegEx detection sequence to reduce number of concurrent operations. 3. Sharing states among automata for different RegExes to reduce resource requirements. In this paper, we propose an efficient hardware architecture and prototype design implementation based on LaFA. Our proof-of-concept prototype design is built on a fraction of a single commodity Field Programmable Gate Array (FPGA) chip and can accommodate up to twenty-five thousand (25k) RegExes. Using only 7% of the logic area and 25% of the memory on a Xilinx Virtex-4 FX100, the prototype design can achieve 2-Gbps (gigabits-per-second) detection throughput with only one detection engine. We estimate that 34-Gbps detection throughput can be achieved if the entire resources of a state-of-the-art FPGA chip are used to implement multiple detection engines.

AB - Regular Expressions (RegExes) are widely used in various applications to identify strings of text. Their flexibility, however, increases the complexity of the detection system and often limits the detection speed as well as the total number of RegExes that can be detected using limited resources. The two classical detection methods, Deterministic Finite Automaton (DFA) and Non-Deterministic Finite Automaton (NFA), have the potential problems of prohibitively large memory requirements and a large number of concurrent operations, respectively. Although recent schemes addressing these problems to improve DFA and NFA are promising, they are inherently limited by their scalability, since they follow the state transition model in DFA and NFA, where the state transitions occur per each character of the input. We recently proposed a scalable RegEx detection system called Lookahead Finite Automata (LaFA) to solve these problems with three novel ideas: 1. Provide specialized and optimized detection modules to increase resource utilizations. 2. Systematically reordering the RegEx detection sequence to reduce number of concurrent operations. 3. Sharing states among automata for different RegExes to reduce resource requirements. In this paper, we propose an efficient hardware architecture and prototype design implementation based on LaFA. Our proof-of-concept prototype design is built on a fraction of a single commodity Field Programmable Gate Array (FPGA) chip and can accommodate up to twenty-five thousand (25k) RegExes. Using only 7% of the logic area and 25% of the memory on a Xilinx Virtex-4 FX100, the prototype design can achieve 2-Gbps (gigabits-per-second) detection throughput with only one detection engine. We estimate that 34-Gbps detection throughput can be achieved if the entire resources of a state-of-the-art FPGA chip are used to implement multiple detection engines.

UR - http://www.scopus.com/inward/record.url?scp=77954054248&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954054248&partnerID=8YFLogxK

U2 - 10.1109/IPDPSW.2010.5470750

DO - 10.1109/IPDPSW.2010.5470750

M3 - Conference contribution

AN - SCOPUS:77954054248

SN - 9781424465347

BT - Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010

ER -