A multi-dimensional progressive perfect hashing for high-speed string matching

Yang Xu, Lei Ma, Zhaobo Liu, H. Jonathan Chao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Aho-Corasick (AC) automaton is widely used for multi-string matching in today's Network Intrusion Detection System (NIDS). With fast-growing rule sets, implementing AC automaton with a small memory without sacrificing its performance has remained challenging in NIDS design. In this paper, we propose a multi-dimensional progressive perfect hashing algorithm named P 2-Hashing, which allows transitions of an AC automaton to be placed in a compact hash table without any collision. P2-Hashing is based on the observation that a hash key of each transition consists of two dimensions, namely a source state ID and an input character. When placing a transition in a hash table and causing a collision, we can change the value of a dimension of the hash key to rehash the transition to a new location of the hash table. For a given AC automaton, P2-Hashing first divides all the transitions into many small sets based on the two-dimensional values of the hash keys, and then places the sets of transitions progressively into the hash table until all are placed. Hash collisions that occurred during the insertion of a transition will only affect the transitions in the same set. The proposed P 2-Hashing has many unique properties, including fast hash index generation and zero memory overhead, which are very suitable for the AC automaton operation. The feasibility and performance of P2-Hashing are investigated through simulations on the full Snort (6.4k rules) and Clam AV (54k rules) rule sets, each of which is first converted to a single AC automaton. Simulation results show that P2-Hashing can successfully construct the perfect hash table even when the load factor of the hash table is as high as 0.91.

Original languageEnglish (US)
Title of host publicationProceedings - 2011 7th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS 2011
Pages167-177
Number of pages11
DOIs
StatePublished - 2011
Event2011 7th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS 2011 - Brooklyn, NY, United States
Duration: Oct 3 2011Oct 4 2011

Other

Other2011 7th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS 2011
CountryUnited States
CityBrooklyn, NY
Period10/3/1110/4/11

Fingerprint

Intrusion detection
Data storage equipment
Systems analysis

Keywords

  • Aho-Corasick Automaton
  • Hash Collision
  • Multi-string Matching
  • Perfect Hash Table

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture

Cite this

Xu, Y., Ma, L., Liu, Z., & Chao, H. J. (2011). A multi-dimensional progressive perfect hashing for high-speed string matching. In Proceedings - 2011 7th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS 2011 (pp. 167-177). [6062729] https://doi.org/10.1109/ANCS.2011.33

A multi-dimensional progressive perfect hashing for high-speed string matching. / Xu, Yang; Ma, Lei; Liu, Zhaobo; Chao, H. Jonathan.

Proceedings - 2011 7th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS 2011. 2011. p. 167-177 6062729.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Xu, Y, Ma, L, Liu, Z & Chao, HJ 2011, A multi-dimensional progressive perfect hashing for high-speed string matching. in Proceedings - 2011 7th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS 2011., 6062729, pp. 167-177, 2011 7th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS 2011, Brooklyn, NY, United States, 10/3/11. https://doi.org/10.1109/ANCS.2011.33
Xu Y, Ma L, Liu Z, Chao HJ. A multi-dimensional progressive perfect hashing for high-speed string matching. In Proceedings - 2011 7th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS 2011. 2011. p. 167-177. 6062729 https://doi.org/10.1109/ANCS.2011.33
Xu, Yang ; Ma, Lei ; Liu, Zhaobo ; Chao, H. Jonathan. / A multi-dimensional progressive perfect hashing for high-speed string matching. Proceedings - 2011 7th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS 2011. 2011. pp. 167-177
@inproceedings{01a255dc0c2c42cbb42881bcb01eb984,
title = "A multi-dimensional progressive perfect hashing for high-speed string matching",
abstract = "Aho-Corasick (AC) automaton is widely used for multi-string matching in today's Network Intrusion Detection System (NIDS). With fast-growing rule sets, implementing AC automaton with a small memory without sacrificing its performance has remained challenging in NIDS design. In this paper, we propose a multi-dimensional progressive perfect hashing algorithm named P 2-Hashing, which allows transitions of an AC automaton to be placed in a compact hash table without any collision. P2-Hashing is based on the observation that a hash key of each transition consists of two dimensions, namely a source state ID and an input character. When placing a transition in a hash table and causing a collision, we can change the value of a dimension of the hash key to rehash the transition to a new location of the hash table. For a given AC automaton, P2-Hashing first divides all the transitions into many small sets based on the two-dimensional values of the hash keys, and then places the sets of transitions progressively into the hash table until all are placed. Hash collisions that occurred during the insertion of a transition will only affect the transitions in the same set. The proposed P 2-Hashing has many unique properties, including fast hash index generation and zero memory overhead, which are very suitable for the AC automaton operation. The feasibility and performance of P2-Hashing are investigated through simulations on the full Snort (6.4k rules) and Clam AV (54k rules) rule sets, each of which is first converted to a single AC automaton. Simulation results show that P2-Hashing can successfully construct the perfect hash table even when the load factor of the hash table is as high as 0.91.",
keywords = "Aho-Corasick Automaton, Hash Collision, Multi-string Matching, Perfect Hash Table",
author = "Yang Xu and Lei Ma and Zhaobo Liu and Chao, {H. Jonathan}",
year = "2011",
doi = "10.1109/ANCS.2011.33",
language = "English (US)",
isbn = "9780769545219",
pages = "167--177",
booktitle = "Proceedings - 2011 7th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS 2011",

}

TY - GEN

T1 - A multi-dimensional progressive perfect hashing for high-speed string matching

AU - Xu, Yang

AU - Ma, Lei

AU - Liu, Zhaobo

AU - Chao, H. Jonathan

PY - 2011

Y1 - 2011

N2 - Aho-Corasick (AC) automaton is widely used for multi-string matching in today's Network Intrusion Detection System (NIDS). With fast-growing rule sets, implementing AC automaton with a small memory without sacrificing its performance has remained challenging in NIDS design. In this paper, we propose a multi-dimensional progressive perfect hashing algorithm named P 2-Hashing, which allows transitions of an AC automaton to be placed in a compact hash table without any collision. P2-Hashing is based on the observation that a hash key of each transition consists of two dimensions, namely a source state ID and an input character. When placing a transition in a hash table and causing a collision, we can change the value of a dimension of the hash key to rehash the transition to a new location of the hash table. For a given AC automaton, P2-Hashing first divides all the transitions into many small sets based on the two-dimensional values of the hash keys, and then places the sets of transitions progressively into the hash table until all are placed. Hash collisions that occurred during the insertion of a transition will only affect the transitions in the same set. The proposed P 2-Hashing has many unique properties, including fast hash index generation and zero memory overhead, which are very suitable for the AC automaton operation. The feasibility and performance of P2-Hashing are investigated through simulations on the full Snort (6.4k rules) and Clam AV (54k rules) rule sets, each of which is first converted to a single AC automaton. Simulation results show that P2-Hashing can successfully construct the perfect hash table even when the load factor of the hash table is as high as 0.91.

AB - Aho-Corasick (AC) automaton is widely used for multi-string matching in today's Network Intrusion Detection System (NIDS). With fast-growing rule sets, implementing AC automaton with a small memory without sacrificing its performance has remained challenging in NIDS design. In this paper, we propose a multi-dimensional progressive perfect hashing algorithm named P 2-Hashing, which allows transitions of an AC automaton to be placed in a compact hash table without any collision. P2-Hashing is based on the observation that a hash key of each transition consists of two dimensions, namely a source state ID and an input character. When placing a transition in a hash table and causing a collision, we can change the value of a dimension of the hash key to rehash the transition to a new location of the hash table. For a given AC automaton, P2-Hashing first divides all the transitions into many small sets based on the two-dimensional values of the hash keys, and then places the sets of transitions progressively into the hash table until all are placed. Hash collisions that occurred during the insertion of a transition will only affect the transitions in the same set. The proposed P 2-Hashing has many unique properties, including fast hash index generation and zero memory overhead, which are very suitable for the AC automaton operation. The feasibility and performance of P2-Hashing are investigated through simulations on the full Snort (6.4k rules) and Clam AV (54k rules) rule sets, each of which is first converted to a single AC automaton. Simulation results show that P2-Hashing can successfully construct the perfect hash table even when the load factor of the hash table is as high as 0.91.

KW - Aho-Corasick Automaton

KW - Hash Collision

KW - Multi-string Matching

KW - Perfect Hash Table

UR - http://www.scopus.com/inward/record.url?scp=81255143348&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=81255143348&partnerID=8YFLogxK

U2 - 10.1109/ANCS.2011.33

DO - 10.1109/ANCS.2011.33

M3 - Conference contribution

SN - 9780769545219

SP - 167

EP - 177

BT - Proceedings - 2011 7th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS 2011

ER -