Performance and architectural issues for string matching

Merrill E. Isenman, Dennis Shasha

Research output: Contribution to journalArticle

Abstract

The authors introduce special heuristics to the Knuth-Morris-Pratt algorithm to reduce the time and space required to perform the string matching. They compare their hardware-based approach to the software approaches embodied in the Unix system grep and fgrep commands. Simulation results show that the hardware approach can provide a 25-500-fold performance improvement, depending on the complexity of the query, and that it is fast enough, even in the presence of variable-length 'don't cares' to keep up with a 20-million character/second disk. The approach compares favorably to other hardware designs in speed and space. The proposed hardware implementation requires 10 kB of one cycle static memory, 28 single-character comparators, four 16-b adders, and control logic for four finite-state machines with a term-matcher controller. After that, additional hardware produces negligible performance improvements for queries with up to 80 terms, about half of which have variable-length 'don't cares.'

Original languageEnglish (US)
Pages (from-to)238-250
Number of pages13
JournalIEEE Transactions on Computers
Volume39
Issue number2
DOIs
StatePublished - Feb 1990

Fingerprint

String Matching
Hardware
Query
Hardware Design
Hardware Implementation
State Machine
Term
Fold
Heuristics
Logic
Adders
Controller
Cycle
Finite automata
Software
Architecture
Simulation
Data storage equipment
Controllers
Character

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Cite this

Performance and architectural issues for string matching. / Isenman, Merrill E.; Shasha, Dennis.

In: IEEE Transactions on Computers, Vol. 39, No. 2, 02.1990, p. 238-250.

Research output: Contribution to journalArticle

@article{afa7c5440fb949408a61c97ea9201dc4,
title = "Performance and architectural issues for string matching",
abstract = "The authors introduce special heuristics to the Knuth-Morris-Pratt algorithm to reduce the time and space required to perform the string matching. They compare their hardware-based approach to the software approaches embodied in the Unix system grep and fgrep commands. Simulation results show that the hardware approach can provide a 25-500-fold performance improvement, depending on the complexity of the query, and that it is fast enough, even in the presence of variable-length 'don't cares' to keep up with a 20-million character/second disk. The approach compares favorably to other hardware designs in speed and space. The proposed hardware implementation requires 10 kB of one cycle static memory, 28 single-character comparators, four 16-b adders, and control logic for four finite-state machines with a term-matcher controller. After that, additional hardware produces negligible performance improvements for queries with up to 80 terms, about half of which have variable-length 'don't cares.'",
author = "Isenman, {Merrill E.} and Dennis Shasha",
year = "1990",
month = "2",
doi = "10.1109/12.45209",
language = "English (US)",
volume = "39",
pages = "238--250",
journal = "IEEE Transactions on Computers",
issn = "0018-9340",
publisher = "IEEE Computer Society",
number = "2",

}

TY - JOUR

T1 - Performance and architectural issues for string matching

AU - Isenman, Merrill E.

AU - Shasha, Dennis

PY - 1990/2

Y1 - 1990/2

N2 - The authors introduce special heuristics to the Knuth-Morris-Pratt algorithm to reduce the time and space required to perform the string matching. They compare their hardware-based approach to the software approaches embodied in the Unix system grep and fgrep commands. Simulation results show that the hardware approach can provide a 25-500-fold performance improvement, depending on the complexity of the query, and that it is fast enough, even in the presence of variable-length 'don't cares' to keep up with a 20-million character/second disk. The approach compares favorably to other hardware designs in speed and space. The proposed hardware implementation requires 10 kB of one cycle static memory, 28 single-character comparators, four 16-b adders, and control logic for four finite-state machines with a term-matcher controller. After that, additional hardware produces negligible performance improvements for queries with up to 80 terms, about half of which have variable-length 'don't cares.'

AB - The authors introduce special heuristics to the Knuth-Morris-Pratt algorithm to reduce the time and space required to perform the string matching. They compare their hardware-based approach to the software approaches embodied in the Unix system grep and fgrep commands. Simulation results show that the hardware approach can provide a 25-500-fold performance improvement, depending on the complexity of the query, and that it is fast enough, even in the presence of variable-length 'don't cares' to keep up with a 20-million character/second disk. The approach compares favorably to other hardware designs in speed and space. The proposed hardware implementation requires 10 kB of one cycle static memory, 28 single-character comparators, four 16-b adders, and control logic for four finite-state machines with a term-matcher controller. After that, additional hardware produces negligible performance improvements for queries with up to 80 terms, about half of which have variable-length 'don't cares.'

UR - http://www.scopus.com/inward/record.url?scp=0025385255&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0025385255&partnerID=8YFLogxK

U2 - 10.1109/12.45209

DO - 10.1109/12.45209

M3 - Article

VL - 39

SP - 238

EP - 250

JO - IEEE Transactions on Computers

JF - IEEE Transactions on Computers

SN - 0018-9340

IS - 2

ER -