This post is my note I kept when I was reading the Introduction to Algorithms book.

String matching problem is formalized as follows.

- $T[1\cdots n]$ : An array of length $n$.
- $P[1\cdots m]$ : An array of the pattern that we are looking for.
- Assumption: Both elements of $P$ and $T$ are characters drawn from a finite alphabet $\Sigma$.

According to problems, $\Sigma$ could be $\{0, 1\}$ or $\{a, b, \ldots, z \}$.

And, to make describing the problem easier, we define the following concepts:

**$P$ occurs with shift s**(starting from 0): Or pattern $P$ occurs beginning at position $s+1$(starting from 1) if $0 \leq s \leq n-m $ and $T[s+1\ldots s+m] = P[1\ldots m]$. And such a shift position is also called a**valid shift**. Otherwise, a shift is an**invalid shift**.

Thus, the string matching is the problem of finding all valid shifts with which a given pattern $P$ occurs in a given text $T$.

# Overview of String Matching Algorithms

Algorithm | Preprocessing Time | Matching Time |
---|---|---|

Naive | 0 | $O((n-m+1)m)$ |

Rabin-Karp | $\Theta(m)$ | $O((n-m+1)m)$ |

Finite Automaton | $O(m\vert \Sigma \vert)$ | $\Theta(n)$ |

Knuth-Morris-Pratt |
$\Theta(m)$ | $\Theta(n)$ |

The first time I saw this table, I was confused completely. The Rabin-Karp method is definitely off somewhere. Its total running time is worse than the Naive method. Yes, that’s true for the worst cases. But the Rabin-Karp algorithm works much better on average and in practice. And it generalizes nicely to other pattern-matching problems.

# Notation and Terminology

- $\Sigma ^ \ast$ : the set of all finite-length strings formed using characters from the alphabet $\Sigma$
**Concatenation**: the concatenation of two strings $x$ and $y$ is denoted $xy$.**Prefix**: String $w$ is a prefix of a string $x$, denoted $w \sqsubset x$, if $x = wy$ for some string $y \in \Sigma^\ast$.**Suffix**: denoted $w \sqsupset x$, if $x = yw$ for some $y \in \Sigma^\ast$.**Empty String**: denoted $\varepsilon$

By definition, empty string is both a suffix and a prefix of all strings.

From the notations we already have, we can move forward to have the following lemma.

**Overlapping-suffix Lemma**: Suppose that $x$, $y$, and $z$ are strings such that $x\sqsupset z$ and $y \sqsupset z$. If $|x| \leq |z|$, then $x \sqsupset y$. If $|x|\geq |y|$, then $y\sqsupset x$. If $|x| = |y|$, then x=y.

## Leave a Comment