Wednesday, May 16, 2007

Formal Definition of Regular expression:

Regular expressions can be expressed in terms of formal Language Theory, Regular expressions consist of constants and operators that denote sets of strings and operations over these sets, respectively. Given a finite alphabet Σ the following constants are defined:

(empty set) ∅ denoting the set ∅
(empty string) ε denoting the set {ε}
(Literal character ) a in Σ denoting the set {a}
and the following operations:
(concatenation) RS denoting the set { αβ | α in R and β in S }. For example {"ab", "c"}{"d", "ef"} = {"abd", "abef", "cd", "cef"}.
(alternation) R|S denoting the set union of R and S.
(Kleene Star )R* denoting the smallest superset of R that contains ε and is closed under string concatenation. This is the set of all strings that can be made by concatenating zero or more strings in R. For example, {"ab", "c"}* = {ε, "ab", "c", "abab", "abc", "cab", "cc", "ababab", ... }.
The above constants and operators form a Kleene Algebra.

Many textbooks use the symbols ∪, +, or ∨ for alternation instead of the vertical bar.To avoid brackets it is assumed that the Kleene star has the highest priority, then concatenation and then set union. If there is no ambiguity then brackets may be omitted. For example, (ab)c is written as abc and a|(b(c*)) can be written as a|bc*.
After this we will discuss formal definition of Regular Grammar.

No comments: