Wednesday, August 1, 2007

unambiguous grammar

Let’s see the solution of following question:
Problem:

Consider the grammar:
S -> aS | aSbS | epsilon
where S is the only non-terminal, and epsilon is the null string.

a) Show that the grammar is ambiguous, by giving two parse trees for the string aab.
b) Find an unambiguous grammar that generates these strings.

Solution:

Solution-(a)

The ambiguity is easy to show: you can derive the string aab as follows:
(at every step we expand the leftmost non-terminal);

S -> aSbS -> aaSbS -> aabS -> aab
S -> aS -> aaSbS -> aabS -> aab

These two parses correspond to associating the b with the first or the
second a.

Solution-(b)

We can disambiguate by using a similar approach to the dangling else, and
decide that each b should be associated with the nearest a. This means that
the expansion within an ab pair should always be balanced. This leads to
the following grammar:

S -> a S | S1 S | epsilon
S1 -> a S1 S1 b | epsilon

It is easy to verify that this generates the same strings as the original
grammar, and the parse tree is always unique, because one b is always associated
with the most recent a.

Note that the answer is not necessarily unique. If the grammar is ambiguous,
it means that we get to choose between possible parses, and each choice is
in a sense a different language. For example, given the ambiguous grammar
for expressions:

E -> E + E | E * E | id

We say that the unambiguous grammar we want is:

E -> E + T | T, T -> T * T | id

because it gives us the proper precedence between the two operators. But that
choice is in no way mandated by the grammar. We could just as well choose:

E -> E * T | T , T -> T + T | id

which generates the same strings, but gives the opposite precedence to
operators.

Monday, July 23, 2007

Dangling Else Problem

In this post, we will see dangling else problem. It is the best example of ambiguous grammar.

Dangling Else Problem:
The dangling else is a well-known problem in computer programming in which a seemingly well defined grammar can become ambiguous. In many programming languages you can write code like this:

if a then if b then s1 else s2

which can be understood in two ways. Either as

if a then
if b then
s1
else
s2
or as
if a then
if b then
s1
else
s2

It can be solved either at the implementation level, by telling the parse the right way to solve the ambiguity, or at the grammar level by using a parsing expression grammar or equivalent.

In next post , I will give an example which will cover all topics related to ambiguous grammar

Tuesday, July 17, 2007

ambiguous grammar examples

In the last post, We saw 2 possible parse trees for a+a+a. How many are there for a+a+a+a?

There are 5 possible parses:

(a)+((a+a)+a),

(a)+(a+(a+a)),

(a+a)+(a+a),

((a+a)+a)+(a),

(a+(a+a))+(a)


While in general it may be difficult to prove a grammar is ambiguous, the demonstration of two distinct parse trees for the same terminal string is sufficient proof that a grammar is ambiguous.

Friday, July 13, 2007

Ambiguous Grammar

What does it mean for a grammar to be ambiguous?

A grammar is ambiguous iff there exists a string s in L(G) for which s has more than one parse tree.

·What is an example ambiguous grammar?

E → E '+' E | 'a'

·Prove that the previous grammar is ambiguous.

Given the input a+a+a, we can produce two different parse trees. The notation N(...) means we build the nonterminal N from the symbols (...)

a+a+a →* E(a)+E(a)+E(a) → E(E(a) + E(a)) + E(a) → E

and

a+a+a →* E(a)+E(a)+E(a) → E(a) + E(E(a) + E(a)) → E

Monday, July 9, 2007

For example:
The x+y*z is interpreted as
(x+y)*z

by this grammar (if we use leftmost derivations) and as

x+(y*z)

by G1 or G2.

That is, both G1 and G2 grammar handle the operator precedence correctly (since * has higher precedence than +), while the G3 grammar does not.

In next post, I will explain the most interesting part of context free grammars that is ambiguous grammar.

Thursday, July 5, 2007

equivalent grammer

Consider the following grammar G1 again:
E ::= E + T | E - T | T
T ::= T * F | T / F | F
F ::= num | id

Consider now the following grammar G2

E ::= T + E | T – E | T
T ::= F * T | F / T | F
F ::= num | id

Consider now the following grammar G3:
E ::= E + E | E – E | E * E | E / E | num | id

Is this grammar equivalent to our original grammar G1?

Well, it recognizes the same language, but it constructs the wrong parse trees.

Saturday, June 30, 2007

parse tree

Consider now the following grammar G2:

E ::= T + E | T – E | T
T ::= F * T | F / T | F
F ::= num | id

This is similar to our original grammar G1 (which we considered in our previous posts), but it is right associative when the leftmost derivation rules is used. That is, x-y-z is equivalent to x-(y-z) under G2, as we can see from its parse tree.

Wednesday, June 27, 2007

Parse tree

Let’s see another example of parse tree. As another example, consider the following grammar:
S ::= ( L )| a

L ::= L , S | S

Under this grammar, the parse tree of the sentence (a,((a, a), a)) is:

S
/ | \
( L )
/ | \
L , S
| / | \
S ( L )
| / | \
a L , S
| |
S a
/ | \
( L )
/ | \
L , S
| |
S a
|
a

Friday, June 22, 2007

Parse Tree

The parse tree of an input sequence according to a CFG is the tree of derivations. For example if we used a production A : : = X1X2...Xn (in either top-down or bottom-up parsing) then we construct a tree with node A and children X1X2...Xn.
For example, the parse tree of id(x) + num(2) * id(y) is:
E
/ | \
E + T
| / | \
T T * F
| | |
F F id
| |
id num

Wednesday, June 20, 2007

Bottom up parsing

Today we will discuss bottom up parsing and parse tree
Bottom up Parsing:
Consider the following grammar G1 again:
E ::= E + T | E - T | T
T ::= T * F | T / F | F
F ::= num | id
In contrast to top-down parsing, bottom-up parsing starts from the input string and uses derivations in the opposite directions (ie, by replacing the RHS sequence X1X2...Xn of a production A : : = X1X2...Xn with the non-terminal A. It stops when it derives the start symbol. For example,
id(x) + num(2) * id(y)
=> id(x) + num(2) * F
=> id(x) + F * F
=> id(x) + T * F
=> id(x) + T
=> F + T
=> T + T
=> E + T
=> E

Friday, June 15, 2007

Top Down Parsing

There are 2 types of parsing:

1.Top Down Parsing
2.Bottom Up Parsing

Consider the following grammar G1 again:

E ::= E + T | E - T | T
T ::= T * F | T / F | F
F ::= num | id

Top-down parsing:

Top-down parsing starts from the start symbol of the grammar S and applies derivations until the entire input string is derived (ie, a sequence of terminals that matches the input tokens). For example,

E => E + T
=> E + T * F
=> T + T * F
=> T + F * F
=> T + num * F
=> F + num * F
=> id + num * F
=> id + num * id

which matches the input sequence id(x) + num(2) * id(y). Top down parsing occasionally requires backtracking. For example, suppose the we used the derivation E => E - T instead of the first derivation. Then, later we would have to backtrack because the derived symbols will not match the input tokens. This is true for all nonterminals that have more than one production since it indicates that there is a choice of which production to use. We will learn how to construct parsers for many types of CFGs that never backtrack. These parsers are based on a method called predictive parsing. One issue to consider is which nonterminal to replace when there is a choice. For example, in T + F * F we have 3 choices: we can use a derivation for T, for the first F, or for the second F. When we always replace the leftmost nonterminal, it is called leftmost derivation.

In next Post, I will explain Bottom up parsing.

Tuesday, June 12, 2007

Examples Of Context Free Languages (CFL)

So far, we have seen very good examples of Regular expressions and Regular Languages. We will see more difficult problems in future. But at this time, we are going to start examples of context free languages (CFL). There are a lot of problems based on CFL. So we will discuss a lot of problems of CFL.

Example 1
Consider the following input string:
x+2*y

When scanned by a scanner, it produces the following stream of tokens:

id(x)
+
num(2)
*
id(y)

The goal is to parse this expression and construct a data structure (a parse tree) to represent it. One possible syntax for expressions is given by the following grammar G1:

E ::= E + T | E - T | T
T ::= T * F | T / F | F
F ::= num | id

where E, T and F stand for expression, term, and factor respectively. For example, the rule for E indicates that an expression E can take one of the following 3 forms: an expression followed by the token + followed by a term, or an expression followed by the token - followed by a term, or simply a term. The first rule for E is actually a shorthand of 3 productions:

E ::= E + T
E ::= E - T
E ::= T
G1 is an example of a context-free grammar (defined below); the symbols E, T and F are nonterminals and should be defined using production rules, while +, -, *, /, num, and id are terminals (i.e., tokens) produced by the scanner. The nonterminal E is the start symbol of the grammar.

In my next Posts, I will explain types of parsing because knowledge of them is necessary for writing context free grammars.

Sunday, June 10, 2007

Example of Regular Expression

Ex. 6:
Find a regular expression corresponding to the language of all strings over the alphabet { a, b } that contain no more than one occurence of the string aa.

Solution:
If there is one substring aa in a string of the language, then that aa can be followed by any number of b. If an a comes after that aa, then that a must be preceded by b because otherwise there are two occurences of aa. Hence any string that follows aa is represented by ( b + ba )*. On the other hand if an a precedes the aa, then it must be followed by b. Hence a string preceding the aa can be represented by ( b + ab )*. Hence if a string of the language contains aa then it corresponds to the regular expression ( b + ab )*aa( b + ba )* .

If there is no aa but at least one a exists in a string of the language, then applying the same argument as for aa to a, ( b + ab )*a( b + ba )* is obtained as a regular expression corresponding to such strings.

If there may not be any a in a string of the language, then applying the same argument as for aa to , ( b + ab )*( b + ba )* is obtained as a regular expression corresponding to such strings.
Altogether ( b + ab )*( + a + aa )( b + ba )* is a regular expression for the language.

Monday, June 4, 2007

More examples

Ex. 5:
Describe as simply as possible in English the language corresponding to the regular expression (( a + b )3)*( #+ a + b ) .

Solution:
(( a + b )3) represents the strings of length 3. Hence (( a + b )3)* represents the strings of length a multiple of 3. Since (( a + b )3)*( a + b ) represents the strings of length 3n + 1, where n is a natural number, the given regular expression represents the strings of length 3n and 3n + 1, where n is a natural number.

Sunday, June 3, 2007

More examples

Ex. 4:
Describe as simply as possible in English the language corresponding to the regular expression a*b(a*ba*b)*a* .

Solution:
A string in the language can start and end with a or b, it has at least one b, and after the first b all the b's in the string appear in pairs. Any numbe of a's can appear any place in the string. Thus simply put, it is the set of strings over the alphabet { a, b } that contain an odd number of b's.
We will continue examples in next post..

Saturday, June 2, 2007

Examples of Regular Grammars and Regular Expressions

Ex. 2:
Find a regular expression corresponding to the language of all strings over the alphabet { a, b } that contain exactly two a's.

Solution:
A string in this language must have at least two a's. Since any string of b's can be placed in front of the first a, behind the second a and between the two a's, and since an arbitrasry string of b's can be represented by the regular expression b*, b*a b*a b* is a regular expression for this language.


Ex. 3:
Find a regular expression corresponding to the language of all strings over the alphabet { a, b } that do not end with ab.

Solution:
Any string in a language over { a , b } must end in a or b. Hence if a string does not end with ab then it ends with a or if it ends with b the last b must be preceded by a symbol b. Since it can have any string in front of the last a or bb, ( a + b )*( a + bb ) is a regular expression for the language

Thursday, May 31, 2007

Examples of Regular Expressions and Regular Languages.

examples of Regular expressions and Regular Languages.
Ex.:
Let r1 and r2 be arbitrary regular expressions over some alphabet. Find a simple (the shortest and with the smallest nesting of * and +) regular expression which is equal to each of the following regular expressions.

(a) (r1 + r2 + r1r2 + r2r1)*
(b) (r1(r1 + r2)*)+

Solution:
One general strategy to approach this type of question is to try to see whether or not they are equal to simple regular expressions that are familiar to us such as a, a*, a+, (a + b)*, (a + b)+ etc.

(a) Since (r1 + r2)* represents all strings consisting of strings of r1 and/or r2 , r1r2 + r2r1 in the given regular expression is redundant, that is, they do not produce any strings that are not represented by (r1 + r2)*. Thus (r1 + r2 + r1r2 + r2r1)* is reduced to (r1 + r2)*.

(b) (r1(r1 + r2)*)+ means that all the strings represented by it must consist of one or more strings of (r1(r1 + r2)*). However, the strings of (r1(r1 + r2)*) start with a string of r1 followed by any number of strings taken arbitrarily from r1 and/or r2.
Thus anything that comes after the first r1 in (r1(r1 + r2)*)+ is represented by (r1 + r2)*. Hence (r1(r1 + r2)*) also represents the strings of (r1(r1 + r2)*)+, and conversely (r1(r1 + r2)*)+ represents the strings represented by (r1(r1 + r2)*). Hence (r1(r1 + r2)*)+ is reduced to (r1(r1 + r2)*).

In our next Posts, we will see some more good examples of Regular expressions and Regular Languages.

Wednesday, May 30, 2007

Examples of regular expression and regular languages

So far, we have discussed several important formal definitions. Now we will see some good examples of them. We will discuss other important definitions time to time.

Examples of regular expression and regular languages corresponding to them

•( a + b )2 corresponds to the language {aa, ab, ba, bb}, that is the set of strings of length 2 over the alphabet {a, b}.

In general ( a + b )k corresponds to the set of strings of length k over the alphabet {a, b}. ( a + b )* corresponds to the set of all strings over the alphabet {a, b}.

•a*b* corresponds to the set of strings consisting of zero or more a's followed by zero or more b's.

•a*b+a* corresponds to the set of strings consisting of zero or more a's followed by one or more b's followed by zero or more a's.

•( ab )+ corresponds to the language {ab, abab, ababab, ... }, that is, the set of strings of repeated ab's.


Note:
A regular expression is not unique for a language. That is, a regular language, in general, corresponds to more than one regular expression. For example (a + b)* and ( a*b* )* correspond to the set of all strings over the alphabet {a, b}.


Definition of Equality of Regular Expressions

Regular expressions are equal if and only if they correspond to the same language.

Thus for example (a + b)* = (a*b*)*, because they both represent the language of all strings over the alphabet {a, b}.

In general, it is not easy to see by inspection whether or not two regular expressions are equal.

Thursday, May 24, 2007

Deterministic

M is deterministic if it satisfies both the following conditions:
• For any q belongs to Q, a belongs to Σ U {Λ} , X belongs to Γ, the set δ(q,a,X) has at most one element.
• For any q belongs to Q, X belongs to Γ, if δ(q, Λ, X) ≠ Ø , then δ(q,a,X) = Ø for every a belongs to Σ

DPDA & NPDA are different only because DPADA is deterministic and it is weaker than NPDA.

Wednesday, May 23, 2007

Deterministic Push down Automata (DPDA)

“Deterministic Push down Automata” (DPDA):

Formal definition:
A PDA M can be defined as a 7-tuple:
M = (Q,Σ,Γ,q0,Z0,A,δ) where
 Q is a finite set of states
 Σ is a finite set of the input alphabet
 Γ is a finite set of the stack alphabet
 q0 is the start state, an element of Q
 Z0 is the initial stack symbol, an element of Γ
 A is the set of final states, a subset of Q
 δ is a finite transition relation (Q x (Σ U {Λ} x Γ ) ----> the set of finite subsets of (Q x Γ* )

Tuesday, May 22, 2007

Acceptance Criteria for PDA

There are two possible acceptance criteria for PDA:

1. acceptance by empty stack
2. acceptance by final state.

The two are easily shown to be equivalent: a final state can perform a pop loop to get to an empty stack, and a machine can detect an empty stack and enter a final state by detecting a unique symbol pushed by the initial state.

Some use a 6-tuple, dropping the Ω for the initial stack symbol, instead adding a first transition which writes a start symbol to the stack.

Tomorrow, we will see formal definition of “Deterministic Push down Automata” (DPDA).

Monday, May 21, 2007

formal definition of Push down Automata.

In this post we will concentrate on formal definition of Push down Automata. It is also called “Non deterministic Push down Automata” (NPDA). But remember that NPDA is different from “Deterministic Push down Automata” (DPDA)
“Non deterministic Push down Automata” (NPDA)
Formal definition:
A NPDA W can be defined as a 7-tuple:
W = (Q,Σ,Φ,σ,s,Ω,F) where
• Q is a finite set of states
• Σ is a finite set of the input alphabet
• Φ is a finite set of the stack alphabet
• σ (or sometimes δ) is a finite transition relation
(Q x (Σ U {ε} x Ø) ----> P (Q x Ø)
• s is an element of Q the start state
• Ω is the initial stack symbol
• F is subset of Q, consisting of the final states.

Sunday, May 20, 2007

Context-Free-Language (CFL)

A language L is said to be a Context-Free-Language (CFL) if its grammar is Context-Free. More precisely, it is a language whose words, sentences and phrases are made of symbols and words from a Context-Free-Grammar. Usually, CFL is of the form L=L(G). Given below are examples for CFG but not for CFL.
Here I am giving you one example of context-free grammar
Example:
A context-free grammar for the language consisting of all strings over {a,b} which contain a different number of a's to b's is
S → U | V
U → TaU | TaT
V → TbV | TbT
T → aTbT | bTaT | ε
Here, T can generate all strings with the same number of a's as b's, U generates all strings with more a's than b's and V generates all strings with fewer a's than b's.

Saturday, May 19, 2007

Context-Free Grammar (formal definition)

Definition of Context-Free Grammar includes the definition of Context free language. So please read the following definition carefully.

Just as any formal grammar, a context-free grammar G can be defined as a 4-tuple:
G = (Vt,Vn,P,S) where
 Vt is a finite set of terminals
 Vn is a finite set of non-terminals
 P is a finite set of production rules
 S is an element of Vn, the distinguished starting non-terminal.
 elements of P are of the form

Vn(Vt U Vn)*

Friday, May 18, 2007

left regular grammar

In a left regular grammar, all rules obey the forms
1. A → a - where A is a non-terminal in N and a is a terminal in Σ
2. A → Ba - where A and B are in N and a is in Σ
3. A → ε - where A is in N and ε is the empty string.
An example of a right regular grammar G with N = {S, A}, Σ = {a, b, c}, P consists of the following rules
S → aS
S → bA
A → ε
A → cA
and S is the start symbol. This grammar describes the same language as the regular expression a*bc*.
A regular grammar is a left regular or right regular grammar.

Thursday, May 17, 2007

Formal Definition of Regular Grammar:

Regular Grammar and Regular Expressions are different.

We have discussed Regular expressions. Let’s see Regular Grammar.

Formal Definition of Regular Grammar:

In Computer Science, a right regular grammar is a formal grammar (N, Σ, P, S) such that all the production rules in P are of one of the following forms:
1. A → a - where A is a non-terminal in N and a is a terminal in Σ
2. A → aB - where A and B are in N and a is in Σ
3. A → ε - where A is in N and ε denotes the empty string, i.e. the string of length 0.

Wednesday, May 16, 2007

Formal Definition of Regular expression:

Regular expressions can be expressed in terms of formal Language Theory, Regular expressions consist of constants and operators that denote sets of strings and operations over these sets, respectively. Given a finite alphabet Σ the following constants are defined:

(empty set) ∅ denoting the set ∅
(empty string) ε denoting the set {ε}
(Literal character ) a in Σ denoting the set {a}
and the following operations:
(concatenation) RS denoting the set { αβ | α in R and β in S }. For example {"ab", "c"}{"d", "ef"} = {"abd", "abef", "cd", "cef"}.
(alternation) R|S denoting the set union of R and S.
(Kleene Star )R* denoting the smallest superset of R that contains ε and is closed under string concatenation. This is the set of all strings that can be made by concatenating zero or more strings in R. For example, {"ab", "c"}* = {ε, "ab", "c", "abab", "abc", "cab", "cc", "ababab", ... }.
The above constants and operators form a Kleene Algebra.

Many textbooks use the symbols ∪, +, or ∨ for alternation instead of the vertical bar.To avoid brackets it is assumed that the Kleene star has the highest priority, then concatenation and then set union. If there is no ambiguity then brackets may be omitted. For example, (ab)c is written as abc and a|(b(c*)) can be written as a|bc*.
After this we will discuss formal definition of Regular Grammar.

Tuesday, May 15, 2007

Non Deterministic Finite Automata (NFA)-continued

For every NFA a deterministic finite state machine (DFA) can be found that accepts the same language. Therefore it is possible to convert an existing NFA into a DFA for the purpose of implementing a (perhaps) simpler machine. This can be performed using the power set construction which may lead to an exponential rise in the number of necessary states.

NFA is very important for depth knowledge of Automata. You will see that almost everything in Automata Theory is related to NFA.
See you later..

Monday, May 14, 2007

Formal Definition of Non-Deterministic Finite State Machine (NFA)

A nondeterministic finite state automaton (NFA) is a 5-tuple, (S, Σ, T, s0, A), consisting of
 a finite set of states (S)
 a finite set of input symbols (Σ)
 a transition function (T : S × (Σ ∪{ε}) → P(S)).
 an initial (or start) state s0 such that s0 ∈ S
 a set of states A distinguished as accepting (or final) states (A ⊆ S)
where P(S) is the power set of S, ε is the empty string, and Σ is the input symbol alphabet.

Given an NFA M.
Given a string w.
There is any number of computation paths of M with input w.
M accepts w if some computation path ends in a final state.

Sunday, May 13, 2007

Deterministic Finite Automata (DFA)

DFAs are equivalent in computing power to NFAs (nondeterministic finite automata).On the other hand, DFAs are of strictly limited power in the languages they can recognize — many simple languages, including any problem that requires more than constant space to solve, cannot be recognized by a DFA.
The classical example of a simply described language that no DFA can recognize is the language consisting of strings of the form anbn — some finite number of a's, followed by an equal number of b's. It can be shown that no DFA can have enough states to recognize such a language.
See you later with our next post….

Formal Definition of Deterministic Finite State Machine (DFA):

Formal Definition of Deterministic Finite State Machine (DFA):

A DFA is a 5-tuple, (S, Σ, T, s, A), consisting of

 a finite set of states (S)
 a finite set called the alphabet (Σ)
 a transition function (T : S × Σ → S)
 a start state (s ∈ S)
 a set of accept states (A ⊆ S)

Given an DFA M.
Given a string w.
There is at most one computation path of M with input w.
M accepts w if that computation path ends in a final state.

Saturday, May 12, 2007

regular language-formal defnition continued

A regular language is a formal language (i.e., a possibly infinite set of finite sequences of symbols from a finite alphabet) that satisfies the following equivalent properties:
 it can be accepted by a deterministic finite state machine
 it can be accepted by a nondeterministic finite state machine
 it can be accepted by an alternating finite automaton
 it can be described by a regular expression
 it can be generated by a regular grammar
 it can be accepted by a read-only Turing machine
In next post, I will discuss formal definition of deterministic finite state machine (DFA). DFA is my Favorite topic.
I am sure you will also like it when we will discuss it in detail after completing all formal definitions.

Friday, May 11, 2007

Regular Language-formal definition

As I said in my previous post, we will discuss formal definitions of a single term at a time. Today, we are going to discuss “Regular languages”

Regular languages over an alphabet (Formal Definition):
The collection of regular languages over an alphabet Σ is defined recursively as follows:
 the empty language Ø is a regular language.
 the empty string language { ε } is a regular language.
 For each a ∈ Σ, the singleton language { a } is a regular language.
 If A and B are regular languages, then A ∪ B (union), A B (concatenation), and A* (Kleene star) are regular languages

Thursday, May 10, 2007

NLIN SPACE AND LIN SPACE

As I said in this post I am going to continue context sensitive languages and grammars.
Computationally the context-sensitive languages are equivalent with linear bounded non-deterministic Turing machine. That is a non-deterministic Turing machine with a tape of only kn cells, where n is the size of the input and k is a constant associated with the machine.
This means that every formal language that can be decided by such a machine is a context-sensitive language, and every context-sensitive language can be decided by such a machine.
This set of languages is also known as NLIN-SPACE, because they can be accepted using linear space on a non-deterministic Turing machine.
The class LIN-SPACE is defined the same, except using a deterministic Turing machine. Clearly LIN-SPACE is a subset of NLIN-SPACE, but it is not known whether LIN-SPACE=NLIN-SPACE. It is widely suspected they are not equal.

Wednesday, May 9, 2007

context sensitive language continued

Note that there must be the same number of symbols on both the left and right sides of the rule. Rules that are applied to symbols must consider the "context" of the symbol.

For example, in the previous rule, a "B" can be turned into an "X" if (but not necessarily only if) it has a preceding "A" and a following "C". This makes it difficult to modify assumptions since individual symbols cannot be interpreted on their own. Also, the number of rules needed to fully specify the grammar might become very large, since there could be many special cases that need to be considered.

We will continue this topic in the next post..

Monday, May 7, 2007

context sensitive languages

I forget to discuss about context sensitive grammars and context sensitive languages. These grammars are not as important as context free grammars and context free languages.

Let’s discuss them….

Context-Sensitive Language:

Context-Sensitive Language is a formal language that can be defined by a context-sensitive grammar. That is one of the four types of grammars in the Chomsky hierarchy. Of the four, this is the least often used, in both theory and practice.

Also known as "type-1" grammars, this class is restricted to Turing machines that can only examine states immediately before and after the current state. So, the grammar might include a rule to turn a sequence A B C into A X C (A B C --> A X B).

Sunday, May 6, 2007

Pushdown Automata

Pushdown automata can also manipulate the stack, as part of performing a transition. Normal finite state machines choose a new state, the result of following the transition. The manipulation can be to push a particular symbol to the top of the stack, or to pop off the top of the stack. The automaton can alternatively ignore the stack, and leave it as it is. The choice of manipulation (or no manipulation) is determined by the transition table.

For every context-free grammar, there exists a pushdown automaton such that the language generated by the grammar is identical with the language generated by the automaton. Conversely, for every pushdown automaton there exists a context-free grammar such that the language generated by the automaton is identical with the language generated by the grammar.

Now, we know all important basic definitions. So, what is the next step?

Well, Next step is to concentrate on the formal definitions of all previously discussed terms. In my next posts, we will focus on one term at a time.

So be prepared for the next post….

Saturday, May 5, 2007

Automata Step by Step

In the previous post we discussed about “context-free languages” and “context-free grammar”. But without understanding the “pushdown automaton”, you can’t get depth knowledge of them.

Let’s see definition of “pushdown automaton”

Pushdown automaton:

A pushdown automaton (PDA) is a finite automaton that can make use of a stack containing data.

Pushdown automata choose a transition by indexing a table by input signal, current state, and the top of the stack. Normal finite state machines just look by input signal and current state: they have no stack to work with. Pushdown automata add the stack as a parameter for choice. Given an input signal, current state, and a given symbol at the top of the stack, a transition path is chosen.

Thursday, May 3, 2007

context free language and context free grammar

Today we will discuss “context-free languages” and “context-free grammar”
Context-free language:
A context-free language is a formal language that is a member of the set of languages defined by context-free grammars. The set of context-free languages is identical to the set of languages accepted by pushdown automata.
To understand it better we will need the definition of context-free grammars.
Context-free grammar:
A context-free grammar (CFG) is a formal grammar in which every production rule is of the form
V → w
Where V is a nonterminal symbol and w is a string consisting of terminals and/or non-terminals. The term "context-free" expresses the fact that the non-terminal V can always be replaced by w, regardless of the context in which it occurs. A formal language is context-free if there is a context-free grammar that generates it.
Familiarly known as "CFGs", or "type-2" grammars. Here, there can only be one nonterminal symbol on the left-hand "assumption" side of the rule. Luckily, any rule can be rewritten to move all symbols to the right-hand "conclusion" side as needed (for example, A B ® C D is the same as A ® B C D. Rules can be applied to individual symbols since there is only one given assumption (like the "A" in the previous example). This means "CFGs are popular for natural language grammars."
Context-free grammars are powerful enough to describe the syntax of most programming languages; in fact, the syntax of most programming languages is specified using context-free grammars. On the other hand, context-free grammars are simple enough to allow the construction of efficient parsing algorithms which, for a given string, determine whether and how it can be generated from the grammar.

Next post will be based on pushdown automata. So keep in touch….

Wednesday, May 2, 2007

Automata step by step

Deterministic Finite Automaton (DFA):
A deterministic finite state machine or deterministic finite automaton (DFA) is a finite state machine where for each pair of state and input symbol there is one and only one transition to a next state. DFAs recognize the set of regular languages and no other languages.
A DFA will take in a string of input symbols. For each input symbol it will then transition to a state given by following a transition function. When the last input symbol has been received it will either accept or reject the string depending on whether the DFA is in an accepting state or a non-accepting state.
Non-Deterministic Finite Automaton (NFA):
A nondeterministic finite state machine or nondeterministic finite automaton (NFA) is a finite state machine where for each pair of state and input symbol there may be several possible next states. Non-deterministic finite state machines are sometimes studied by the name sub shifts of finite type.
So far, we have discussed a lot of definitions, but still we need to know some more definitions. Don’t be impatience, once you become familiar with all the definitions and terms then it will be very interesting. Don’t forget to read my next Post.

Tuesday, May 1, 2007

Automata Step by Step

Now we will discuss regular languages in this post.

Regular language:
A regular language is the set of strings generated by a regular grammar. Regular grammars are also known as Type-3 grammars in the Chomsky hierarchy.
A regular grammar can be represented by a deterministic or non-deterministic finite automaton. Such automata can serve to either generate or accept sentences in a particular regular language. Note that since the set of regular languages is a subset of context-free languages, any deterministic or non-deterministic finite automaton can be simulated by a pushdown automaton.
Now you will be curious to know about the terms “deterministic finite automaton”, “non-deterministic finite automaton”, “pushdown automaton” and “context-free languages”.
In next Post, we will discuss only “deterministic finite automaton” and “nondeterministic finite state machine”.

Monday, April 30, 2007

Automata Step by Step

Now we also need definition of “regular grammar”

Regular Grammars:

Regular grammars describe exactly all regular languages and are in that sense equivalent to finite state automata and regular expressions. Moreover, the right regular grammars by themselves are also equivalent to the regular languages, as are the left regular grammars.

Every regular grammar is a context-free grammar. Every context-free grammar can be easily rewritten into a form in which only a combination of left regular and right regular rules is used. Therefore, such grammars can express all context-free languages. Regular grammars, which use either left-regular or right-regular rules but not both, can only express a smaller set of languages, called the regular languages. In that sense they are equivalent with finite state automata and regular expressions.

What is a regular language? It is important to understand “regular language” for better understanding of above definitions. We will discuss it in our next post..

Sunday, April 29, 2007

Automata Step by Step

Today, we will discuss “Regular expressions”.

Regular expressions:

A regular expression (also "RegEx" or "regex”) is a string that is used to describe or match a set of strings, according to certain syntax rules. The specific syntax rules vary depending on the specific implementation, programming language, or library in use. Additionally, the functionality of regex implementations can vary between versions

In another words, a regular expression, often called a pattern, is an expression that describes a set of strings. They are usually used to give a concise description of a set, without having to list all elements. For example, the set containing the three strings Handel, Händel, and Haendel can be described by the pattern "H(ä|ae?)ndel" (or alternatively, it is said that the pattern matches each of the three strings). In most formalism, if there is any regex that matches a particular set then there are an infinite number of such expressions.

Saturday, April 28, 2007

Automata Step by Step

Now we will define “Sequence”

Sequence:
A sequence is an ordered list of objects (or events). Like a set, it contains members (also called elements or terms), and the number of terms (possibly infinite) is called the length of the sequence. Unlike a set, order matters, and the exact same elements can appear multiple times at different positions in the sequence.

Now we will define “Language”.

Language(or Formal Language):

A Language is a set of strings over a given input alphabet.
As an example of formal language, an alphabet might be {a, b}and a string over that alphabet might be “ababba”

Friday, April 27, 2007

Automata Step by Step

Let’s see the definitions of “String”, “Alphabet”,” Sequence” and “Language”. Remember all above terms are connected. You can’t understand any of them without knowing the definitions of all of them.

String:

In computer programming and formal language theory, (and other branches of mathematics), a string is an ordered sequence of symbols. These symbols are chosen from a predetermined set.

Example: Empty String (contains no symbols)

Length of Empty String is zero.

Now we will define “Alphabet”.

Alphabet:
An alphabet is a finite set of symbols.
Examples:
Binary Alphabet = {0, 1}
Unary alphabet= {1}
ASCII Alphabet= {…..A, B, C………, a, b, c…………}
Decimal Alphabet= {0, 1, 2…….8, 9}

Thursday, April 26, 2007

Automata Step by Step

Formal Grammar:

A formal grammar, or sometimes simply grammar, is a precise description of a formal language — that is, of a set of strings. The two main categories of formal grammar are that of generative grammars, which are sets of rules for how strings in a language can be generated, and that of analytic grammars, which are sets of rules for how a string can be analyzed to determine whether it is a member of the language. In short, an analytic grammar describes how to recognize when strings are members in the set, whereas a generative grammar describes how to write only those strings in the set.

Now it is necessary to understand the definitions of “String”, “Alphabet”,”Sequence” and “Language”. I will discuss them in the next Post.

Wednesday, April 25, 2007

Automata Step by Step

This post is connected to the previous post. So if you have not read that post, please give few minutes in reading that post.

Parsing:

Parsing (more formally syntax analysis) is the process of analyzing a sequence of tokens to determine its grammatical structure with respect to a given formal grammar. A parser is the component of a compiler that carries out this task.

Parsing transforms input text into a data structure, usually a tree, which is suitable for later processing and which captures the implied hierarchy of the input. Lexical analysis creates tokens from a sequence of input characters and it is these tokens that are processed by a parser to build a data structure such as parse tree or abstract syntax trees.

Can you guess what the next question is? Yes, you are right. Next question is what the Formal Grammar is.

Monday, April 23, 2007

Automata Step by Step

I am writing this blog for the people who are strongly interested in Automata and Formal languages or who want to start from basics of Automata.

First of all, we will discuss some very basic definitions of Automata. In Automata theory, all things are related to each other so you will notice that you need definitions of other words for understanding a particular definition. We will also see formal definitions of all terms after understanding their basic definitions. After basic and formal definitions, we will solve a lot of good “Problems of Automata” and see a lot of good examples.

Basic Definitions:

If you are a beginner, you should know what is Automata and Automata theory.

1. Automaton:
In Simple Words, An automaton (plural: automata) is a self-operating machine. The word is sometimes used to describe a robot, more specifically an autonomous robot. Used colloquially, it refers to a mindless follower.

An automaton is a mathematical model for a finite state machine (FSM). An FSM is a machine that, given an input of symbols, 'jumps' through a series of states according to a transition function (which can be expressed as a table). In the common "Mealy" variety of FSMs, this transition function tells the automaton which state to go to next given a current state and a current symbol.

2. Automata theory:

Automata theory is the study of abstract machines and problems they are able to solve. Automata theory is closely related to formal language theory as the automata are often classified by the class of formal languages they are able to recognize.

Now at this point, it is important to know about Finite state machines.

3. Finite state machine (FSM):

A finite state automaton (plural: automata) is a model of behavior composed of a finite number of states, transitions between those states, and actions.

4. States:

A state is a unique configuration of information in a program or machine. It is a concept that occasionally extends into some forms of systems programming such as lexers and parsers.

We will see the definition of parser in the next post. Above 4 definitions are enough for today. Please don’t forget to review them otherwise you will not be able to understand the next post. Read carefully. Have a good day.