Saturday, June 30, 2007

parse tree

Consider now the following grammar G2:

E ::= T + E | T – E | T
T ::= F * T | F / T | F
F ::= num | id

This is similar to our original grammar G1 (which we considered in our previous posts), but it is right associative when the leftmost derivation rules is used. That is, x-y-z is equivalent to x-(y-z) under G2, as we can see from its parse tree.

Wednesday, June 27, 2007

Parse tree

Let’s see another example of parse tree. As another example, consider the following grammar:
S ::= ( L )| a

L ::= L , S | S

Under this grammar, the parse tree of the sentence (a,((a, a), a)) is:

S
/ | \
( L )
/ | \
L , S
| / | \
S ( L )
| / | \
a L , S
| |
S a
/ | \
( L )
/ | \
L , S
| |
S a
|
a

Friday, June 22, 2007

Parse Tree

The parse tree of an input sequence according to a CFG is the tree of derivations. For example if we used a production A : : = X1X2...Xn (in either top-down or bottom-up parsing) then we construct a tree with node A and children X1X2...Xn.
For example, the parse tree of id(x) + num(2) * id(y) is:
E
/ | \
E + T
| / | \
T T * F
| | |
F F id
| |
id num

Wednesday, June 20, 2007

Bottom up parsing

Today we will discuss bottom up parsing and parse tree
Bottom up Parsing:
Consider the following grammar G1 again:
E ::= E + T | E - T | T
T ::= T * F | T / F | F
F ::= num | id
In contrast to top-down parsing, bottom-up parsing starts from the input string and uses derivations in the opposite directions (ie, by replacing the RHS sequence X1X2...Xn of a production A : : = X1X2...Xn with the non-terminal A. It stops when it derives the start symbol. For example,
id(x) + num(2) * id(y)
=> id(x) + num(2) * F
=> id(x) + F * F
=> id(x) + T * F
=> id(x) + T
=> F + T
=> T + T
=> E + T
=> E

Friday, June 15, 2007

Top Down Parsing

There are 2 types of parsing:

1.Top Down Parsing
2.Bottom Up Parsing

Consider the following grammar G1 again:

E ::= E + T | E - T | T
T ::= T * F | T / F | F
F ::= num | id

Top-down parsing:

Top-down parsing starts from the start symbol of the grammar S and applies derivations until the entire input string is derived (ie, a sequence of terminals that matches the input tokens). For example,

E => E + T
=> E + T * F
=> T + T * F
=> T + F * F
=> T + num * F
=> F + num * F
=> id + num * F
=> id + num * id

which matches the input sequence id(x) + num(2) * id(y). Top down parsing occasionally requires backtracking. For example, suppose the we used the derivation E => E - T instead of the first derivation. Then, later we would have to backtrack because the derived symbols will not match the input tokens. This is true for all nonterminals that have more than one production since it indicates that there is a choice of which production to use. We will learn how to construct parsers for many types of CFGs that never backtrack. These parsers are based on a method called predictive parsing. One issue to consider is which nonterminal to replace when there is a choice. For example, in T + F * F we have 3 choices: we can use a derivation for T, for the first F, or for the second F. When we always replace the leftmost nonterminal, it is called leftmost derivation.

In next Post, I will explain Bottom up parsing.

Tuesday, June 12, 2007

Examples Of Context Free Languages (CFL)

So far, we have seen very good examples of Regular expressions and Regular Languages. We will see more difficult problems in future. But at this time, we are going to start examples of context free languages (CFL). There are a lot of problems based on CFL. So we will discuss a lot of problems of CFL.

Example 1
Consider the following input string:
x+2*y

When scanned by a scanner, it produces the following stream of tokens:

id(x)
+
num(2)
*
id(y)

The goal is to parse this expression and construct a data structure (a parse tree) to represent it. One possible syntax for expressions is given by the following grammar G1:

E ::= E + T | E - T | T
T ::= T * F | T / F | F
F ::= num | id

where E, T and F stand for expression, term, and factor respectively. For example, the rule for E indicates that an expression E can take one of the following 3 forms: an expression followed by the token + followed by a term, or an expression followed by the token - followed by a term, or simply a term. The first rule for E is actually a shorthand of 3 productions:

E ::= E + T
E ::= E - T
E ::= T
G1 is an example of a context-free grammar (defined below); the symbols E, T and F are nonterminals and should be defined using production rules, while +, -, *, /, num, and id are terminals (i.e., tokens) produced by the scanner. The nonterminal E is the start symbol of the grammar.

In my next Posts, I will explain types of parsing because knowledge of them is necessary for writing context free grammars.

Sunday, June 10, 2007

Example of Regular Expression

Ex. 6:
Find a regular expression corresponding to the language of all strings over the alphabet { a, b } that contain no more than one occurence of the string aa.

Solution:
If there is one substring aa in a string of the language, then that aa can be followed by any number of b. If an a comes after that aa, then that a must be preceded by b because otherwise there are two occurences of aa. Hence any string that follows aa is represented by ( b + ba )*. On the other hand if an a precedes the aa, then it must be followed by b. Hence a string preceding the aa can be represented by ( b + ab )*. Hence if a string of the language contains aa then it corresponds to the regular expression ( b + ab )*aa( b + ba )* .

If there is no aa but at least one a exists in a string of the language, then applying the same argument as for aa to a, ( b + ab )*a( b + ba )* is obtained as a regular expression corresponding to such strings.

If there may not be any a in a string of the language, then applying the same argument as for aa to , ( b + ab )*( b + ba )* is obtained as a regular expression corresponding to such strings.
Altogether ( b + ab )*( + a + aa )( b + ba )* is a regular expression for the language.

Monday, June 4, 2007

More examples

Ex. 5:
Describe as simply as possible in English the language corresponding to the regular expression (( a + b )3)*( #+ a + b ) .

Solution:
(( a + b )3) represents the strings of length 3. Hence (( a + b )3)* represents the strings of length a multiple of 3. Since (( a + b )3)*( a + b ) represents the strings of length 3n + 1, where n is a natural number, the given regular expression represents the strings of length 3n and 3n + 1, where n is a natural number.

Sunday, June 3, 2007

More examples

Ex. 4:
Describe as simply as possible in English the language corresponding to the regular expression a*b(a*ba*b)*a* .

Solution:
A string in the language can start and end with a or b, it has at least one b, and after the first b all the b's in the string appear in pairs. Any numbe of a's can appear any place in the string. Thus simply put, it is the set of strings over the alphabet { a, b } that contain an odd number of b's.
We will continue examples in next post..

Saturday, June 2, 2007

Examples of Regular Grammars and Regular Expressions

Ex. 2:
Find a regular expression corresponding to the language of all strings over the alphabet { a, b } that contain exactly two a's.

Solution:
A string in this language must have at least two a's. Since any string of b's can be placed in front of the first a, behind the second a and between the two a's, and since an arbitrasry string of b's can be represented by the regular expression b*, b*a b*a b* is a regular expression for this language.


Ex. 3:
Find a regular expression corresponding to the language of all strings over the alphabet { a, b } that do not end with ab.

Solution:
Any string in a language over { a , b } must end in a or b. Hence if a string does not end with ab then it ends with a or if it ends with b the last b must be preceded by a symbol b. Since it can have any string in front of the last a or bb, ( a + b )*( a + bb ) is a regular expression for the language