Automata theory - CFG and normal forms

Unit 2
Context Free
Grammar
Types Derivations Ambiguity
Simplification
of CFG
Elimination of
useless
symbols
Elimination of
unit
productions
Elimination of
Null
production
Formal Form
CNF GNF
2

Grammars: Introduction
• Grammars denote syntactical rules for conversation in natural languages.
• Noam Chomsky gave a mathematical model of grammar in 1956.
• A grammar is a set of production rules which are used to generate strings of a
language.
• A grammar can be represented as 4 tuples (N, T, P, S)
• Where,
• N:- Set of Non terminals or variable list
• T:- Set of Terminals(T∈ ∑)
• S:- Special Non terminal called Starting symbol of grammar( S ∈ N)
• P:- Production rule ( of the form α → β , where α and β are strings on N ∪ ∑ )
4

Two basic elements of a Grammar
1. Terminal symbols
2. Non-terminal symbols
Terminal Symbols-
• Terminal symbols are denoted by using small case letters such as a, b, c etc.
• Terminal symbols are those which are the constituents of the sentence generated using
a grammar.
Non-Terminal Symbols-
• Non-Terminal symbols are denoted by using capital letters such as A, B, C etc.
• Non-Terminal symbols are those which take part in the generation of the sentence but
are not part of it.
• Non-Terminal symbols are also called as variables.
5

Example
• Example: Grammar G1
S → AB
A → a
B → b
• G1= (N,T,P,S)
Where,
• S, A, and B are Non-terminal symbols
• a and b are Terminal symbols
• S is the Start symbol, S ∈ N
• P1,P2,P3 – are Production rules
P1: S → AB
P2: A → a
P3: B → b
G1 = ({S, A, B}, {a, b}, {p1,p2,p3}, S)
6

Chomosky Hierarchy
• According to Noam Chomosky, there are four types of grammars
Type 0, Type 1, Type 2, and Type 3.
• Type 0 known as unrestricted grammar.
• Type 1 known as context sensitive grammar.
• Type 2 known as context free grammar.
• Type 3 Regular Grammar.
9

Type 0: Unrestricted Grammar:
• Type-0 grammars include all formal grammars.
• Type 0 grammar languages are recognized by Turing Machine.
• These languages are also known as the Recursively Enumerable languages.
• Grammar Production in the form of α → β
• where
α is ( V + T)* V ( V + T)* V : Variables/NT T : Terminals.
β is ( V + T )*.
• In type 0 there must be at least one variable on Left side of production.
Example1 : Example2 :
Sab –> ba S → ACaB
A –> S. Bc → acB
Here, Variables are S, A and Terminals a, b. CB → DB
aD → Db
10

Type 1: Context Sensitive Grammar
• Type-1 grammars generate the context-sensitive languages.
• The language generated by the grammar are recognized by the Linear Bound
Automata(LBA)
Rules:
1. First of all Type 1 grammar should be Type 0.
2. Grammar Production in the form of α → β
Where,
α , β is ( V + T )+.
| α | <= | β |
i.e count of symbol in α is less than or equal to β
Example: 1 Example: 2
S –> AB AB → AbBc
AB –> abc A → bcA
B –> b B → b
11

Type 2: Context Free Grammar:
• Type-2 grammars generate the context-free languages.
• The language generated by the grammar is recognized by a Pushdown automata (PDA)
Rules:
1. First of all it should be Type 1.
2. Left hand side of production can have only one variable.
3. Grammar Production in the form of α → β
Where,
α is Single NT
β is ( V + T )*.
| α | <= | β |
i.e count of symbol in α is less than or equal to β
Example
S –> AB
A –> a/ε
B –> b
12

Type 3: Regular Grammar:
• Type-3 grammars generate regular languages.
• These languages can be accepted by a finite state automaton (FA)
• Type 3 is most restricted form of grammar.
• The productions must be in the form
X → Aa/a NT-> NT T / T
X → aA/a NT->T NT / T
where,
X,A is Non Terminal
a ∈ ∑ *
Example
S->aS/b
S->aS/c
S->Sa/b
A->ba/ ε
13

Context Free Grammars and Languages
• Context free grammar (CFG) is a formal grammar which is used to generate all possible
strings in a given formal language.
• Context free grammar G can be defined by four tuples as:
(N, T, P, S)
• Where,
• N:- Set of Non terminals or variable list
• T:- Set of Terminals(T∈ ∑)
• S:- Special Non terminal called Starting symbol of grammar( S ∈ N)
• P:- Production rule ( of the form α → β , where α and β are strings on N ∪ ∑ )
• In CFG, the start symbol is used to derive the string.
• We can derive the string by repeatedly replacing a non-terminal by the right hand side
of the production, until all non-terminal have been replaced by terminal symbols.
• It is used to generate all possible patterns of strings in a given formal language.
16

Examples
Example 1:
Construct the CFG for the language having any number of a's over the set ∑= {a}.
R.E = a*
Grammar :Production rule (P):
S → aS rule 1
S → ε rule 2
Derive a string "aaa
S
=>aS
=>aaS rule 1
=>aaaS rule 1
=>aaaε rule 2
=> aaa (Required string) 17

Contd…
Example 2:
Construct a CFG for the regular expression (0 +1)*
S → 0S | 1S rule 1
S → ε rule 2
Derive a string “1001”
S
=>1S rule 1
=>10S rule 1
=> 100S rule 1
=> 1001S rule 1
=> 1001ε rule 2
=> 1001 (Required string)
18

Contd…
Example 3:
Construct a CFG for defining palindrome over ∑={a,b}, L = { wcwR }
S → aSa rule 1
S → bSb rule 2
S → c rule 3
Derive a string "abbcbba“
S => aSa
=> abSba from rule 2
=> abbSbba from rule 2
=> abbcbba from rule 3 (Required string)
19

Contd…
Example 4:
Construct a CFG for set of strings with equal no.of a’s and equal no.of a’s over ∑={a,b}
S → SaSbS rule 1
S →SbSaS rule 2
S → ε rule 3
Derive a string " babaab “
S => SaSbS from rule 1
=> SbSaaSbS from rule 2
=> SbSaSbSaaSbS from rule 2
=> bSaSbSaaSbS from rule 3
=> babaaSbS
=> babaabS
=> babaab from rule 3 (Required string)
20

Contd…
Example 5:
Construct a CFG for the language L = anb2n where n>=1,over ∑={a,b}
S → aSbb rule 1
S → abb rule 2
Derive a string " aabbbb “
S => aSbb from rule 1
=> aabbbb from rule 2 (Required string)
21

Derivations
• Starting with the start symbol, non-terminals are rewritten using productions
until only terminals remain.
• Any terminal sequence that can be generated in this manner is syntactically
valid.
• If a terminal sequence can’t be generated using the productions of the grammar
it is invalid (has syntax errors).
• The set of strings derivable from the start symbol is the language of the grammar
(sometimes denoted L(G)).
• Derivation is a sequence of production rules.
• It is used to get the input string through these production rules.
23

Contd…
• During parsing, we need to take the following two decisions.
1. Need to decide the non-terminal which is to be replaced.
2. Need to decide the production rule by which the non-terminal will be replaced.
• Based on the following 2 derivations, We have two options to decide which non-terminal to be
placed with production rule .
1. Left most Derivation
2. Right most Derivation
• To illustrate a derivation, we can draw a derivation tree (also called a parse tree)
24

Left most Derivation
• In the leftmost derivation, the input is scanned and replaced with the production rule from left to
right.
• So in leftmost derivation, we read the input string from left to right.
• Leftmost non-terminal is always expanded.
Example:
E -> E + E Rule1
E -> E - E Rule2
E -> a | b Rule3
The leftmost derivation is:
W= a - b + a
E => E + E
=> E - E + E
=> a - E + E
=> a - b + E
=> a - b + a
=>

Rightmost Derivation
• In rightmost derivation, the input is scanned and replaced with the production rule from right to left.
• So in rightmost derivation, we read the input string from right to left.
• Rightmost non-terminal is always expanded.
Example:
E = E + E Rule1
E = E - E Rule2
E = a | b Rule3
The rightmost derivation is:
W=a - b + a
E => E - E
=> E - E + E
=> E - E + a
=> E - b + a
=> a - b + a
26

Parse tree
• Parse tree is the graphical representation of symbol. The symbol can be terminal
or non-terminal.
• In parsing, the string is derived using the start symbol.
• The root of the parse tree is that start symbol.
• All leaf nodes have to be terminals.
• All interior nodes have to be non-terminals.
• In-order traversal gives original input string.
27

Example:
Grammar G :
S → S + S | S * S
S → a|b|c
Input String : W=a * b + c
Parse Tree for Left most Derivation
28

Contd…
Input String : W=a * b + c
Parse Tree for Right most Derivation
30

inference :We apply the productions of a CFG to infer that certain
strings are in the language of a certain variable Two inference
approaches:
1. Recursive inference, using productions from body to head
2. Derivations, using productions from head to body
We consider some inferences we can make using G 1 w= a ∗ (a + b00)
Derivation of a ∗ (a + b00) by G 1
E ⇒ E ∗ E
⇒ I ∗ E
⇒ a ∗ E
⇒ a ∗ (E)
⇒ a ∗ (E + E)
⇒ a ∗ (I + E)
⇒ a ∗ (a + E)
⇒ a ∗ (a + I)
⇒ a ∗ (a + I0)
⇒ a ∗ (a + I00)
⇒ a ∗ (a + b00)
Inference Derivation

Relation between
Inference, Derivation, and Parse Trees
Theorem 1
Theorem 2
Theorem 3

From Recursive Inference to Parse Tree
Theorem 1 : Let 𝐺 = 𝑉, 𝑇, 𝑃, 𝑆 be a CFG.
If recursive inference tells us that string 𝑤 ∈ 𝑇∗
is in the language of variable 𝐴 ∈ 𝑉, then a parse
tree exists with root 𝐴 and yield 𝑤.
We will prove this by induction on the number of
steps in the recursive inference.
Basis step: One step. This means that there
is a production rule 𝐴 → 𝑤. The tree for this
is:
𝐴
Where 𝑤 = 𝑥1𝑥2 … 𝑥𝑛.
𝑥 𝑥 … 𝑥

𝑥1 𝑥2
…
𝑥𝑛
𝑤
Inductive step: Assume that the last inference step
looked at the production 𝐴 → 𝑋1𝑋2 … 𝑋𝑛, and
previous inference steps verified that 𝑥𝑖 ∈ 𝐿 𝑋𝑖
, for each 𝑥𝑖 in 𝑤 = 𝑥1𝑥2 … 𝑥𝑛. The tree for this
is:
𝐴
The tree from 𝐴 to 𝑋1𝑋2 … 𝑋𝑛.
𝑋1 𝑋2 … 𝑋𝑛
The inductive hypothesis lets us
assume we already have trees
yielding the terminal strings.

From Parse Tree to
Derivation
Theorem 2 : Let 𝐺 = 𝑉, 𝑇, 𝑃, 𝑆 be a
CFG, and
𝑙𝑚
suppose there is a parse tree with a root of variable 𝐴
with yield 𝑤 ∈ 𝑇∗. Then there is a leftmost
derivation
∗
𝐴 𝑤 in
𝐺.
We will prove this by induction on tree height.
Basic step: The tree’s height is one. The tree
looks
like this: So, there must be a
production 𝐴 → 𝑋1𝑋2 … 𝑋𝑛
in 𝐺, where 𝑤 = 𝑋1𝑋2 … 𝑋𝑛.
𝐴
𝑋1 𝑋2 …
𝑋𝑛

Inductive step:
The tree’s height exceeds 1, so
the tree looks like this:
Note that 𝐴 may produce some
terminal strings (like 𝑥2 and 𝑥𝑛)
and other strings containing
variables (like 𝑋1 and 𝑋𝑛−1).
𝑥1 𝑥𝑛−1
𝑤
𝐴
𝑋𝑛−1
𝑋1 𝑋2
…
= 𝑥2
𝑋𝑛
= 𝑥𝑛

Inductive step (continued):
▶ By the inductive hypothesis,
𝑋1
𝑥1
, 𝑋𝑛−1
𝑙𝑚 𝑙𝑚
∗ ∗
𝑥𝑛−1, etc.
▶ Trivially, 𝑋2
∗
𝑙𝑚 𝑙𝑚
∗
𝑥2, 𝑋𝑛 𝑥𝑛,
etc.,
𝑙𝑚
because they are terminals only.
▶ Since 𝐴 ⇒ 𝑋1𝑋2 … 𝑋𝑛−1𝑋𝑛, and
𝑤 = 𝑥1𝑥2 … 𝑥𝑛−1𝑥𝑛, we know that
∗
𝐴
𝑥1
𝑥𝑛−1
𝑤
𝐴
𝑋𝑛−1
𝑋1 𝑋2
…
= 𝑥2
𝑋𝑛
= 𝑥𝑛

From Derivation to Recursive Inference
Theorem 3 : Let 𝐺 =𝑉, 𝑇, 𝑃, 𝑆 be a CFG, 𝑤 ∈ 𝑇∗,
and ∗
𝐴 ∈ 𝑉. If a derivation 𝐴 ⇒ 𝑤 exists in grammar 𝐺,
then
𝑤 ∈ 𝐿 𝐴 can be inferred via recursive inference.
We will prove this by induction on the length of the
derivation.
Basic step: The derivation is one step. This means
that
𝐴 → 𝑤 is a production, so clearly 𝑤 ∈ 𝐿 𝐴
can be inferred.

Inductive step: There is more than one step in the
derivation. We can write the derivation as
∗
𝐴 ⇒ 𝑋1𝑋2 … 𝑋𝑛 ⇒ 𝑥1𝑥2 … 𝑥𝑛 = 𝑤
By the inductive hypothesis, we can infer that 𝑥𝑖 ∈ 𝐿𝑋𝑖
for every 𝑖. Next, since 𝐴 → 𝑋1𝑋2 … 𝑋𝑛 is clearly a
production, we can infer that 𝑤 ∈ 𝐿 𝐴 .

1) G : S →aCa
C →aCa/b
S=> aCa
=> aaCaa
=> aaaCaaa
=> aaabaaa
L(G) = an b an
2) G : S →0S1/ ε
S => 0S1
=> 0 0S1 1
=> 0 00S11 1
=> 0 0011 1
L(G) = 0n 1n | for n>=0;
41
What is the language defined by ‘G’

3) Consider the grammar S → abScB | λ B → bB |b
42
String 1 String 2 String 3 String 4
S⇒abScB
⇒ababScBcB
⇒ababcBcB
⇒ababcbcB
⇒ababcbcbB
⇒ababcbcbb
S⇒abScB
⇒ababScBcB
⇒abababScBcBcB
⇒abababcBcBcB
⇒abababcbcBcB
⇒abababcbcbBcB
⇒abababcbcbbcB
⇒abababcbcbbcbB
⇒abababcbcbbcbb
S⇒abScB
⇒ababScBcB
⇒abababScBcBcB
⇒abababcBcBcB
⇒abababcbcBcB
⇒abababcbcbcB
⇒abababcbcbcb
S⇒abScB
⇒ababScBcB
⇒abababScBcBcB
⇒abababcBcBcB
⇒abababcbcBcB
⇒abababcbcbcB
⇒abababcbcbcbB
⇒abababcbcbcbbB
⇒abababcbcbcbbb
Solution: L(G) = { (ab)n (cbm)n | n >0, m >0}
Contd…

What is the language defined by ‘G’
4. G : S →aS/bS/a/b
L(G) = (a+b)+
5. G : S →XaaX X →aX/bX/ ε
L(G) = (a+b)* aa (a+b)*
6. G : S → SS
L(G) =
43
Contd…

Give context-free grammars that generate
the following languages.
1.L= { w ∈ {0, 1} ∗ | w contains at least three 1s }
S → X1X1X1X
X → 0X | 1X | ε
2. L= { w ∈ {0, 1} ∗ | w = wR and |w| is even }
S → 0S0 | 1S1 | ε
3. L={ w ∈ {0, 1} ∗ | the length of w is odd and the middle symbol is 0 }
S → 0S0 | 0S1 | 1S0 | 1S1 | 0
4. L={ a i b j c k | i, j, k ≥ 0, and i = j or i = k }
S → XY | W
X → aXb | ε
Y → cY | ε
W → aW c | Z
Z → bZ | ε
5. L={ a i b j c k | i, j, k ≥ 0 and i + j = k }
S → aSc | X
X → bXc | ε
44

Ambiguity
• A grammar is said to be ambiguous if there exists more than one leftmost
derivation or more than one rightmost derivative or more than one parse tree for
the given input string.
Example1: Input String : W=a * b + c
Parse Tree for Left most Derivation Parse Tree for Right most Derivation
46

Contd…
Example 2 :
S = aSb | SS
S = ∈
Parse Tree I Parse Tree II
47
Example 3 :
consider a grammar G with the production rule
E → I
E → E + E
E → E * E
E → (E)
I → ε | 0 | 1 | 2 | ... | 9
the string "3 * 2 + 5",

Contd…
• If the grammar has ambiguity then it is not good for a compiler construction.
• No method can automatically detect and remove the ambiguity but you can
remove ambiguity by re-writing the whole grammar without ambiguity.
48

Ambiguous grammar to unambiguous
grammar
Example1 :
• Show that the given Expression grammar is ambiguous. Also, find an
equivalent unambiguous grammar.
Input Grammar:
E → E * E
E → E + E
E → id
Solution:
• Let us derive the string "id + id * id"
49

Contd…
As there are two different parse tree for deriving the same
string "id + id * id", the given grammar is ambiguous.
50

Removing ambiguity
Rewriting the grammar
For the Expression Grammar, use the following steps to get unambiguous
grammar
1. Take care of precedence (Use a different non terminal for each
precedence level and also start with the lowest precedence (PLUS)
2. Ensure associativity (define the rule as left recursive if the operator is
left associative and as right recursive if the operator is right associative )
The equivalent unambiguous grammar
E → E + T
E → T
T → T * F
T → F
F → id
• It reflects the fact that ∗ has higher precedence than +.
• Also that, the operators + and ∗ are left-associative as these 2 are left
recursive rules.
51

Contd…
Example2:
• Check that the given grammar is ambiguous or not. Also, find an equivalent
unambiguous grammar.
S → S + S
S → S * S
S → S ^ S
S → a
Solution:
Let us derive the string “a + a * a"
52

Contd…
The equivalent unambiguous grammar
S → S + A | A
A → A * B | B
B → C ^ B | C
C → a
• It reflects the fact that ^ has higher precedence than * and +.
• The operators + and ∗ are left-associative as these 2 are left
recursive rules.
• The operators ^ is right associative as it is right recursive rule.
53

2) Consider a grammar G is given as follows:
S → AB | aaB
A → a | Aa
B → b
Determine whether the grammar G is ambiguous or not. If G is ambiguous, construct an unambiguous
grammar equivalent to G.
Solution:
Let us derive the string "aab“
As there are two different parse tree for deriving the same string, the given grammar is ambiguous.
Unambiguous grammar will be:
S → AB
A → Aa | a
B → b
54

Inherent Ambiguity
• A context-free language for which all possible CFGs are ambiguous
is called inherently ambiguous.
• One example 𝐿 = 𝑎𝑛𝑏𝑛𝑐𝑚𝑑𝑚 | 𝑚, 𝑛 ≥ 1 𝖴 𝑎𝑛𝑏𝑚𝑐𝑚𝑑𝑛 | 𝑚, 𝑛 ≥ 1 .
• Proving that languages are inherently ambiguous can be quite
difficult.
• These languages are encountered quite rarely, so this has little
practical impact.

Simplification of CFG
languages can efficiently be represented by a context-free grammar.
All the grammar are not always optimized that means the grammar may consist of some extra symbols(non-
terminal).
Having extra symbols, unnecessary increase the length of grammar.
Simplification of grammar means reduction of grammar by removing useless symbols. The properties of reduced
grammar are given below:
1. Each variable (i.e. non-terminal) and each terminal of G appears in the derivation of some word in L.
2. There should not be any production as X → Y where X and Y are non-terminal.
3. If ε is not in the language L then there need not to be the production X → ε.
57

Elimination of Useless Symbols
58

❖Useful Symbols
❑A symbol X in a CFG G = {V, T, P, S} is called useful
✔ if there exist a derivation of a terminal string from S where X
appears somewhere,
✔ else it is called useless.
59

• A CFG has no useless variables if and only if all its variables are reachable
and generating.
• Therefore it is possible to eliminate useless variables from a grammar as
follows:
❑Step 1: Find the non-generating variables and delete them, along with all productions
involving non-generating variables.
❑Step 2: Find the non-reachable variables in the resulting grammar and delete them, along
with all productions involving non-reachable variables.
60

• Generating variables
• A variable X is called as generating
- if it derives a string of terminals.
- Note that the language accepted by a context-free grammar is non-empty if and only if the start symbol
is generating.
• Algorithm to find the non-generating variables in a CFG
▪ Mark a variable X as "generating"
- if it has a production X -> w, where w is a string of only terminals and/or variables previously marked
"generating".
▪ Repeat the above step until no further variables get marked "generating".
▪ All variables not marked "generating" are non-generating
61

• Reachable variables
• A variable X is called as reachable
- if the start symbol derives a string containing the variable X.
• Algorithm to find the non-reachable variables in a CFG
• Mark the start variable as "reachable".
• Mark a variable Y as "reachable" if there is a production X -> w,
where X is a variable previously marked as "reachable" and
w is a string containing Y.
• Repeat the above step until no further variables get marked "reachable".
• All variables not marked "reachable" are non-reachable
62

1. Remove the useless symbol from the given context free grammar
S -> abS | abA | abB
A ->cd
B->aB
C->dc
Solution:
❖Step 1: Eliminate non-generating symbols i.e non-terminals which do not produce any terminal
string
❖ In the given productions, B do not produce any terminal
❖ Eliminate all the productions in which B occurs.
• S -> abS | abA | abB
• A ->cd
• B->aB
• C->dc
❖Resulting productions are: S -> abS | abA
A -> cd
C -> dc
Elimination of Useless Symbols-Example
63

❖Step 2: Eliminate non-reachable symbols i.e non-terminals that can never be reached
from the starting symbol
• In the set of productions available after Step 2,
‘C’ is not reachable from starting symbol ‘S’
• Eliminate productions involving non-terminal ‘C’
S -> abS | abA
A ->cd
C->dc
• Final productions after eliminating useless symbols are:
S -> abS | abA
A ->cd
64

2. Remove the useless symbol from the given context free grammar
S -> aB / bX
A -> Bad / bSX / a
B -> aSB / bBX
X -> SBD / aBx / ad
❖Step 1: Eliminate non-generating symbols i.e non-terminals which do
not produce any terminal string
• A and X directly derive string of terminals a and ad, hence they are useful. Since X
is a useful symbol so S is also a useful symbol as S -> bX.
• But B does not derive any terminals, so clearly B is a non-generating symbol.
• So eliminate the productions with B
S -> aB / bX
A -> Bad / bSX / a
B -> aSB / bBX
X -> SBD / aBx / ad
Elimination of Useless Symbols-
Example
65

• The resulting productions are
S -> bX
A -> bSX / a
X -> ad
❖Step 2: Eliminate non-reachable symbols i.e non-terminals that can never be reached from the starting
symbol
• In the reduced grammar A is a non-reachable symbol
• So remove the production involving A
• Final grammar after elimination of the useless symbols is
S -> bX
X -> ad
66

• Elimination of useful symbols - Order of elimination
• Always Eliminate non-generating symbol first and then eliminate non-reachable
symbols
• Reversing the order of elimination would not work
S -> AB | a
A -> aA
B -> b
• Here A is non-generating, and after deleting A (along with the production S ->
AB) the variable B becomes unreachable. Hence, it is considered as useless
variable
• However, if we would first test for reachability, all variables would be reachable,
and subsequently eliminating non-generating variables would leave us with B.
67

• If a symbol is useful then it is both generating and reachable
• Converse of above statement is not true.
• For e.g. in CFG
S → ABC
B → b
B is both reachable and generating but still not useful
68

Elimination of Null Productions
69

Elimination of Null Productions
• Null Productions
A production of type A → є is called as Null production
• In a given CFG, a non-terminal N is called as nullable
- if there is a production N -> ϵ or
- If there is a derivation that starts at N and leads to ϵ
- If A -> ϵ is a production to be eliminated
- look for all productions, whose right side contains A, and
- replace each occurrence of A in each of these productions to obtain the non ϵ-
productions.
- resultant non ϵ-productions must be added to the grammar to keep the
language the same.
70

1. Remove the null productions from the following grammar
S -> aX / bX
X-> a / b / є
Solution:
- There is one null production in the grammar X -> ϵ.
SO NULLABLE SYMBOL={X}
- To eliminate X -> ϵ, change the productions containing X in the right side.
- The productions with X in the right side are S -> aX and S -> bX
- So replacing each occurrence of X by ϵ, we get two new productions
S-> a and S -> b
- Adding these productions to the grammar and eliminating X -> ϵ, we get
S -> aX / bX / a / b
X-> a / b
Elimination of Null Productions –
Example
71

Elimination of Null Productions –
Example
• 2. Remove the null productions from the following grammar
S -> ABAC
A -> aA / ϵ
B -> bB / ϵ and
C -> c
Solution:
• We have two null productions in the grammar A -> ϵ and and B -> ϵ
• Nullable symbols={A,B}
• To eliminate A -> ϵ we have to change the productions containing A in the right side.
• The productions with A in the right side are S -> ABAC and A -> aA.
• So replacing each occurrence of A by ϵ, we get four new productions
S -> ABC / BAC / BC
A -> a
• Add these productions to the grammar and eliminate A -> ϵ.
S -> ABAC / ABC / BAC / BC/C/AC
A -> aA / a
B -> bB / ϵ
C -> c
72

• To eliminate B -> ϵ we have to change the productions containing B on the right side.
• The productions with B in the right side are S -> ABAC / ABC / BAC / BC and B -> bB
• Doing that we generate these new productions:
S -> AAC / AC / C
B -> b
Add these productions to the grammar and remove the production B -> ϵ from the grammar. The new grammar
after removal of ϵ – productions is:
S -> ABAC / ABC / BAC / BC / AAC / AC / C
A -> aA / a
B -> bB / b
C -> c
Elimination of Null Productions – Example
73

Elimination of Unit Productions
74

• Unit Production
▪ A unit production is a production A -> B where both A and B are non-terminals.
▪ Unit productions are redundant and hence should be removed.
• Follow the following steps to remove the unit production
1. Select a unit production A -> B, such that there exist a production B -> α, where α is a terminal
2. For every non-unit production, B -> α repeat the following step
▪ Add production A -> α to the grammar
3. Eliminate A -> B from the grammar
4. Repeat the above steps , if there are more unit productions
Elimination of Unit Productions
75

1. Eliminate Unit productions from the given grammar
S-> aX / bY / Y
X-> S
Y -> bY / b
Solution:
Find unit pairs...A->B then (A,B) is an unit pair
• There are two unit productions in the given grammar, S -> Y and X -> S
• Substituting the values of unit production S -> Y we get,
S-> aX / bY / bY / b ----🡪 S-> aX / bY / b
• Substituting the values of unit production X -> S we get,
X-> aX / bY / Y X-> aX / bY / bY / b
• Final set of productions would be,
S-> aX / bY / b
X-> aX / bY / b
Y -> bY / b
Elimination of Unit Productions –
Example
76
Unit pairs production
Self pairs (S,S) S->Ax|bY
(X,X) X->Ax|bY
(Y,Y) Y->bY|b
Direct
pairs
(S,Y) S->bY|b
(X,S) X->Ax|bY
Indirect
pairs
(X,Y) X->bY|b
Combine the production as
S->Ax|bY|b
X->Ax|bY|b
Y->bY|b
Another way

2. Eliminate Unit productions from the given grammar
S -> AB
A -> a , B -> C , C -> D and D -> b
Solution:
• There are two unit productions in the given grammar, B -> C and C -> D
• Substituting the values of unit production B -> C in C -> D we get,
B-> D
• Substituting the values of unit production B-> D in D -> b we get,
B-> b
• Substituting the values of unit production C-> D in D -> b we get,
C-> b
• C is a non-reachable symbol. Hence remove it
• Final set of productions after removing non-reachable symbol would be,
S -> AB
A -> a
B-> b
Elimination of Unit Productions –
Example
77

Exercise Problems
1. Remove the useless symbols from the given grammar
A -> xyz / Xyzz
X -> Xz / xYz
Y -> yYy / Xz
Z -> Zy / z
Sol.. A -> xyz
2. Remove the useless symbols from the given grammar
T → aaB | abA | aaT
A → aA
B → ab | b
C → ad
Sol. T → aaB | aaT
B → ab | b
78

3. Remove the ε production from the following CFG by preserving the meaning of it.
S → XYX
X → 0X | ε
Y → 1Y
Sol… Nullable symbols={X}
S → XYX |YX |XY|Y
X → 0X | 0
Y → 1Y
4. Remove the ε production from the following CFG by preserving the meaning of it.
S → ASA | aB | b
A → B
B → b | ∈
Sol… Nullable={B,A}
S → ASA | aB | b | AS | SA|a
A → B
B → b
Exercise Problems
79

5. Identify and remove the unit productions from the following CFG
S -> S + T/ T
T -> T * F/ F
F -> (S)/a
Sol….
unit productions..
S->T T->F S->F
units pairs…
self pairs (S,S) (T,T) (F,F)
direct pairs = (S,T) (T,F)
indirect pairs (S,F)
After elimination..
S -> S + T/ T*F/(S)|a
T -> T * F/ (S)|a
F -> (S)/a
80

6. Remove the unit productions from the following grammar
S -> AB
A -> a
B -> C / b
C -> D
D -> E
E -> a
Sol….
unit productions..
B->C C->D D->E B->D B->E C->E
units pairs…
self pair (S,S) (A,A) (B,B) (C,C) (D,D) (E,E)
direct pairs (B,C) (C,D) (D,E)
indirect pairs (B,E) (C,E) (B,D)
After elimination..
S -> AB
A -> a
B -> a / b
C -> a
D -> a
E -> a
Useless symbols eliminations ….
S -> AB
A -> a
B -> a / b
81

Normal Form
• Normalization is the process of minimizing redundancy from a
relation or set of relations.
• A grammar is said to be in normal form when every production of the
grammar has some specific form
• In this course we are going to study 2 types of Normal form
Normal Form
Chomsky normal
form (CNF)
Greibach normal
form (GNF)
83

Chomsky normal form (CNF)
• A context free grammar (CFG) is in Chomsky Normal Form
(CNF) if all production rules satisfy one of the following
conditions:
• NT-> T | NT NT
S-> a | AB
A-> Aab |BBA
B-> a | Aa
1. S → ε
2. NT→ T (Eg. A → a)
3. NT → NT NT (Eg. A →SE)
Let consider,
NT = Non terminal (Eg. A,S,E..)
T = Terminal (Eg. a,b,0,1--)
85

Steps to convert a CFG to CNF
1. Simplify the grammar - Eliminate null, unit and useless productions
(Kindly refer previous slides).
2. Eliminate terminals from RHS if they exist with other terminals or
non-terminals.
Example:
Consider A → aX
Then we can convert to CNF form such as
Let Z → a
A → ZX
CNF Normal form
NT→ T
NT → NT NT
86

Steps to convert a CFG to CNF
3. Eliminate RHS with more than two non-terminals.
A->BDX
Rewritten as
S->B (wrong way)
E->DX
A->SE
Example:
Consider A → BDX
Then we can convert to CNF form such as
Let Z → BD
A → ZX
CNF Normal form
NT→ T
NT → NT NT
87

Problem 1 Convert the given CFG to CNF. Consider the given grammar G1:
S → ASA | Bb
A → aaA | ab | λ
B → bbbB | C
C → aA | B
Solution:
Nullable symbols ={A}
After elimination λ production
S → ASA | AS | SA | S | Bb
A → aaA | aa | ab
B → bbbB | C
C → aA | a | b
Unit pairs= { (S,S) (A,A) (B,B) (C,C) (B,C) }
After elimination unit production
S → ASA | AS | SA | Bb
A → aaA | aa | ab
B → bbbB | aA | a | b
C → aA | a | b
88
CNF Normal form
NT→ T
NT → NT NT

After elimination USELESS
S → ASA | AS | SA | Bb
A → aaA | aa | ab
B → bbbB | aA | a | b
CNF: (replace using new productions)
S → ASA | AS | SA | BY
A → XXA | XX | XY
B → YYYB | XA | a | b
X → a
Y → b
Resultant Grammar in CNF form:
S → TA | AS | SA | BY
A → UA | XX | XY
B → VW | XA | a | b
X → a
Y → b
T → AS
U → XX V → YY W → YB 89
CNF Normal form
NT→ T
NT → NT NT

Problem:2
Convert the given CFG to CNF. Consider the given grammar G1:
S → a | aA | B
A → aBB | ε
B → Aa | b
Solution:
Step 1: We will create a new production S1 → S, as the start symbol S appears on the RHS. The grammar will be:
S → a | aA | B
A → aBB | ε
B → Aa | b
Step 2: As grammar G1 contains A → ε null production, its removal from the grammar yields:
S → a | aA | B
A → aBB
B → Aa | b | a
Now, as grammar G1 contains Unit production S → B, its removal yield:
unit pairs = {(S,S) (A,A) (B,B) (S,B) }
S → a | aA | Aa | b
A → aBB
B → Aa | b | a
90
CNF Normal form
NT→ T
NT → NT NT

Also remove the unit production S1 → S, its removal from the grammar yields:
S → a | aA | Aa | b
A → aBB
B → Aa | b | a
Step 3: In the production rule S0 → aA | Aa, S → aA | Aa, A → aBB and B → Aa, terminal a exists on RHS with non-terminals. So we will replace
terminal a with X:
S → a | XA | AX | b
A → XBB
B → AX | b | a
X → a
Step 4: In the production rule A → XBB, RHS has more than two symbols, removing it from grammar yield:
S → a | XA | AX | b
A → RB
B → AX | b | a
X → a
R → XB Hence, for the given grammar, this is the required CNF.
91
CNF Normal form
NT→ T
NT → NT NT

Exercises Problems:
1 ) Convert the following CFG to Chomsky normal form:
S→A /B /C
A→aAa/ B
B→bB/ bb
C→baD/ abD/ aa
D→ aCaa/ D
2) Construct the following grammar in CNF:
S→ ABC/ BaB
A →aA/ BaC/ aaa
B →bBb / a / D
C →CA/ AC
D→ ε
92

CNF Problem
c) Convert the following grammar into CNF
S → cBA
S → A
A → cB | AbbS
B → aaa
d) Construct a equivalent grammar G in CNF for the grammar G1
where
G1=({S,A,B},{a,b},{S →ASB/ ε , A→ aAS/a, B→ SbS/A/bb},S)
93

Greibach Normal Form (GNF)
• GNF stands for Greibach normal form. A CFG(context
free grammar) is in GNF(Greibach normal form) if all
the production rules satisfy one of the following
conditions:
1. S → ε
2. NT→ T (Eg. A → a)
3. NT → T (NT)* (Eg. A →aSBBA)
Let consider,
NT = Non terminal (Eg. A,S,E..)
T = Terminal (Eg. a,b,0,1--)
95

Steps to convert a CFG to GNF
1. Eliminate null, unit and useless productions (Kindly refer previous
slides).
2. Convert the given grammar into CNF form (Kindly refer previous
slides).
3. Rename the Non Terminal as (A1,A2,A3,....)
4. Check the production such that all production should be in the form
Ai →Aj where(i ≤ j) .
5. If the production is not as per step 4, Replace the production as per
Lemma I or Lemma II
96

Lemma I
If G = (V,T,P,S) is a CFG and, the set of ‘A’ production belong to P are
A → Aα ------ (1)
A → β1 |β2 | β3 | β4 ----- | βn ------ (2)
then Let G’ = (V’,T,P’,S)
Where P’ be
A → β1 α | β2 α | β3 α | β4 α ----- |βn α
By sub. (2) in (1)
97

Lemma II
If G = (V,T,P,S) is a CFG and, the set of ‘A’ production belong to P are
A → Aα1 | Aα2 | Aα3 -----| Aαm |β1 | β2 | β1 ------ | βn
Then introduce a new non-terminal X
So,Let G’ = (V’,T,P’,S) , Where V’ = (V ∪ X)
Where P’ can be formed
A → βi (1 ≤ i ≤ n)
A → βi X
X → αj (1 ≤ j ≤ m)
X → αj X
1
2
98

Problem (1)
Convert the following to GNF
S→AB
A →BS|b
B →SA|a
Solution:
Step 1 & 2 : The given grammar is in CNF form
Step 3: Renaming the production, Let S = A1 ,A = A2 ,B = A3
A1 → A2 A3 ---- (1)
A2 → A3 A1 |b ---- (2)
A3 → A1 A2 |a ---- (3)
Step 4: While checking the condition Ai →Aj where(i ≤ j)
Equation(3) is not in the format , so as per Lemma I let us Sub. The value of A1 from (1) to (3), so
A3 → A2 A3 A2 |a ---- (4)
CNF form
1. S → ε
2. NT→ T (Eg. A → a)
3. NT → NT NT (Eg. A →SE)
GNF form
1. S → ε
2. NT→ T (Eg. A → a)
3. NT → T (NT)* (Eg. A →aSBBA)
Lemma 1
A → Aα ------ (1)
A → β1 | β2 | β3 | β4 ----- |βn ------ (2)
A → β1 α | β2 α |β3 α | β4 α ----- | βn α99

Problem (1)
Again as per Lemma I sub. The value of A2 from equ. (2) in (4), we may get
A3 → A3 A1 A3 A2 |b A3 A2 |a ---- (5)
So, Now let solve by Lemma 2,
A3 → A3 A1 A3 A2 |b A3 A2 |a ---- (5)
A α β
Lemma 2
A → Aα1 | Aα2 | Aα3 -----|Aαm | β1 | β1 ------ | βn
A3 → b A3 A2 |a ---- (6) (GNF)
A3 → b A3 A2X|aX ---- (7) (GNF)
X→ A1 A3 A2 ---- (8)
X→ A1 A3 A2X---- (9)
Now sub (6) & (7) in (2)
A2 → b A3 A2 A1 | aA1 | b A3 A2XA1| aX A1|b ---- (10)(GNF)
Now Sub (10) in (1)
A1 → b A3 A2 A1 A3 | aA1 A3 | b A3 A2XA1 A3 | aX A1 A3 |bA3 ---- (11)(GNF)
Now sub (11) in (8)&(9)
X→ b A3 A2 A1 A3 A3 A2 | aA1 A3 A3 A2 | b A3 A2XA1 A3 A3 A2 | aX A1 A3 A3 A2 |bA3 A3 A2 ---- (12) (GNF)
X→ b A3 A2 A1 A3 A3 A2X| aA1 A3 A3 A2X| b A3 A2XA1 A3 A3 A2X| aX A1 A3 A3 A2 X|bA3 A3 A2X---- (13) (GNF)
Answer:
A1 → b A3 A2 A1 A3 | aA1 A3 | b A3 A2XA1 A3 | aX A1 A3 |bA3
A2 → b A3 A2 A1 | aA1 | b A3 A2XA1| aX A1|b
A3 → b A3 A2 |a
A3 → b A3 A2X|aX
X→ b A3 A2 A1 A3 A3 A2 | aA1 A3 A3 A2 | b A3 A2XA1 A3 A3 A2 | aX A1 A3 A3 A2 |bA3 A3 A2
X→ b A3 A2 A1 A3 A3 A2X| aA1 A3 A3 A2X| b A3 A2XA1 A3 A3 A2X| aX A1 A3 A3 A2 X|bA3 A3 A2X
100

101
Problem : 2
S → XB | AA
A → a | SA
B → b
X → a
Solution:
As the given grammar G is already in CNF and there is no left recursion, so we can
skip step 1 and step 2 and directly go to step 3.
The production rule A → SA is not in GNF, so we substitute S → XB | AA in the
production rule
A → SA as:
S → XB | AA
A → a | XBA | AAA
B → b
X → a

103
Problem: 3
Convert into GNF form : S -> AA | 0 A -> SS | 1
Solution :
First rename the variables:
A1 -> A2 A2 | 0
A2 -> A1 A1 | 1
three of the four productions are just fine, only A2 -> A1 A1 is a problem.
Apply algorithm 1: A2 -> A2 A2 A1 | 0 A1
So, now we have still have one problem production: A2 -> A2 A2 A1
apply algorithm 2:
A2 -> 1 | 0 A1 | 1 B | 0 A1 B
B -> A2 A1 | A2 A1 B
so now our grammar looks like:
A1 -> A2 A2 | 0
A2 -> 1 | 0 A1 | 1 B | 0 A1 B
B -> A2 A1 | A2 A1 B

104
Now we must fix A1, so that is only starts with terminals:
A1 -> 1 A2 | 0 A1 A2 | 1 B A2 | 0 A1 B A2 | 0
then we must B in a similar fashion (replacing initial occurences of A2)
B -> 1 A1 | 0 A1 A1 | 1 B A1 | 0 A1 B A1 |1 A1 B | 0 A1 A1 B | 1 B A1 B | 0 A1 B A1 B
and now we have the following grammar:
A1 -> 1 A2 | 0 A1 A2 | 1 B A2 | 0 A1 B A2 | 0
A2 -> 1 | 0 A1 | 1 B | 0 A1 B
B -> 1 A1 | 0 A1 A1 | 1 B A1 | 0 A1 B A1 |1 A1 B | 0 A1 A1 B | 1 B A1 B | 0 A1 B A1 B
Which is in GNF.

Pumping Lemma for Context-free Languages (CFL)
Pumping Lemma for CFL states that for any Context Free Language L, it is possible to find two substrings
that can be ‘pumped’ any number of times and still be in the same language. For any language L, we break
its strings into five parts and pump second and fourth substring.
Thus, if L is a CFL, there exists an integer n, such that for all x ∈ L with |x| ≥ n, there exists u, v, w, x, y ∈ Σ∗,
such that x = uvwxy, and
(1) |vwx| ≤ n
(2) |vx| ≥ 1
(3) for all i ≥ 0: uviwxiy ∈ L
Application:
Pumping Lemma is used as a tool to prove that a language is not CFL. Because, if any one string does not
satisfy its conditions, then the language is not CFL.

Problem 1:
L012 = {0n1n2n | n ≥ 0} is not Context-free
Let us assume that L is Context-free, then by Pumping Lemma, the above given rules follow.
Now, let x ∈ L and |x| ≥ n.
So, by Pumping Lemma, there exists u, v, w, x, y such that (1) – (3) hold.
We show that for all u, v, w, x, y (1) – (3) do not hold.
If (1) and (2) hold then x = 0n1n2n = uvwxy with |vwx| ≤ n and |vx| ≥ 1.
(1) tells us that vwx does not contain both 0 and 2. Thus, either vwx has no 0’s, or vwx has no 2’s.
Thus, we have two cases to consider.
Suppose vwx has no 0’s. By (2), vx contains a 1 or a 2. Thus uwy has ‘n’ 0’s and uwy either has less
than ‘n’ 1’s or has less than ‘n’ 2’s.
But (3) tells us that uwy = uv0wx0y ∈ L.
So, uwy has an equal number of 0’s, 1’s and 2’s gives us a contradiction. The case where vwx has no
2’s is similar and also gives us a contradiction. Thus L is not context-free.

Problem 2:
The language = {ai bj ck |i<j and i<k} is not context free.
Proof (By contradiction)
Suppose this language is context-free; then it has a context-free grammar.
Let K be the constant associated with this grammar by the Pumping Lemma.
Consider the string akbk+1ck+1, which is in L and has length greater than K.
By the Pumping Lemma this must be representable as uvxyz, such that all uvixyiz are also in L.
By the same argument as for the previous lemma neither v nor y may contain a mixture of symbols.
Suppose v consists entirely of à's. Then there is no way y, which cannot have both `b's and `c's, can generate enough of these letters to
keep their number greater than that of the à's (it can do it for one or the other of them, not both).
Similarly y cannot consist of just à’s.
So suppose then that v or y contains only `b's or only `c’s.
Consider the string uv0xy0z which must be in L. Since we have dropped both v and y, we must have at least one `b' or one `c' less than we
had in uvxyz , which was akbk+1ck+1. Consequently, this string no longer has enough of either `b's or `c's to be a member of L..

Automata theory - CFG and normal forms

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Automata theory - CFG and normal forms

Similar to Automata theory - CFG and normal forms (20)

More from Akila Krishnamoorthy

More from Akila Krishnamoorthy (12)

Recently uploaded

Recently uploaded (20)

Automata theory - CFG and normal forms