COMPILER DESIGN PPTS.pptx

UNIT-1
PART-A
Overview of Compilation

COMPILER
A compiler is a program that reads a program in one language,
the source language and
Translates into an equivalent program in another language,
the target language.
The translation process should also report the presence of
errors in the source program

Source
Program → Compiler → Target
Program
↓
Error
Messages
COMPILER

There are two parts of compilation.
.
The analysis part breaks up the source program into
constant piece and creates an
intermediate representation of the source program.
The synthesis part constructs the desired target program
from the intermediate representation.

Lexical analyzer
• Lexical Analyzer reads the source program character by
character and returns the
• tokens of the source program.
• • A token describes a pattern of characters having same
meaning in the source
• program. (such as identifiers, operators, keywords, numbers,
delimeters and so
• on)
• Ex: newval := oldval + 12 => tokens: newval identifier
• := assignment operator
• oldval identifier

Lexical analyzer
+
add operator
12 a number
• Puts information about identifiers into the symbol table.
• • ‘egular expressions are used to describe tokens (lexical
constructs).

Syntax analyzer.
The syntax of a language is specified by a context free
grammar (CFG).
• The rules in a CFG are mostly recursive.
• A syntax analyzer checks whether a given program satisfies
the rules implied by a CFG or not.
If it satisfies, the syntax analyzer creates a parse tree for the
given program.

Syntax analyzer
Ex:
We use BNF (Backus Naur Form) to specify a CFG
assgstmt -> identifier := expression
expression -> identifier
expression -> number
expression -> expression + expression

Semantic analyzer
• A semantic analyzer checks the source program for semantic
errors and collects
• the type information for the code generation.
• Type-checking is an important part of semantic analyzer.
• Normally semantic information cannot be represented by a
context-free language
• used in syntax analyzers.
• Context-free grammars used in the syntax analysis are
integrated with attributes
• (semantic rules)
• the result is a syntax-directed translation,
• Attribute grammars

Semantic analyzer
Ex:
newval := oldval + 12
The type of the identifier newval must match with type of
the expression (oldval+12)

Intermediate code generation
A compiler may produce an explicit intermediate codes
representing the source program.
These intermediate codes are generally machine
(architecture independent). But the level of intermediate
codes is close to the level of machine codes.

Intermediate code generation
Ex:
newval := oldval * fact + 1
id1 := id2 * id3 + 1
MULT id2,id3,temp1 Intermediates Codes (Quadraples)
ADD temp1,#1,temp2
MOV temp2,,id1

Code optimizer
The code optimizer optimizes the code produced by the
intermediate code generator in the terms of time and space.
Ex:
MULT id2,id3,temp1
ADD temp1,#1,id1

Code generator
Produces the target language in a specific architecture.
The target program is normally is a relocatable object file
containing the machine codes.
Ex:
( assume that we have an architecture with instructions
whose at least one of its operands is a machine register)

Code generator
MOVEid2,R1
MULT id3,R1
ADD #1,R1
MOVER1,id1

Compiler construction tools
• A number of tools have been developed variously called
compiler –compiler , compiler generator or translator
writing system
• The input for these systems may contain
• 1. a description of source language.
• 2. a description of what output to be
• generated.
• 3. a description of the target machine.

Compiler construction tools
The principal aids provided by compiler-compiler are
1. Scanner Generator
2. Parser generator
3. Facilities for code generation

Lexical Analyzer
• The Role of the Lexical Analyzer
1st phase of compiler
Reads i/p character & produce o/p sequence of tokens that
the Parser uses for syntax analysis
It can either work as a separate module or as sub module

Tokens , Patterns and Lexemes
• Lexeme: Sequence of character in the source pm that is
matched against the pattern for a token
• Pattern: The rule associated with each set of strings is called
pattern.
• Lexeme is matched against pattern to generate token
• Token: Token is word, which describes the lexeme in source
pgm. It is generated when lexeme is matched against pattern

3 Methods of constructing Lexical Analyzer
.
• 1. Using Lexical Code Generator
• Such compilers/ tools are available that takes in Regular
Expressions
• As i/p and generate code for that R.E.These tools can be
used to generate Lexical Analyzer code from R.Es
• - Example of such Tool is LEX for Unix .

3 Methods of constructing Lexical Analyzer
R.E LEX.YY.C
LEX.YY.C a.out(lexical)
tokens
source
program
Lex generator
compiler
a.out

Compiler and Translator
• Compiler is a form of translator that translate a program
written in one language “
• The “ource Language” to an equivalent program in a second
language “ The Target
• language ” or “ The Object Language “

Compiler and Translator
Prog in source
Language
prog in target
language
Errors & Diagnostics
Assemblers, Compilers and Interpreters are all specific
translators
compiler

Assemblers
M/C code
Assembly
• Language
• (opcodes or mnemonics)
• Interpreters
• - Interpret the statements of the High level Language pgm as
they are encountered .
• Produce o/p of statement as they are interpreted .
assembler

Languages involved in Compiler
3 languages are involved in a compiler
• 1. Source language: Read by compiler
• 2. Target or Object language : translated by compiler
/translator to another language
• 4. Host Language: Language in which compiler is written

Advantages of Compiler
• Conciseness which improves programmer productivity,
• semantic restriction
• Translate and analyze H.L.L.(source pgm) only once and then
• run the equivalent m/c code produce by compiler

Disadvantage of Compiler
• Slower speed
• Size of compiler and compiled code
• Debugging involves all source code
Interpreter versus Compiler
Compiler translates to machine code

Variant: Interpretation of intermediate code
Compiler generates code for a "virtual machine" (VM)
VM is simulated by software

Single-Pass Compilers
Phases work in an interleaved way

Static Structure of a (Single-Pass) Compiler

Multi-Pass Compilers
Phases are separate "Programs", which run sequentially

Why multi-pass?
If memory is scarce (irrelevant today)
If the language is complex
If portability is important

Loader/Linker Editor:
Performs 2 functions
• i. Loading : relocatable code is converted to absolute code
i.e. placed at their specified position
ii. Link-Editor :
Make single pgm from several files of relocatable
m/c code.
The file may be o/p of different compilation .
May be Library files or routine provided by system .
Result is executable file

Preprocessor
Produce i/p to compiler. They may perform
i. Macro processing: shorthand for longer construct .
ii. File inclusion: separate module can be used by including
their file e.g #include <iostream> .
iii. Rational preprocessors: give support for additional
facilities which are not included in compiler itself .
iv.Language Extension: may include extra capabilities .

Major Types of Compiler
• 1. Self-resident Compiler: generates Target code for the
same m/c or host .
• 2. Cross Compilers: generates target code for m/c other then
host .

Phases and passes
• In logical terms a compiler is thought of as consisting of
stages and phases
• Physically it is made up of passes
• The compiler has one pass for each time the source code, or
a representation of
• it, is read
• Many compilers have just a single pass so that the complete
compilation
• process is performed while the code is read once

Phases and passes
• The various phases described will therefore be executed in
parallel
• Earlier compilers had a large number of passes, typically due
to the limited
• memory space available
• Modern compilers are single pass since memory space is not
usually a problem

Use of tools
• The 2 main types of tools used in compiler production
are:
• 1. a lexical analyzer generator
• Takes as input the lexical structure of a language, which
defines how its
• tokens are made up from characters
• Produces as output a lexical analyzer (a program in C
for example) for the
• language
• Unix lexical analyzer Lex
• 2. a symbol analyzer generator

Use of tools
• Takes as input the syntactical definition of a language
• Produces as output a syntax analyzer (a program in C for
example) for the
• language
• The most widely know is the Unix-based YACC (Yet
Another
• Compiler-Compiler), used in conjunction with Lex.
• Bison: public domain version

Applications of compiler techniques
• Compiler technology is useful for a more general class of
applications
• Many programs share the basic properties of compilers: they
read textual input,
• organize it into a hierarchical structure and then process the
structure
• An understanding how programming language compilers are
designed and
• organized can make it easier to implement these compiler
like applications as
• well
• More importantly, tools designed for compiler writing such
as lexical analyzer

Applications of compiler techniqu
• generators and parser generators can make it vastly easier to
implement such
• applications
• Thus, compiler techniques - An important knowledge for
computer science
• engineers
• Examples:
• Document processing: Latex, HTML, XML
• User interfaces: interactive applications, file systems, databases
• Natural language treatment
• Automata Theory, La

Interpreters
• Interpreters: Instead of producing a target program as a
translation, an interpreter
• performs the operations implied by the source program

Differences between compiler and Interpreter
no
1
COMPILER
It takes entire program as input
2 Intermediate object code is generated
3 Conditional control statements are
executes faster
4
5
Memory requirement is more
Program need not be compiled every
time
6 Errors are displayed after entire
program is checked
7 Ex: C Compiler
INTERPRETER
It takes single instruction as
input
No intermediate object code
is generated
Conditional control
statements are executes
slower
Memory requirement is less
Every time higher level
program is converted into
lower level program
Errors are displayed for every
instruction interpreted (if
any)
Ex : Basic

Types of compilers
1.Incremental compiler :
It is a compiler it performs the recompilation of only modified
source rather than compiling the whole source program.
Features:
1.It tracks the dependencies between output and source
program.
2.It produces the same result as full recompilation.
3.It performs less task than recompilation.
4.The process of incremental compilation is effective for
maintenance.

Types of compilers
2.Cross compiler:
Basically there exists 3 languages
1.source language i.e application program.
2.Target language in which machine code is return.
3.Implementation language in which a compiler is
return.
All these 3 languages are different. In other words
there may be a compiler which run on one machine and
produces the target code for another machine. Such a
compiler is called cross compiler.

Types of compilers
To represent cross compiler T diagram is drawn as follows.
I
S T
I
•
• CROSS COMPILER:For source language L the target language
N get generated which runs on machine M.
L N L N
•
•
•
S S M M
M

Bootstrapping Compilers and T-diagrams
• The rules for T-diagrams are very simple. A compiler written
in some language “C” (could be anything from machine code
on up) that translates programs in language A to language B
looks like this (these diagrams are from

• Now suppose you have a machine that can directly run HP
machine code, and a compiler from ML to HP machine code,
and you want to get a ML compiler running on different
machine code P.You can start by writing an ML-to-P
compiler in ML, and compile that to get an ML-to-P compiler
in HP:

From there, feed the new ML-to-P compiler to itself, running on
the HP machine, and you end up with an ML-to-P compiler
that runs on a P machine!

Lexical Analysis
• lexical analyser or scanner is a program that groups
sequences of characters into lexemes, and outputs (to
the syntax analyser) a sequence of tokens. Here:
• (a) Tokens are symbolic names for the entities that
make up the text of the program;
• e.g.
• If for the keyword if , and id
• for any identifier. These make up the output of
• the lexical analyser

The role of lexical analyzer
Lexical Analyzer Parser
Source
program
token
getNextToken
Symbol
Tosemantic
analysis

Tokens, Patterns and Lexemes
• A token is a pair a token name and an optional token value
• A pattern is a description of the form that the lexemes of a
token may take
• A lexeme is a sequence of characters in the source program
that matches the pattern for a token

Example
Token Informal description Sample lexemes
if
else
comparison
id
number
Literal
Characters i, f
Characters e, l, s, e
< or > or <= or >= or == or !=
Letter followed by letter and digits
Any numeric constant
Anything but “ sorrounded by “
if
else
<=, !=
pi, score, D2
3.14159, 0, 6.02e23
“core dumped”
printf(“total = %dn”, score);

Attributes for tokens
• E = M * C ** 2
– <id, pointer to symbol table entry for E>
– <assign-op>
– <id, pointer to symbol table entry for M>
– <mult-op>
– <id, pointer to symbol table entry for C>
– <exp-op>
– <number, integer value 2>

Lexical errors
• Some errors are out of power of lexical analyzer to
recognize:
– fi (a == f(x)) …
• However it may be able to recognize errors like:
– d = 2r
• Such errors are recognized when no pattern for tokens
matches a character sequence

Specification of tokens
• In theory of compilation regular expressions are used to
formalize the specification of tokens
• Regular expressions are means for specifying regular
languages
• Example:
• Letter_(letter_ | digit)*
• Each regular expression is a pattern specifying the form of
strings

Regular expressions
• Ɛ is a regular expression, L(Ɛ) = {Ɛ}
• If a is a symbol in ∑then a is a regular expression, L(a) ={a}
• (r) | (s) is a regular expression denoting the language L(r) ∪
L(s)
• (r)(s) is a regular expression denoting the language L(r)L(s)
• (r)* is a regular expression denoting (L9r))*
• (r) is a regular expression denting L(r)

Regular definitions
d1 -> r1
d2 -> r2
…
dn -> rn
• Example:
letter_ -> A | B | … | Z | a | b | … | Z | _
digit -> 0 | 1 | … | 9
id -> letter_ (letter_ | digit)*

Extensions
• One or more instances: (r)+
• Zero of one instances: r?
• Character classes: [abc]
• Example:
– letter_ -> [A-Za-z_]
– digit
– id
-> [0-9]
-> letter_(letter|digit)*

Recognition of tokens
• Starting point is the language grammar to understand the
tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt
| Ɛ
expr -> term relop term
| term
term -> id
| number

Recognition of tokens (cont.)
• The next step is to formalize the patterns:
digit
Digits
-> [0-9]
-> digit+
number -> digit(.digits)? (E[+-]? Digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
• We also need to handle whitespaces:
ws -> (blank | tab | newline)+

Transition diagrams
• Transition diagram for relop

Transition diagrams (cont.)
• Transition diagram for reserved words and identifiers

• Transition diagram for unsigned numbers

• Transition diagram for whitespace

Lexical Analyzer Generator - Lex
Compiler
Lex Source
program
lex.l
lex.yy.c
C
compiler
lex.yy.c a.out
a.out
Input
stream
Sequenc
e of
tokens

LEX Example
%{
/* definitions of manifest constants
LT,LE, EQ, NE, GT,GE,
IF, THEN, ELSE, ID, NUMBER, RELOP */
%}
/* regular definitions
[ tn]
delim
ws
letter
digit
id
{delim}+
[A-Za-z]
[0-9]
{letter}({letter}|{digit})*
number {digit}+(.{digit}+)?(E[+-]?{digit}+)?

LEX Example
%%
{ws} {/* no action and no return */}
if {return(IF);}
then {return(THEN);}
else {return(ELSE);} {id} {yylval = (int) installID();
return(ID); }
{yylval = (int) installNum(); return(NUMBER);}
{number}
…

LEX Example
Int installID() {/* funtion to install the lexeme,whose
first character is pointed to by yytext, and whose
length is yyleng, into the symbol table and return a
pointer thereto*/
}
Int installNum() { /* similar to installID, butputs
numerical constants intoaseparate table */
}

Finite Automata
• Regular expressions = specification
• Finite automata = implementation
• A finite automaton consists of
– An input alphabet 
– A set of states S
– A start state n
– A set of accepting states F  S
– A set of transitions state input state
75

Finite Automata
• Transition
s1 a s2
• Is read
In state s1 on input “a” go to state s2
• If end of input
– If in accepting state => accept, othewise => reject
• If no transition possible => reject
76

Finite Automata State Graphs
77
• A state
• The start state
• An accepting state
• A transition
a

Example
• Alphabet {0,1}
• What language does this recognize?
78
0
1
0
1
0
1

And Another Example
79
• Alphabet still { 0, 1 }
1
1
• The operation of the automaton is not completely defined
by the input
– On input “11” the automaton could be in either state

Epsilon Moves
80
• Another kind of transition: -moves

• Machine can move from state A to state B without
reading input
A B

Deterministic and Nondeterministic Automata
• Deterministic Finite Automata (DFA)
– One transition per input per state
– No -moves
• Nondeterministic Finite Automata (NFA)
– Can have multiple transitions for one input in a given
state
– Can have -moves
• Finite automata have finite memory
– Need only to encode the current state
81

Execution of Finite Automata
• A DFA can take only one path through the state graph
– Completely determined by input
• NFAs can choose
– Whether to make -moves
– Which of multiple transitions for a single input to take .
82

Acceptance of NFAs
83
• Input:
0
• An NFA can get into multiple states
1
1
1 0 1
• Rule: NFA accepts if it can get in a final state

NFA vs. DFA (1)
• NFAs and DFAs recognize the same set of languages (regular
languages)
• DFAs are easier to implement
– There are no choices to consider
84

NFA vs. DFA (2)
• For a given language the NFA can be simpler than the DFA
85
0
1
0
0
1
0
0
NFA
DFA
1
1
• DFA can be exponentially larger than NFA

Regular Expressions to Finite Automata
• High-level sketch
86
Regular
expressions
NFA
DFA
Lexical
Specification
Table-driven
Implementation of DFA

Regular Expressions to NFA (1)
• For each kind of rexp, define an NFA
– Notation: NFA for rexp A
87
A
• For 

• For input a
a

• For AB
88
A B

• For A | B
A
B





89
A


• For A*


Example of RegExp -> NFA conversion
90

C E
1
0
D F


B


G
• Consider the regular expression
(1 | 0)*1
• The NFA is



A H 1
I J

Next
91
Regular
expressions
NFA
DFA
Lexical
Specification
Table-driven
Implementation of DFA

NFA to DFA. The Trick
• Simulate the NFA
• Each state of resulting DFA
= a non-empty subset of states of the NFA
• Start state
= the set of NFA states reachable through -moves from NFA
start state
• Add a transition S a “’ to DFA iff
– “’ is the set of NFA states reachable from the states in “
after seeing the input a
• considering -moves as well
92

NFA -> DFA Example
93
1
0
1
 






A B
C
D
E
F
G H I J
ABCDHI
FGABCDHI
EJGABCDHI
0
1
0
1
0 1

NFA to DFA. Remark
• An NFA may be in many states at any time
• How many different states ?
• If there are N states, the NFA must be in some subset of
those N states
• How many non-empty subsets are there?
– 2N - 1 = finitely many, but exponentially many
94

Implementation
• A DFA can be implemented by a 2D table T
– One dimension is “states”
– Other dimension is “input symbols”
– For every transition Si a Sk define T[i,a] = k
• DFA “execution”
– If in state Si and input a, read T[i,a] = k and skip to stateSk
– Very efficient
95

Table Implementation of a DFA
96
S
T
U
0
1
0
1
0 1
0 1
S T U
T T U
U T U

Implementation (Cont.)
• NFA -> DFA conversion is at the heart of tools such as flex or
jflex
• But, DFAs can be huge
• In practice, flex-like tools trade off speed for space in the
choice of NFA and DFA representations
97

UNIT-1
PATR-B
Top Down Parsing

Top-Down Parsing
• The parse tree is created top to bottom.
• Top-down parser
– Recursive-Descent Parsing
• Backtracking is needed (If a choice of a production rule
does not work, we backtrack to try other alternatives.)
• It is a general parsing technique, but not widely used.
• Not efficient

Top-Down Parsing
– Predictive Parsing
• no backtracking
• efficient
• needs a special form of grammars (LL(1) grammars).
• Recursive Predictive Parsing is a special form of
Recursive Descent parsing without backtracking.
• Non-Recursive (Table Driven) Predictive Parser is also
known as LL(1) parser.

Recursive-Descent Parsing (uses Backtracking)
• Backtracking is needed.
• It tries to find the left-most derivation.
S  aBc
B  bc | b
S S
input: abc
a B a B
c
b c b
c
fails, backtrack

Predictive Parser
a grammar   a grammar suitable
for predictive
left parsing (a LL(1)
eliminate
grammar)
left recursion factor no %100 guarantee.
• When re-writing a non-terminal in a derivation step, a
predictive parser can uniquely choose a production rule
by just looking the current symbol in the input string.
A  1 | ...| n input: ... a .......
current token

Predictive Parser (example)
stmt  if ...... |
while ...... |
begin ...... |
for .....
• When we are trying to write the non-terminal stmt, if the current
token is if we have to choose first production rule.
• When we are trying to write the non-terminal stmt, we can
uniquely choose the production rule by just looking the current
token.
• We eliminate the left recursion in the grammar, and left factor it.
But it may not be suitable for predictive parsing (not LL(1)
grammar).

Recursive Predictive Parsing
• Each non-terminal corresponds to a procedure.
Ex: A  aBb (This is only the production rule for A)
proc A {
- match the current token with a, and move to the
next token;
- call ‘B’;
- match the current token with b, and move to the
next token;
}

Recursive Predictive Parsing (cont.)
A  aBb | bAB
proc A {
case of the current token {
‘a’: - match the current token with a, and move to
the next token;
- call ‘B’;
- match the current token with b, and move to
the next token;
‘b’: - match the current token with b, and move to
the next token;
- call ‘A’;
- call ‘B’;
}
}

Recursive Predictive Parsing (cont.)
• When to apply -productions.
A  aA | bB | 
• If all other productions fail, we should apply an -
production. For example, if the current token is not a
or b, we may apply the -production.
• Most correct choice: We should apply an -production
for a non-terminal A when the current token is in the
follow set of A (which terminals can follow A in the
sentential forms).

Non-Recursive Predictive Parsing -- LL(1) Parser
• Non-Recursive predictive parsing is a table-driven parser.
• It is a top-down parser.
• It is also known as LL(1) Parser.
input buffer
Non-recursive
stack
output
Predictive Parser
Parsing Table

LL(1) Parser
input buffer
– our string to be parsed. We will assume that its end is marked with
a special symbol $.
output
– a production rule representing a step of the derivation sequence
(left-most derivation) of the string in the input buffer.
stack
– contains the grammar symbols
– at the bottom of the stack, there is a special end marker symbol $.
– initially the stack contains only the symbol $ and the starting symbol
S. $S  initial stack
– when the stack is emptied (ie. only $ left in the stack), the parsing is
completed.

LL(1) Parser
parsing table
– a two-dimensional array M[A,a]
– each row is a non-terminal symbol
– each column is a terminal symbol or the special symbol $
– each entry holds a production rule.

LL(1) Parser – Parser Actions
• The symbol at the top of the stack (say X) and
the current symbol in the input string (say a)
determine the parser action.
• There are four possible parser actions.
1. If X and a are $  parser halts (successful
completion)
2. If X and a are the same terminal symbol
(different from $)
 parser pops X from the stack, and moves the
next symbol in the input buffer.

3. If X is a non-terminal
 parser looks at the parsing table entry
M[X,a]. If M[X,a] holds a production rule
XY1Y2...Yk, it pops X from the stack and
pushes Yk,Yk-1,...,Y1 into the stack. The parser
also outputs the production rule XY1Y2...Yk to
represent a step of the derivation.
4. none of the above  error
– all empty entries in the parsing table are errors.
– If X is a terminal symbol different from a, this is also an error
case.

LL(1) Parser – Example1
LL(1)
S  aBa
Parsing
B  bB | 
Table
a b $
S S  aBa
B B   B  bB

LL(1) Parser – Example
input
stack
$S
output
abba$ S  aBa
$aBa abba$
$aB bba$
$aBb bba$
$aB ba$
$aBb ba$
$aB a$
$a a$
B  bB
B  bB
B  
$ accept, successful
$
completion

LL(1) Parser – Example1 (cont.)
B
B
b
b

Outputs: S  aBa B  bB B  bB B  
Derivation(left-most): SaBaabBaabbBaabba
S
parse tree
a B a

E  TE’
E’  +TE’ | 
T  FT’
T’ *FT’ | 
F  (E) | id
id + * ( ) $
E E 
TE’
E  TE’
E’ E’  +TE’ E’   E’  
T T 
FT’
T  FT’
T’ T’   T’  *FT’ T’   T’  
F F  id F  (E)

stack input
output
$E id+id$ E  TE’
$E’T id+id$ T  FT’
$E’ T’F id+id$ F  id
$ E’ T’id id+id$
$ E’ T’ +id$ T’ 
$ E’ +id$ E’ +TE’
$ E’ T+ +id$
$ E’ T id$ T  FT’
$ E’ T’ F id$ F  id
$ E’ T’id id$
$ E’ T’ $ T’ 
$ E’ $ E’ 
$ $ accept

Constructing LL(1) Parsing Tables
• Two functions are used in the construction of LL(1) parsing tables:
– FIRST FOLLOW
• FIRST() is a set of the terminal symbols which occur as first
symbols in strings derived from  where  is any string of
grammar symbols.
• if  derives to , then  is also in FIRST() .
• FOLLOW(A) is the set of the terminals which occur immediately
after (follow) the non-terminal A in the strings derived from the
starting symbol.
– a terminal a is in FOLLOW(A) if S  Aa
– $ is in FOLLOW(A) if S  A
*
*

Compute FIRST for Any String X
  is inFIRST(X).
• If X is a terminal symbol  FIRST(X)={X}
• If X is a non-terminal symbol and X   is a production rule
• If X is a non-terminal symbol and X  Y1Y2..Ynis a production
rule  if a terminal a in FIRST(Yi) and  is in all FIRST(Yj) for
j=1,...,i-1 then a is in FIRST(X).
 if  is in all FIRST(Yj) for j=1,...,n
then  is in FIRST(X).
• If X is   FIRST(X)={}
• If X is Y1Y2..Yn
 if a terminal a in FIRST(Yi) and  is in all FIRST(Yj)for
j=1,...,i-1 then a is in FIRST(X).
 if  is in all FIRST(Yj) for j=1,...,n
then  is in FIRST(X).

FIRST Example
E  TE’
E’ +TE’ | 
T  FT’
T’  *FT’ | 
F  (E) | id
FIRST(F) = {(,id} FIRST(TE’) = {(,id}
FIRST(T’) = {*, } FIRST(+TE’ ) = {+}
FIRST(T) = {(,id} FIRST() = {}
FIRST(FT’) = {(,id}
FIRST(E’) = {+, }
FIRST(E) = {(,id} FIRST(*FT’) = {*}
FIRST() = {}
FIRST((E)) = {(}
FIRST(id) = {id}

Compute FOLLOW (for non-terminals)
• If S is the start symbol  $ is in FOLLOW(S)
• if A  B is a production rule
 everything in FIRST() is FOLLOW(B) except 
• If ( A  B is a production rule ) or
( A  B is a production rule and  is in FIRST())
 everything in FOLLOW(A) is in FOLLOW(B).
We apply these rules until nothing more can be added
to any follow set.

FOLLOW Example
E  TE’
E’  +TE’ | 
T  FT’
T’  *FT’ | 
F  (E) | id
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, ), $ }
FOLLOW(T’) = { +, ), $ }
FOLLOW(F) = {+, *, ), $ }

Constructing LL(1) Parsing Table -- Algorithm
• for each production rule A   of a grammar G



– for each terminal a in FIRST()
add A   to M[A,a]
– If  in FIRST()
for each terminal a in FOLLOW(A) add A   to M[A,a]
– If  in FIRST() and $ in FOLLOW(A)
add A   to M[A,$]
• All other undefined entries of the parsing table
are error entries.

Constructing LL(1) Parsing Table -- Example
FIRST(TE’)={(,id}
FIRST(+TE’ )={+}
 E  TE’ into M[E,(] and M[E,id]
 E’  +TE’ intoM[E’,+]
E  TE’
E’ +TE’
E’  FIRST()={}
but since  in FIRST()
and FOLLOW(E’)={$,)}
 none
 E’   into M[E’,$] and M[E’,)]
FIRST(FT’)={(,id}
FIRST(*FT’ )={*}
 T  FT’ into M[T,(] and M[T,id]
 T’  *FT’ intoM[T’,*]
T  FT’
T’ *FT’
T’  FIRST()={}
and FOLLOW(T’)={$,),+}
 none
 T’   into M[T’,$], M[T’,)] and
FIRST((E) )={(}
M[T’,+]
F  (E)
F  id FIRST(id)={id}
 F  (E) intoM[F,(]
 F  id intoM[F,id]

LL(1) Grammars
• A grammar whose parsing table has no multiply-
defined entries is said to be LL(1) grammar.
one input symbol used as a look-head symbol do
determine parser action
LL(1) left most derivation
input scanned from left to right
• The parsing table of a grammar may contain more
than one production rule. In this case, we say that it is
not a LL(1) grammar.

A Grammar which is not LL(1)
S  i C t S E | a
E  e S | 
C  b
FOLLOW(S) = { $,e }
FOLLOW(E) = { $,e }
FOLLOW(C) = { t }
FIRST(iCtSE) = {i}
FIRST(a) = {a}
FIRST(eS) = {e}
FIRST() = {}
FIRST(b) = {b}
a b e i t $
S S  a S 
iCtSE
E E  e S
E  
E 

C C  b two prod uction rulesfor M[ E,e]
Problem  ambiguity

A Grammar which is not LL(1) (cont.)
• What do we have to do it if the resulting
parsing table contains multiply defined
entries?
– If we didn’t eliminate left recursion, eliminate the left
recursion in the grammar.
– If the grammar is not left factored, we have to left factor
the grammar.
– If its (new grammar’s) parsing table still contains multiply
defined entries, that grammar is ambiguous or it is
inherently not a LL(1) grammar.

A Grammar which is not LL(1) (cont.)
• A left recursive grammar cannot be a LL(1) grammar.
– A  A | 
 any terminal that appears in FIRST() also appears FIRST(A)
because A  .
 If  is , any terminal that appears in FIRST() also appearsin
FIRST(A) and FOLLOW(A).
• A grammar is not left factored, it cannot be a LL(1)
grammar
• A  1 |2
any terminal that appears in FIRST(1) also appears in
FIRST(2).
• An ambiguous grammar cannot be a LL(1) grammar.

Properties of LL(1) Grammars
• A grammar G is LL(1) if and only if the following
conditions hold for two distinctive production
rules A   and A  
1. Both  and  cannot derive strings starting with same
terminals.
2. At most one of  and  can derive to .
3. If  can derive to , then  cannot derive to any string starting
with a terminal in FOLLOW(A).

Error Recovery in Predictive Parsing
• An error may occur in the predictive parsing
(LL(1) parsing)
– if the terminal symbol on the top of stack does not match with
the current input symbol.
– if the top of stack is a non-terminal A, the current input symbol
is a, and the parsing table entry M[A,a] is empty.
• What should the parser do in an error case?
– The parser should be able to give an error message (as much
as possible meaningful error message).
– It should be recover from that error case, and it should be able
to continue the parsing with the rest of the input.

Error Recovery Techniques
• Panic-Mode Error Recovery
– Skipping the input symbols until a synchronizing token is
found.
• Phrase-Level Error Recovery
– Each empty entry in the parsing table is filled with a pointer to
a specific error routine to take care that error case.
• Error-Productions
– If we have a good idea of the common errors that might be
encountered, we can augment the grammar with productions
that generate erroneous constructs.
– When an error production is used by the parser, we can
generate appropriate error diagnostics.
– Since it is almost impossible to know all the errors that can be
made by the programmers, this method is not practical.

• Global-Correction
– Ideally, we would like a compiler to make as few change as
possible in processing incorrect inputs.
– We have to globally analyze the input to find the error.
– This is an expensive method, and it is not in practice.

Panic-Mode Error Recovery in LL(1) Parsing
• In panic-mode error recovery, we skip all the input
symbols until a synchronizing token is found.
• What is the synchronizing token?
– All the terminal-symbols in the follow set of a non-terminal can be
used as a synchronizing token set for that non-terminal.
• So, a simple panic-mode error recovery for the
LL(1) parsing:
– All the empty entries are marked as synch to indicate that the
parser will skip all the input symbols until a symbol in the follow
set of the non-terminal A which on the top of the stack. Then the
parser will pop that non-terminal A from the stack. The parsing
continues from that state.
– Tohandle unmatched terminal symbols, the parser pops that
unmatched terminal symbol from the stack and it issues an error
message saying that that unmatched terminal is inserted.

Panic-Mode Error Recovery - Example
S  AbS | e | 
A  a | cAd
FOLLOW(S)={$}
FOLLOW(A)={b,d}
output stack output
S  AbS $S ceadb$ S  AbS
Error: missing b, inserted $SbdA
$SbA
$SbdAc
eadb$
ceadb$ A  cAd
ceadb$
Error:unexpected e (illegal A)
stack input
$S aab$
$SbA aab$ A  a
$Sba aab$
$Sbab$
$S ab$ S  AbS (Remove all input tokens until first b or d,
pop A)
$SbA
$Sba
$Sbb$
ab$
ab$
A  a
$S
$Sbd db$
$Sb b$
$ S  
$S $ S   $ $ accept
$ $ accept
c inpuct
a b c d e $
S S 
AbS
syn
c
S  AbS syn
c
S 
e
S  
A A  a syn A  cAd syn sync sync

Phrase-Level Error Recovery
• Each empty entry in the parsing table is filled
with a pointer to a special error routine which
will take care that error case.
• These error routines may:
– change, insert, or delete input symbols.
– issue appropriate error messages
– pop items from the stack.
• We should be careful when we design these
error routines, because we may put the parser
into an infinite loop.

Syntax Analyzer
• Syntax Analyzer creates the syntactic structure of the
given source program.
• This syntactic structure is mostly a parse tree.
• Syntax Analyzer is also known as parser.
• The syntax of a programming is described by a context-
free grammar (CFG). We will use BNF (Backus-Naur Form)
notation in the description of CFGs.
• The syntax analyzer (parser) checks whether a given
source program satisfies the rules implied by a context-
free grammar or not.
– If it satisfies, the parser creates the parse tree of that program.
– Otherwise the parser gives the error messages.

Syntax Analyzer
• A context-free grammar
– gives a precise syntactic specification of a programming
language.
– the design of the grammar is an initial phase of the
design of a compiler.
– a grammar can be directly converted into a parser by
some tools.

Parser
Lexical
Analyze
r
Parser
source
program
token
get nexttoken
parse tree
• Parser works on a stream of tokens.
• The smallest item is a token.

Parsers (cont.)
• We categorize the parsers into two groups:
1. Top-Down Parser
– the parse tree is created top to bottom, starting from the root.
2. Bottom-Up Parser
– the parse is created bottom to top; starting from the leaves
• Both top-down and bottom-up parsers scan the input from left to
right (one symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented
only for sub-classes of context-free grammars.
– LL for top-down parsing
– LR for bottom-up parsing

Context-Free Grammars
• Inherently recursive structures of a
programming language are defined by a context-
free grammar.
• In a context-free grammar, we have:
– A finite set of terminals (in our case, this will be the set of
tokens)
– A finite set of non-terminals (syntactic-variables)
– A finite set of productions rules in the following form
• A   where A is a non-terminal and
 is a string of terminals and non-terminals
(including the empty string)
– A start symbol (one of the non-terminal symbol)

Context-Free Grammars
| E – E | E * E | E / E | - E
• Example:
E  E + E
E  ( E )
E  id

Derivations
E  E+E
• E+E derives from E
– we can replace E by E+E
– to able to do this, we have to have a production rule EE+E in
our grammar.
E  E+E  id+E  id+id
•*A sequence of replacements of non-terminal
+
symbols is called a derivation of id+id from E.

Derivations
• In general a derivation step is
A   if there is a production rule A in our grammar
where  and  are arbitrary strings of terminal and
non-terminal symbols
1  2  ...  n (n derives from 1 or 1
derives n )
 : derives in one step
 : derives in zero or more steps
 : derives in one or more steps

CFG - Terminology
• L(G) is the language of G (the language generated by G)
which is a set of sentences.
• A sentence of L(G) is a string of terminal symbols of G.
• If S is the start symb+ol of G then
 is a sentence of L(G) iff S   where  is a string of terminals of G.
 If G is a context-free grammar, L(G) is a context-free
language.
• Two grammars are equivalent if they produce the same
language.
• S   - If  contains non-terminals, it is called as a
se*ntential form of G.
- If  does not contain non-terminals, it is
called as a sentence of G.

Derivation Example
E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)
OR
E  -E  -(E)  -(E+E)  -(E+id)  -(id+id)
• At each derivation step, we can choose any of the non-terminal in
the sentential form of G for the replacement.
• If we always choose the left-most non-terminal in each derivation
step, this derivation is called as left-most derivation.
• If we always choose the right-most non-terminal in each
derivation step, this derivation is called as right-most derivation.

Left-Most and Right-Most Derivations
Left-Most Derivation
lm lm lm lm
lm
E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)
Right-Most Derivation
E  -E  -(E)  -(E+E)  -(E+id)  -(id+id)
rm rm rm rm rm
• We will see that the top-down parsers try to find the left-most
derivation of the given source program.
• We will see that the bottom-up parsers try to find the right-most
derivation of the given source program in the reverse order.

Parse Tree
• Inner nodes of a parse tree are non-terminalsymbols.
• The leaves of a parse tree are terminal symbols.
• A parse tree can be seen as a graphical representation of a derivation.
E  -E E
E
-
E
E
E + E
-
( E )
E
E
-
( E )
E
E
id
E
E +
-
( E )
id
E
E
E + E
-
( E )
id
 -(E)  -(E+E)
 -
(id+E)
 -(id+id)

Ambiguity
E  E+E  id+E id+E*E
 id+id*E  id+id*id
E  E*E  E+E*E id+E*E
 id+id*E  id+id*id
id
id
E + E id
E
E
* E
• A grammar produces more than one parse tree for a sentenceis
called as an ambiguous grammar.
E
id E
E + E
* E
id id

Ambiguity (cont.)
• For the most parsers, the grammar must be unambiguous.
• unambiguous grammar
 unique selection of the parse tree for a sentence
• We should eliminate the ambiguity in the grammar during
the design phase of the compiler.
• An unambiguous grammar should be written to eliminate
the ambiguity.
• We have to prefer one of the parse trees of a sentence
(generated by an ambiguous grammar) to disambiguate
that grammar to restrict to this choice.

Ambiguity (cont.)
stmt  if expr then stmt |
if expr then stmt else stmt | otherstmts
if E1 then if E2 then S1 else S2
stmt
if expr then stmt else
E1 if expr then stmt S2
E2 S1
stmt
stmt if expr then stmt
E1 if expr then stmt else stm
S2
1
E2 S1
2

Ambiguity
• We prefer the second parse tree (else matches with closestif).
• So, we have to disambiguate our grammar to reflect this choice.
• The unambiguous grammar will be:
stmt  matchedstmt | unmatchedstmt
matchedstmt  if expr then matchedstmt else matchedstmt
| otherstmts
unmatchedstmt  if expr then stmt |
if expr then matchedstmt elseunmatchedstmt

Ambiguity – Operator Precedence
• Ambiguous grammars (because of ambiguous operators) can be
disambiguated according to the precedence and associativity
rules.
E  E+E | E*E | E^E | id | (E)
disambiguate the grammar
precedence: ^ (right to left)
* (left to right)
+ (left to right)
E  E+T | T
T  T*F | F
F  G^F | G
G  id | (E)


Left Recursion
• A grammar is left recursive if it has a non-terminal A such
that there is a derivation.
+
A  A for some string 
• Top-down parsing techniques cannot handle left-recursive
grammars.
• So, we have to convert our left-recursive grammar into an
equivalent grammar which is not left-recursive.
• The left-recursion may appear in a single step of the
derivation (immediate left-recursion), or may appear in
more than one step of the derivation.

Immediate Left-Recursion
where  does not start withA
eliminate immediate left recursion
A  A  | 

A   A’
A’   A’ |  an equivalent grammar
In general,
A  A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A
 eliminate immediate left recursion
A  1 A’ | ... | n A’
A’  1 A’ | ... | m A’ |  an equivalent grammar

Left-Recursion -- Problem
• A grammar cannot be immediately left-recursive, but it still can be
left-recursive.
• By just eliminating the immediate left-recursion, we may not get
a grammar which is not left-recursive.
S  Aa | b
A  Sc | d This grammar is not immediately left-recursive,
but it is still left-recursive.
S  Aa  Sca
A  Sc  Aac
or
causes to a left-recursion
• So, we have to eliminate all left-recursions from our grammar

Eliminate Left-Recursion -- Algorithm
- Arrange non-terminals in some order: A1 ...An
- for i from 1 to n do {
- for j from 1 to i-1 do {
replace each production
Ai  Aj 
by
Ai  1  | ... | k 
where Aj  1 | ... | k
}
- eliminate immediate left-recursions among Ai
productions
}

Eliminate Left-Recursion -- Example
S  Aa | b
A  Ac | Sd | f
-Order of non-terminals: S, A
for S:
- we do not enter the inner loop.
- there is no immediate left recursion in S.
for A:
- Replace A  Sd with A  Aad | bd
So, we will have A  Ac | Aad | bd | f
- Eliminate the immediate left-recursion in A
A  bdA’ |fA’
A’ cA’ | adA’ | 

Eliminate Left-Recursion -- Example
So, the resulting equivalent grammar which is
not left-recursive is:
S  Aa | b
A  bdA’ |fA’
A’  cA’ | adA’ | 

Left-Factoring
• A predictive parser (a top-down parser without
backtracking) insists that the grammar must be left-
factored.
grammar  a new equivalent grammar suitable for
predictive parsing
stmt  if expr then stmt else stmt |
if expr then stmt
• when we see if, we cannot now which production rule
to choose to re-write stmt in the derivation.

Left-Factoring (cont.)
• In general,
A  1 | 2
symbols
where  is non-empty and the first
of 1 and 2 (if they haveone)are
different.
• when processing  we cannot know whether expand
A to 1 or
A to 2
• But, if we re-write the grammar as follows
A  A’
A’  1 | 2 so, we can immediately expand A to A’

Left-Factoring -- Algorithm
• For each non-terminal A with two or more
alternatives (production rules) with a common
non-empty prefix, let say
A  1 | ...| n | 1 | ... |m
convert it into
A  A’ | 1 | ... |m
A’ 1 | ... |n

Left-Factoring – Example1
A  abB | aB | cdg | cdeB | cdfB

A  aA’ | cdg | cdeB | cdfB
A’  bB | B

A  aA’ | cdA’’
A’  bB | B
A’’  g | eB | fB

Left-Factoring – Example2
A  ad | a | ab | abc | b

A  aA’ | b
A’  d |  | b | bc

A  aA’ | b
A’  d |  | bA’’
A’’   | c

LL(1) Parser
input buffer
– our string to be parsed. We will assume that its end is marked with a
special symbol $.
output
– a production rule representing a step of the derivation sequence
(left-most derivation) of the string in the input buffer.
stack
– contains the grammar symbols
– at the bottom of the stack, there is a special end marker symbol $.
– initially the stack contains only the symbol $ and the starting symbol
S. $S  initialstack

LL(1) Parser
– when the stack is emptied (ie. only $ left in the stack), the
parsing is completed.
parsing table
– a two-dimensional array M[A,a]
– each row is a non-terminal symbol
– each column is a terminal symbol or the special symbol$
– each entry holds a production rule.

• The symbol at the top of the stack (say X) and the
current symbol in the input string (say a) determine
the parser action.
• There are four possible parser actions.
1. If X and a are $  parser halts (successful
completion)
2. If X and a are the same terminal symbol (different
from $)
 parser pops X from the stack, and moves the next
symbol in the input buffer.

LL(1)
S  aBa
Parsing
B  bB |  Table
output
S  aBa
B  bB
B  bB
stack
$S
$aBa
$aB
$aBb
$aB
$aBb
input
abba$
abba$
bba$
bba$
ba$
ba$
a b $
S S  aBa
B B   B  bB

a$ B  
a$
$aB
$a
$ $ accept, successful
completion

E  TE’
E’  +TE’ | 
T  FT’
T’ *FT’ | 
F  (E) | id
id + * ( ) $
E E 
TE’
E  TE’
E’ E’  +TE’ E’   E’  
T T 
FT’
T  FT’
T’ T’   T’ 
*FT’
T’   T’  
F F  id F  (E)

stack input output
$E id+id$ E  TE’
$E’T id+id$ T  FT’
$E’ T’F id+id$ F  id
$ E’ T’id
$ E’ T’
id+id$
+id$ T’ 
$ E’ +id$ E’ +TE’
$ E’ T+
$ E’ T
+id$
id$ T  FT’

$ E’ T’ F id$ F  id
$ E’ T’id
$ E’ T’
id$
$ T’ 
$ E’ $ E’ 
$ $ accept

Constructing LL(1) Parsing Table -- Algorithm
• for each production rule A   of a grammar G
– for each terminal a in FIRST()
 add A   to M[A,a]


– If  in FIRST()
for each terminal a in FOLLOW(A) add A   to M[A,a]
– If  in FIRST() and $ in FOLLOW(A)
add A   to M[A,$]
• All other undefined entries of the parsing table
are error entries.

E  TE’ FIRST(TE’)={(,id}  E  TE’ into
M[E,(] and M[E,id]
E’  +TE’ FIRST(+TE’ )={+}  E’  +TE’ intoM[E’,+]
E’  FIRST()={}  none
 E’   into
and FOLLOW(E’)={$,)}
M[E’,$] and M[E’,)]
T  FT’ FIRST(FT’)={(,id}  T  FT’ into
M[T,(] and M[T,id]
T’  *FT’ FIRST(*FT’ )={*}  T’  *FT’ into M[T’,*]

T’  FIRST()={}
 none
 T’ 
and FOLLOW(T’)={$,),+}
into M[T’,$], M[T’,)] and M[T’,+]
FIRST((E) )={(}  F 
FIRST(id)={id}  F  id
F  (E)
(E) into M[F,(]
F  id
into M[F,id]

A Grammar which is not LL(1)
S  i C t S E | a
E  e S | 
C  b
FOLLOW(S) = { $,e }
FOLLOW(E) = { $,e }
FOLLOW(C) = { t }
FIRST(iCtSE) = {i}
FIRST(a) = {a}
FIRST(eS) = {e}
FIRST() = {}
FIRST(b) = {b}
a b e i t $
S S  a S 
iCtSE
E
two p
E  e S
E  
roductio n rules for
E 

C C  b
M[E,e]
Problem  ambiguity

• Global-Correction
– Ideally, we would like a compiler to make as few change
as possible in processing incorrect inputs.
– We have to globally analyze the input to find the error.
– This is an expensive method, and it is not in practice.

S  AbS | e | 
A  a | cAd
FOLLOW(S)={$}
FOLLOW(A)={b,d}
a b c d e $
S S 
AbS
syn
c
S  AbS syn
c
S 
e
S  
A A  a syn
c
A  cAd syn
c
sync sync
put
 AbS
 a
stack input
$S ceadb$
$SbA ceadb$
$SbdAc ceadb$
$SbdAeadb$
stack input out
output
$S aab$ S
S  AbS
$SbA aab$ A
A  cAd
$Sba aab$
$Sb ab$

$S ab$ S  AbS(Remove all input tokens until first b or d, popA)
$SbA
$Sbd
$Sba
$Sb
ab$
db$
ab$
b$
A  a
$Sb b$
$S $ S  
$ $ accept
Error : unexpected e (illegal A)

Bottom-Up Parsing
• A bottom-up parser creates the parse tree of the
given input starting from leaves towards the root.
• A bottom-up parser tries to find the right-most
derivation of the given input in the reverse order.
S  ...   (the right-most derivation of )
 (the bottom-up parser finds the right-most derivation in
the reverse order)
• Bottom-up parsing is also known as shift-reduce
parsing because its two main actions are shift and
reduce.
– At each shift action, the current symbol in the input string is pushed
to a stack.

Bottom-Up Parsing
– At each reduction step, the symbols at the top of the
stack (this symbol sequence is the right side of a
production) will replaced by the non-terminal at the left
side of that production.
– There are also two more actions: accept and error.

Shift-Reduce Parsing
• A shift-reduce parser tries to reduce the given input string
into the starting symbol.
a string  the starting symbol
reduced to
• At each reduction step, a substring of the input matching to
the right side of a production rule is replaced by the non-
terminal at the left side of that production rule.
• If the substring is chosen correctly, the right most derivation
of that string is created in the reverse order.
Rightmost Derivation: S  
*
rm
Shift-Reduce Parser finds:   ... S
rm rm

Shift-Reduce Parsing -- Example
input string: aaabb
aaAbb
aAbb 
S  aABb
A  aA | a
B  bB | b
reduction
aABb
S
S  aABb  aAbb  aaAbb  aaabb
rm rm rm rm
Right Sentential Forms
• How do we know which substring to be replaced at each
reduction step?

Handle
• Informally, a handle of a string is a substring that
matches the right side of a production rule.
– But not every substring matches the right side of a production rule is
handle
• A handle of a right sentential form  ( ) is
a production rule A   and a position of 
where the string  may be found and replaced by A
to produce
the previous right-sentential form in a rightmost
derivation of .
S  A  
• If the grammar is unambiguous, then every right-
sentential form of the grammar has exactly one handle.
• We will see that  is a string of terminals.
*
rm rm

Handle Pruning
• Then find a handle An-1n-1 inn-1,
and replace n-1 in by An-1 to getn-2.
• Repeat this, until we reach S.
• A right-most derivation in reverse can be obtained by
handle-pruning.
rm rm rm rm rm
S=0  1  2  ...  n-1  n=
input string
• Start from n, find a handle Ann in n,
and replace n in by An to getn-1.

A Shift-Reduce Parser
E  E+T |T
T  T*F |F
F  (E) | id
Right-Most Derivation of id+id*id
E  E+T  E+T*F  E+T*id  E+F*id
 E+id*id  T+id*id  F+id*id  id+id*id
Right-Most Sentential Form
id+id*id
F+id*id
T+id*id
E+id*id
Reducing Production
F  id
T  F
E  T
F  id
T  F
F  id
T  T*F
E  E+T
Handles are red and underlined in the right-sentential
E+F*id
E+T*id
E+T*F
E+T
E
forms.

A Stack Implementation of A Shift-Reduce Parser
• There are four possible actions of a shift-parser action:
1. Shift : The next input symbol is shifted onto the top of the stack.
2. Reduce: Replace the handle on the top of the stack by the non-
terminal.
3. Accept: Successful completion of parsing.
4. Error: Parser discovers a syntax error, and calls an error recovery
routine.
• Initial stack just contains only the end-marker $.
• The end of the input string is marked by the end-
marker $.

A Stack Implementation of A Shift-Reduce Parser
$E+id
$E+F
*id$
*id$
reduce by F  id
reduce by T  F T 2 T 5 *
F 6
$E+T
$E+T*
*id$
id$
shift
shift F 1 F 4
id
id id
$E+T*id
$E+T*F
$E+T
$E
$
$
$
$
reduce by F  id
reduce by T  T*F
reduce by E  E+T
accept
Stack Input Action
$ id+id*id$shift
$id
$F
$T
+id*id$
+id*id$
+id*id$
reduce by F  id
reduce by T  F
reduce by E  T
Parse Tree
E 8
$E
$E+ id*
+id*id$
id$ shift
shift
E 3 + T 7

Conflicts During Shift-Reduce Parsing
• There are context-free grammars for which shift-reduce
parsers cannot be used.
• Stack contents and the next input symbol may not
decide action:
– shift/reduce conflict: Whether make a shift operation or a reduction.
– reduce/reduce conflict: The parser cannot decide which of several
reductions to make.
• If a shift-reduce parser cannot be used for a grammar,
that grammar is called as non-LR(k) grammar.
right-most k lookhead
left to right
scanning derivation
• An ambiguous grammar can never be a LR grammar.

Shift-Reduce Parsers
• There are two main categories of shift-reduce
parsers
1. Operator-Precedence Parser
– simple, but only a small class of grammars.
2. LR-Parsers
– covers wide range of grammars.
• SLR – simple LR parser
• LR – most general LR parser
• LALR – intermediate LR parser (lookhead LR parser)
SLR
CFG
LR
LALR

• Operator
Operator-Precedence Parser
grammar
– small, but an important class of grammars
– we may have an efficient operator precedence parser (a shift-
reduce parser) for an operator grammar.
• In an operator grammar, no production rule can
have:
–  at the right side
– two adjacent non-terminals at the right side.
• Ex:
EAB EEOE EE+E |
Aa Eid E*E |
Bb O+|*|/
E/E | id
not operator grammar not operator grammar operator
grammar

Precedence Relations
• In operator-precedence parsing, we define three disjoint
precedence relations between certain pairs of terminals.
a <. b
a =· b
a .> b
b has higher precedence than a
b has same precedence as a
b has lower precedence than a
• The determination of correct precedence relations
between terminals are based on the traditional notions of
associativity and precedence of operators. (Unary minus
causes a problem).

Using Operator-Precedence Relations
• The intention of the precedence relations is to
find the handle of a right-sentential form,
<. with marking the left end,
=· appearing in the interior of the handle, and
.> marking the right hand.
• In our input string $a1a2...an$, we insert the
precedence relation between the pairs of
terminals (the precedence relation holds
between the terminals in that pair).

Using Operator -Precedence Relations
id * $
d
id
e ce .> .> .>
+ <. .> <. .>
* <. .> .> .>
$ <. <. <.
E  E+E | E-E | E*E | E/E | E^E | (E) | -E | id
The partial operator-prece n
table for this grammar
• Then the input string id+id*id with the precedence
relations inserted will be:
$ <. id .> + <. id .> * <.id .> $

ToFind The Handles
1. Scan the string from left end until the first .> is
encountered.
2. Then scan backwards (to the left) over any =· until a
<. is encountered.
3. The handle contains everything to left of the first .>
and to the right of the <. is encountered.
E  id
E  id
E  id
$ id + id * id $
$ E + id * id $
$ E + E * id $
$ E + E * .E $
E  E+E
E  E*E
$ E + E $
$ <. id .> + <. id .> * <. id .> $
$ <. + <. id .> * <. id .> $
$ <. + <. * <. id .> $
$ <. + <. * .>$
$ <. + .> $
$ $ $ E $

Operator-Precedence Parsing Algorithm
• The input string is w$, the initial stack is $ and a table
holds precedence relations between certain terminals
Algorithm:
set p to point to the first symbol of w$ ;
repeat forever
if ( $ is on top of the stack and p points to $ ) then
return
else {
let a be the topmost terminal symbol on the stack and
let b be the symbol pointed to by p;
if ( a <. b or a =· b ) then { /* SHIFT */

Operator-Precedence Parsing
Algorithm
push b onto the stack;
advance p to the next input symbol;
}
/* REDUCE */
else if ( a .> b ) then
repeat pop stack
until ( the top of stack terminal is related
by <. to the terminal most recently popped);
else error();
}

Operator-Precedence Parsing Algorithm -- Example
stack input action
$ id+id*id$ $ <. id shift
$id +id*id$ id .> + reduce E  id
$ +id*id$ shift
$+ id*id$ shift
$+id *id$ id .> * reduce E  id
$+ *id$ shift
$+* id$ shift
$+*id $ id .> $ reduce E  id
$+* $ * .> $ reduce E  E*E
$+ $ + .> $ reduce E  E+E
$ $ accept

How to Create Operator-Precedence Relations
• We use associativity and precedence relations among operators.
1. If operator O1 has higher precedence than operator O2,
 O1
.> O2 and O2 <. O1
2. If operator O1 and operator O2 have equal precedence,
they are left-associative  O1
.> O2 and O2
.> O1
they are right-associative  O1 <. O2 and O2 <. O1
(<. O, O .> ), ) .> O, O .> $, and $ <.
3. For all operators O,
O <. id, id .> O, O <. (,
O
4. Also, let
(=·) $ <. ( id .> ) ) .> $
( <. ( $ <. id id .> $ ) .> )
( <. id

Operator-Precedence Relations
+ - * / ^ id ( ) $
+ .> .> <. <. <. <. <. .> .>
- .> .> <. <. <. <. <. .> .>
* .> .> .> .> <. <. <. .> .>
/ .> .> .> .> <. <. <. .> .>
^ .> .> .> .> <. <. <. .> .>
id .> .> .> .> .> .> .>
( <. <. <. <. <. <. <. =·
) .> .> .> .> .> .> .>
$ <. <. <. <. <. <. <.

Handling Unary Minus
• Operator-Precedence parsing cannot handle the unary
minus when we have also the binary minus in our
grammar.
• The best approach to solve this problem, let the lexical
analyzer handle this problem.
– The lexical analyzer will return two different operators for the unary
minus and the binary minus.
– The lexical analyzer will need a lookhead to distinguish the binary minus
from the unary minus.
• Then, we make
for any operator
if unary-minus has higher
if unary-minus has lower
O <. unary-minus
unary-minus .> O
precedence than O
unary-minus <. O
(or equal) precedence than O

Precedence Functions
• Compilers using operator precedence parsers do
not need to store the table of precedence
relations.
• The table can be encoded by two precedence
functions f and g that map terminal symbols to
integers.
• For symbols a and b.
f(a) < g(b) whenever a <. b
f(a) = g(b) whenever a =· b
f(a) > g(b) whenever a .> b

Disadvantages of Operator Precedence Parsing
• Disadvantages:
– It cannot handle the unary minus (the lexical analyzer should
handle the unary minus).
– Small class of grammars.
– Difficult to decide which language is recognized by the
grammar.
• Advantages:
– simple
– powerful enough for expressions in programming languages

Error Recovery in Operator-Precedence Parsing
Error Cases:
1. No relation holds between the terminal on the top of
stack and the next input symbol.
2. A handle is found (reduction step), but there is no
production with this handle as a right side
Error Recovery:
1. Each empty entry is filled with a pointer to an error
routine.
2. Decides the popped handle “looks like” which right hand
side. And tries to recover from that situation.

LR Parsers
• The most powerful shift-reduce parsing (yet
efficient) is:
LR(k) parsing.
right-most k
derivation (k is omitted 
left to right
lookhead
scanning
it is 1)

LR Parsers
• LR parsing is attractive because:
– LR parsing is most general non-backtracking shift-reduce
parsing, yet it is still efficient.
– The class of grammars that can be parsed using LR
methods is a proper superset of the class of grammars
that can be parsed with predictive parsers.
LL(1)-Grammars  LR(1)-Grammars
– An LR-parser can detect a syntactic error as soon as it is
possible to do so a left-to-right scan of the input.

LR Parsers
• LR-Parsers
– covers wide range of grammars.
– SLR – simple LR parser
– CLR – most general LR parser
– LALR – intermediate LR parser (look-head LR parser)
– SLR, LR and LALR work same (they used the same
algorithm), only their parsing tables are different.

LR Parsing Algorithm
Sm
Xm
Sm-1
Xm-
1
.
.
S1
X1
S0
a1 ... ai ... an $
Action Table Goto Table
s
t
a
t
e
s
terminals and $
four different
actions
s
t
a
t
e
s
non-terminal
each item is
a state number
LR ParsingAlgorithm
stack
input
output

A Configuration of LR Parsing Algorithm
• A configuration of a LR parsing is:
( So X1 S1 ...Xm Sm, ai ai+1 ... an $ )
Stack Rest of Input
• Sm and ai decides the parser action by consulting the
parsing action table. (Initial Stack contains just So )
• A configuration of a LR parsing represents the right
sentential form:
X1 ... Xm ai ai+1 ... an $

Actions of A LR-Parser
1. shift s -- shifts the next input symbol and the state s onto
the stack
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )  ( So X1 S1 ... Xm Sm ai s, ai+1 ... an $ )
2. reduce A (or rn where n is a production number)
– pop 2|| (=r) items from the stack;
– then push A and s where s=goto[sm-r,A]
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )  ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )
– Output is the reducing production reduce A
2. Accept – Parsing successfully completed
3. Error -- Parser detected an error (an empty entry in the
action table)

Reduce Action
• pop 2|| (=r) items from the stack; let us assume
that  = Y1Y2...Yr
• then push A and s where s=goto[sm-r,A]
( So X1 S1 ... Xm-r Sm-r Y1 Sm-r ...Yr Sm, ai ai+1 ... an $ )
 ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )
• In fact, Y1Y2...Yr is a handle.
X1 ... Xm-r A ai ... an $  X1 ... Xm Y1...Yr ai ai+1 ... an $

state id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
(SLR) Parsing Tables for Expression Grammar
1) E  E+T
2) E  T
3) T  T*F
4)T  F 5)
F  (E)
6) F  id

Actions of A (S)LR-Parser -- Example
stack
0
0id5
input
id*id+id$
*id+id$
action
shift 5
reduce by Fid
output
Fid
0F3 *id+id$ reduce by TF TF
0T2 *id+id$ shift 7
0T2*7
0T2*7id5
0T2*7F10
id+id$
+id$
+id$
shift 5
reduce by Fid
reduce by TT*F
Fid
TT*F
0T2 +id$ reduce by ET ET

Actions of A (S)LR-Parser -- Example
0E1 +id$ shift 6
0E1+6 id$ shift 5
0E1+6id5 $ reduce by Fid
0E1+6F3 $ reduce by TF
0E1+6T9 $ reduce by EE+T
0E1 $ accept

Constructing SLR Parsing Tables – LR(0) Item
• An LR(0) item of a grammar G is a production of G a dot at the
some position of the right side.
• Ex: A  aBb Possible LR(0) Items: A  .aBb
(four different possibility) A 
a.Bb
A  aB.b
A  aBb.
• Sets of LR(0) items will be the states of action and goto table of
the SLR parser.
• A collection of sets of LR(0) items (the canonical LR(0) collection)
is the basis for constructing SLR parsers.
• Augmented Grammar:
G’ is G with a new production rule “’“ where “’ is the new
starting symbol.

The Closure Operation
• If I is a set of LR(0) items for a grammar G,
then closure(I) is the set of LR(0) items
constructed from I by the two rules:
1. Initially, every LR(0) item in I is added to closure(I).
2. If A  .B is in closure(I) and B is a productionrule
of G; then B. will be in the closure(I).
We will apply this rule until no more new LR(0) items can
be added to closure(I).

The Closure Operation -- Example
E’  E closure({E’  .E}) =
{ E’  .E
E  .E+T
E  .T
T  .T*F
T  .F
E  E+T
kernel items
E  T
T  T*F
T  F
F  (E)
F  id F  .(E)
F  .id }

Goto Operation
• If I is a set of LR(0) items and X is a grammar symbol (terminal or non-terminal), then goto(I,X) is
defined as follows:
– If A  .X in I then every item in closure({A  X.})will be in goto(I,X).
Example:
I ={ E’  .E, E  .E+T, E  .T,
T  .T*F, T  .F,
F  .(E), F  .id }
goto(I,E) = { E’  E., E  E.+T }
goto(I,T) = { E  T., T  T.*F }
goto(I,F) = {T  F.}
goto(I,() = { F  (.E), E  .E+T, E  .T,T  .T*F, T .F,
F  .(E), F  .id }
goto(I,id) = { F  id. }

Construction of The Canonical LR(0) Collection
• Tocreate the SLR parsing tables for a grammar G, we will
create the canonical L‘(O) collection of the grammar G’.
• Algorithm:
C is { closure({“’.S}) }
repeat the followings until no more set of LR(0) items can be added to C.
for each I in C and each grammar symbol X
if goto(I,X) is not empty and not in C
add goto(I,X) to C
• goto function is a DFA on the sets in C.

The Canonical LR(0) Collection --
Example

Transition Diagram (DFA) of Goto Function
I0 I1
I2
I3
I4
I5
I6
I7
I8
to I2
to I3
to I4
I9
to I3
to I4
I11
to I6
id to I5
I10
to I4
to I5
(
F
*
E
E
+
T
T
)
T
F
F
F
(
id
id
(
* to I7
(
id
+

Constructing SLR Parsing Table
(of an augumented grammar G’)
1. Construct the canonical collection of sets of
L‘(O) items for G’. C{I0,...,In}
2. Create the parsing action table as follows
• If a is a terminal, A.a in Ii and goto(Ii,a)=Ij then
action[i,a] is shift j.
• If A. is in Ii , then action[i,a] is reduce A for all a
in FOLLOW(A) where A“’.
• If “’S. is in Ii , then action[i,$] is accept.
• If any conflicting actions generated by these rules, the
grammar is not SLR(1).

Constructing SLR Parsing Table
3. Create the parsing gototable
• for all non-terminals A, if goto(Ii,A)=Ij then goto[i,A]=j
4. All entries not defined by (2) and (3) are
errors.
5. Initial state of the parser contains “’.S

state id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
Parsing Tables of Expression Grammar

SLR(1) Grammar
• An LR parser using SLR(1) parsing tables for a
grammar G is called as the SLR(1) parser for
G.
• If a grammar G has an SLR(1) parsing table, it
is called SLR(1) grammar (or SLR grammar in
short).
• Every SLR grammar is unambiguous, but
every unambiguous grammar is not a SLR
grammar.

shift/reduce and reduce/reduce conflicts
• If a state does not know whether it will make a shift
operation or reduction for a terminal, we say that
there is a shift/reduce conflict.
• If a state does not know whether it will make a
reduction operation using the production rule i or j
for a terminal, we say that there is a reduce/reduce
conflict.
• If the SLR parsing table of a grammar G has a conflict,
we say that that grammar is not SLR grammar.

Conflict Example
I1: “’ S. I6: S L=.R I9: S 
R  .L
I2: S  L.=R L .*R
R  L. L  .id
S  L=R
L=R.
S  R
L *R
L  id
R  L
I0: “’  .S
S  .L=R
S  .R
L  .*R
L  .id
R  .L I3: S R.
I7: L *R.
Problem
FOLLOW(R)={=,$} I8: R L.
=
I4: L  *.R
R  .L
L .*R
L  .id
shift 6
reduce by R  L
shift/reduce conflict I5: L id.

Conflict Example2
S  AaAb
S  BbBa
A  
B  
I0: “’  .S
S  .AaAb
S  .BbBa
A  .
B  .
Problem
FOLLOW(A)={a,b}
FOLLOW(B)={a,b}
b
a reduce by A  
reduce by B  
reduce/reduce conflict
reduce by A  
reduce by B  
reduce/reduce conflict

Constructing Canonical LR(1) Parsing Tables
• In SLR method, the state i makes a reduction by A
when the current token is a:
– if the A. in the Ii and a is FOLLOW(A)
• In some situations, A cannot be followed by the
terminal a in a right-sentential form when 
and the state i are on the top stack. This means
that making reduction in this case is not correct.
S  AaAb
S  BbBa
SAaAbAabab SBbBaBbaba
A   Aab   ab Bba   ba
B   AaAb  Aa  b BbBa  Bb a

LR(1) Item
• Toavoid some of invalid reductions, the states need
to carry more information.
• Extra information is put into a state by including a
terminal symbol as a second component in an item.
where a is the look-head of
• A LR(1) item is:
A  .,a
the LR(1) item
(a is a terminal or end-
marker.)

LR(1) Item (cont.)
• When  ( in the LR(1) item A  .,a ) is not
empty, the look-head does not have any affect.
• When  is empty (A  .,a ), we do the
reduction by A only if the next input symbol
is a (not for any terminal in FOLLOW(A)).
• A state will contain A  .,a1 where
{a1,...,an}  FOLLOW(A)
...
A  .,an

Canonical Collection of Sets of LR(1) Items
• The construction of the canonical collection of
the sets of LR(1) items are similar to the
construction of the canonical collection of the
sets of LR(0) items, except that closure and goto
operations work a little bit different.
closure(I) is: ( where I is a set of LR(1) items)
– every LR(1) item in I is in closure(I)
– if A.B,a in closure(I) and B is a production rule of G;
then B.,b will be in the closure(I) for each terminal b in
FIRST(a) .

goto operation
• If I is a set of LR(1) items and X is a grammar
symbol (terminal or non-terminal), then
goto(I,X) is defined as follows:
– If A  .X,a in I
then every item in closure({A  X.,a}) will be in
goto(I,X).

Construction of The Canonical LR(1) Collection
• Algorithm:
C is { closure({“’.S,$}) }
repeat the followings until no more set of LR(1) items can be added to
C.
for each I in C and each grammar symbol X
if goto(I,X) is not empty and not in C
add goto(I,X) to C
• goto function is a DFA on the sets in C.

A Short Notation for The Sets of LR(1) Items
• A set of LR(1) items containing the following
items
A  .,a1
...
A  .,an
can be written as
A  .,a1/a2/.../an

Canonical LR(1) Collection -- Example
2
I : S  A.aAb ,$
S  AaAb
S  BbBa
A  
B  
I0: “’  .S ,$
S  .AaAb ,$
S  .BbBa,$
A  . ,a
B  .,b I3: S  B.bBa,$
I6: S  AaA.b ,$ I8: S  AaAb.,$
I4: S  Aa.Ab ,$
A  . ,b
I7: S  BbB.a ,$ I9: S  BbBa.,$
I5: S  Bb.Ba,$
B  . ,a
S
AI1:“’  S.,$
B
a
b
A
B
a
b
to I4
to I5

Canonical LR(1) Collection – Example2
“’  S
1) S  L=R
2) S  R
3) L *R
4) L  id
5) R  L
I0:“’  .S,$
S  .L=R,$
S  .R,$
L  .*R,$/=
L  .id,$/=
R  .L,$
I1:“’  S.,$
I3:S  R.,$
I4:L  *.R,$/=
R  .L,$/=
L .*R,$/=
L  .id,$/=
5
I :L id.,$/=
6
I :S L=.R,$
R  .L,$
L  .*R,$
L  .id,$
I7:L  *R.,$/=
I8: R  L.,$/=
I9:S  L=R.,$
I10:R  L.,$
11
I :L  *.R,$
R  .L,$
L .*R,$
L  .id,$
12
I :L  id.,$
I13:L  *R.,$
to I6
to I7
to I8
to I4
to I5
to I10
to I11
to I12
to I9
to I10
to I11
to I12
to I13
i
d
S
L I2:S  L.=R,$
R  L.,$
R
R
L
R
id
i
d
i
d
R
L
L
*
*
*
*
I4 and I11
I5 and I12
I7 and I13
8
I and I10

Construction of LR(1) Parsing Tables
1. Construct the canonical collection of setsof
L‘(1) items for G’. C{I0,...,In}
2. Create the parsing action table as follows
• If a is a terminal, A.a,b in Ii and goto(Ii,a)=Ij then
action[i,a] is shift j.
• If A.,a is in Ii , then action[i,a] is reduce A where
A“’.
• If “’S.,$ is in Ii , then action[i,$] is accept.
• If any conflicting actions generated by these rules, the
grammar is not LR(1).

Construction of LR(1) Parsing Tables
3. Create the parsing gototable
• for all non-terminals A, if goto(Ii,A)=Ij then goto[i,A]=j
4. All entries not defined by (2) and (3) are
errors.
5. Initial state of the parser contains “’.S,$

LR(1) Parsing Tables – (for Example2)
id * = $ S L R
no shift/reduce
no reduce/redu

so, it is a LR(1) g
0 s5 s4 1 2 3
1 acc
2 s6 r5
3 r2
4 s5 s4 8 7
5 r4 r4
6 s12 s11 10 9
7 r3 r3
8 r5 r5
9 r1
10 r5
11 s12 s11 10 13
12 r4
13 r3
or
ce conflict
rammar

LALR Parsing Tables
• LALR stands for LookAhead LR.
• LALR parsers are often used in practice because LALR
parsing tables are smaller than LR(1) parsing tables.
• The number of states in SLR and LALR parsing tables for
a grammar G are equal.
• But LALR parsers recognize more grammars than SLR
parsers.
• yacc creates a LALR parser for the given grammar.
• A state of LALR parser will be again a set of LR(1) items.

Creating LALR Parsing Tables

Canonical LR(1) Parser
LALR Parser
shrink # of states
• This shrink process may introduce a
reduce/reduce conflict in the resulting LALR
parser (so the grammar is NOT LALR)
• But, this shrink process does not produce a
shift/reduce conflict.

The Core of A Set of LR(1) Items
• The core of a set of LR(1) items is the set of its first component.
Ex: S  L.=R,$  S  L.=R Core
R  L.,$ R  L.
• We will find the states (sets of LR(1) items) in a canonical LR(1)
parser with same cores. Then we will merge them as a single
state.
I1:L  id.,= A new state: I12: L  id.,=
L  id.,$
I2:L  id.,$

have same core, merge them

The Core of A Set of LR(1) Items
• We will do this for all states of a canonical
LR(1) parser to get the states of the LALR
parser.
• In fact, the number of the states of the LALR
parser for a grammar will be equal to the
number of states of the SLR parser for that
grammar.

Creation of LALR Parsing Tables
• Create the canonical LR(1) collection of the sets of LR(1)
items for the given grammar.
• Find each core; find all sets having that same core; replace
those sets having same cores with a single set which is
their union.
C={I0,...,In}  C’={J1,...,Jm} where m  n
• Create the parsing tables (action and goto tables) same as
the construction of the parsing tables of LR(1) parser.
– Note that: If J=I1  ...  Ik since I1,...,Ik have same cores
 cores of goto(I1,X),...,goto(I2,X) must be same.
– So, goto(J,X)=K where K is the union of all sets of items having same
cores as goto(I1,X).
• If no conflict is introduced, the grammar is LALR(1)
grammar. (We may only introduce reduce/reduce
conflicts; we cannot introduce a shift/reduce conflict)

Shift/Reduce Conflict
• We say that we cannot introduce a shift/reduce conflict during the shrink
process for the creation of the states of a LALR parser.
• Assume that we can introduce a shift/reduce conflict. In this case, a state
of LALR parser must have:
A  .,a and B  .a,b
• This means that a state of the canonical LR(1) parser must have:
A  .,a and B  .a,c
But, this state has also a shift/reduce conflict. i.e. The original canonical
LR(1) parser has a conflict.
(Reason for this, the shift operation does not depend on lookaheads)

Reduce/Reduce Conflict
• But, we may introduce a reduce/reduce conflict during the shrink
process for the creation of the states of a LALR parser.
I1 : A  .,a
B  .,b
I2: A  .,b
B  .,c

 reduce/reduce
conflict
I12: A  .,a/b
B  .,b/c

Canonical LALR(1) Collection – Example2
“’  S
1) S  L=R
2) S  R
3) L *R
4) L  id
5) R  L
S  .R,$
L 
.*R,$/=
L  .id,$/=
R  .L,$
2
I :S  L.=R,$
R  L.,$
I3:S 
R.,$
S  .L=R,$ R  .L,$/=
L .*R,$/=
L  .id,$/=
I512:L 
id.,$/=
I :S  L=.R,$
R  .
6
L  .L,$
*R,$
L  .id,$
I713:L 
*R.,$/=
I810: R 
L.,$/=
I9:S  L=R.,$
to I6
to I713
to I810
to I411
to I512
to I810
to I411
to I512
to I9
S L
L
R
L
R
id
i
d
i
d
I0:“’  .S,$ I1:“’  S.,$ I411:L  *.R,$/= R
*
*
*
Same Cores
I4 and I11
I5 and I12
I7 and I13
8
I and I10

LALR(1) Parsing Tables – (for Example2)
id * = $ S L R
no shift/reduce or
no reduce/reduce

so, it is a LALR(1) g
0 s5 s4 1 2 3
1 acc
2 s6 r5
3 r2
4 s5 s4 8 7
5 r4 r4
6 s12 s11 10 9
7 r3 r3
8 r5 r5
9 r1
conflict
rammar

Using Ambiguous Grammars
• All grammars used in the construction of LR-parsing tables must be un-
ambiguous.
• Can we create LR-parsing tables for ambiguous grammars ?
– Yes, but they will have conflicts.
– We can resolve these conflicts in favor of one of them to disambiguate
the grammar.
– At the end, we will have again an unambiguous grammar.
• Why we want to use an ambiguous grammar?
– Some of the ambiguous grammars are much natural, and a
corresponding unambiguous grammar can be very complex.
– Usage of an ambiguous grammar may eliminate unnecessary reductions.
• Ex.
E  E+T | T
E  E+E | E*E | (E) | id  T  T*F | F
F  (E) | id

Sets of LR(0) Items for Ambiguous Grammar
E  .E+E
E  .E*E
E  .(E)
E  .id
E  E .*E
2
I : E  (.E)
E  .E+E
E  .E*E
E  .(E)
E  .id
I3: E  id.
E  E .+E E  .E+E
E  .E*E
E  .(E)
E  .id
5
I : E  E *.E
E  .E+E
E  .E*E
E  .(E)
E  .id
I6: E  (E.)
E  E.+E
E  E.*E
I0: E’  .E I1: E’  E. I4: E  E +.E I7: E  E+E.
E  E.+E
E  E.*E
8
I : E E*E .
E  E.+E
E  E.*E
I9: E  (E).
5
)
E
E
E
*
+
+
E +
*
+
*
I
*
(
(
(
(
id
id
id
id
I2
I2
I3
I3
I4
I4
I5
I4
I5

SLR-Parsing Tables for Ambiguous Grammar
FOLLOW(E) = { $,+,*,) }
State I7 has shift/reduce conflicts for symbols + and *.
I0 I7
E I1
+ I4
E
when current token is +
shift  + is right-associative
reduce  + is left-associative
when current token is *
shift  * has higher precedence than +
reduce  + has higher precedence than *

FOLLOW(E) = { $,+,*,) }
State I8 has shift/reduce conflicts for symbols + and *.
I0 I7
E I1
* I5
E
when current token is *
shift  * is right-associative
reduce  * is left-associative
when current token is +
shift  + has higher precedence than *
reduce  * has higher precedence than +

id + * ( ) $ E
0 s3 s2 1
1 s4 s5 acc
2 s3 s2 6
3 r4 r4 r4 r4
4 s3 s2 7
5 s3 s2 8
6 s4 s5 s9
7 r1 s5 r1 r1
8 r2 r2 r2 r2
9 r3 r3 r3 r3
Action Goto

Error Recovery in LR Parsing
• An LR parser will detect an error when it consults the
parsing action table and finds an error entry. All empty
entries in the action table are error entries.
• Errors are never detected by consulting the goto table.
• An LR parser will announce error as soon as there is no
valid continuation for the scanned portion of the input.
• A canonical LR parser (LR(1) parser) will never make
even a single reduction before announcing an error.
• The SLR and LALR parsers may make several reductions
before announcing an error.
• But, all LR parsers (LR(1), LALR and SLR parsers) will
never shift an erroneous input symbol onto the stack.

Panic Mode Error Recovery in LR Parsing
• Scan down the stack until a state s with a goto on a
particular nonterminal A is found. (Get rid of
everything from the stack before this state s).
• Discard zero or more input symbols until a symbol a is
found that can legitimately follow A.
– The symbol a is simply in FOLLOW(A), but this may not work forall
situations.
• The parser stacks the nonterminal A and the state
goto[s,A], and it resumes the normal parsing.
• This nonterminal A is normally is a basic programming
block (there can be more than one choice for A).
– stmt, expr, block, ...

Phrase-Level Error Recovery in LR Parsing
• Each empty entry in the action table is marked
with a specific error routine.
• An error routine reflects the error that the user
most likely will make in that case.
• An error routine inserts the symbols into the
stack or the input (or it deletes the symbols
from the stack and the input, or it can do both
insertion and deletion).
– missing operand
– unbalanced right parenthesis

UNIT-3
PART-A
Semantic analysis
276

277
Syntax Analysis Example
a := b + c* 100
 The seven tokens are grouped into a parse tree
Assignment stmt
identifier
a
:= expression
expression expression
+
identifier
b
c*100

Example of Parse Tree
278
Given the grammar:
list  list + digit
list  list - digit
(2.2)
(2.3)
list  digit (2.4)
digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 (2.5)
What is the parse
tree for 9-5+2?
list
digit
list digit
list digit
9 - 5 + 2

Abstract Syntax Tree (AST)
279
The AST is a condensed/simplified/abstract form of
the parse tree in which:
1. Operators are directly associated with interior nodes
(non-terminals)
2. Chains of single productions are collapsed.
3. Terminals that have no attributes are ignored, i.e.,
the corresponding leaf nodes are discarded.

Abstract and Concrete Trees
280
list
digit
list digit
list digit
9 - 5 + 2
Parse or concrete tree
+
- 2
9 5
Abstract syntaxtree

Advantages of the AST Representation
281
• Convenient representation for semantic analysis and
intermediate-language (IL) generation
• Useful for building other programming language tools e.t., a
syntax-directed editor

282
Syntax Directed Translation (SDT)
Syntax-directed translation is a method of translating a string into a
sequence of actions by attaching on such action to each rule of agrammar.
A syntax-directed translation is defined by augmenting the CFG: a translation
rule is defined for each production. A translation rule defines the translation
of the left-hand side non terminal.

283
Syntax-Directed Definitions and Translation
Schemes
A. Syntax-Directed Definitions:
• give high-level specifications for translations
• hide many implementation details such as order of evaluation of
semantic actions.
• We associate a production rule with a set of semantic actions, and we do
not say when they will be evaluated.
B. Translation Schemes:
• Indicate the order of evaluation of semantic actions associated with
a production rule.
• In other words, translation schemes give a little bit information about
implementation details.

Example Syntax-Directed Definition
term ::= ID
{ term.place := ID.place ; term.code = “” }
term1 ::= term2 *ID
{term1.place := newtemp( );
term1.code := term2.code || ID.code ||
gen(term1.place ‘:=‘ term2.place ‘*’ ID.place}
expr ::= term
{ expr.place := term.place ; expr.code := term.code }
expr1 ::= expr2 +term
{ expr1.place := newtemp( )
expr1.code := expr2.code || term.code ||
gen(expr1.place ‘:=‘ expr2.place ‘+’ term.place }
284

COMPILER DESIGN PPTS.pptx

Recommended

Recommended

More Related Content

Similar to COMPILER DESIGN PPTS.pptx

Similar to COMPILER DESIGN PPTS.pptx (20)

More from MUSHAMHARIKIRAN6737

More from MUSHAMHARIKIRAN6737 (7)

Recently uploaded

Recently uploaded (20)

COMPILER DESIGN PPTS.pptx