尊敬的 微信汇率:1円 ≈ 0.046166 元 支付宝汇率:1円 ≈ 0.046257元 [退出登录]
SlideShare a Scribd company logo
Chapter 4
Syntax Analysis
Introduction
• The syntax of programming language constructs can be specified by context-free
grammars or BNF (Backus-Naur Form) notation, Grammars offer significant benefits
for both language designers and compiler writers.
• A grammar gives a precise, yet easy-to-understand, syntactic specification of a
programming language.
• From certain classes of grammars, we can construct automatically an efficient
parser that determines the syntactic structure of a source program.
• The structure imparted to a language by a properly designed grammar is useful for
translating source programs into correct object code and for detecting errors.
• A grammar allows a language to be evolved or developed iteratively, by adding new
constructs to perform new tasks.
4.1.1 The Role of the Parser
• In our compiler model, the parser obtains a string of tokens from the lexical analyzer, as
shown in Fig. 4.1, and verifies that the string of token names can be generated by the
grammar for the source language.
4.1.1 The Role of the Parser
• There are three general types of parsers for grammars: universal, top-down, and
bottom-up.
• Universal parsing methods such as the Cocke-Younger-Kasami algorithm and
Earley's algorithm can parse any grammar. These general methods are, however,
too inefficient to use in production compilers.
• The methods commonly used in compilers can be classified as being either top-
down or bottom-up. As implied by their names, top-down methods build parse trees
from the top (root) to the bottom (leaves), while bottom-up methods start from the
leaves and work their way up to the root. In either case, the input to the parser is
scanned from left to right, one symbol at a time.
• Parsers implemented by hand often use LL grammars. Parsers for the larger class of
LR grammars are usually constructed using automated tools.
4.1.2 Representative Grammars
• Constructs that begin with keywords like while or int, are relatively easy to parse,
because the keyword guides the choice of the grammar production that must be
applied to match the input. We therefore concentrate on expressions, which present
more of challenge, because of the associativity and precedence of operators.
• Associativity and precedence are captured in the following grammar. E represents
expressions consisting of terms separated by + signs, T represents terms consisting
of factors separated by * signs, and F represents factors that can be either
parenthesized expressions or identifiers:
4.1.2 Representative Grammars
• Expression grammar (4.1)belongs to the class of LR grammars that are suitable for
bottom-up parsing. This grammar can be adapted to handle additional operators and
additional levels of precedence. However, it cannot be used for top-down parsing
because it is left recursive. The following non-left-recursive variant of the expression
grammar (4.1) will be used for top-down parsing:
4.1.3 Syntax Error Handling
The remainder of this section considers the nature of syntactic errors and general
strategies for error recovery. Two of these strategies, called panic-mode and phrase-
level recovery.
• Most programming language specifications do not describe how a compiler should
respond to errors; error handling is left to the compiler designer. Planning the error
handling right from the start can both simplify the structure of a compiler and improve
its handling of errors.
• Common programming errors can occur at many different levels.
• Lexical errors include misspellings of identifiers, keywords, or operators e.g., the use
of an identifier elipsesize instead of
e l l i p s e s i z e - and missing quotes around text intended as a string.
4.1.3 Syntax Error Handling
• Syntactic errors include misplaced semicolons or extra or missing braces; that is, '(("
or ")." As another example, in C or Java, the appearance of a case statement without
an enclosing switch is a syntactic error (however,this situation is usually allowed by
the parser and caught later in the processing, as the compiler attempts to generate
code).
• Semantic errors include type mismatches between operators and operands. An
example is a return statement in a Java method with result type void.
• Logical errors can be anything from incorrect reasoning on the part of the
programmer to the use in a C program of the assignment operator = instead of the
comparison operator ==. The program containing = may be well formed; however, it
may not reflect the programmer's intent.
4.1.3 Syntax Error Handling
• The precision of parsing methods allows syntactic errors to be detected very
efficiently. Several parsing methods, such as the LL and LR methods, detect an
error as soon as possible; that is, when the stream of tokens from the lexical
analyzer cannot be parsed further according to the grammar for the language.
• Another reason for emphasizing error recovery during parsing is that many errors
appear syntactic, whatever their cause, and are exposed when parsing cannot
continue. A few semantic errors, such as type mismatches, can also be detected
efficiently; however, accurate detection of semantic and logical errors at compile
time is in general a difficult task.
4.1.3 Syntax Error Handling
The error handler in a parser has goals that are simple to state but challenging to
realize:
1. Report the presence of errors clearly and accurately.
2. Recover from each error quickly enough to detect subsequent errors.
3. Add minimal overhead to the processing of correct programs.
Fortunately, common errors are simple ones, and a relatively straightforward error-
handling mechanism often suffices.
How should an error handler report the presence of an error?
At the very least, it must report the place in the source program where an error is
detected, because there is a good chance that the actual error occurred within the
previous
few tokens. A common strategy is to print the offending line with a pointer to the
position at which an error is detected.
4.1.4 Error-Recovery Strategies
• Once an error is detected, how should the parser recover? Although no strategy
has proven itself universally acceptable, a few methods have broad applicability. The
simplest approach is for the parser to quit with an informative error message when it
detects the first error.
• The balance of this section is devoted to the following recovery strategies:
panic-mode, phrase-level, error-productions, and global-correction.
1. Panic-Mode Recovery
- the parser discards input symbols one at a time until one of a designated set of
synchronizing tokens (i.e, delimiters: ; , }) is found.
- often skips a considerable amount of input without checking it for additional
errors,
- advantage of simplicity
- guaranteed not to go into an infinite loop.
4.1.4 Error-Recovery Strategies
2. Phrase-Level Recovery
- a parser may perform local correction on the remaining input
- it may replace a prefix of the remaining input by some string that allows the parser to
continue. - A typical local correction is to replace a comma by a semicolon, delete an
extraneous semicolon, or insert a missing semicolon.
- The choice of the local correction is left to the compiler designer.
- used in several error-repairing compilers, as it can correct any input string.
- Its major drawback is the difficulty it has in coping with situations in which the actual
error has occurred before the point of detection.
3. Error Productions
- Anticipating common errors that might be encountered
- A parser constructed from a grammar augmented by these error productions detects
the anticipated errors when an error production is used during parsing.
- The parser can then generate appropriate error diagnostics about the erroneous
construct that has been recognized in the input.
4.1.4 Error-Recovery Strategies
4. Global Correction
- There are algorithms for choosing a minimal sequence of changes
to obtain a globally least-cost correction.
- Given an incorrect input string x and grammar G, these algorithms
will find a parse tree for a related string y
- the number of insertions, deletions, and changes of tokens
required to transform x into y is as small as possible.
- too costly to implement in terms of time and space, so these
techniques are currently only of theoretical interest.
4.2 Context-Free Grammars
• Grammars were introduced in Section 2.2 to systematically describe the syntax
of programming language constructs like expressions and statements. Using
a syntactic variable stmt to denote statements and variable expr to denote
expressions, the production
• specifies the structure of this form of conditional statement. Other productions
then define precisely what an expr is and what else a stmt can be.
4.2.1 The Formal Definition of a Context-Free
Grammar
From Section 2.2, a context-free grammar (grammar for short) consists of terminals,
nonterminals, a start symbol, and productions.
1. Terminals are the basic symbols from which strings are formed. The term "token
name" is a synonym for “terminal" and frequently we will use the word "token" for
terminal when it is clear that we are talking about just the token name. We assume that
the terminals are the first components of the tokens output by the lexical analyzer. In
(4.4), the terminals are
the keywords if and else and the symbols "(" and ") ."
2. Nonterminals are syntactic variables that denote sets of strings. In (4.4), stmt and
expr are nonterminals. The sets of strings denoted by nonterminals help define the
language generated by the grammar. Nonterminals impose a hierarchical structure on
the language that is key to syntax analysis and translation.
3. In a grammar, one nonterminal is distinguished as the start symbol, and the set
of strings it denotes is the language generated by the grammar. Conventionally, the
productions for the start symbol are listed first.
4. The productions of a grammar specify the manner in which the terminals and
nonterminals can be combined to form strings. Each production consists of:
(a) A nonterminal called the head or left side of the production; this production
defines some of the strings denoted by the head.
(b) The symbol has been used in place of the arrow.
(c) A body or right side consisting of zero or more terminals and nonterminals. The
components of the body describe one way in which strings of the nonterminal at the
head can be constructed.
4.2.1 The Formal Definition of a Context-Free Grammar
4.2.2 Notational Conventions
• To avoid always having to state that "these are the terminals," "these are the nontermiaals,"
and so on, the following notational conventions for grammars will be used throughout the
remainder of this book.
1. These symbols are terminals:
(a) Lowercase letters early in the alphabet, such as a, b, c.
(b) Operator symbols such as +, *, and so on.
(c) Punctuation symbols such as parentheses, comma, and so on.
(d) The digits 0,1,. .. ,9.
(e) Boldface strings such as id or if, each of which represents a single terminal symbol.
2. These symbols are nonterminals:
(a) Uppercase letters early in the alphabet, such as A, B, C.
(b) The letter S, which, when it appears, is usually the start symbol.
(c) Lowercase, italic names such as expr or stmt.
(d) When discussing programming constructs, uppercase letters may be used to
represent nonterminals for the constructs. For example, nonterminals for expressions,
terms, and factors are often represented by E, T, and F, respectively
4.2.2 Notational Conventions
• 3. Uppercase letters late in the alphabet, such as X, Y, Z represent grammar
symbols; that is, either nonterminals or terminals.
• 4. Lowercase letters late in the alphabet, chiefly u, v, ...,z , represent (possibly
empty) strings of terminals.
• 5. Lowercase Greek letters, a, ,O, y for example, represent (possibly empty)
strings of grammar symbols. Thus, a generic production can be written
as A + a, where A is the head and a the body.
• 6. A set of productions A -+al, A + a2,... , A -+ a k with a common head
A (call them A-productions), may be written A + a1 / a s I .. I ak. Call
a l , a2,. .. ,a k the alternatives for A.
• 7. Unless stated otherwise, the head of the first production is the start symbol.
4.2.2 Notational Conventions
4.2.3 Derivations
• The construction of a parse tree can be made precise by taking a derivational view,
in which productions are treated as rewriting rules.
4.2.4 Parse Trees and Derivations
• A parse tree is a graphical representation of a derivation that filters out the order in which productions
are applied to replace nonterminals.
• For example, the parse tree for -(id + id) in Fig. 4.3, results from the derivation (4.8) as well as
derivation (4.9).
The leaves of a parse tree are labeled by nonterminals or terminals and, read from left to right,
constitute a sentential form, called the yield or frontier of the tree.
4.2.5 Ambiguity
• From Section 2.2.4, a grammar that produces more than one parse tree for some sentence is said to
be ambiguous. Put another way, an ambiguous grammar is one that produces more than one leftmost
derivation or more than one rightmost derivation for the same sentence.
4.2.5 Ambiguity
• For most parsers, it is desirable that the grammar be made unambiguous,
for if it is not, we cannot uniquely determine which parse tree to select for a
sentence. In other cases, it is convenient to use carefully chosen
ambiguous grammars, together with disambiguating rules that "throw
away" undesirable parse trees, leaving only one tree for each sentence.
4.2.7 Context-Free Grammars Versus Regular Expressions
4.2.7 Context-Free Grammars Versus Regular
Expressions
• We can construct mechanically a grammar to recognize the same
language
as a nondeterministic finite automaton (NFA). The grammar above was
constructed from the NFA in Fig. 3.24 using the following construction:
4.2.7 Context-Free Grammars Versus Regular
Expressions
• Colloquially, we say that "finite automata cannot count,"
meaning that
a finite automaton cannot accept a language like {a^n b^n I n >
1) that would require it to keep count of the number of a's
before it sees the b's. Likewise, "a grammar can count two
items but not three," as we shall see when we consider non-
context-free language constructs in Section 4.3.5.
4.3.1 Lexical Versus Syntactic Analysis
• Everything that can be described by a regular expression can also be described by a grammar. We
may therefore reasonably ask: "Why use regular expressions to define the lexical syntax of a
language?"
There are several reasons.
1. Separating the syntactic structure of a language into lexical and nonlexical parts provides a
convenient way of modularizing the front end of a compiler into two manageable-sized
components.
2. The lexical rules of a language are frequently quite simple, and to describe them we do not
need a notation as powerful as grammars.
3. Regular expressions generally provide a more concise and easier-to-understand notation
for tokens than grammars.
4. More efficient lexical analyzers can be constructed automatically from regular expressions
than from arbitrary grammars.
• There are no firm guidelines as to what to put into the lexical rules, as opposed to the syntactic
rules. Regular expressions are most useful for describing the structure of constructs such as
identifiers, constants, keywords, and white space. Grammars, on the other hand, are most useful for
describing nested structures such as balanced parentheses, matching begin-end's, corresponding if-
4.3.2 Eliminating Ambiguity
• Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity. As an example, we shall
eliminate the ambiguity from the following "dangling else" grammar:
• In all programming languages with conditional statements of this form, the first parse tree is preferred.
The general rule is, "Match each else with the closest unmatched then." This disambiguating rule can
theoretically be incorporated directly into a grammar, but in practice it is rarely built into the productions.
4.3.3 Elimination of Left Recursion
• The nonterminal A generates the same strings as before but is no longer left recursive. This
procedure eliminates all left recursion from the A and A' productions, but it does not eliminate left
recursion involving derivations of two or more steps.
4.3.4 Left Factoring
Exercises for Section 4.3
Top down parsing
• Top-down parsing processes the input string provided by a lexical analyzer.
• The top-down parsing first creates the root node of the parse tree. And it continues
creating its leaf nodes
• The process of constructing the parse tree which starts from the root and goes down to the
leaf is Top-Down Parsing.
• Top-Down Parsers constructs from the Grammar which is free from ambiguity and left
recursion.
• Top-Down Parsers uses leftmost derivation to construct a parse tree.
• It does not allow Grammar With Common Prefixes.
• Top-down parsers are also called predictive parsers.
Example
Recursive-Descent Parsing
• Recursive descent is a top-down parsing technique that constructs the
parse tree from the top and the input is read from left to right. It uses
procedures for every terminal and non-terminal entity. This parsing
technique recursively parses the input to make a parse tree, which may or
may not require back-tracking.
• A form of recursive-descent parsing that does not require any back-
tracking is known as predictive parsing.
Back-tracking
• General recursive-descent may require backtracking; that is, it may
require repeated scans over the input.
• Top- down parsers start from the root node (start symbol) and match the
input string against the production rules to replace them (if matched).
Example of back-tracking
FIRST and FOLLOW
• The construction of both top-down and bottom-up parsers is supported by two
functions, FIRST and FOLLOW, associated with a grammar G.
• FIRST and FOLLOW are two functions related with grammar that help us fill in the
entries of an M-table.
Computation of FIRST
• FIRST (a) is defined as the
collection of terminal
symbols which are the first
letters of strings.
Computation of Follow
• Follow (A) is defined as the
collection of terminal symbols that
occur directly to the right of A.
Rules of FOLLOW
• If S is the start symbol, FOLLOW (S)
={$}
• If there is a production A  aB ,
then everything in FIRST() except
 is in FOLLOW(B).
• If there is a production A  aB, or
a production A  aB, where
FIRST() contains , then everything
in FOLLOW (A) is in FOLLOW (B)
Rules of first
• A symbol c is in FIRST (α) if and
only if α ⇒ cβ for some sequence
β of grammar symbols.
• If Y1 is Non-terminal and
If Y1 does not derive to an empty
string i.e., If FIRST (Y1) does not
contain ε then, FIRST (X) = FIRST (Y1,
Y2, Y3) = FIRST(Y1)
• If X   is a production, then add
 to FIRST(X)
• Calculate the first and follow functions
for the given grammar.
S → (L) / a
L → SL’
L’ → ,SL’ / ∈
• First Functions-
First(S) = { ( , a }
First(L) = First(S) = { ( , a }
First(L’) = { , , ∈ }
Example
• Follow Functions-
Follow(S) = { $ } ∪ { First(L’) – ∈ } ∪
Follow(L) ∪ Follow(L’) = { $ , , , ) }
Follow(L) = { ) }
Follow(L’) = Follow(L) = { ) }
????
LL(1) Grammars
• context-free grammar G = (VT, VN, S, P) whose parsing table has no multiple entries is
said to be LL(1).
• The first L stands for scanning the input from left to right,
• The second L stands for producing a leftmost derivation,
• The 1 stands for using one input symbol of lookahead at each step to make parsing
action decision.
• A language is said to be LL(1) if it can be generated by a LL(1) grammar.
conditions to check first are as follows:
1. The grammar is free from left recursion.
2. The grammar should not be ambiguous.
Algorithm to construct LL(1) Parsing Table:
Step 1: First check all the important conditions mentioned above and go to
step 2.
Step 2: Calculate First() and Follow() for all non-terminals.
Step 3: For each production A –> α. (A tends to alpha)
Find First(α) and for each terminal in First(α), make entry A –> α in the table.
If First(α) contains ε as terminal than,
find the Follow(A) and for each terminal in Follow(A),
make entry A –> ε in the table.
If the First(α) contains ε and Follow(A) contains $ as terminal,
then make entry A –> ε in the table for the $.
Example: Consider the Grammar:
E --> TE'
E' --> +TE' | ε
T --> FT'
T' --> *FT' | ε
F --> id | (E)
Step1 :The grammar satisfies all
properties in step 1
Step 2 :calculating first() and
follow().
First Follow
E –> TE’ { id, ( } { $, ) }
E’ –>
+TE’/ε
{ +, ε } { $, ) }
T –> FT’ { id, ( } { +, $, ) }
T’ –> *FT’/ε { *, ε } { +, $, ) }
F –> id/(E) { id, ( } { *, +, $, ) }
Non recursive Predictive Parsing
• Predictive parsing can be performed using a pushdown stack, avoiding
recursive calls.
• Initially the stack holds just the start symbol of the grammar.
• At each step a symbol X is popped from the stack:
• if X is a terminal symbol then it is matched with lookahead and lookahead
is advanced,
• if X is a nonterminal, then using lookahead and a parsing a production is
chosen and its right hand side is pushed onto the stack.
• This process goes on until the stack and the input string become empty. It
is useful to have an end of stack and an end of input symbols. We denote
them both by $.
Example : Consider the grammar G
given by:
S  aAa | BAa | 
A cA | bA | 
B b
a b c $
S
S aAa S BAa
S  
A A A bA A cA
B B b
Stack Remaining input action
$S bcba$ choose S  BAa
$aAB bcba$ choose B b
$aAb bcba$ match b
$aA cba$ choose A  cA
$aAc cba$ match c
$aA ba$ choose A bA
$aAb ba$ match b
$aA a$ choose A  
$a a$ match a
Error Recovery in Predictive Parsing
• error recovery refers to the stack of a table-driven predictive parser,
since it makes explicit the terminals and non-terminals that the
parser hopes to match with the remainder of the input
Panic Mode
• Panic-mode error recovery is based on the idea of skipping symbols
on the the input until a token in a selected set of synchronizing
tokens appears effectiveness depends on the choice of
synchronizing set. The sets should be chosen so that the parser
recovers quickly from errors.
Some heuristics are as follows:
• As a starting point, place all symbols in FOLLOW(A) into the synchronizing set
for nonterminal A. If we skip tokens until an element of FOLLOW(A) is seen
and pop A from the stack, it is likely that parsing can continue.
• It is not enough to use FOLLOW(A) as the synchronizing set for A.
• If we add symbols in FIRST(A) to the synchronizing set for nonterminal A,
then it may be possible to resume parsing according to A if a symbol in
FIRST(A) appears in the input.
• If a nonterminal can generate the empty string, then the production deriving
E can be used as a default.
• If a terminal on top of the stack cannot be matched, a simple idea is to pop
the terminal, issue a message saying that the terminal was inserted, and
continue parsing. In effect, this approach takes the synchronizing set of a
token to consist of all other tokens.
Phrase-level Recovery:
phrase-level error recovery is implemented by filling in the blank entries in the
predictive parsing table with pointers to error routines.
These routines may change, insert, or delete symbols on the input and issue
appropriate error messages.
They may also pop from the stack.
Alteration of stack symbols or the pushing of new symbols onto the stack is
questionable for several reasons.
First, the steps carried out by the parser might then not correspond to the
derivation of any word in the language at all.
Second, we must ensure that there is no possibility of an infinite loop.
Checking that any recovery action eventually results in an input symbol being
consumed (or the stack being shortened if the end of the input has been reached) is
a good way to protect against such loops.
4.7 More Powerful LR Parsers
• In this section, we shall extend the previous LR parsing techniques to use one symbol of lookahead on the input.
There are two different methods:
1. The "canonical-LR" or just "LR" method, which makes full use of the lookahead symbol(s). This method
uses a large set of items, called the LR(1) items.
2. The "lookahead-LR" or "LALR" method, which is based on the LR(0) sets of items, and has many fewer
states than typical parsers based on the LR(1) items. By carefully introducing lookaheads into the LR(0)
items, we can handle many more grammars with the LALR method than with the SLR method, and build
parsing tables that are no bigger than the
SLR tables. LALR is the method of choice in most situations.
• After introducing both these methods, we conclude with a discussion of how to compact LR parsing tables for
environments with limited memory.
• 4.7.1 Canonical LR(1) Items
4.7.2 Constructing LR(1) Sets of Items
Assignment
• 4.9 Parser Generators
• Bottom Up Parser

More Related Content

Similar to chapter4 end.pptx

Principles of Compiler Design - Introduction
Principles of Compiler Design - IntroductionPrinciples of Compiler Design - Introduction
Principles of Compiler Design - Introduction
sheelarajasekar205
 
Compiler Design.pptx
Compiler Design.pptxCompiler Design.pptx
Compiler Design.pptx
SouvikRoy149
 
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATORPSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
ijistjournal
 
Compiler design important questions
Compiler design   important questionsCompiler design   important questions
Compiler design important questions
akila viji
 
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGESOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
IJCI JOURNAL
 
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGESOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
IJCI JOURNAL
 
Compiler Design Material
Compiler Design MaterialCompiler Design Material
Compiler Design Material
Dr. C.V. Suresh Babu
 
3.2
3.23.2
Error Detection & Recovery.pptx
Error Detection & Recovery.pptxError Detection & Recovery.pptx
Error Detection & Recovery.pptx
MohibKhan79
 
11700220036.pdf
11700220036.pdf11700220036.pdf
11700220036.pdf
SouvikRoy149
 
12000121056_Priyankamanna-3.pdf
12000121056_Priyankamanna-3.pdf12000121056_Priyankamanna-3.pdf
12000121056_Priyankamanna-3.pdf
PriyankaManna20
 
Compier Design_Unit I.ppt
Compier Design_Unit I.pptCompier Design_Unit I.ppt
Compier Design_Unit I.ppt
sivaganesh293
 
Compier Design_Unit I.ppt
Compier Design_Unit I.pptCompier Design_Unit I.ppt
Compier Design_Unit I.ppt
sivaganesh293
 
Pcd question bank
Pcd question bank Pcd question bank
Pcd question bank
Sumathi Gnanasekaran
 
Error detection recovery
Error detection recoveryError detection recovery
Error detection recovery
Tech_MX
 
Compiler an overview
Compiler  an overviewCompiler  an overview
Compiler an overview
amudha arul
 
Phases of Compiler.pptx
Phases of Compiler.pptxPhases of Compiler.pptx
Phases of Compiler.pptx
ssuser3b4934
 
SS UII Lecture 1
SS UII Lecture 1SS UII Lecture 1
SS UII Lecture 1
Avinash Kapse
 
The role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designThe role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler design
Sadia Akter
 
Phases of Compiler.pptx
Phases of Compiler.pptxPhases of Compiler.pptx
Phases of Compiler.pptx
SuchandaBanerjee6
 

Similar to chapter4 end.pptx (20)

Principles of Compiler Design - Introduction
Principles of Compiler Design - IntroductionPrinciples of Compiler Design - Introduction
Principles of Compiler Design - Introduction
 
Compiler Design.pptx
Compiler Design.pptxCompiler Design.pptx
Compiler Design.pptx
 
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATORPSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
 
Compiler design important questions
Compiler design   important questionsCompiler design   important questions
Compiler design important questions
 
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGESOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
 
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGESOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
 
Compiler Design Material
Compiler Design MaterialCompiler Design Material
Compiler Design Material
 
3.2
3.23.2
3.2
 
Error Detection & Recovery.pptx
Error Detection & Recovery.pptxError Detection & Recovery.pptx
Error Detection & Recovery.pptx
 
11700220036.pdf
11700220036.pdf11700220036.pdf
11700220036.pdf
 
12000121056_Priyankamanna-3.pdf
12000121056_Priyankamanna-3.pdf12000121056_Priyankamanna-3.pdf
12000121056_Priyankamanna-3.pdf
 
Compier Design_Unit I.ppt
Compier Design_Unit I.pptCompier Design_Unit I.ppt
Compier Design_Unit I.ppt
 
Compier Design_Unit I.ppt
Compier Design_Unit I.pptCompier Design_Unit I.ppt
Compier Design_Unit I.ppt
 
Pcd question bank
Pcd question bank Pcd question bank
Pcd question bank
 
Error detection recovery
Error detection recoveryError detection recovery
Error detection recovery
 
Compiler an overview
Compiler  an overviewCompiler  an overview
Compiler an overview
 
Phases of Compiler.pptx
Phases of Compiler.pptxPhases of Compiler.pptx
Phases of Compiler.pptx
 
SS UII Lecture 1
SS UII Lecture 1SS UII Lecture 1
SS UII Lecture 1
 
The role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designThe role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler design
 
Phases of Compiler.pptx
Phases of Compiler.pptxPhases of Compiler.pptx
Phases of Compiler.pptx
 

Recently uploaded

Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
anilsa9823
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
manji sharman06
 
MongoDB to ScyllaDB: Technical Comparison and the Path to Success
MongoDB to ScyllaDB: Technical Comparison and the Path to SuccessMongoDB to ScyllaDB: Technical Comparison and the Path to Success
MongoDB to ScyllaDB: Technical Comparison and the Path to Success
ScyllaDB
 
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
ThousandEyes
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
UiPathCommunity
 
Cyber Recovery Wargame
Cyber Recovery WargameCyber Recovery Wargame
Cyber Recovery Wargame
Databarracks
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
Cynthia Thomas
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDCScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB
 
Fuxnet [EN] .pdf
Fuxnet [EN]                                   .pdfFuxnet [EN]                                   .pdf
Fuxnet [EN] .pdf
Overkill Security
 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
ScyllaDB
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
dipikamodels1
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
Mydbops
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
Tobias Schneck
 

Recently uploaded (20)

Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
 
MongoDB to ScyllaDB: Technical Comparison and the Path to Success
MongoDB to ScyllaDB: Technical Comparison and the Path to SuccessMongoDB to ScyllaDB: Technical Comparison and the Path to Success
MongoDB to ScyllaDB: Technical Comparison and the Path to Success
 
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
 
Cyber Recovery Wargame
Cyber Recovery WargameCyber Recovery Wargame
Cyber Recovery Wargame
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDCScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDC
 
Fuxnet [EN] .pdf
Fuxnet [EN]                                   .pdfFuxnet [EN]                                   .pdf
Fuxnet [EN] .pdf
 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
 

chapter4 end.pptx

  • 2. Introduction • The syntax of programming language constructs can be specified by context-free grammars or BNF (Backus-Naur Form) notation, Grammars offer significant benefits for both language designers and compiler writers. • A grammar gives a precise, yet easy-to-understand, syntactic specification of a programming language. • From certain classes of grammars, we can construct automatically an efficient parser that determines the syntactic structure of a source program. • The structure imparted to a language by a properly designed grammar is useful for translating source programs into correct object code and for detecting errors. • A grammar allows a language to be evolved or developed iteratively, by adding new constructs to perform new tasks.
  • 3. 4.1.1 The Role of the Parser • In our compiler model, the parser obtains a string of tokens from the lexical analyzer, as shown in Fig. 4.1, and verifies that the string of token names can be generated by the grammar for the source language.
  • 4. 4.1.1 The Role of the Parser • There are three general types of parsers for grammars: universal, top-down, and bottom-up. • Universal parsing methods such as the Cocke-Younger-Kasami algorithm and Earley's algorithm can parse any grammar. These general methods are, however, too inefficient to use in production compilers. • The methods commonly used in compilers can be classified as being either top- down or bottom-up. As implied by their names, top-down methods build parse trees from the top (root) to the bottom (leaves), while bottom-up methods start from the leaves and work their way up to the root. In either case, the input to the parser is scanned from left to right, one symbol at a time. • Parsers implemented by hand often use LL grammars. Parsers for the larger class of LR grammars are usually constructed using automated tools.
  • 5. 4.1.2 Representative Grammars • Constructs that begin with keywords like while or int, are relatively easy to parse, because the keyword guides the choice of the grammar production that must be applied to match the input. We therefore concentrate on expressions, which present more of challenge, because of the associativity and precedence of operators. • Associativity and precedence are captured in the following grammar. E represents expressions consisting of terms separated by + signs, T represents terms consisting of factors separated by * signs, and F represents factors that can be either parenthesized expressions or identifiers:
  • 6. 4.1.2 Representative Grammars • Expression grammar (4.1)belongs to the class of LR grammars that are suitable for bottom-up parsing. This grammar can be adapted to handle additional operators and additional levels of precedence. However, it cannot be used for top-down parsing because it is left recursive. The following non-left-recursive variant of the expression grammar (4.1) will be used for top-down parsing:
  • 7. 4.1.3 Syntax Error Handling The remainder of this section considers the nature of syntactic errors and general strategies for error recovery. Two of these strategies, called panic-mode and phrase- level recovery. • Most programming language specifications do not describe how a compiler should respond to errors; error handling is left to the compiler designer. Planning the error handling right from the start can both simplify the structure of a compiler and improve its handling of errors. • Common programming errors can occur at many different levels. • Lexical errors include misspellings of identifiers, keywords, or operators e.g., the use of an identifier elipsesize instead of e l l i p s e s i z e - and missing quotes around text intended as a string.
  • 8. 4.1.3 Syntax Error Handling • Syntactic errors include misplaced semicolons or extra or missing braces; that is, '((" or ")." As another example, in C or Java, the appearance of a case statement without an enclosing switch is a syntactic error (however,this situation is usually allowed by the parser and caught later in the processing, as the compiler attempts to generate code). • Semantic errors include type mismatches between operators and operands. An example is a return statement in a Java method with result type void. • Logical errors can be anything from incorrect reasoning on the part of the programmer to the use in a C program of the assignment operator = instead of the comparison operator ==. The program containing = may be well formed; however, it may not reflect the programmer's intent.
  • 9. 4.1.3 Syntax Error Handling • The precision of parsing methods allows syntactic errors to be detected very efficiently. Several parsing methods, such as the LL and LR methods, detect an error as soon as possible; that is, when the stream of tokens from the lexical analyzer cannot be parsed further according to the grammar for the language. • Another reason for emphasizing error recovery during parsing is that many errors appear syntactic, whatever their cause, and are exposed when parsing cannot continue. A few semantic errors, such as type mismatches, can also be detected efficiently; however, accurate detection of semantic and logical errors at compile time is in general a difficult task.
  • 10. 4.1.3 Syntax Error Handling The error handler in a parser has goals that are simple to state but challenging to realize: 1. Report the presence of errors clearly and accurately. 2. Recover from each error quickly enough to detect subsequent errors. 3. Add minimal overhead to the processing of correct programs. Fortunately, common errors are simple ones, and a relatively straightforward error- handling mechanism often suffices. How should an error handler report the presence of an error? At the very least, it must report the place in the source program where an error is detected, because there is a good chance that the actual error occurred within the previous few tokens. A common strategy is to print the offending line with a pointer to the position at which an error is detected.
  • 11. 4.1.4 Error-Recovery Strategies • Once an error is detected, how should the parser recover? Although no strategy has proven itself universally acceptable, a few methods have broad applicability. The simplest approach is for the parser to quit with an informative error message when it detects the first error. • The balance of this section is devoted to the following recovery strategies: panic-mode, phrase-level, error-productions, and global-correction. 1. Panic-Mode Recovery - the parser discards input symbols one at a time until one of a designated set of synchronizing tokens (i.e, delimiters: ; , }) is found. - often skips a considerable amount of input without checking it for additional errors, - advantage of simplicity - guaranteed not to go into an infinite loop.
  • 12. 4.1.4 Error-Recovery Strategies 2. Phrase-Level Recovery - a parser may perform local correction on the remaining input - it may replace a prefix of the remaining input by some string that allows the parser to continue. - A typical local correction is to replace a comma by a semicolon, delete an extraneous semicolon, or insert a missing semicolon. - The choice of the local correction is left to the compiler designer. - used in several error-repairing compilers, as it can correct any input string. - Its major drawback is the difficulty it has in coping with situations in which the actual error has occurred before the point of detection. 3. Error Productions - Anticipating common errors that might be encountered - A parser constructed from a grammar augmented by these error productions detects the anticipated errors when an error production is used during parsing. - The parser can then generate appropriate error diagnostics about the erroneous construct that has been recognized in the input.
  • 13. 4.1.4 Error-Recovery Strategies 4. Global Correction - There are algorithms for choosing a minimal sequence of changes to obtain a globally least-cost correction. - Given an incorrect input string x and grammar G, these algorithms will find a parse tree for a related string y - the number of insertions, deletions, and changes of tokens required to transform x into y is as small as possible. - too costly to implement in terms of time and space, so these techniques are currently only of theoretical interest.
  • 14. 4.2 Context-Free Grammars • Grammars were introduced in Section 2.2 to systematically describe the syntax of programming language constructs like expressions and statements. Using a syntactic variable stmt to denote statements and variable expr to denote expressions, the production • specifies the structure of this form of conditional statement. Other productions then define precisely what an expr is and what else a stmt can be.
  • 15. 4.2.1 The Formal Definition of a Context-Free Grammar From Section 2.2, a context-free grammar (grammar for short) consists of terminals, nonterminals, a start symbol, and productions. 1. Terminals are the basic symbols from which strings are formed. The term "token name" is a synonym for “terminal" and frequently we will use the word "token" for terminal when it is clear that we are talking about just the token name. We assume that the terminals are the first components of the tokens output by the lexical analyzer. In (4.4), the terminals are the keywords if and else and the symbols "(" and ") ." 2. Nonterminals are syntactic variables that denote sets of strings. In (4.4), stmt and expr are nonterminals. The sets of strings denoted by nonterminals help define the language generated by the grammar. Nonterminals impose a hierarchical structure on the language that is key to syntax analysis and translation.
  • 16. 3. In a grammar, one nonterminal is distinguished as the start symbol, and the set of strings it denotes is the language generated by the grammar. Conventionally, the productions for the start symbol are listed first. 4. The productions of a grammar specify the manner in which the terminals and nonterminals can be combined to form strings. Each production consists of: (a) A nonterminal called the head or left side of the production; this production defines some of the strings denoted by the head. (b) The symbol has been used in place of the arrow. (c) A body or right side consisting of zero or more terminals and nonterminals. The components of the body describe one way in which strings of the nonterminal at the head can be constructed.
  • 17. 4.2.1 The Formal Definition of a Context-Free Grammar
  • 18. 4.2.2 Notational Conventions • To avoid always having to state that "these are the terminals," "these are the nontermiaals," and so on, the following notational conventions for grammars will be used throughout the remainder of this book. 1. These symbols are terminals: (a) Lowercase letters early in the alphabet, such as a, b, c. (b) Operator symbols such as +, *, and so on. (c) Punctuation symbols such as parentheses, comma, and so on. (d) The digits 0,1,. .. ,9. (e) Boldface strings such as id or if, each of which represents a single terminal symbol. 2. These symbols are nonterminals: (a) Uppercase letters early in the alphabet, such as A, B, C. (b) The letter S, which, when it appears, is usually the start symbol. (c) Lowercase, italic names such as expr or stmt. (d) When discussing programming constructs, uppercase letters may be used to represent nonterminals for the constructs. For example, nonterminals for expressions, terms, and factors are often represented by E, T, and F, respectively
  • 19. 4.2.2 Notational Conventions • 3. Uppercase letters late in the alphabet, such as X, Y, Z represent grammar symbols; that is, either nonterminals or terminals. • 4. Lowercase letters late in the alphabet, chiefly u, v, ...,z , represent (possibly empty) strings of terminals. • 5. Lowercase Greek letters, a, ,O, y for example, represent (possibly empty) strings of grammar symbols. Thus, a generic production can be written as A + a, where A is the head and a the body. • 6. A set of productions A -+al, A + a2,... , A -+ a k with a common head A (call them A-productions), may be written A + a1 / a s I .. I ak. Call a l , a2,. .. ,a k the alternatives for A. • 7. Unless stated otherwise, the head of the first production is the start symbol.
  • 21. 4.2.3 Derivations • The construction of a parse tree can be made precise by taking a derivational view, in which productions are treated as rewriting rules.
  • 22. 4.2.4 Parse Trees and Derivations • A parse tree is a graphical representation of a derivation that filters out the order in which productions are applied to replace nonterminals. • For example, the parse tree for -(id + id) in Fig. 4.3, results from the derivation (4.8) as well as derivation (4.9). The leaves of a parse tree are labeled by nonterminals or terminals and, read from left to right, constitute a sentential form, called the yield or frontier of the tree.
  • 23.
  • 24. 4.2.5 Ambiguity • From Section 2.2.4, a grammar that produces more than one parse tree for some sentence is said to be ambiguous. Put another way, an ambiguous grammar is one that produces more than one leftmost derivation or more than one rightmost derivation for the same sentence.
  • 25. 4.2.5 Ambiguity • For most parsers, it is desirable that the grammar be made unambiguous, for if it is not, we cannot uniquely determine which parse tree to select for a sentence. In other cases, it is convenient to use carefully chosen ambiguous grammars, together with disambiguating rules that "throw away" undesirable parse trees, leaving only one tree for each sentence.
  • 26. 4.2.7 Context-Free Grammars Versus Regular Expressions
  • 27. 4.2.7 Context-Free Grammars Versus Regular Expressions • We can construct mechanically a grammar to recognize the same language as a nondeterministic finite automaton (NFA). The grammar above was constructed from the NFA in Fig. 3.24 using the following construction:
  • 28.
  • 29. 4.2.7 Context-Free Grammars Versus Regular Expressions • Colloquially, we say that "finite automata cannot count," meaning that a finite automaton cannot accept a language like {a^n b^n I n > 1) that would require it to keep count of the number of a's before it sees the b's. Likewise, "a grammar can count two items but not three," as we shall see when we consider non- context-free language constructs in Section 4.3.5.
  • 30.
  • 31. 4.3.1 Lexical Versus Syntactic Analysis • Everything that can be described by a regular expression can also be described by a grammar. We may therefore reasonably ask: "Why use regular expressions to define the lexical syntax of a language?" There are several reasons. 1. Separating the syntactic structure of a language into lexical and nonlexical parts provides a convenient way of modularizing the front end of a compiler into two manageable-sized components. 2. The lexical rules of a language are frequently quite simple, and to describe them we do not need a notation as powerful as grammars. 3. Regular expressions generally provide a more concise and easier-to-understand notation for tokens than grammars. 4. More efficient lexical analyzers can be constructed automatically from regular expressions than from arbitrary grammars. • There are no firm guidelines as to what to put into the lexical rules, as opposed to the syntactic rules. Regular expressions are most useful for describing the structure of constructs such as identifiers, constants, keywords, and white space. Grammars, on the other hand, are most useful for describing nested structures such as balanced parentheses, matching begin-end's, corresponding if-
  • 32. 4.3.2 Eliminating Ambiguity • Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity. As an example, we shall eliminate the ambiguity from the following "dangling else" grammar:
  • 33. • In all programming languages with conditional statements of this form, the first parse tree is preferred. The general rule is, "Match each else with the closest unmatched then." This disambiguating rule can theoretically be incorporated directly into a grammar, but in practice it is rarely built into the productions.
  • 34. 4.3.3 Elimination of Left Recursion
  • 35. • The nonterminal A generates the same strings as before but is no longer left recursive. This procedure eliminates all left recursion from the A and A' productions, but it does not eliminate left recursion involving derivations of two or more steps.
  • 38. Top down parsing • Top-down parsing processes the input string provided by a lexical analyzer. • The top-down parsing first creates the root node of the parse tree. And it continues creating its leaf nodes • The process of constructing the parse tree which starts from the root and goes down to the leaf is Top-Down Parsing. • Top-Down Parsers constructs from the Grammar which is free from ambiguity and left recursion. • Top-Down Parsers uses leftmost derivation to construct a parse tree. • It does not allow Grammar With Common Prefixes. • Top-down parsers are also called predictive parsers.
  • 40.
  • 41. Recursive-Descent Parsing • Recursive descent is a top-down parsing technique that constructs the parse tree from the top and the input is read from left to right. It uses procedures for every terminal and non-terminal entity. This parsing technique recursively parses the input to make a parse tree, which may or may not require back-tracking. • A form of recursive-descent parsing that does not require any back- tracking is known as predictive parsing. Back-tracking • General recursive-descent may require backtracking; that is, it may require repeated scans over the input. • Top- down parsers start from the root node (start symbol) and match the input string against the production rules to replace them (if matched).
  • 43. FIRST and FOLLOW • The construction of both top-down and bottom-up parsers is supported by two functions, FIRST and FOLLOW, associated with a grammar G. • FIRST and FOLLOW are two functions related with grammar that help us fill in the entries of an M-table. Computation of FIRST • FIRST (a) is defined as the collection of terminal symbols which are the first letters of strings. Computation of Follow • Follow (A) is defined as the collection of terminal symbols that occur directly to the right of A.
  • 44. Rules of FOLLOW • If S is the start symbol, FOLLOW (S) ={$} • If there is a production A  aB , then everything in FIRST() except  is in FOLLOW(B). • If there is a production A  aB, or a production A  aB, where FIRST() contains , then everything in FOLLOW (A) is in FOLLOW (B) Rules of first • A symbol c is in FIRST (α) if and only if α ⇒ cβ for some sequence β of grammar symbols. • If Y1 is Non-terminal and If Y1 does not derive to an empty string i.e., If FIRST (Y1) does not contain ε then, FIRST (X) = FIRST (Y1, Y2, Y3) = FIRST(Y1) • If X   is a production, then add  to FIRST(X)
  • 45. • Calculate the first and follow functions for the given grammar. S → (L) / a L → SL’ L’ → ,SL’ / ∈ • First Functions- First(S) = { ( , a } First(L) = First(S) = { ( , a } First(L’) = { , , ∈ } Example • Follow Functions- Follow(S) = { $ } ∪ { First(L’) – ∈ } ∪ Follow(L) ∪ Follow(L’) = { $ , , , ) } Follow(L) = { ) } Follow(L’) = Follow(L) = { ) }
  • 46. ????
  • 47. LL(1) Grammars • context-free grammar G = (VT, VN, S, P) whose parsing table has no multiple entries is said to be LL(1). • The first L stands for scanning the input from left to right, • The second L stands for producing a leftmost derivation, • The 1 stands for using one input symbol of lookahead at each step to make parsing action decision. • A language is said to be LL(1) if it can be generated by a LL(1) grammar. conditions to check first are as follows: 1. The grammar is free from left recursion. 2. The grammar should not be ambiguous.
  • 48. Algorithm to construct LL(1) Parsing Table: Step 1: First check all the important conditions mentioned above and go to step 2. Step 2: Calculate First() and Follow() for all non-terminals. Step 3: For each production A –> α. (A tends to alpha) Find First(α) and for each terminal in First(α), make entry A –> α in the table. If First(α) contains ε as terminal than, find the Follow(A) and for each terminal in Follow(A), make entry A –> ε in the table. If the First(α) contains ε and Follow(A) contains $ as terminal, then make entry A –> ε in the table for the $.
  • 49. Example: Consider the Grammar: E --> TE' E' --> +TE' | ε T --> FT' T' --> *FT' | ε F --> id | (E) Step1 :The grammar satisfies all properties in step 1 Step 2 :calculating first() and follow(). First Follow E –> TE’ { id, ( } { $, ) } E’ –> +TE’/ε { +, ε } { $, ) } T –> FT’ { id, ( } { +, $, ) } T’ –> *FT’/ε { *, ε } { +, $, ) } F –> id/(E) { id, ( } { *, +, $, ) }
  • 50.
  • 51. Non recursive Predictive Parsing • Predictive parsing can be performed using a pushdown stack, avoiding recursive calls. • Initially the stack holds just the start symbol of the grammar. • At each step a symbol X is popped from the stack: • if X is a terminal symbol then it is matched with lookahead and lookahead is advanced, • if X is a nonterminal, then using lookahead and a parsing a production is chosen and its right hand side is pushed onto the stack. • This process goes on until the stack and the input string become empty. It is useful to have an end of stack and an end of input symbols. We denote them both by $.
  • 52.
  • 53. Example : Consider the grammar G given by: S  aAa | BAa |  A cA | bA |  B b a b c $ S S aAa S BAa S   A A A bA A cA B B b
  • 54. Stack Remaining input action $S bcba$ choose S  BAa $aAB bcba$ choose B b $aAb bcba$ match b $aA cba$ choose A  cA $aAc cba$ match c $aA ba$ choose A bA $aAb ba$ match b $aA a$ choose A   $a a$ match a
  • 55. Error Recovery in Predictive Parsing • error recovery refers to the stack of a table-driven predictive parser, since it makes explicit the terminals and non-terminals that the parser hopes to match with the remainder of the input Panic Mode • Panic-mode error recovery is based on the idea of skipping symbols on the the input until a token in a selected set of synchronizing tokens appears effectiveness depends on the choice of synchronizing set. The sets should be chosen so that the parser recovers quickly from errors.
  • 56. Some heuristics are as follows: • As a starting point, place all symbols in FOLLOW(A) into the synchronizing set for nonterminal A. If we skip tokens until an element of FOLLOW(A) is seen and pop A from the stack, it is likely that parsing can continue. • It is not enough to use FOLLOW(A) as the synchronizing set for A. • If we add symbols in FIRST(A) to the synchronizing set for nonterminal A, then it may be possible to resume parsing according to A if a symbol in FIRST(A) appears in the input. • If a nonterminal can generate the empty string, then the production deriving E can be used as a default. • If a terminal on top of the stack cannot be matched, a simple idea is to pop the terminal, issue a message saying that the terminal was inserted, and continue parsing. In effect, this approach takes the synchronizing set of a token to consist of all other tokens.
  • 57. Phrase-level Recovery: phrase-level error recovery is implemented by filling in the blank entries in the predictive parsing table with pointers to error routines. These routines may change, insert, or delete symbols on the input and issue appropriate error messages. They may also pop from the stack. Alteration of stack symbols or the pushing of new symbols onto the stack is questionable for several reasons. First, the steps carried out by the parser might then not correspond to the derivation of any word in the language at all. Second, we must ensure that there is no possibility of an infinite loop. Checking that any recovery action eventually results in an input symbol being consumed (or the stack being shortened if the end of the input has been reached) is a good way to protect against such loops.
  • 58. 4.7 More Powerful LR Parsers • In this section, we shall extend the previous LR parsing techniques to use one symbol of lookahead on the input. There are two different methods: 1. The "canonical-LR" or just "LR" method, which makes full use of the lookahead symbol(s). This method uses a large set of items, called the LR(1) items. 2. The "lookahead-LR" or "LALR" method, which is based on the LR(0) sets of items, and has many fewer states than typical parsers based on the LR(1) items. By carefully introducing lookaheads into the LR(0) items, we can handle many more grammars with the LALR method than with the SLR method, and build parsing tables that are no bigger than the SLR tables. LALR is the method of choice in most situations. • After introducing both these methods, we conclude with a discussion of how to compact LR parsing tables for environments with limited memory. • 4.7.1 Canonical LR(1) Items
  • 59. 4.7.2 Constructing LR(1) Sets of Items
  • 60.
  • 61.
  • 62. Assignment • 4.9 Parser Generators • Bottom Up Parser
  翻译: