尊敬的 微信汇率:1円 ≈ 0.046166 元 支付宝汇率:1円 ≈ 0.046257元 [退出登录]
SlideShare a Scribd company logo
CompilerDesign
Module 2
Module 3
Chapter 1: Introduction
Chapter 2: Lexical Analysis
CompilerDesign
Module 3
Chapter 1: Introduction
Compilers
• “Compilation”
Translation of a program written in a source language into a semantically
equivalent program written in a target language
Compiler
Error messages
Source
Program
Target
Program
Input
Output
Interpreters
• “Interpretation”
• Performing the operations implied by the source program
Interpreter
Source
Program
Input
Output
Error messages
Preprocessors, Compilers, Assemblers,
and Linkers
Preprocessor
Compiler
Assembler
Linker
Skeletal Source Program
Source Program
Target Assembly Program
Relocatable Object Code
Absolute Machine Code
Libraries and
Relocatable Object Files
Phases
of
Compiler lexical analyzer
syntax analyzer
semantic analyzer
source program
tokens
parse trees
parse trees
intermediate code generator
code optimizer
code generator
intermediate code
optimized intermediate code
target program
Lexical Analysis
• The first phase of compiler is called as lexical analysis or
scanning
• The lexical analyser news live stream of characters making of
the source program and groups the character into meaningful
sequence of lexemes.
• The lexical analyser produces has an output of a token in the
form
<token name, attribute value>
• In the token the first component token name is the abstract
sample that is used during the syntax analysis and the
component attribute value points to the entry in the simple
table for this token
• Ex: Source input is
position =initial + rate * 60
• Position : the lexeme will match this as <id, 1>
• = : the lexeme will match this as <=>, because = is an abstract
symbol.
• Initial : the lexeme will match this as <id, 2>
• + : the lexeme will match this as <+>
• Rate: the lexeme will match this as <id, 3>
• * : the lexeme will match this as <*>
• 60: the lexeme will match this as <60>
<id,1> <=> <id, 2> <+> <id, 3> <*> <60> => lexmes
Syntax Analysis
parse tree
<id,1> <=> <id, 2> <+> <id, 3> <*> <60>
Syntax Analysis
<id, 1>
=
*
+
<id, 2>
60
<id, 3>
Semantic Analysis
<id, 1>
=
*
+
<id, 2>
60
<id, 3>
Semantic Analysis
<id, 1>
=
*
+
<id, 2>
60.0
<id, 3> int to float
Symbol Table
• There is a record for each identifier
• The attributes include name, type, location, etc.
Intermediate Code Generation
<id, 1>
=
*
+
<id, 2>
60.0
<id, 3> int to float
Intermediate Code Generation
t1= inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
Code Optimizer
Code Optimizer
t1= inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
t1= id3*60.0
id1 = id2+t1
Code Generation
Code Generation
t1 = id3*60.0
id1 = id2+t1
LDF R2, id3
MULF R2, R2, #60.0
LDF R1, id2
ADDF R1,R1,R2
STF id1, R1
Qualities of a Good Compiler
What qualities would you want in a compiler?
• generates correct code (first and foremost!)
• generates fast code
• conforms to the specifications of the input language
• copes with essentially arbitrary input size, variables, etc.
• compilation time (linearly)proportional to size of source
• good diagnostics
• consistent optimisations
• works well with the debugger
The Evolution of Programing
Language
• The Move to High Level Language
1. A major step towards higher-level languages was nade in the
lat the 1930's with the development of Fortran for scientiic co
for business data processing, and Lisp for symbollc
computation.
2. Classification based on generation
a) First generation : machine languages
b) Second Generation : assembly languages
c) Third generation : high level languages like fortran, Cobol,C,C++
and Java
d) Fourth Generation : languages designed for specific applications
like SQL for database, NOMAD for report generation etc
e) Fifth generation : languages applied to logic and constraints based
like prolog andOPS5
3. Classification of languages uses the term imperative and
declarative
a) imperative in which a program specifies how a computation is to
be done
b) declarative for languages in which a program specifies what
computation is to be done
c) Languages such as C. C++, C#, and Java are imperative languages
d) Functional languages such as ML and Haskell and constrain
languages such as Prolog are often considered to be declarative
languages
4. The term on Neumann language is applied to programming a whose
computational mode is based on the von Neumann computer
architecture , Many of today's languages, such as Fortran and C are von
Neumann languages
5. Based object oriented languages and scripting languages
CompilerDesign
Module 2
Module 3
Chapter 2: Lexical Analysis
Lexical Analysis
The first phase of the compiler, the main task of the lexical analyser is
to read the input characters of the source program,
group them into lexemes
produce the output as a sequence of tokens for each lexeme in the
source program
• As shown in the figure the call suggested by get next token command
causes the lexical analyser to read the characters from its input until it can
identify the next legs and produce for it the next token which in returns to
the parser.
Lexical
analyzer
symbol
table
parser
Source
program
token
getNexttoken()
Some Terminology
• A token is a pair consisting of a token name and optional
attribute value
• A pattern is a description of the form that the lexemes of a
token may take
• A Lexeme is a sequence of characters in the source program that
matches the pattern for a token and is identified by the lexical
analyser has a instance of that token
Lexical analyser is divided into cascade of two processor scanning
and lexical analysis
• Scanning: consists of a multiple processes that do not require
tokenization of the input such as deletion of comments
compaction of consecutive whitespace characters into one
• Lexical analysis: is a proper is the more complex portion which
produces the tokens from the output of the scanner
• Token syntax is
<token name, attribute value>
Ex: E = M * C ** 2
• For the above source program, the tokens are generated as by
using attribute value itself.
• <id, E>
• < assign_op>
• <id, M>
• < multi_op>
• <id, C>
• <exp_op>
• <number, 2>
Or
• <2>
Lexical Errors
• It is hard for a lexical analyzer to tell, without the aid of other
components, that there is a source-code error. For instance, if
the string fi is encountered for the first time in a C program in
the context:
Ex: fi (a=s f(x))
• a lexical analyzer cannot tell whether fi is a misspelling of the
keyword if or an undeclared function identifier.
• Since fi is a valid lexeme for the token id, the lexical analyzer
must return the token id to the parser and let some other
phase of the compiler—probably the parser in this case handle
an error due to transposition of the letters.
• The simplest recovery strategy is "panic mode recovery”.
• We delete successive characters from the remaining input,
until the lexical analyzer can find a well-formed token at the
beginning of what input is left.
• This recover technique may confuse the parser, but in an
interactive computing environment it may be quite adequate.
• Other possible error-recovery actions are:
1. Delete one character from the remaining input.
2. Insert a missing character into the remaining input.
3. Replace a character by another character.
4. Transpose two adjacent characters.
Input Buffering
• For instance we cannot be sure we've seen the end of an
identifier until we see a character that is not a letter or digit, and
therefore is not part of the lexeme for id.
• In C. single-character operators like -, , or < could also be the
beginning of a two-character operator like ->, s, or <#.
• Thus, we shall introduce a two-buffer scheme that handles large
lookhead safely and sentinels that saves the time checking for
the end of buffers.
Buffer Pair
• Specialized buffering techniques have been developed to reduce
the amount of overhead required to process a single input
character.
• An important scheme involves two buffers that are alternately
reloaded, as suggested in Fig.
forward
lexemeBegin
Fig: using a pair of input buffer
E = M * C * * 2 eof
• Two pointers to the input are maintained:
1. Pointer lexemeBegin, marks the beginning of the current lexeme,
whose extent we are attempting to determine.
2. Pointer forward scans ahead until a pattern match is found.
• Once the next lexeme is determined, forward is set to the
character at its right end. Then, after the lexeme is recorded as an
attribute value of a token returned to the parser, lexemebegin is
set to character immediately after the lexeme just found
Sentinels
• We must check, each time we advance forward, that we have not
moved off one of the buffers; if we do, then we must also reload the
other buffer.
• Thus for each character read, we make two tests:
one for the end of the buffer, &
one to determine what character is read.
• We can combine the buffer-end test with the test for the current
character if we extend each buffer to hold a sentinel character at the
end.
• The sentinel is a special character that cannot be part of the source
program, and a natural choice is the character eof.
E = M eof * C * * 2 eof eof
lexemeBegin forward
Specification of Tokens
Strings and Languages
1. {0, 1} is an binary alphabet.
2. A string over alphabet is a finite sequence of symbols drawn from that
alphabet.
3. The empty string is denoted by , the sting length is zero.
4. A language is any countable set of strings over some fixed alphabets
5. The language L containing an empty string is represented by { }
Specification of Tokens
Regular Expression
• Given an alphabet ,
1. is a regular expression and L { } is { }, the language whose member is
an empty string.
2. For each a , a is a regular expression denote {a}, the set containing
the string a.
3. r and s are regular expressions denoting the language L(r ) and L(s ). Then
 ( r ) | ( s ) is a regular expression denoting L( r ) U L( s )
 ( r ) ( s ) is a regular expression denoting L( r ) L ( s )
 ( r )* is a regular expression denoting (L ( r )) *
• Let = {a, b}
• a *
• a +
• a | b
• (a | b) (a | b)
• (a | b)*
• a | a*b
• The Grammar a|b identifies the language {a,b}
• The Grammar (a|b) (a|b) identifies the language {aa,ab,ba,bb}
• The Grammar a* identifies the language consisting string of zero or
more occurrences of a {, a, aa,aaaa,aaaa,aaaaa,}
• The Grammar a+ identifies the language consisting string of one or
more occurrences of a { a,aa,aaaa,aaaa,aaaaa,}
• The Grammar (a|b)* identifies the language {, a,b, aa, ab, ba, bb….}
• The Grammar a|a*b identifies the language {a, ab,aab,aaaab,….}
• Ex 1: To identify letters, digits, underscore
letters A|B|……|Z|a|b|…..|z|_
digit 0|1|2|…|9
id letter (letter|digit)*
• Ex 2: To identify unsigned numbers (integers or floating point)
such as 48618, 516.14,166.2-4e13, 0.15456E9
digit 0|1|2|….|9
digits digit digit*
optionalFraction . digits |
optinalExponent (e|E(+|-| )) digits |
number digits optionalFraction optinalExponent
Ex 1:
letters A|B|……|Z|a|b|…..|z|_
digit 0|1|2|…|9
id letter (letter|digit)
Updated to
• letter [A-Za-z_]
• digit [0-9]
• id letter (letter|digit)*
Ex 2:
digit 0|1|2|….|9
digits digit digit*
optionalFraction . digit |
optinalExponent (e|E(+|-| )digits) |
number digits optionalFraction optinalExponent
Updated to
• digit [0-9]
• digits digit+
• number digits(.digits)? (e|E[+-]? digits )?
• For the given regular expression grammar , describe the language
1. a(a|b)*a
2. (a|b)*a(a|b)(a|b)
3. a*ba*ba*ba*
Recognition of Token
• In our discussion we will make use of Dandling if loop statement.
smt if expr then stmt
| if expr then stmt else stmt
|
expr term relop term
| term
term id
| number
A grammar for branching statement
• The grammar fragment describes a simple form of branching
statements and conditional expressions.
• For relop, we use the comparison operators of languages like
Pascal or SQL where = is "equals" and <> is "not equals," because
it presents an interesting structure of lexemes.
• The terminals of the grammar, which are if, then, else, relop, id,
and number, are the names of tokens as far as the lexical
analyzer is concerned.
• The patterns for these tokens are described using regular
definitions, as shown below.
digit [0-9]
digits digit+
number digits(.digits)? (e[+-]? digits )?
letter [A-Za-z_]
id letter (letter|digit)*
if if
then then
else else
relop < | > | <= | >= | = | < >
Patterns for token for if-else-then statement
For this language, the lexical analyzer will recognize the keywords
if, then, and else, as well as lexemes that match the
patterns for relop, id, and number
Lexemes Token name Attribute Value
if if -
then then -
else else -
any id id Pointer to table entry
any number number Pointer to table entry
< relop LT
<= relop LE
= relop EQ
< > relop NE
> relop GT
>= relop GE
Transition Diagrams
1. We always indicate an accepting state by a double circle.
2. If it is necessary to retract the forward pointer one position,
then we shall additionally place a * near that accepting state.
3. One state is designated the start state, or initial state it is
indicated by an edge labeled "start " entering from nowhere.
0
6
5
4
3
2
1
8
7
return ( relop, LE)
return ( relop, NE)
return ( relop, LT)
return ( relop, EQ)
return ( relop, GE)
return ( relop, GT)
others
others
start < =
=
=
>
>
*
*
Transition diagram for relop
• We begin in state 0, the start state.
• If we see < as the first input symbol, then among the lexemes that
match the pattern for relop, we can be looking at <,<>, or <=.
• We therefore go to state 1, and look at the next character.
• If it is =, then we recognize lexeme <=, enter state 2, and return the
token relop with attribute LE, the symbolic constant representing
this particular comparison operator.
• If in state 1 the next character is >, then instead we have lexeme < >,
and enter state 3 to return an indication that the not-equals
operator has been found.
• On any other character, the lexeme is < and we enter state 4 to
return that information.
• Note, however, that state 4 has a* to indicate that we must retract
the input one position.
• Transition diagram for id’s and keywords
9 11
10 return (getToken(),
installID)
start letter other
letter or digit
*
• Transition diagram for unsigned numbers
1
2
14
1
3
start digit
other
digit
*
15
21
20
19
18
17
16
digit
digit
digit
digit
digit
other
other
*
*
.
E
E + or -
Module 2

More Related Content

What's hot

Parallel programming model, language and compiler in ACA.
Parallel programming model, language and compiler in ACA.Parallel programming model, language and compiler in ACA.
Parallel programming model, language and compiler in ACA.
MITS Gwalior
 
Lecture 2 process
Lecture 2   processLecture 2   process
Lecture 2 process
Kumbirai Junior Muzavazi
 
Pipeline hazard
Pipeline hazardPipeline hazard
Pipeline hazard
AJAL A J
 
PCI BUS
PCI BUSPCI BUS
PCI BUS
mobasith7
 
An Overview of Border Gateway Protocol (BGP)
An Overview of Border Gateway Protocol (BGP)An Overview of Border Gateway Protocol (BGP)
An Overview of Border Gateway Protocol (BGP)
Jasim Alam
 
System Software module 1
System Software module 1System Software module 1
System Software module 1
ShwetaNirmanik
 
Chap7 2 Ecc Intro
Chap7 2 Ecc IntroChap7 2 Ecc Intro
Chap7 2 Ecc Intro
Edora Aziz
 
20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)
Kentaro Ebisawa
 
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SAN19-118
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SAN19-118Reliability, Availability, and Serviceability (RAS) on ARM64 status - SAN19-118
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SAN19-118
Wei Fu
 
Kernels and its types
Kernels and its typesKernels and its types
Kernels and its types
ARAVIND18MCS1004
 
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Linaro
 
Ccna introduction
Ccna introductionCcna introduction
Ccna introduction
Mukesh Gautam
 
Snort
SnortSnort
Snort
Rahul Jain
 
token bus
 token bus token bus
token bus
iamvisakh
 
IBTA Releases Updated Specification for RoCEv2
IBTA Releases Updated Specification for RoCEv2IBTA Releases Updated Specification for RoCEv2
IBTA Releases Updated Specification for RoCEv2
inside-BigData.com
 
Lecture 2 more about parallel computing
Lecture 2   more about parallel computingLecture 2   more about parallel computing
Lecture 2 more about parallel computing
Vajira Thambawita
 
RISC-V Boot Process: One Step at a Time
RISC-V Boot Process: One Step at a TimeRISC-V Boot Process: One Step at a Time
RISC-V Boot Process: One Step at a Time
Atish Patra
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
AMD
 
CPU SCHEDULING AND DEADLOCK
CPU SCHEDULING AND	DEADLOCKCPU SCHEDULING AND	DEADLOCK
CPU SCHEDULING AND DEADLOCK
Vicky Kumar
 
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMIKernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Anne Nicolas
 

What's hot (20)

Parallel programming model, language and compiler in ACA.
Parallel programming model, language and compiler in ACA.Parallel programming model, language and compiler in ACA.
Parallel programming model, language and compiler in ACA.
 
Lecture 2 process
Lecture 2   processLecture 2   process
Lecture 2 process
 
Pipeline hazard
Pipeline hazardPipeline hazard
Pipeline hazard
 
PCI BUS
PCI BUSPCI BUS
PCI BUS
 
An Overview of Border Gateway Protocol (BGP)
An Overview of Border Gateway Protocol (BGP)An Overview of Border Gateway Protocol (BGP)
An Overview of Border Gateway Protocol (BGP)
 
System Software module 1
System Software module 1System Software module 1
System Software module 1
 
Chap7 2 Ecc Intro
Chap7 2 Ecc IntroChap7 2 Ecc Intro
Chap7 2 Ecc Intro
 
20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)
 
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SAN19-118
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SAN19-118Reliability, Availability, and Serviceability (RAS) on ARM64 status - SAN19-118
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SAN19-118
 
Kernels and its types
Kernels and its typesKernels and its types
Kernels and its types
 
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
 
Ccna introduction
Ccna introductionCcna introduction
Ccna introduction
 
Snort
SnortSnort
Snort
 
token bus
 token bus token bus
token bus
 
IBTA Releases Updated Specification for RoCEv2
IBTA Releases Updated Specification for RoCEv2IBTA Releases Updated Specification for RoCEv2
IBTA Releases Updated Specification for RoCEv2
 
Lecture 2 more about parallel computing
Lecture 2   more about parallel computingLecture 2   more about parallel computing
Lecture 2 more about parallel computing
 
RISC-V Boot Process: One Step at a Time
RISC-V Boot Process: One Step at a TimeRISC-V Boot Process: One Step at a Time
RISC-V Boot Process: One Step at a Time
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
 
CPU SCHEDULING AND DEADLOCK
CPU SCHEDULING AND	DEADLOCKCPU SCHEDULING AND	DEADLOCK
CPU SCHEDULING AND DEADLOCK
 
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMIKernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
 

Similar to Module 2

An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
Assignment4
Assignment4Assignment4
Assignment4
Sunita Milind Dol
 
Compiler Construction
Compiler ConstructionCompiler Construction
Compiler Construction
Sarmad Ali
 
Cd ch2 - lexical analysis
Cd   ch2 - lexical analysisCd   ch2 - lexical analysis
Cd ch2 - lexical analysis
mengistu23
 
11700220036.pdf
11700220036.pdf11700220036.pdf
11700220036.pdf
SouvikRoy149
 
The Phases of a Compiler
The Phases of a CompilerThe Phases of a Compiler
The Phases of a Compiler
Radhika Talaviya
 
what is compiler and five phases of compiler
what is compiler and five phases of compilerwhat is compiler and five phases of compiler
what is compiler and five phases of compiler
adilmehmood93
 
Concept of compiler in details
Concept of compiler in detailsConcept of compiler in details
Concept of compiler in details
kazi_aihtesham
 
atc 3rd module compiler and automata.ppt
atc 3rd module compiler and automata.pptatc 3rd module compiler and automata.ppt
atc 3rd module compiler and automata.ppt
ranjan317165
 
Compiler Design Material
Compiler Design MaterialCompiler Design Material
Compiler Design Material
Dr. C.V. Suresh Babu
 
role of lexical parser compiler design1-181124035217.pdf
role of lexical parser compiler design1-181124035217.pdfrole of lexical parser compiler design1-181124035217.pdf
role of lexical parser compiler design1-181124035217.pdf
ranjan317165
 
2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docx
2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docx2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docx
2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docx
venkatapranaykumarGa
 
Dineshmaterial1 091225091539-phpapp02
Dineshmaterial1 091225091539-phpapp02Dineshmaterial1 091225091539-phpapp02
Dineshmaterial1 091225091539-phpapp02
Tirumala Rao
 
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATORPSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
ijistjournal
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
Dr. Jaydeep Patil
 
1.Role lexical Analyzer
1.Role lexical Analyzer1.Role lexical Analyzer
1.Role lexical Analyzer
Radhakrishnan Chinnusamy
 
Chapter 1.pptx
Chapter 1.pptxChapter 1.pptx
Chapter 1.pptx
NesredinTeshome1
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
Anujashejwal
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
Munni28
 
System software module 4 presentation file
System software module 4 presentation fileSystem software module 4 presentation file
System software module 4 presentation file
jithujithin657
 

Similar to Module 2 (20)

An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
Assignment4
Assignment4Assignment4
Assignment4
 
Compiler Construction
Compiler ConstructionCompiler Construction
Compiler Construction
 
Cd ch2 - lexical analysis
Cd   ch2 - lexical analysisCd   ch2 - lexical analysis
Cd ch2 - lexical analysis
 
11700220036.pdf
11700220036.pdf11700220036.pdf
11700220036.pdf
 
The Phases of a Compiler
The Phases of a CompilerThe Phases of a Compiler
The Phases of a Compiler
 
what is compiler and five phases of compiler
what is compiler and five phases of compilerwhat is compiler and five phases of compiler
what is compiler and five phases of compiler
 
Concept of compiler in details
Concept of compiler in detailsConcept of compiler in details
Concept of compiler in details
 
atc 3rd module compiler and automata.ppt
atc 3rd module compiler and automata.pptatc 3rd module compiler and automata.ppt
atc 3rd module compiler and automata.ppt
 
Compiler Design Material
Compiler Design MaterialCompiler Design Material
Compiler Design Material
 
role of lexical parser compiler design1-181124035217.pdf
role of lexical parser compiler design1-181124035217.pdfrole of lexical parser compiler design1-181124035217.pdf
role of lexical parser compiler design1-181124035217.pdf
 
2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docx
2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docx2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docx
2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docx
 
Dineshmaterial1 091225091539-phpapp02
Dineshmaterial1 091225091539-phpapp02Dineshmaterial1 091225091539-phpapp02
Dineshmaterial1 091225091539-phpapp02
 
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATORPSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
1.Role lexical Analyzer
1.Role lexical Analyzer1.Role lexical Analyzer
1.Role lexical Analyzer
 
Chapter 1.pptx
Chapter 1.pptxChapter 1.pptx
Chapter 1.pptx
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
 
System software module 4 presentation file
System software module 4 presentation fileSystem software module 4 presentation file
System software module 4 presentation file
 

Recently uploaded

My Airframe Metallic Design Capability Studies..pdf
My Airframe Metallic Design Capability Studies..pdfMy Airframe Metallic Design Capability Studies..pdf
My Airframe Metallic Design Capability Studies..pdf
Geoffrey Wardle. MSc. MSc. Snr.MAIAA
 
❣Unsatisfied Bhabhi Call Girls Surat 💯Call Us 🔝 7014168258 🔝💃Independent Sura...
❣Unsatisfied Bhabhi Call Girls Surat 💯Call Us 🔝 7014168258 🔝💃Independent Sura...❣Unsatisfied Bhabhi Call Girls Surat 💯Call Us 🔝 7014168258 🔝💃Independent Sura...
❣Unsatisfied Bhabhi Call Girls Surat 💯Call Us 🔝 7014168258 🔝💃Independent Sura...
hotchicksescort
 
🚺ANJALI MEHTA High Profile Call Girls Ahmedabad 💯Call Us 🔝 9352988975 🔝💃Top C...
🚺ANJALI MEHTA High Profile Call Girls Ahmedabad 💯Call Us 🔝 9352988975 🔝💃Top C...🚺ANJALI MEHTA High Profile Call Girls Ahmedabad 💯Call Us 🔝 9352988975 🔝💃Top C...
🚺ANJALI MEHTA High Profile Call Girls Ahmedabad 💯Call Us 🔝 9352988975 🔝💃Top C...
dulbh kashyap
 
CSP_Study - Notes (Paul McNeill) 2017.pdf
CSP_Study - Notes (Paul McNeill) 2017.pdfCSP_Study - Notes (Paul McNeill) 2017.pdf
CSP_Study - Notes (Paul McNeill) 2017.pdf
Ismail Sultan
 
Technological Innovation Management And Entrepreneurship-1.pdf
Technological Innovation Management And Entrepreneurship-1.pdfTechnological Innovation Management And Entrepreneurship-1.pdf
Technological Innovation Management And Entrepreneurship-1.pdf
tanujaharish2
 
FUNDAMENTALS OF MECHANICAL ENGINEERING.pdf
FUNDAMENTALS OF MECHANICAL ENGINEERING.pdfFUNDAMENTALS OF MECHANICAL ENGINEERING.pdf
FUNDAMENTALS OF MECHANICAL ENGINEERING.pdf
EMERSON EDUARDO RODRIGUES
 
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdfSri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Balvir Singh
 
Hot Call Girls In Bangalore ✔ 9079923931 ✔ Hi I Am Divya Vip Call Girl Servic...
Hot Call Girls In Bangalore ✔ 9079923931 ✔ Hi I Am Divya Vip Call Girl Servic...Hot Call Girls In Bangalore ✔ 9079923931 ✔ Hi I Am Divya Vip Call Girl Servic...
Hot Call Girls In Bangalore ✔ 9079923931 ✔ Hi I Am Divya Vip Call Girl Servic...
Banerescorts
 
Online train ticket booking system project.pdf
Online train ticket booking system project.pdfOnline train ticket booking system project.pdf
Online train ticket booking system project.pdf
Kamal Acharya
 
Call Girls Madurai 8824825030 Escort In Madurai service 24X7
Call Girls Madurai 8824825030 Escort In Madurai service 24X7Call Girls Madurai 8824825030 Escort In Madurai service 24X7
Call Girls Madurai 8824825030 Escort In Madurai service 24X7
Poonam Singh
 
MODULE 5 BIOLOGY FOR ENGINEERS TRENDS IN BIO ENGINEERING.pptx
MODULE 5 BIOLOGY FOR ENGINEERS TRENDS IN BIO ENGINEERING.pptxMODULE 5 BIOLOGY FOR ENGINEERS TRENDS IN BIO ENGINEERING.pptx
MODULE 5 BIOLOGY FOR ENGINEERS TRENDS IN BIO ENGINEERING.pptx
NaveenNaveen726446
 
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call GirlCall Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
sapna sharmap11
 
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdfSELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
Pallavi Sharma
 
Butterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdfButterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdf
Lubi Valves
 
The Differences between Schedule 40 PVC Conduit Pipe and Schedule 80 PVC Conduit
The Differences between Schedule 40 PVC Conduit Pipe and Schedule 80 PVC ConduitThe Differences between Schedule 40 PVC Conduit Pipe and Schedule 80 PVC Conduit
The Differences between Schedule 40 PVC Conduit Pipe and Schedule 80 PVC Conduit
Guangdong Ctube Industry Co., Ltd.
 
College Call Girls Kolkata 🔥 7014168258 🔥 Real Fun With Sexual Girl Available...
College Call Girls Kolkata 🔥 7014168258 🔥 Real Fun With Sexual Girl Available...College Call Girls Kolkata 🔥 7014168258 🔥 Real Fun With Sexual Girl Available...
College Call Girls Kolkata 🔥 7014168258 🔥 Real Fun With Sexual Girl Available...
Ak47
 
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
AK47
 
Microsoft Azure AD architecture and features
Microsoft Azure AD architecture and featuresMicrosoft Azure AD architecture and features
Microsoft Azure AD architecture and features
ssuser381403
 
Better Builder Magazine, Issue 49 / Spring 2024
Better Builder Magazine, Issue 49 / Spring 2024Better Builder Magazine, Issue 49 / Spring 2024
Better Builder Magazine, Issue 49 / Spring 2024
Better Builder Magazine
 
一比一原版(psu学位证书)美国匹兹堡州立大学毕业证如何办理
一比一原版(psu学位证书)美国匹兹堡州立大学毕业证如何办理一比一原版(psu学位证书)美国匹兹堡州立大学毕业证如何办理
一比一原版(psu学位证书)美国匹兹堡州立大学毕业证如何办理
nonods
 

Recently uploaded (20)

My Airframe Metallic Design Capability Studies..pdf
My Airframe Metallic Design Capability Studies..pdfMy Airframe Metallic Design Capability Studies..pdf
My Airframe Metallic Design Capability Studies..pdf
 
❣Unsatisfied Bhabhi Call Girls Surat 💯Call Us 🔝 7014168258 🔝💃Independent Sura...
❣Unsatisfied Bhabhi Call Girls Surat 💯Call Us 🔝 7014168258 🔝💃Independent Sura...❣Unsatisfied Bhabhi Call Girls Surat 💯Call Us 🔝 7014168258 🔝💃Independent Sura...
❣Unsatisfied Bhabhi Call Girls Surat 💯Call Us 🔝 7014168258 🔝💃Independent Sura...
 
🚺ANJALI MEHTA High Profile Call Girls Ahmedabad 💯Call Us 🔝 9352988975 🔝💃Top C...
🚺ANJALI MEHTA High Profile Call Girls Ahmedabad 💯Call Us 🔝 9352988975 🔝💃Top C...🚺ANJALI MEHTA High Profile Call Girls Ahmedabad 💯Call Us 🔝 9352988975 🔝💃Top C...
🚺ANJALI MEHTA High Profile Call Girls Ahmedabad 💯Call Us 🔝 9352988975 🔝💃Top C...
 
CSP_Study - Notes (Paul McNeill) 2017.pdf
CSP_Study - Notes (Paul McNeill) 2017.pdfCSP_Study - Notes (Paul McNeill) 2017.pdf
CSP_Study - Notes (Paul McNeill) 2017.pdf
 
Technological Innovation Management And Entrepreneurship-1.pdf
Technological Innovation Management And Entrepreneurship-1.pdfTechnological Innovation Management And Entrepreneurship-1.pdf
Technological Innovation Management And Entrepreneurship-1.pdf
 
FUNDAMENTALS OF MECHANICAL ENGINEERING.pdf
FUNDAMENTALS OF MECHANICAL ENGINEERING.pdfFUNDAMENTALS OF MECHANICAL ENGINEERING.pdf
FUNDAMENTALS OF MECHANICAL ENGINEERING.pdf
 
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdfSri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
 
Hot Call Girls In Bangalore ✔ 9079923931 ✔ Hi I Am Divya Vip Call Girl Servic...
Hot Call Girls In Bangalore ✔ 9079923931 ✔ Hi I Am Divya Vip Call Girl Servic...Hot Call Girls In Bangalore ✔ 9079923931 ✔ Hi I Am Divya Vip Call Girl Servic...
Hot Call Girls In Bangalore ✔ 9079923931 ✔ Hi I Am Divya Vip Call Girl Servic...
 
Online train ticket booking system project.pdf
Online train ticket booking system project.pdfOnline train ticket booking system project.pdf
Online train ticket booking system project.pdf
 
Call Girls Madurai 8824825030 Escort In Madurai service 24X7
Call Girls Madurai 8824825030 Escort In Madurai service 24X7Call Girls Madurai 8824825030 Escort In Madurai service 24X7
Call Girls Madurai 8824825030 Escort In Madurai service 24X7
 
MODULE 5 BIOLOGY FOR ENGINEERS TRENDS IN BIO ENGINEERING.pptx
MODULE 5 BIOLOGY FOR ENGINEERS TRENDS IN BIO ENGINEERING.pptxMODULE 5 BIOLOGY FOR ENGINEERS TRENDS IN BIO ENGINEERING.pptx
MODULE 5 BIOLOGY FOR ENGINEERS TRENDS IN BIO ENGINEERING.pptx
 
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call GirlCall Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
 
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdfSELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
 
Butterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdfButterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdf
 
The Differences between Schedule 40 PVC Conduit Pipe and Schedule 80 PVC Conduit
The Differences between Schedule 40 PVC Conduit Pipe and Schedule 80 PVC ConduitThe Differences between Schedule 40 PVC Conduit Pipe and Schedule 80 PVC Conduit
The Differences between Schedule 40 PVC Conduit Pipe and Schedule 80 PVC Conduit
 
College Call Girls Kolkata 🔥 7014168258 🔥 Real Fun With Sexual Girl Available...
College Call Girls Kolkata 🔥 7014168258 🔥 Real Fun With Sexual Girl Available...College Call Girls Kolkata 🔥 7014168258 🔥 Real Fun With Sexual Girl Available...
College Call Girls Kolkata 🔥 7014168258 🔥 Real Fun With Sexual Girl Available...
 
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
 
Microsoft Azure AD architecture and features
Microsoft Azure AD architecture and featuresMicrosoft Azure AD architecture and features
Microsoft Azure AD architecture and features
 
Better Builder Magazine, Issue 49 / Spring 2024
Better Builder Magazine, Issue 49 / Spring 2024Better Builder Magazine, Issue 49 / Spring 2024
Better Builder Magazine, Issue 49 / Spring 2024
 
一比一原版(psu学位证书)美国匹兹堡州立大学毕业证如何办理
一比一原版(psu学位证书)美国匹兹堡州立大学毕业证如何办理一比一原版(psu学位证书)美国匹兹堡州立大学毕业证如何办理
一比一原版(psu学位证书)美国匹兹堡州立大学毕业证如何办理
 

Module 2

  • 1. CompilerDesign Module 2 Module 3 Chapter 1: Introduction Chapter 2: Lexical Analysis
  • 3. Compilers • “Compilation” Translation of a program written in a source language into a semantically equivalent program written in a target language Compiler Error messages Source Program Target Program Input Output
  • 4. Interpreters • “Interpretation” • Performing the operations implied by the source program Interpreter Source Program Input Output Error messages
  • 5. Preprocessors, Compilers, Assemblers, and Linkers Preprocessor Compiler Assembler Linker Skeletal Source Program Source Program Target Assembly Program Relocatable Object Code Absolute Machine Code Libraries and Relocatable Object Files
  • 6. Phases of Compiler lexical analyzer syntax analyzer semantic analyzer source program tokens parse trees parse trees intermediate code generator code optimizer code generator intermediate code optimized intermediate code target program
  • 7. Lexical Analysis • The first phase of compiler is called as lexical analysis or scanning • The lexical analyser news live stream of characters making of the source program and groups the character into meaningful sequence of lexemes. • The lexical analyser produces has an output of a token in the form <token name, attribute value> • In the token the first component token name is the abstract sample that is used during the syntax analysis and the component attribute value points to the entry in the simple table for this token • Ex: Source input is position =initial + rate * 60
  • 8. • Position : the lexeme will match this as <id, 1> • = : the lexeme will match this as <=>, because = is an abstract symbol. • Initial : the lexeme will match this as <id, 2> • + : the lexeme will match this as <+> • Rate: the lexeme will match this as <id, 3> • * : the lexeme will match this as <*> • 60: the lexeme will match this as <60> <id,1> <=> <id, 2> <+> <id, 3> <*> <60> => lexmes
  • 9. Syntax Analysis parse tree <id,1> <=> <id, 2> <+> <id, 3> <*> <60> Syntax Analysis <id, 1> = * + <id, 2> 60 <id, 3>
  • 10. Semantic Analysis <id, 1> = * + <id, 2> 60 <id, 3> Semantic Analysis <id, 1> = * + <id, 2> 60.0 <id, 3> int to float
  • 11. Symbol Table • There is a record for each identifier • The attributes include name, type, location, etc.
  • 12. Intermediate Code Generation <id, 1> = * + <id, 2> 60.0 <id, 3> int to float Intermediate Code Generation t1= inttofloat(60) t2 = id3 * t1 t3 = id2 + t2 id1 = t3
  • 13. Code Optimizer Code Optimizer t1= inttofloat(60) t2 = id3 * t1 t3 = id2 + t2 id1 = t3 t1= id3*60.0 id1 = id2+t1
  • 14. Code Generation Code Generation t1 = id3*60.0 id1 = id2+t1 LDF R2, id3 MULF R2, R2, #60.0 LDF R1, id2 ADDF R1,R1,R2 STF id1, R1
  • 15.
  • 16. Qualities of a Good Compiler What qualities would you want in a compiler? • generates correct code (first and foremost!) • generates fast code • conforms to the specifications of the input language • copes with essentially arbitrary input size, variables, etc. • compilation time (linearly)proportional to size of source • good diagnostics • consistent optimisations • works well with the debugger
  • 17. The Evolution of Programing Language • The Move to High Level Language 1. A major step towards higher-level languages was nade in the lat the 1930's with the development of Fortran for scientiic co for business data processing, and Lisp for symbollc computation. 2. Classification based on generation a) First generation : machine languages b) Second Generation : assembly languages c) Third generation : high level languages like fortran, Cobol,C,C++ and Java d) Fourth Generation : languages designed for specific applications like SQL for database, NOMAD for report generation etc e) Fifth generation : languages applied to logic and constraints based like prolog andOPS5
  • 18. 3. Classification of languages uses the term imperative and declarative a) imperative in which a program specifies how a computation is to be done b) declarative for languages in which a program specifies what computation is to be done c) Languages such as C. C++, C#, and Java are imperative languages d) Functional languages such as ML and Haskell and constrain languages such as Prolog are often considered to be declarative languages 4. The term on Neumann language is applied to programming a whose computational mode is based on the von Neumann computer architecture , Many of today's languages, such as Fortran and C are von Neumann languages 5. Based object oriented languages and scripting languages
  • 20. Lexical Analysis The first phase of the compiler, the main task of the lexical analyser is to read the input characters of the source program, group them into lexemes produce the output as a sequence of tokens for each lexeme in the source program • As shown in the figure the call suggested by get next token command causes the lexical analyser to read the characters from its input until it can identify the next legs and produce for it the next token which in returns to the parser. Lexical analyzer symbol table parser Source program token getNexttoken()
  • 21. Some Terminology • A token is a pair consisting of a token name and optional attribute value • A pattern is a description of the form that the lexemes of a token may take • A Lexeme is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyser has a instance of that token
  • 22. Lexical analyser is divided into cascade of two processor scanning and lexical analysis • Scanning: consists of a multiple processes that do not require tokenization of the input such as deletion of comments compaction of consecutive whitespace characters into one • Lexical analysis: is a proper is the more complex portion which produces the tokens from the output of the scanner • Token syntax is <token name, attribute value>
  • 23. Ex: E = M * C ** 2 • For the above source program, the tokens are generated as by using attribute value itself. • <id, E> • < assign_op> • <id, M> • < multi_op> • <id, C> • <exp_op> • <number, 2> Or • <2>
  • 24. Lexical Errors • It is hard for a lexical analyzer to tell, without the aid of other components, that there is a source-code error. For instance, if the string fi is encountered for the first time in a C program in the context: Ex: fi (a=s f(x)) • a lexical analyzer cannot tell whether fi is a misspelling of the keyword if or an undeclared function identifier. • Since fi is a valid lexeme for the token id, the lexical analyzer must return the token id to the parser and let some other phase of the compiler—probably the parser in this case handle an error due to transposition of the letters.
  • 25. • The simplest recovery strategy is "panic mode recovery”. • We delete successive characters from the remaining input, until the lexical analyzer can find a well-formed token at the beginning of what input is left. • This recover technique may confuse the parser, but in an interactive computing environment it may be quite adequate. • Other possible error-recovery actions are: 1. Delete one character from the remaining input. 2. Insert a missing character into the remaining input. 3. Replace a character by another character. 4. Transpose two adjacent characters.
  • 26. Input Buffering • For instance we cannot be sure we've seen the end of an identifier until we see a character that is not a letter or digit, and therefore is not part of the lexeme for id. • In C. single-character operators like -, , or < could also be the beginning of a two-character operator like ->, s, or <#. • Thus, we shall introduce a two-buffer scheme that handles large lookhead safely and sentinels that saves the time checking for the end of buffers.
  • 27. Buffer Pair • Specialized buffering techniques have been developed to reduce the amount of overhead required to process a single input character. • An important scheme involves two buffers that are alternately reloaded, as suggested in Fig. forward lexemeBegin Fig: using a pair of input buffer E = M * C * * 2 eof
  • 28. • Two pointers to the input are maintained: 1. Pointer lexemeBegin, marks the beginning of the current lexeme, whose extent we are attempting to determine. 2. Pointer forward scans ahead until a pattern match is found. • Once the next lexeme is determined, forward is set to the character at its right end. Then, after the lexeme is recorded as an attribute value of a token returned to the parser, lexemebegin is set to character immediately after the lexeme just found
  • 29. Sentinels • We must check, each time we advance forward, that we have not moved off one of the buffers; if we do, then we must also reload the other buffer. • Thus for each character read, we make two tests: one for the end of the buffer, & one to determine what character is read. • We can combine the buffer-end test with the test for the current character if we extend each buffer to hold a sentinel character at the end. • The sentinel is a special character that cannot be part of the source program, and a natural choice is the character eof. E = M eof * C * * 2 eof eof lexemeBegin forward
  • 30. Specification of Tokens Strings and Languages 1. {0, 1} is an binary alphabet. 2. A string over alphabet is a finite sequence of symbols drawn from that alphabet. 3. The empty string is denoted by , the sting length is zero. 4. A language is any countable set of strings over some fixed alphabets 5. The language L containing an empty string is represented by { }
  • 31. Specification of Tokens Regular Expression • Given an alphabet , 1. is a regular expression and L { } is { }, the language whose member is an empty string. 2. For each a , a is a regular expression denote {a}, the set containing the string a. 3. r and s are regular expressions denoting the language L(r ) and L(s ). Then  ( r ) | ( s ) is a regular expression denoting L( r ) U L( s )  ( r ) ( s ) is a regular expression denoting L( r ) L ( s )  ( r )* is a regular expression denoting (L ( r )) *
  • 32. • Let = {a, b} • a * • a + • a | b • (a | b) (a | b) • (a | b)* • a | a*b • The Grammar a|b identifies the language {a,b} • The Grammar (a|b) (a|b) identifies the language {aa,ab,ba,bb} • The Grammar a* identifies the language consisting string of zero or more occurrences of a {, a, aa,aaaa,aaaa,aaaaa,} • The Grammar a+ identifies the language consisting string of one or more occurrences of a { a,aa,aaaa,aaaa,aaaaa,} • The Grammar (a|b)* identifies the language {, a,b, aa, ab, ba, bb….} • The Grammar a|a*b identifies the language {a, ab,aab,aaaab,….}
  • 33. • Ex 1: To identify letters, digits, underscore letters A|B|……|Z|a|b|…..|z|_ digit 0|1|2|…|9 id letter (letter|digit)*
  • 34. • Ex 2: To identify unsigned numbers (integers or floating point) such as 48618, 516.14,166.2-4e13, 0.15456E9 digit 0|1|2|….|9 digits digit digit* optionalFraction . digits | optinalExponent (e|E(+|-| )) digits | number digits optionalFraction optinalExponent
  • 35. Ex 1: letters A|B|……|Z|a|b|…..|z|_ digit 0|1|2|…|9 id letter (letter|digit) Updated to • letter [A-Za-z_] • digit [0-9] • id letter (letter|digit)*
  • 36. Ex 2: digit 0|1|2|….|9 digits digit digit* optionalFraction . digit | optinalExponent (e|E(+|-| )digits) | number digits optionalFraction optinalExponent Updated to • digit [0-9] • digits digit+ • number digits(.digits)? (e|E[+-]? digits )?
  • 37.
  • 38. • For the given regular expression grammar , describe the language 1. a(a|b)*a 2. (a|b)*a(a|b)(a|b) 3. a*ba*ba*ba*
  • 39. Recognition of Token • In our discussion we will make use of Dandling if loop statement. smt if expr then stmt | if expr then stmt else stmt | expr term relop term | term term id | number A grammar for branching statement
  • 40. • The grammar fragment describes a simple form of branching statements and conditional expressions. • For relop, we use the comparison operators of languages like Pascal or SQL where = is "equals" and <> is "not equals," because it presents an interesting structure of lexemes. • The terminals of the grammar, which are if, then, else, relop, id, and number, are the names of tokens as far as the lexical analyzer is concerned. • The patterns for these tokens are described using regular definitions, as shown below.
  • 41. digit [0-9] digits digit+ number digits(.digits)? (e[+-]? digits )? letter [A-Za-z_] id letter (letter|digit)* if if then then else else relop < | > | <= | >= | = | < > Patterns for token for if-else-then statement For this language, the lexical analyzer will recognize the keywords if, then, and else, as well as lexemes that match the patterns for relop, id, and number
  • 42. Lexemes Token name Attribute Value if if - then then - else else - any id id Pointer to table entry any number number Pointer to table entry < relop LT <= relop LE = relop EQ < > relop NE > relop GT >= relop GE
  • 43. Transition Diagrams 1. We always indicate an accepting state by a double circle. 2. If it is necessary to retract the forward pointer one position, then we shall additionally place a * near that accepting state. 3. One state is designated the start state, or initial state it is indicated by an edge labeled "start " entering from nowhere.
  • 44. 0 6 5 4 3 2 1 8 7 return ( relop, LE) return ( relop, NE) return ( relop, LT) return ( relop, EQ) return ( relop, GE) return ( relop, GT) others others start < = = = > > * * Transition diagram for relop
  • 45. • We begin in state 0, the start state. • If we see < as the first input symbol, then among the lexemes that match the pattern for relop, we can be looking at <,<>, or <=. • We therefore go to state 1, and look at the next character. • If it is =, then we recognize lexeme <=, enter state 2, and return the token relop with attribute LE, the symbolic constant representing this particular comparison operator. • If in state 1 the next character is >, then instead we have lexeme < >, and enter state 3 to return an indication that the not-equals operator has been found. • On any other character, the lexeme is < and we enter state 4 to return that information. • Note, however, that state 4 has a* to indicate that we must retract the input one position.
  • 46. • Transition diagram for id’s and keywords 9 11 10 return (getToken(), installID) start letter other letter or digit *
  • 47. • Transition diagram for unsigned numbers 1 2 14 1 3 start digit other digit * 15 21 20 19 18 17 16 digit digit digit digit digit other other * * . E E + or -
  翻译: