尊敬的 微信汇率:1円 ≈ 0.046166 元 支付宝汇率:1円 ≈ 0.046257元 [退出登录]
SlideShare a Scribd company logo
Context Free Grammar
1
Course Name: Compiler Design
Course Code: CSE331
Level:3, Term:3
Department of Computer Science and Engineering
Daffodil International University
Syntax Analysis
Parsing or syntactic analysis is the process of analyzing a string of symbols,
either in natural language or in computer languages, conforming to the
rules of a formal grammar.
The second phase of compiler and the sometimes also being called as
Hierarchical Analysis.
2
Why Syntax Analysis?
• We have seen that a lexical analyzer can identify tokens with the help of
regular expressions and pattern rules.
• But a lexical analyzer cannot check the syntax of a given sentence due to the
limitations of the regular expressions.
• Regular expressions cannot check balancing tokens, such as parenthesis.
• Therefore, this phase uses context-free grammar (CFG), which is recognized by
push-down automata.
3
Syntax Analysis
4
Detail Diagram
5
• A syntax analyzer or parser takes the input from a lexical analyzer in the
form of token streams.
• The parser analyzes the source code (token stream) against the
production rules to detect any errors in the code. The output of this
phase is a parse tree.
• This way, the parser accomplishes two tasks, i.e., parsing the code,
looking for errors and generating a parse tree as the output of the
phase.
6
CFG
• CFG, on the other hand, is a superset of Regular Grammar, as depicted below:
A context-free grammar (CFG) consisting of a finite set of grammar
rules is a quadruple (N, T, P, S)
where,
• N is a set of non-terminal symbols.
• T is a set of terminals where N ∩ T = NULL.
• P is a set of rules, P: N → (N ∪ T)*, i.e., the left-hand side of the
production rule P does have any right context or left context.
• S is the start symbol.
7
• A context-free grammar has four components: G = ( V, Σ, P, S )
A set of non-terminals (V). Non-terminals are syntactic variables that denote sets of strings.
The non-terminals define sets of strings that help define the language generated by the
grammar.
A set of tokens, known as terminal symbols (Σ). Terminals are the basic symbols from which
strings are formed.
A set of productions (P). The productions of a grammar specify the manner in which the
terminals and non-terminals can be combined to form strings. Each production consists of
a non-terminal called the left side of the production, an arrow, and a sequence of tokens
and/or on- terminals, called the right side of the production.
One of the non-terminals is designated as the start symbol (S); from where the production
begins.
CFG: Context Free Grammar
8
Example of CFG:
• G = ( V, Σ, P, S ) Where:
• V = { Q, Z, N }
• Σ = { 0, 1 }
• P = { Q → Z | Q → N | Q → ℇ | Z → 0Q0 |N → 1Q1 }
• S = { Q }
This grammar describes palindrome language,
such as: 1001, 11100111, 00100, 1010101, 11111, etc.
9
The Role of the Lexical Analyzer
• Roles:
• Primary role: Scan a source program (a string) and break it up into small, meaningful units,
called tokens.
• Example: position := initial + rate * 60;
• Transform into meaningful units: identifiers, constants, operators, and punctuation.
• Other roles:
• Removal of comments
• Case conversion
• Removal of white spaces
• Interpretation of compiler directives or pragmas.
• Communication with symbol table: Store information regarding an identifier in the symbol
table. Not advisable in cases where scopes can be nested.
• Preparation of output listing: Keep track of source program, line numbers, and
correspondences between error messages and line numbers.
• Why Lexical Analyzer is separate from parser?
• Simpler design of both LA and parser.
• More efficient & portable compiler.
10
Tokens
A lexical token is a sequence of characters that can be treated as a unit in the grammar of the
programming languages.
Example of tokens:
Type token (id, number, real, . . . )
Punctuation tokens (IF, void, return, . . . )
Alphabetic tokens (keywords)
Keywords; Examples-for, while, if etc.
Identifier; Examples-Variable name, function name etc.
Operators; Examples '+', '++', '-' etc.
Separators; Examples ',' ';' etc.
Example of Non-Tokens:
Comments, preprocessor directive, macros, blanks, tabs, newline etc.
Lexeme: The sequence of characters matched by a pattern to form the corresponding token or a
sequence of input characters that comprises a single token is called a lexeme.
eg- “float”, “abs_zero_Kelvin”, “=”, “-”, “273”, “;” .
11
Tokens
Operators = + − > ( { := == <>
Keywords if while for int double
Numeric literals 43 6.035 -3.6e10 0x13F3A
Character literals ‘a’ ‘~’ ‘’’
String literals “3.142” “aBcDe” “”
White space space(‘ ’) tab(‘t’) newline(‘n’)
Comments /*this is not a token*/
• Examples of non-tokens
• Examples of Tokens
12
• Type of tokens in C++:
• Constants:
• char constants: ‘a’
• string constants: “i=%d”
• int constants: 50
• float point constants
• Identifiers: i, j, counter, ……
• Reserved words: main, int, for, …
• Operators: +, =, ++, /, …
• Misc. symbols: (, ), {, }, …
• Tokens are specified by regular expressions.
main() {
int i, j;
for (i=0; i<50; i++) {
printf(“i = %d”, i);
}
}
13
Lexical Analysis vs Parsing
• There are a number of reasons why the analysis portion of a compiler is normally separated into
lexical analysis and parsing (syntax analysis) phases.
• Simplicity of design is the most important consideration.
The separation of lexical and syntactic analysis often allows us to simplify at least one of these
tasks. For example, a parser that had to deal with comments and whitespace as syntactic units
would be considerably more complex than one that can assume comments and whitespace have
already been removed by the lexical analyzer.
• Compiler efficiency is improved.
a separate lexical analyzer allows us to apply specialized techniques that serve only the lexical
task, not the job of parsing. In addition, specialized buffering techniques for reading input
characters can speed up the compiler significantly.
• Compiler portability is enhanced.
Input-device-specific peculiarities can be restricted to the lexical analyzer
14
More about Tokens, Patterns and Lexemes
• Token: a certain classification of entities of a program.
• four kinds of tokens in previous example: identifiers, operators, constraints, and punctuation.
• Lexeme: A specific instance of a token. Used to differentiate tokens. For instance, both position
and initial belong to the identifier class, however each a different lexeme.
• Lexical analyzer may return a token type to the Parser, but must also keep track of
“attributes” that distinguish one lexeme from another.
• Examples of attributes: Identifiers: string, Numbers: value
• Attributes are used during semantic checking and code generation. They are not needed
during parsing.
• Patterns: Rule describing how tokens are specified in a program. Needed because a language can
contain infinite possible strings. They all cannot be enumerated (calculated/specified)
• Formal mechanisms used to represent these patterns. Formalism helps in describing precisely
(i) which strings belong to the language, and (ii) which do not.
• Also, form basis for developing tools that can automatically determine if a string belongs to a
language.
15
Lexical errors
fi (a==f(x)) - fi is misspelled or keyword? Or undeclared function identifier?
• If fi is a valid lexeme for the token id, the lexical analyzer must return the token
id to the parser and let some other phase of the compiler - handle the error
How?
i. Delete one character from the remaining input.
ii. Insert a missing character into the remaining input.
iii. Replace a character by another character.
iv. Transpose two adjacent characters.
16
Lexical Errors and Recovery
• Panic mode error recovery
• Deleting an extraneous character
• Inserting a missing character
• Replacing an incorrect character by another
• Transposing two adjacent character
17
THANK YOU
18

More Related Content

Similar to 3a. Context Free Grammar.pdf

1.Role lexical Analyzer
1.Role lexical Analyzer1.Role lexical Analyzer
1.Role lexical Analyzer
Radhakrishnan Chinnusamy
 
Compiler Designs
Compiler DesignsCompiler Designs
Compiler Designs
wasim liam
 
Lecture 02 lexical analysis
Lecture 02 lexical analysisLecture 02 lexical analysis
Lecture 02 lexical analysis
Iffat Anjum
 
New compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfNew compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdf
eliasabdi2024
 
Lexical Analysis.pdf
Lexical Analysis.pdfLexical Analysis.pdf
Lexical Analysis.pdf
BiswanathSethi2
 
role of lexical anaysis
role of lexical anaysisrole of lexical anaysis
role of lexical anaysis
Sudhaa Ravi
 
Chap 1 and 2
Chap 1 and 2Chap 1 and 2
Plc part 2
Plc  part 2Plc  part 2
Plc part 2
Taymoor Nazmy
 
Structure of the compiler
Structure of the compilerStructure of the compiler
Structure of the compiler
Sudhaa Ravi
 
Compiler Construction
Compiler ConstructionCompiler Construction
Compiler Construction
Sarmad Ali
 
11700220036.pdf
11700220036.pdf11700220036.pdf
11700220036.pdf
SouvikRoy149
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
Anujashejwal
 
match the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdfmatch the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdf
arpitaeron555
 
Unitiv 111206005201-phpapp01
Unitiv 111206005201-phpapp01Unitiv 111206005201-phpapp01
Unitiv 111206005201-phpapp01
riddhi viradiya
 
1._Introduction_.pptx
1._Introduction_.pptx1._Introduction_.pptx
1._Introduction_.pptx
Anbarasan Radhakrishnan R
 
Syntax
SyntaxSyntax
Lecture 04 syntax analysis
Lecture 04 syntax analysisLecture 04 syntax analysis
Lecture 04 syntax analysis
Iffat Anjum
 
Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design
MAHASREEM
 
4 compiler lab - Syntax Ana
4 compiler lab - Syntax Ana4 compiler lab - Syntax Ana
4 compiler lab - Syntax Ana
MashaelQ
 
Syntax analysis
Syntax analysisSyntax analysis

Similar to 3a. Context Free Grammar.pdf (20)

1.Role lexical Analyzer
1.Role lexical Analyzer1.Role lexical Analyzer
1.Role lexical Analyzer
 
Compiler Designs
Compiler DesignsCompiler Designs
Compiler Designs
 
Lecture 02 lexical analysis
Lecture 02 lexical analysisLecture 02 lexical analysis
Lecture 02 lexical analysis
 
New compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfNew compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdf
 
Lexical Analysis.pdf
Lexical Analysis.pdfLexical Analysis.pdf
Lexical Analysis.pdf
 
role of lexical anaysis
role of lexical anaysisrole of lexical anaysis
role of lexical anaysis
 
Chap 1 and 2
Chap 1 and 2Chap 1 and 2
Chap 1 and 2
 
Plc part 2
Plc  part 2Plc  part 2
Plc part 2
 
Structure of the compiler
Structure of the compilerStructure of the compiler
Structure of the compiler
 
Compiler Construction
Compiler ConstructionCompiler Construction
Compiler Construction
 
11700220036.pdf
11700220036.pdf11700220036.pdf
11700220036.pdf
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
match the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdfmatch the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdf
 
Unitiv 111206005201-phpapp01
Unitiv 111206005201-phpapp01Unitiv 111206005201-phpapp01
Unitiv 111206005201-phpapp01
 
1._Introduction_.pptx
1._Introduction_.pptx1._Introduction_.pptx
1._Introduction_.pptx
 
Syntax
SyntaxSyntax
Syntax
 
Lecture 04 syntax analysis
Lecture 04 syntax analysisLecture 04 syntax analysis
Lecture 04 syntax analysis
 
Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design
 
4 compiler lab - Syntax Ana
4 compiler lab - Syntax Ana4 compiler lab - Syntax Ana
4 compiler lab - Syntax Ana
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 

Recently uploaded

Erasmus + DISSEMINATION ACTIVITIES Croatia
Erasmus + DISSEMINATION ACTIVITIES CroatiaErasmus + DISSEMINATION ACTIVITIES Croatia
Erasmus + DISSEMINATION ACTIVITIES Croatia
whatchangedhowreflec
 
Accounting for Restricted Grants When and How To Record Properly
Accounting for Restricted Grants  When and How To Record ProperlyAccounting for Restricted Grants  When and How To Record Properly
Accounting for Restricted Grants When and How To Record Properly
TechSoup
 
220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx
Kalna College
 
bryophytes.pptx bsc botany honours second semester
bryophytes.pptx bsc botany honours  second semesterbryophytes.pptx bsc botany honours  second semester
bryophytes.pptx bsc botany honours second semester
Sarojini38
 
Observational Learning
Observational Learning Observational Learning
Observational Learning
sanamushtaq922
 
BỘ BÀI TẬP TEST THEO UNIT - FORM 2025 - TIẾNG ANH 12 GLOBAL SUCCESS - KÌ 1 (B...
BỘ BÀI TẬP TEST THEO UNIT - FORM 2025 - TIẾNG ANH 12 GLOBAL SUCCESS - KÌ 1 (B...BỘ BÀI TẬP TEST THEO UNIT - FORM 2025 - TIẾNG ANH 12 GLOBAL SUCCESS - KÌ 1 (B...
BỘ BÀI TẬP TEST THEO UNIT - FORM 2025 - TIẾNG ANH 12 GLOBAL SUCCESS - KÌ 1 (B...
Nguyen Thanh Tu Collection
 
Diversity Quiz Prelims by Quiz Club, IIT Kanpur
Diversity Quiz Prelims by Quiz Club, IIT KanpurDiversity Quiz Prelims by Quiz Club, IIT Kanpur
Diversity Quiz Prelims by Quiz Club, IIT Kanpur
Quiz Club IIT Kanpur
 
220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science
Kalna College
 
220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology
Kalna College
 
Non-Verbal Communication for Tech Professionals
Non-Verbal Communication for Tech ProfessionalsNon-Verbal Communication for Tech Professionals
Non-Verbal Communication for Tech Professionals
MattVassar1
 
nutrition in plants chapter 1 class 7...
nutrition in plants chapter 1 class 7...nutrition in plants chapter 1 class 7...
nutrition in plants chapter 1 class 7...
chaudharyreet2244
 
Ethiopia and Eritrea Eritrea's journey has been marked by resilience and dete...
Ethiopia and Eritrea Eritrea's journey has been marked by resilience and dete...Ethiopia and Eritrea Eritrea's journey has been marked by resilience and dete...
Ethiopia and Eritrea Eritrea's journey has been marked by resilience and dete...
biruktesfaye27
 
The basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptxThe basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptx
heathfieldcps1
 
pol sci Election and Representation Class 11 Notes.pdf
pol sci Election and Representation Class 11 Notes.pdfpol sci Election and Representation Class 11 Notes.pdf
pol sci Election and Representation Class 11 Notes.pdf
BiplabHalder13
 
Information and Communication Technology in Education
Information and Communication Technology in EducationInformation and Communication Technology in Education
Information and Communication Technology in Education
MJDuyan
 
Cross-Cultural Leadership and Communication
Cross-Cultural Leadership and CommunicationCross-Cultural Leadership and Communication
Cross-Cultural Leadership and Communication
MattVassar1
 
Decolonizing Universal Design for Learning
Decolonizing Universal Design for LearningDecolonizing Universal Design for Learning
Decolonizing Universal Design for Learning
Frederic Fovet
 
Slides Peluncuran Amalan Pemakanan Sihat.pptx
Slides Peluncuran Amalan Pemakanan Sihat.pptxSlides Peluncuran Amalan Pemakanan Sihat.pptx
Slides Peluncuran Amalan Pemakanan Sihat.pptx
shabeluno
 
How to Create User Notification in Odoo 17
How to Create User Notification in Odoo 17How to Create User Notification in Odoo 17
How to Create User Notification in Odoo 17
Celine George
 
How to stay relevant as a cyber professional: Skills, trends and career paths...
How to stay relevant as a cyber professional: Skills, trends and career paths...How to stay relevant as a cyber professional: Skills, trends and career paths...
How to stay relevant as a cyber professional: Skills, trends and career paths...
Infosec
 

Recently uploaded (20)

Erasmus + DISSEMINATION ACTIVITIES Croatia
Erasmus + DISSEMINATION ACTIVITIES CroatiaErasmus + DISSEMINATION ACTIVITIES Croatia
Erasmus + DISSEMINATION ACTIVITIES Croatia
 
Accounting for Restricted Grants When and How To Record Properly
Accounting for Restricted Grants  When and How To Record ProperlyAccounting for Restricted Grants  When and How To Record Properly
Accounting for Restricted Grants When and How To Record Properly
 
220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx
 
bryophytes.pptx bsc botany honours second semester
bryophytes.pptx bsc botany honours  second semesterbryophytes.pptx bsc botany honours  second semester
bryophytes.pptx bsc botany honours second semester
 
Observational Learning
Observational Learning Observational Learning
Observational Learning
 
BỘ BÀI TẬP TEST THEO UNIT - FORM 2025 - TIẾNG ANH 12 GLOBAL SUCCESS - KÌ 1 (B...
BỘ BÀI TẬP TEST THEO UNIT - FORM 2025 - TIẾNG ANH 12 GLOBAL SUCCESS - KÌ 1 (B...BỘ BÀI TẬP TEST THEO UNIT - FORM 2025 - TIẾNG ANH 12 GLOBAL SUCCESS - KÌ 1 (B...
BỘ BÀI TẬP TEST THEO UNIT - FORM 2025 - TIẾNG ANH 12 GLOBAL SUCCESS - KÌ 1 (B...
 
Diversity Quiz Prelims by Quiz Club, IIT Kanpur
Diversity Quiz Prelims by Quiz Club, IIT KanpurDiversity Quiz Prelims by Quiz Club, IIT Kanpur
Diversity Quiz Prelims by Quiz Club, IIT Kanpur
 
220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science
 
220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology
 
Non-Verbal Communication for Tech Professionals
Non-Verbal Communication for Tech ProfessionalsNon-Verbal Communication for Tech Professionals
Non-Verbal Communication for Tech Professionals
 
nutrition in plants chapter 1 class 7...
nutrition in plants chapter 1 class 7...nutrition in plants chapter 1 class 7...
nutrition in plants chapter 1 class 7...
 
Ethiopia and Eritrea Eritrea's journey has been marked by resilience and dete...
Ethiopia and Eritrea Eritrea's journey has been marked by resilience and dete...Ethiopia and Eritrea Eritrea's journey has been marked by resilience and dete...
Ethiopia and Eritrea Eritrea's journey has been marked by resilience and dete...
 
The basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptxThe basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptx
 
pol sci Election and Representation Class 11 Notes.pdf
pol sci Election and Representation Class 11 Notes.pdfpol sci Election and Representation Class 11 Notes.pdf
pol sci Election and Representation Class 11 Notes.pdf
 
Information and Communication Technology in Education
Information and Communication Technology in EducationInformation and Communication Technology in Education
Information and Communication Technology in Education
 
Cross-Cultural Leadership and Communication
Cross-Cultural Leadership and CommunicationCross-Cultural Leadership and Communication
Cross-Cultural Leadership and Communication
 
Decolonizing Universal Design for Learning
Decolonizing Universal Design for LearningDecolonizing Universal Design for Learning
Decolonizing Universal Design for Learning
 
Slides Peluncuran Amalan Pemakanan Sihat.pptx
Slides Peluncuran Amalan Pemakanan Sihat.pptxSlides Peluncuran Amalan Pemakanan Sihat.pptx
Slides Peluncuran Amalan Pemakanan Sihat.pptx
 
How to Create User Notification in Odoo 17
How to Create User Notification in Odoo 17How to Create User Notification in Odoo 17
How to Create User Notification in Odoo 17
 
How to stay relevant as a cyber professional: Skills, trends and career paths...
How to stay relevant as a cyber professional: Skills, trends and career paths...How to stay relevant as a cyber professional: Skills, trends and career paths...
How to stay relevant as a cyber professional: Skills, trends and career paths...
 

3a. Context Free Grammar.pdf

  • 1. Context Free Grammar 1 Course Name: Compiler Design Course Code: CSE331 Level:3, Term:3 Department of Computer Science and Engineering Daffodil International University
  • 2. Syntax Analysis Parsing or syntactic analysis is the process of analyzing a string of symbols, either in natural language or in computer languages, conforming to the rules of a formal grammar. The second phase of compiler and the sometimes also being called as Hierarchical Analysis. 2
  • 3. Why Syntax Analysis? • We have seen that a lexical analyzer can identify tokens with the help of regular expressions and pattern rules. • But a lexical analyzer cannot check the syntax of a given sentence due to the limitations of the regular expressions. • Regular expressions cannot check balancing tokens, such as parenthesis. • Therefore, this phase uses context-free grammar (CFG), which is recognized by push-down automata. 3
  • 6. • A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams. • The parser analyzes the source code (token stream) against the production rules to detect any errors in the code. The output of this phase is a parse tree. • This way, the parser accomplishes two tasks, i.e., parsing the code, looking for errors and generating a parse tree as the output of the phase. 6
  • 7. CFG • CFG, on the other hand, is a superset of Regular Grammar, as depicted below: A context-free grammar (CFG) consisting of a finite set of grammar rules is a quadruple (N, T, P, S) where, • N is a set of non-terminal symbols. • T is a set of terminals where N ∩ T = NULL. • P is a set of rules, P: N → (N ∪ T)*, i.e., the left-hand side of the production rule P does have any right context or left context. • S is the start symbol. 7
  • 8. • A context-free grammar has four components: G = ( V, Σ, P, S ) A set of non-terminals (V). Non-terminals are syntactic variables that denote sets of strings. The non-terminals define sets of strings that help define the language generated by the grammar. A set of tokens, known as terminal symbols (Σ). Terminals are the basic symbols from which strings are formed. A set of productions (P). The productions of a grammar specify the manner in which the terminals and non-terminals can be combined to form strings. Each production consists of a non-terminal called the left side of the production, an arrow, and a sequence of tokens and/or on- terminals, called the right side of the production. One of the non-terminals is designated as the start symbol (S); from where the production begins. CFG: Context Free Grammar 8
  • 9. Example of CFG: • G = ( V, Σ, P, S ) Where: • V = { Q, Z, N } • Σ = { 0, 1 } • P = { Q → Z | Q → N | Q → ℇ | Z → 0Q0 |N → 1Q1 } • S = { Q } This grammar describes palindrome language, such as: 1001, 11100111, 00100, 1010101, 11111, etc. 9
  • 10. The Role of the Lexical Analyzer • Roles: • Primary role: Scan a source program (a string) and break it up into small, meaningful units, called tokens. • Example: position := initial + rate * 60; • Transform into meaningful units: identifiers, constants, operators, and punctuation. • Other roles: • Removal of comments • Case conversion • Removal of white spaces • Interpretation of compiler directives or pragmas. • Communication with symbol table: Store information regarding an identifier in the symbol table. Not advisable in cases where scopes can be nested. • Preparation of output listing: Keep track of source program, line numbers, and correspondences between error messages and line numbers. • Why Lexical Analyzer is separate from parser? • Simpler design of both LA and parser. • More efficient & portable compiler. 10
  • 11. Tokens A lexical token is a sequence of characters that can be treated as a unit in the grammar of the programming languages. Example of tokens: Type token (id, number, real, . . . ) Punctuation tokens (IF, void, return, . . . ) Alphabetic tokens (keywords) Keywords; Examples-for, while, if etc. Identifier; Examples-Variable name, function name etc. Operators; Examples '+', '++', '-' etc. Separators; Examples ',' ';' etc. Example of Non-Tokens: Comments, preprocessor directive, macros, blanks, tabs, newline etc. Lexeme: The sequence of characters matched by a pattern to form the corresponding token or a sequence of input characters that comprises a single token is called a lexeme. eg- “float”, “abs_zero_Kelvin”, “=”, “-”, “273”, “;” . 11
  • 12. Tokens Operators = + − > ( { := == <> Keywords if while for int double Numeric literals 43 6.035 -3.6e10 0x13F3A Character literals ‘a’ ‘~’ ‘’’ String literals “3.142” “aBcDe” “” White space space(‘ ’) tab(‘t’) newline(‘n’) Comments /*this is not a token*/ • Examples of non-tokens • Examples of Tokens 12
  • 13. • Type of tokens in C++: • Constants: • char constants: ‘a’ • string constants: “i=%d” • int constants: 50 • float point constants • Identifiers: i, j, counter, …… • Reserved words: main, int, for, … • Operators: +, =, ++, /, … • Misc. symbols: (, ), {, }, … • Tokens are specified by regular expressions. main() { int i, j; for (i=0; i<50; i++) { printf(“i = %d”, i); } } 13
  • 14. Lexical Analysis vs Parsing • There are a number of reasons why the analysis portion of a compiler is normally separated into lexical analysis and parsing (syntax analysis) phases. • Simplicity of design is the most important consideration. The separation of lexical and syntactic analysis often allows us to simplify at least one of these tasks. For example, a parser that had to deal with comments and whitespace as syntactic units would be considerably more complex than one that can assume comments and whitespace have already been removed by the lexical analyzer. • Compiler efficiency is improved. a separate lexical analyzer allows us to apply specialized techniques that serve only the lexical task, not the job of parsing. In addition, specialized buffering techniques for reading input characters can speed up the compiler significantly. • Compiler portability is enhanced. Input-device-specific peculiarities can be restricted to the lexical analyzer 14
  • 15. More about Tokens, Patterns and Lexemes • Token: a certain classification of entities of a program. • four kinds of tokens in previous example: identifiers, operators, constraints, and punctuation. • Lexeme: A specific instance of a token. Used to differentiate tokens. For instance, both position and initial belong to the identifier class, however each a different lexeme. • Lexical analyzer may return a token type to the Parser, but must also keep track of “attributes” that distinguish one lexeme from another. • Examples of attributes: Identifiers: string, Numbers: value • Attributes are used during semantic checking and code generation. They are not needed during parsing. • Patterns: Rule describing how tokens are specified in a program. Needed because a language can contain infinite possible strings. They all cannot be enumerated (calculated/specified) • Formal mechanisms used to represent these patterns. Formalism helps in describing precisely (i) which strings belong to the language, and (ii) which do not. • Also, form basis for developing tools that can automatically determine if a string belongs to a language. 15
  • 16. Lexical errors fi (a==f(x)) - fi is misspelled or keyword? Or undeclared function identifier? • If fi is a valid lexeme for the token id, the lexical analyzer must return the token id to the parser and let some other phase of the compiler - handle the error How? i. Delete one character from the remaining input. ii. Insert a missing character into the remaining input. iii. Replace a character by another character. iv. Transpose two adjacent characters. 16
  • 17. Lexical Errors and Recovery • Panic mode error recovery • Deleting an extraneous character • Inserting a missing character • Replacing an incorrect character by another • Transposing two adjacent character 17
  翻译: