Tribhuvan university CSIT 2074 Syllabus Compiler Design and Construction Old Questions Unit Wise Question

Compiler Design and Construction - Unit Wise Questions

Questions Organized by Units

Unit 1: Unit 1

27 Questions

1. Explain the various phases of compiler in detail with practical example.

6 marks

Official Answer

A compiler operates in phases. A phase is a logically interrelated operation that takes source program in one representation and produces output in another representation. The phases of a compiler are shown in below:

1. Analysis phase breaks up the source program into constituent pieces and creates an intermediate representation of the source program. Analysis of source program includes: lexical analysis, syntax analysis and semantic analysis.

2. Synthesis phase construct the desired target program from the intermediate representation. The synthesis part of compiler consists of the following phases: Intermediate code generation, Code optimization and Target code generation.

COMPILER DESIGN: Phases of a compiler

1. Lexical Analysis:

In this phase, lexical analyzer reads the source program and returns the tokens of the source program. Token is a sequence of characters that can be treated as a single logical entity (such as identifier, operators, keywords, constants etc.).

Example:

Input String: c = a + b * 3

Tokens: id1 = id2 + id3 * 3

2. Syntax Analysis:

In this phase, the syntax analyzer takes the token produced by lexical analyzer as input and generates a parse tree as output. In syntax analysis phase, the parser checks that the expression made by the token is syntactically correct or not, according to the rules that define the syntax of the source language.

Example:

3. Semantic Analysis:

In this phase, the semantic analyzer checks the source program for semantic errors and collects the type information for the code generation. Semantic analyzer checks whether they form a sensible set of instructions in the programming language or not. Type-checking is an important part of semantic analyzer.

Example:

4. Intermediate Code Generation:

If the program syntactically and semantically correct then intermediate code generator generates a simple machine independent intermediate language. The intermediate code should be generated in such a way that it can easily translated into the target machine code.

Example:

t1 = 3.0;

t2 = id3 * t1;

t3 = id2 + t2;

id1 = t3;

5. Code Optimization:

It is used to improve the intermediate code so that the output of the program could run faster and takes less space. It removes the unnecessary lines of the code and arranges the sequence of statements in order to speed up the program execution without wasting resources.

Example:

t2 = id3 * 3.0;

id1 = id2 + t2;

6. Code Generation:

Code generation is the final stage of the compilation process. It takes the optimized intermediate code as input and maps it to the target machine language.

Example:

MOV R1, id3

MUL R1, #3.0

MOV R2, id2

ADD R1, R2

MOV id1, R1

Symbol Table

Symbol tables are data structures that are used by compilers to hold information about source-program constructs. The information is collected incrementally by the analysis phase of compiler and used by the synthesis phases to generate the target code. Entries in the symbol table contain information about an identifier such as its type, its position in storage, and any other relevant information.

Error Handling

Whenever an error is encountered during the compilation of the source program, an error handler is invoked. Error handler generates a suitable error reporting message regarding the error encountered.

AI Generated Answer

1. Explain the phase of a compiler with block diagram.

6 marks

Details

Official Answer

2. Synthesis phase construct the desired target program from the intermediate representation. The synthesis part of compiler consists of the following phases: Intermediate code generation, Code optimization and Target code generation.

COMPILER DESIGN: Phases of a compiler

1. Lexical Analysis:

Example:

Input String: c = a + b * 3

Tokens: id1 = id2 + id3 * 3

2. Syntax Analysis:

Example:

3. Semantic Analysis:

Example:

4. Intermediate Code Generation:

Example:

t1 = 3.0;

t2 = id3 * t1;

t3 = id2 + t2;

id1 = t3;

5. Code Optimization:

Example:

t2 = id3 * 3.0;

id1 = id2 + t2;

6. Code Generation:

Code generation is the final stage of the compilation process. It takes the optimized intermediate code as input and maps it to the target machine language.

Example:

MOV R1, id3

MUL R1, #3.0

MOV R2, id2

ADD R1, R2

MOV id1, R1

Symbol Table

Error Handling

Whenever an error is encountered during the compilation of the source program, an error handler is invoked. Error handler generates a suitable error reporting message regarding the error encountered.

AI Generated Answer

1. Differentiate between compiler and interpreter. "Symbol table is a necessary component of compiler", justify this statement with examples.

6 marks

Details

Official Answer

Difference between Interpreter and Compiler

differentiate between compilor and interpreter - Brainly.in

Symbol tables are data structures that are used by compilers to hold information about source-program constructs. The information is collected incrementally by the analysis phase of a compiler and used by the synthesis phases to generate the target code. Entries in the symbol table contain information about an identifier such as its type, its position in storage, and any other relevant information.

Symbol tables typically need to support multiple declarations of the same identifier within a program. The lexical analyzer can create a symbol table entry and can return token to the parser, say id, along with a pointer to the lexeme. Then the parser can decide whether to use a previously created symbol table or create new one for the identifier.

The basic operations defined on a symbol table include

allocate – to allocate a new empty symbol table
free – to remove all entries and free the storage of a symbol table
insert – to insert a name in a symbol table and return a pointer to its entry
lookup – to search for a name and return a pointer to its entry
set_attribute – to associate an attribute with a given entry
get_attribute – to get an attribute associated with a given entry

AI Generated Answer

1. List out the tasks performed by Lexical Analyser. Define DFA. Convert the Regular Expression (a+b)*a(a+b) to DFA directly. (3+1+6)

10 marks

Details

Official Answer

AI Generated Answer

1. Define the compiler. Explain the phases of compiler.

6 marks

Details

Official Answer

A compiler is a program that takes a program written in a source language and translates it into an equivalent program in a target language. As an important part of a compiler is error showing to the programmer.

COMPILER DESIGN: Phases of a compiler

1. Lexical Analysis:

Example:

Input String: c = a + b * 3

Tokens: id1 = id2 + id3 * 3

2. Syntax Analysis:

Example:

3. Semantic Analysis:

Example:

4. Intermediate Code Generation:

Example:

t1 = 3.0;

t2 = id3 * t1;

t3 = id2 + t2;

id1 = t3;

5. Code Optimization:

Example:

t2 = id3 * 3.0;

id1 = id2 + t2;

6. Code Generation:

Code generation is the final stage of the compilation process. It takes the optimized intermediate code as input and maps it to the target machine language.

Example:

MOV R1, id3

MUL R1, #3.0

MOV R2, id2

ADD R1, R2

MOV id1, R1

Symbol Table

Error Handling

Whenever an error is encountered during the compilation of the source program, an error handler is invoked. Error handler generates a suitable error reporting message regarding the error encountered.

AI Generated Answer

1. Explain the analysis phase of the compiling a source program with example.

6 marks

Details

Official Answer

Analysis phase breaks up the source program into constituent pieces and creates an intermediate representation of the source program. Analysis of source program includes: lexical analysis, syntax analysis and semantic analysis.

Block diagram for phases of compiler:

COMPILER DESIGN: Phases of a compiler

1. Lexical Analysis:

2. Syntax Analysis:

3. Semantic Analysis:

Example of Analysis phase:

AI Generated Answer

1. Explain briefly about different phases involved in compiler, with a block diagram.

6 marks

Details

Official Answer

COMPILER DESIGN: Phases of a compiler

1. Lexical Analysis:

Example:

Input String: c = a + b * 3

Tokens: id1 = id2 + id3 * 3

2. Syntax Analysis:

Example:

3. Semantic Analysis:

Example:

4. Intermediate Code Generation:

Example:

t1 = 3.0;

t2 = id3 * t1;

t3 = id2 + t2;

id1 = t3;

5. Code Optimization:

Example:

t2 = id3 * 3.0;

id1 = id2 + t2;

6. Code Generation:

Code generation is the final stage of the compilation process. It takes the optimized intermediate code as input and maps it to the target machine language.

Example:

MOV R1, id3

MUL R1, #3.0

MOV R2, id2

ADD R1, R2

MOV id1, R1

Symbol Table

Error Handling

Whenever an error is encountered during the compilation of the source program, an error handler is invoked. Error handler generates a suitable error reporting message regarding the error encountered.

AI Generated Answer

1. Draw block diagram to represent different phases of compiler. Explain different steps in analysis phase.

6 marks

Details

Official Answer

Block diagram for phases of compiler:

COMPILER DESIGN: Phases of a compiler

1. Lexical Analysis:

2. Syntax Analysis:

3. Semantic Analysis:

Example of Analysis phase:

AI Generated Answer

1. Draw block diagram of compiler. Explain different steps in synthesis phase.

6 marks

Details

Official Answer

Block diagram for phases of compiler:

COMPILER DESIGN: Phases of a compiler

Synthesis phase construct the desired target program from the intermediate representation. The synthesis part of compiler consists of the following phases: Intermediate code generation, Code optimization and Target code generation.

1. Intermediate Code Generation:

Example:

Intermediate code for: id1 = id2 + id3 * 3.0

t1 = 3.0;

t2 = id3 * t1;

t3 = id2 + t2;

id1 = t3;

2. Code Optimization:

Example:

t2 = id3 * 3.0;

id1 = id2 + t2;

3. Code Generation:

Code generation is the final stage of the compilation process. It takes the optimized intermediate code as input and maps it to the target machine language.

Example:

MOV R1, id3

MUL R1, #3.0

MOV R2, id2

ADD R1, R2

MOV id1, R1

AI Generated Answer

3. Define type checking. Differentiate between type casting and coercion? Write SDD to carry out type checking for the following expression.

E->n|E*E| E==|E[E] | E↑

10 marks

Details

Official Answer

AI Generated Answer

2. Differentiate between LR(0) and LR(1) algorithm. Construct LR(1) parse table of the following grammar.(3+7)

S->AA

A->aA | b

10 marks

Details

Official Answer

AI Generated Answer

4. Define compiler and differentiate it with an interpreter.

5 marks

Details

Official Answer

AI Generated Answer

5. What are the typical entries made in symbol table? Explain.

5 marks

Details

Official Answer

AI Generated Answer

4. Define compiler. Explain analysis phase of compiler.

5 marks

Details

Official Answer

1. Lexical Analysis:

2. Syntax Analysis:

3. Semantic Analysis:

Example of Analysis phase:

AI Generated Answer

8. What are the compiler construction tools? Explain.

6 marks

Details

Official Answer

For the construction of a compiler, the compiler writer uses different types of software tools that are known as compiler construction tools. These tools make use of specialized languages for specifying and implementing specific components, and most of them use sophisticated algorithms. The tools should hide the details of the algorithm used and produce component in such a way that they can be easily integrated into the rest of the compiler. Some of the most commonly used compiler construction tools are:

1. Parser generators: They automatically produce syntax analyzers from a grammatical description of a programming language.

2. Scanner generators: They produce lexical analyzers from a regular-expression description of the tokens of a language.

3. Syntax-directed translation engine: They produce collections of routines for walking a parse tree and generating intermediate code.

4. Code-generator generators: They produce a code generator from a collection of rules for translating each operation of the intermediate language into the machine language for a target machine.

5. Data-flow analysis engines: They facilitate the gathering of information about how the data is transmitted from one part of the program to another. For code optimization, data-flow analysis is a key part.

6. Compiler-construction toolkits: They provide an integrated set of routines for constructing various phases of a compiler.

AI Generated Answer

6. Define Left recursive grammar. Remove left recursion from the following grammar.

S→SB | Ca

B→Bb | c

C→aB | a

5 marks

Details

Official Answer

AI Generated Answer

7. What are the disadvantages of shift reduce parsing. Perform shift reduce parsing of string

w= (x-x) -(x/x) for given grammar.

E→ E-E | E/E | (E) | x

5 marks

Details

Official Answer

AI Generated Answer

10. Discuss the importance of error handler in compiler. How is it manipulated in the different phases of compilation?

6 marks

Details

Official Answer

Error detection and reporting of errors are important functions of the compiler. Whenever an error is encountered during the compilation of the source program, an error handler is invoked. Error handler generates a suitable error reporting message regarding the error encountered. The error reporting message allows the programmer to find out the exact location of the error. Errors can be encountered at any phase of the compiler during compilation of the source program for several reasons such as:

In lexical analysis phase, errors can occur due to misspelled tokens, unrecognized characters, etc. These errors are mostly the typing errors.
In syntax analysis phase, errors can occur due to the syntactic violation of the language.
In intermediate code generation phase, errors can occur due to incompatibility of operands type for an operator.
In code optimization phase, errors can occur during the control flow analysis due to some unreachable statements.
In code generation phase, errors can occurs due to the incompatibility with the computer architecture during the generation of machine code. For example, a constant created by compiler may be too large to fit in the word of the target machine.
In symbol table, errors can occur during the bookkeeping routine, due to the multiple declaration of an identifier with ambiguous attributes.

After detecting an error, a phase must somehow deal with that error, so that compilation can proceed, allowing further errors in the source program to be detected.

Error handler contains a set of routines for handling error encountered in any phase of the compiler. Every phase of compiler will call an appropriate routine of the error handler as per their specifications.

AI Generated Answer

10. Differentiate between Pascal compiler and C++ compiler.

6 marks

Details

Official Answer

Pascal Compiler	C/C++ Compiler
It is one pass compiler where all the phases are combined into one pass.	It is multi-pass compiler where different phases of compiler are grouped into multiple phases.
Here intermediate representation of source program is not created.	Here intermediate representation of source program is created.
It is faster than C/C++ compiler.	It is slightly slower than Pascal compiler.
It does not look back at code it previously processed	Each pass takes the result of the previous pass as the input and create intermediate output.
Unable to generate efficient programs due to limited scope available.	Due to wider scope availability, these compiler allows better code generation
It simplifies the job of writing a compiler but limited.	It is complex job writing a multi-pass compiler, code is improved pass by pass until final code.
Memory consumption is lower.	Memory consumption is higher.

AI Generated Answer

8. Define attribute grammar with example of inherited and synthesized attributes.

5 marks

Details

Official Answer

AI Generated Answer

10. Differentiate between C compiler and Pascal compiler.

6 marks

Details

Official Answer

Pascal Compiler	C/C++ Compiler
It is one pass compiler where all the phases are combined into one pass.	It is multi-pass compiler where different phases of compiler are grouped into multiple phases.
Here intermediate representation of source program is not created.	Here intermediate representation of source program is created.
It is faster than C/C++ compiler.	It is slightly slower than Pascal compiler.
It does not look back at code it previously processed	Each pass takes the result of the previous pass as the input and create intermediate output.
Unable to generate efficient programs due to limited scope available.	Due to wider scope availability, these compiler allows better code generation
It simplifies the job of writing a compiler but limited.	It is complex job writing a multi-pass compiler, code is improved pass by pass until final code.
Memory consumption is lower.	Memory consumption is higher.

AI Generated Answer

10. Discuss the importance of error handler in compiler. How is it manipulated in the different phases of compilation.

6 marks

Details

Official Answer

In lexical analysis phase, errors can occur due to misspelled tokens, unrecognized characters, etc. These errors are mostly the typing errors.
In syntax analysis phase, errors can occur due to the syntactic violation of the language.
In intermediate code generation phase, errors can occur due to incompatibility of operands type for an operator.
In code optimization phase, errors can occur during the control flow analysis due to some unreachable statements.
In code generation phase, errors can occurs due to the incompatibility with the computer architecture during the generation of machine code. For example, a constant created by compiler may be too large to fit in the word of the target machine.
In symbol table, errors can occur during the bookkeeping routine, due to the multiple declaration of an identifier with ambiguous attributes.

After detecting an error, a phase must somehow deal with that error, so that compilation can proceed, allowing further errors in the source program to be detected.

AI Generated Answer

9. Define three address codes. Write down Quadruples for,

a = -b*(c+d)/e

5 marks

Details

Official Answer

AI Generated Answer

1. What do you mean by compiler? Explain the semantic analysis phase of compiler construction.

6 marks

Details

Official Answer

Block diagram for phases of compiler:

COMPILER DESIGN: Phases of a compiler

In Semantic Analysis phase, the semantic analyzer checks the source program for semantic errors and collects the type information for the code generation. Semantic analyzer checks whether they form a sensible set of instructions in the programming language or not. Type-checking is an important part of semantic analyzer.

Semantic analyzer is required when the compiler may require performing some additional checks such as determining the type of expressions and checking that all statements are correct with respect to the typing rules, that variables have been properly declared before they are used, that functions are called with the proper number of parameters etc. This semantic analyzer phase is carried out using information from the parse tree and the symbol table.

Example:

AI Generated Answer

10. List out different types of run time storage management techniques. Explain any one of them.

5 marks

Details

Official Answer

AI Generated Answer

11. What is the advantages of code optimization? Explain about dead - code elimination.

5 marks

Details

Official Answer

AI Generated Answer

12. Explain about the factors affecting target code generation.

5 marks

Details

Official Answer

AI Generated Answer

Unit 2: Unit 2

59 Questions

2. Convert the following RE to DFA directly.

(a+b)*ab

6 marks

Details

Official Answer

Given RE;

(a+b)*ab

The augmented RE of the given RE is:

Syntax tree for augmented R.E. is

Calculating followpos:

followpos(1) = {1, 2, 3}

followpos(2) = {1, 2, 3}

followpos(3) = {4}

followpos(4) = {5}

Now,

1) Start state of DFA = firstpos(root) = {1, 2, 3} = S₀

Use followpos of symbol representing position in RE to obtain the next state of DFA.

2) From S₀ = {1, 2, 3}

1 and 3 represents 'a'

2 represents 'b'

followpos({1, 3}) = {1, 2, 3, 4} = S₁

δ(s₀, a) = S₁

followpos({2}) = {1, 2, 3} = S₀

δ(S₀, b) = S₀

3) From S₁ = {1, 2, 3, 4}

1, 3 → a

2, 4 → b

followpos({1, 3}) = {1, 2, 3, 4} = S₁

δ(S₁, a) = S₁

followpos({2, 4}) = {1, 2, 3, 5} = S₂

δ(S₁, b) = S₂

4) From S₂ = {1, 2, 3, 5}

1, 3 → a

2 → b

5 → #

followpos({1, 3}) = {1, 2, 3, 4}= S₁

δ(S₂, a) = S₁

followpos({2}) = {1, 2, 3} = S₀

δ(S₂, b) = S₀

Now there was no new states occur.

Therefore, Final state = {S₂}

Now the resulting DFA of given regular expression is:

AI Generated Answer

2. Design a lexical analyzer generator and explain it.

6 marks

Details

Official Answer

Lexical Analyzer Generator (Lex/Flex) systematically translate regular definitions into C source code for efficient scanning. Generated code is easy to integrate in C applications. Flex is a latter version of lex.

Firstly, specification of a lexical analyzer is prepared by creating a program lex.l in Lex/Flex
lex.l is run into lex compiler to produce a C program lex.yy.c
lex.yy.c consists of the tabular representation of lex.l, together with a standard routine that uses the table to recognize lexeme.
Lex.yy.c is run through C compiler to produce the object program a.out which is a lexical analyzer that input into tokens.

A flex specification consists of three parts:

1. regular definitions, C declarations in %{ %}

2. translation rules

3. user-defined auxiliary procedures

The translation rules are of the form:

p1 { action1 }

p2 { action2 }

…

pn { actionn }

Lex/flex regular definitions are of the form:

name definition

E.g. Digit [0-9]

Example of lex specification

#include <stdio.h>

[0-9]+ { printf(“%s\\n”, yytext); }

. | \\n { }

main()

{ yylex();

}

AI Generated Answer

2. List out the major tasks carried out in Lexical Analysis Phase. Convert the following NFA to DFA.

6 marks

Details

Official Answer

In lexical analysis phase, Lexical Analyzer reads the input characters of the source program, groups them into lexemes, and produces a sequence of tokens for each lexeme. The tokens are then sent to the parser for syntax analysis.

Task carried out in lexical analysis phase are:

Lexical analyzer reads the input character and produces output sequence of tokens that the Parser uses for syntax analysis.
Lexical analyzer helps to identify token into the symbol table.
Lexical Analyzer is also responsible for eliminating comments and white spaces from the source program.
It also generates lexical errors.
Lexical analyzer is used by web browsers to format and display a web page with the help of parsed data from JavsScript, HTML, CSS

Second part:

Using the subset construction we have the DFA as:

States	Next states
States	0	1
→ *{p}	{r, q}	Φ
*{r, q}	{r, q}	{r}
*{r}	{r}	Φ
Φ	Φ	Φ

Now Transition diagram for above table is:

AI Generated Answer

1. Differentiate between top-down and bottom-up parsing methods. Construct SLR parse table for the following grammar.

10 marks

Details

Official Answer

Difference between top-down and bottom-up parsing methods

Top-down Parsing	Bottom-up Parsing
It is a parsing strategy that first looks at the highest level of the parse tree and works down the parse tree by using the rules of grammar.	It is a parsing strategy that first looks at the lowest level of the parse tree and works up the parse tree by using the rules of grammar.
Top-down parsing attempts to find the left most derivations for an input string.	Bottom-up parsing can be defined as an attempt to reduce the input string to start symbol of a grammar.
In this parsing technique we start parsing from top (start symbol of parse tree) to down (the leaf node of parse tree) in top-down manner.	In this parsing technique we start parsing from bottom (leaf node of parse tree) to up (the start symbol of parse tree) in bottom-up manner.
This parsing technique uses Left Most Derivation.	This parsing technique uses Right Most Derivation.
Its main decision is to select what production rule to use in order to construct the string.	Its main decision is to select when to use a production rule to reduce the string to get the starting symbol.
A top-down parser can be easily structured and formed.	It is difficult to produce a bottom-up parser
It is less powerful.	It is more powerful.
Error detection is easy.	Error detection is difficult.

Now,

Given grammar:

1. S → aETe

2. E → Ebc

3. E → b

4. T → d

The augmented grammar of given grammar is,

S‘ → S

S → aETe

E → Ebc

E → b

T → d

Next, we obtain the canonical collection of sets of LR(0) items, as follows,

I₀ = closure({S'→•S}) = {S'→•S, S→ •aETe }

goto(I₀, S) = {S'→S•}= I₁

goto(I₀, a) = closure({S→ a•ETe}) = {S→ a•ETe, E → •Ebc, E → •b}= I₂

goto(I₂, E) = closure({S→ aE•Te, E → E•bc}) = {S→ aE•Te, E → E•bc, T → •d}= I₃

goto(I₂, b) = closure({E → b•}) = {E → b•} = I₄

goto(I₃ , d) = closure({T → d•}) = I₅

goto(I₃ , b) = closure({E → Eb•c}) = {E → Eb•c}= I₆

goto(I₃ , T) = closure({S→ aET•e}) = {S→ aET•e} = I₇

goto( I₆ , c) = closure({E → Ebc•}) = {E → Ebc•}= I₈

goto(I₇ , e) = closure({S→ aETe•}) = {S→ aETe•}= I₉

FOLLOW set for non-terminals are,

FOLLOW(S) = {$}

FOLLOW(E) = {b, d}

FOLLOW(T) = {e}

Now, the SLR parsing table is,

States/ Terminals	Action Table						Goto Table
States/ Terminals	a	b	c	d	e	$	S	E	T
0	S₂						1
1						Accept
2		S₄						3
3		S₆		S₅					7
4		R₃		R₃
5					R₄
6			S₈
7					S₉
8		R₂		R₂
9						R₁

AI Generated Answer

2. What is the role of regular expression in lexical analysis? Also give example.

6 marks

Details

Official Answer

The lexical analyzer needs to scan and identify only a finite set of valid string/token/lexeme that belong to the language in hand. It searches for the pattern defined by the language rules.

Regular expressions have the capability to express finite languages by defining a pattern for finite strings of symbols. The grammar defined by regular expressions is known as regular grammar. The language defined by regular grammar is known as regular language.

An important notation for specifying the patterns in known as Regular expression. Set of strings is matched by each pattern and the names of the set of strings are served by the regular expressions. Regular languages describe the programming language tokens.

The various operations on languages are:

Union of two languages L and M is written as

L U M = {s | s is in L or s is in M}

Concatenation of two languages L and M is written as

LM = {st | s is in L and t is in M}

The Kleene Closure of a language L is written as

L* = Zero or more occurrence of language L.

Representing valid tokens of a language in regular expression:

If x is a regular expression, then:
x* means zero or more occurrence of x.
x⁺ means one or more occurrence of x.
x? means at most one occurrence of x
[a-z] is all lower-case alphabets of English language.
[A-Z] is all upper-case alphabets of English language.
[0-9] is all natural digits used in mathematics.

Representing occurrence of symbols using regular expressions:

letter = [a – z] or [A – Z]
digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0-9]
sign = [ + | - ]

Representing language tokens using regular expressions:

Decimal = (sign)?(digit)+
Identifier = (letter)(letter | digit)*

AI Generated Answer

2. Define token, pattern and lexeme with suitable example. How input buffering can be implemented for scanner, Explain.

6 marks

Details

Official Answer

Token

Token is a sequence of characters that can be treated as a single logical entity. For example, Identifiers, Keywords, Operators, Constants etc.

Pattern

A set of string in the input for which the same token is produced as output. This set of string is described by a rule called a pattern associated with the token. For example, “a letter followed by zero or more letters, digits, or underscores.”

Lexeme

A lexeme is a sequence of characters in the source program that is matched by the pattern for a token.

Input Buffering

Lexical analysis needs to look ahead several characters before a match can be announced. We have two buffer input scheme that is useful when look ahead is necessary.

Buffer Pair
Sentinels

Buffer Pair (2N Buffering):

In this technique buffer is divided into two halves with N-characters each, where N is number of characters on the disk block like 1024 and 4096. Rather than reading characters by characters from file we read N input character at once. If there are fewer than N character in input EOF marker is placed. There are two pointers: lexeme pointer and forward pointer.

Lexeme pointer points to the start of the lexeme while forward pointer scans the input buffer for lexeme. When forward pointer reaches the end of one half, second half is loaded and forwarded pointer points to the beginning of the next half.

Sentinels:

While using buffer pair technique, we have to check each time fwd is moved that it doesn't crosses the buffer half and when it reaches end of buffer, the other one needs to be loaded. We can solve this problem by introducing a sentinel character at the end of both halves of buffer. The sentinel can be any character which is not a part of source program. EOF is usually preferred as it will also indicate end of source program. It signals the need for some special action (fill other buffer-half, or terminate processing).

AI Generated Answer

2. Given a regular expression (ε + 0)*10. Construct the DFA recognizing the pattern described by this regular expression using syntax tree based reduction.

6 marks

Details

Official Answer

Given RE;

(ε + 0)*10

The augmented RE of the given RE is:

(ε + 0)*10 #

Syntax tree for augmented R.E. is

Calculating followpos:

followpos(1) = {1, 2}

followpos(2) = {3}

followpos(3) = {4}

Now,

1) Start state of DFA = firstpos(root) = {1, 2} = S₀

Use followpos of symbol representing position in RE to obtain the next state of DFA.

2) From S₀ = {1, 2}

1 represents '0'

2 represents '1'

followpos(1) = {1, 2} = S₀

δ(s₀, 0) = S₀

followpos(2) = {3} = S₁

δ(S₀, 1) = S₁

3) From S₁ = {3}

Here 3 represents '0'

followpos(3) = {4} = S₂

δ(S₁, 0) = S₂

4) From S₂ = {4}

Here 4 represents '#'

Now there was no new states occur.

Therefore, Final state = {S₂}

Now the resulting DFA of given regular expression is:

AI Generated Answer

2. Why are regular expressions used in token specification? Write the regular expression to specify the identifier like in C.

6 marks

Details

Official Answer

The lexical analyzer needs to scan and identify only a finite set of valid string/token/lexeme that belong to the language in hand. It searches for the pattern defined by the language rules. Regular expressions have the capability to express finite languages by defining a pattern for finite strings of symbols. So regular expressions are used in token specification.

Regular expressions are the algebraic expressions that are used to describe tokens of a programming language. It uses the three regular operations. These are called union/or, concatenation and star. Brackets ( and ) are used for grouping.

Union of two languages L and M is written as

L U M = {s | s is in L or s is in M}

Concatenation of two languages L and M is written as

LM = {st | s is in L and t is in M}

The Kleene Closure of a language L is written as

L* = Zero or more occurrence of language L.

Now,

In C the regular expression for identifiers can be written using the regular definition as;

letter → A | B | .... | Z | a | b | .... | z | _

digit → 0 | 1 | 2 | ..... |9

identifier → letter(letter | digit)*

The regular expression representing identifiers without using regular definition:

(A | B | C |.........| Z | a | b | c |.........| z | _). ((A | B | C |.........| Z | a | b | c |.........| z | _ |) (1 | 2 |................| 9))*

AI Generated Answer

2. What are the functional responsibilities of a lexical analyzer? Write a lexical analyzer that recognizes the identifier of C language.

6 marks

Details

Official Answer

Lexical Analyzer reads the input characters of the source program, groups them into lexemes, and produces a sequence of tokens for each lexeme. The tokens are then sent to the parser for syntax analysis.

Responsibilities of a lexical analyzer are:

It reads the input character and produces output sequence of tokens that the Parser uses for syntax analysis.
Lexical analyzer helps to identify token into the symbol table.
Lexical Analyzer is also responsible for eliminating comments and white spaces from the source program.
It also generates lexical errors.
Lexical analyzer is used by web browsers to format and display a web page with the help of parsed data from JavsScript, HTML, CSS

Second part:

/* Lex program to recognize the identifier of C language */

% {

#include <stdio.h>

%}

% %

^[a - z A - Z _ ][a - z A - Z 0 - 9 _ ]* printf("Valid Identifier");

^[^a - z A - Z _] printf("Invalid Identifier");

. ;

% %

main()

{

 yylex();

}

AI Generated Answer

2. Explain about design of lexical analyzer generator with its suitable diagram.

6 marks

Details

Official Answer

Firstly, specification of a lexical analyzer is prepared by creating a program lex.l in Lex/Flex
lex.l is run into lex compiler to produce a C program lex.yy.c
lex.yy.c consists of the tabular representation of lex.l, together with a standard routine that uses the table to recognize lexeme.
Lex.yy.c is run through C compiler to produce the object program a.out which is a lexical analyzer that input into tokens.

A flex specification consists of three parts:

1. regular definitions, C declarations in %{ %}

2. translation rules

3. user-defined auxiliary procedures

The translation rules are of the form:

p1 { action1 }

p2 { action2 }

…

pn { actionn }

Lex/flex regular definitions are of the form:

name definition

E.g. Digit [0-9]

Example of lex specification

#include <stdio.h>

[0-9]+ { printf(“%s\\n”, yytext); }

. | \\n { }

main()

{ yylex();

}

AI Generated Answer

3. What is shift reduce parsing techniques? Show shift reduce parsing action for the string (x+x)*a, given the grammar

6 marks

Details

Official Answer

The process of reducing the given input string into the starting symbol is called shift-reducing parsing.

A shift reduce parser uses a stack to hold the grammar symbol and an input buffer to hold the input string W.

Sift reduce parsing performs the two actions: shift and reduce.

At the shift action, the current symbol in the input string is pushed to a stack.
At each reduction, the symbols will replaced by the non-terminals. The symbol is the right side of the production and non-terminal is the left side of the production.

Now,

Given grammar;

Input String: (x+x)*a

Shift reduce parsing action for given input string is shown below:

At $S*a parser can neither perform shift action nor reduce action. Hence, error occurred.

AI Generated Answer

3. Find first and follow of all the non terminals in the following grammar.

6 marks

Details

Official Answer

The given grammar is;

Now, the FIRST and FOLLOW of given non-terminals are:

FIRST(A) = FIRST(T) = FIRST(X) = {(, a}

FIRST(E) = {+, ∈}

FIRST(Y) = {*, ∈}

FOLLOW(A) = {), $}

FOLLOW(E) = {), $}

FOLLOW(T) = {), +, $}

FOLLOW(Y) = {), +, $}

FOLLOW(X) = {*, +, ), $}

Non-terminals	FIRST()	FOLLOW()
A	{(, a}	{), $}
E	{+, ∈}	{), $}
T	{(, a}	{), +, $}
Y	{*, ∈}	{), +, $}
X	{(, a}	{*, +, ), $}

AI Generated Answer

3. Discuss the specification of lexical analyzer generator Lex.

6 marks

Details

Official Answer

Firstly, specification of a lexical analyzer is prepared by creating a program lex.l in Lex/Flex
lex.l is run into lex compiler to produce a C program lex.yy.c
lex.yy.c consists of the tabular representation of lex.l, together with a standard routine that uses the table to recognize lexeme.
Lex.yy.c is run through C compiler to produce the object program a.out which is a lexical analyzer that input into tokens.

A lex/flex specification consists of three parts:

1. regular definitions, C declarations in %{ %}

2. translation rules

3. user-defined auxiliary procedures

The translation rules are of the form:

p1 { action1 }

p2 { action2 }

…

pn { actionn }

Lex/flex regular definitions are of the form:

name definition

E.g. Digit [0-9]

Example of lex specification

#include <stdio.h>

[0-9]+ { printf(“%s\\n”, yytext); }

. | \\n { }

main()

{ yylex();

}

AI Generated Answer

3. Differentiate between recursive descent and non-recursive predictive parsing method. Find first and follow of all the non-terminals in the following grammar.

6 marks

Details

Official Answer

Difference between recursive descent and non-recursive predictive parsing

Recursive predictive descent parsing	Non-recursive predictive descent parsing
It is a technique which may or may not require backtracking process.	It is a technique that does not require any kind of back tracking.
It uses procedures for every non terminal entity to parse strings.	It finds out productions to use by replacing input string.
It is a type of top-down parsing built from a set of mutually recursive procedures where each procedure implements one of non-terminal s of grammar.	It is a type of top-down approach, which is also a type of recursive parsing that does not uses technique of backtracking.
It contains several small functions one for each non- terminals in grammar.	The predictive parser uses a look ahead pointer which points to next input symbols to make it parser back tracking free, predictive parser puts some constraints on grammar.
It accepts all kinds of grammars.	It accepts only a class of grammar known as LL(k) grammar.

Now,

Given grammar;

Now, the FIRST and FOLLOW for the non-terminals are:

Non-terminals	FIRST()	FOLLOW()
E	{(, id}	{), $}
A	{+, ∈}	{), $}
T	{(, id}	{), +, $}
B	{*, ∈}	{), +, $}
F	{(, id}	{*, +, ), $}

AI Generated Answer

3. What are the problem with top down parsers? Explain the LR parsing algorithm.

6 marks

Details

Official Answer

Top-down parser is the parser which generates parse for the given input string with the help of grammar productions by expanding the non-terminals i.e. it starts from the start symbol and ends on the terminals.

Problems with top-down parsers are:

Only judges grammatically
Stops when it finds a single derivation.
No semantic knowledge employed.
No way to rank the derivations.
Problems with left-recursive rules.
Problems with ungrammatical sentences.

LR Parsing algorithm:

An LR parser comprises of a stack, an input buffer and a parsing table that has two parts: action and goto. Its stack comprises of entries of the form s_o X₁ s₁ X₂s₂ ... X_m s_m where every s_i is called a state and every X_i is a grammar symbol(terminal or non-terminal)

If top of the stack is S_m and input symbol is a, the LR parser consults action[s_m,a] which can be one of four actions.

1. Shift s , where s is a state.

2. Reduce using production A→ b

3. Accept

4. Error

The function goto takes a state and grammar symbol as arguments and produces a state.

The configuration of the LR parser is a tuple comprising of the stack contents and the contents of the unconsumed input buffer. It is of the form

(s_o X₁ s₁ X₂s₂ ... X_m s_m , a_i a_i+1 ... a_n $)

1. If action[s_m,a_i] = shift s, the parser executes a shift move entering the configuration

(s_o X₁ s₁ X₂s₂ ... X_m s_ma_is , a_i+1 ... a_n $) shifting both a_i and the state s onto stack.

2. If action[s_m,a_i] = reduce A→ β , the parser executes a reduce move entering the configuration

(s_o X₁ s₁ X₂s₂ ... X_m-r s_m-r As , a_i a_i+1 ... a_n $) . Here s = goto[s_m-r,A] and r is the length of handle β.

Note: if r is the length of β then the parser pops 2r symbols from the stack and push the state as on goto[s_m-r,A] . After reduction , the parser outputs the production used on reduce operation. A→β

3. If action[s_m,a_i] = Accept, then parser accept i.e. parsing successfully complete and stop.

4. If action[s_m,a_i] = error then call the error recovery routine.

AI Generated Answer

3. What are regular definitions? Given the following CFG,

Write the regular definitions for the terminals used. Also, construct the transition diagram for id and num.

6 marks

Details

Official Answer

If Σ is an alphabet of basic symbols, then a regular definition is a sequence of definition of the form

d₁ → r₁

d₂ → r₂

.....

d_n → r_n

where, d_i is a distinct name and r_i is a regular expression over symbols in Σ ∪ {d₁, d₂, ...., d_i-1}

Now,

Regular definitions for the terminals used in given grammar are:

digit → [0-9]
num → digit⁺(. digit⁺)?(E[+|-]? digit⁺)?
letter → [A-Za-z]
id → letter ( letter | digit )^*
if → if
then → then
else → else
relop → < | > | <= | >= | = | <>

Transition diagram for id:

trans dia id

Here 9, 10 & 11 are the name of states.

Transition diagram for num:

trans dia num

AI Generated Answer

3. What is shift reduce parsing technique? Show shift reduce parsing action for the string a*(a+a), given the grammar

6 marks

Details

Official Answer

The process of reducing the given input string into the starting symbol is called shift-reducing parsing.

A shift reduce parser uses a stack to hold the grammar symbol and an input buffer to hold the input string W.

Sift reduce parsing performs the two actions: shift and reduce.

At the shift action, the current symbol in the input string is pushed to a stack.
At each reduction, the symbols will replaced by the non-terminals. The symbol is the right side of the production and non-terminal is the left side of the production.

Now,

Given grammar;

Input String: a*(a+a)

Shift reduce parsing action for given input string is shown below:

AI Generated Answer

3. Given the regular expression (0 + 1)* 011 , construct a DFA equivalent to this regular expression computing follow pos ().

6 marks

Details

Official Answer

Given RE;

(0 + 1)* 011

The augmented RE of the given RE is:

Syntax tree for augmented R.E. is

Calculating followpos:

followpos(1) = {1, 2, 3}

followpos(2) = {1, 2, 3}

followpos(3) = {4}

followpos(4) = {5}

followpos(5) = {6}

Now,

1) Start state of DFA = firstpos(root) = {1, 2, 3} = S₀

Use followpos of symbol representing position in RE to obtain the next state of DFA.

2) From S₀ = {1, 2, 3}

1 and 3 represents '0'

2 represents '1'

followpos({1, 3}) = {1, 2, 3, 4} = S₁

δ(s₀, 0) = S₁

followpos({2}) = {1, 2, 3} = S₀

δ(S₀, 1) = S₀

3) From S₁ = {1, 2, 3, 4}

1, 3 → 0

2, 4 → 1

followpos({1, 3}) = {1, 2, 3, 4} = S₁

δ(S₁, 0) = S₁

followpos({2, 4}) = {1, 2, 3, 5} = S₂

δ(S₁, 1) = S₂

4) From S₂ = {1, 2, 3, 5}

1, 3 → 0

2, 5 → 1

followpos({1, 3}) = {1, 2, 3, 4}= S₁

δ(S₂, 0) = S₁

followpos({2, 5}) = {1, 2, 3, 6} = S₃

δ(S₂, 1) = S₃

5) From S₃ = {1, 2, 3, 6}

1, 3 → 0

2 → 1

6 → #

followpos({1, 3}) = {1, 2, 3, 4}= S₁

δ(S₃, 0) = S₁

followpos({2}) = {1, 2, 3} = S₀

δ(S₃, 1) = S₀

Now there was no new states occur.

Therefore, Final state = {S₃}

Now the resulting DFA of given regular expression is:

AI Generated Answer

2. What are static and dynamic type checking? Write SDD to carry out type checking for the following expression.

E->id |E1 op E2 | E1 relop E2 | E1[E2] | E1↑

10 marks

Details

Official Answer

In static type checking, the type of variable is known at compile time (it checks the type of variable before running). Typical examples of static checking are:

• Type checks: Report an error if an operator is applied to an incompatible operand.

• Flow-of-control checks: Statements that results in a branch need to be terminated correctly.

• Uniqueness checks: There are situations where an object must be defined exactly once.

• Name-related checks: Sometimes the same names may be appeared two or more times.

In dynamic type checking, the type of variable is known at runtime (it checks the type of variable while executing). Compiler generates verification code to enforce programming language's dynamic semantics. If a programming language has no any dynamic checking, it is called strongly typed language i.e. there will be no type errors during run-time.

Now,

SDD to carry out type checking for the given expression is given below:

E → id { E.type = lookup(id.entry) }

E → E1 op E2 { if (E1.type = integer and E2.type = integer) then E.type = integer

else if (E1.type = integer and E2.type = real) then E.type = real

else if (E1.type = real and E2.type = integer) then E.type = real

else if (E1.type = real and E2.type = real) then E.type = real

else E.type = type-error }

E → E1 relop E2 { if E1.type = boolean and E2.type = boolean then E.type = boolean

else E.type = error}

E → E1 [E2] { if (E2.type = int and E1.type = array(s, t)) then E.type = t

else E.type = type-error }

E →E1 ↑ { if E1.type = pointer(t) then E.type = t

else type-error}

AI Generated Answer

3. Differentiate between top-down parsing and bottom-up parsing.

6 marks

Details

Official Answer

Top-down Parsing	Bottom-up Parsing
It is a parsing strategy that first looks at the highest level of the parse tree and works down the parse tree by using the rules of grammar.	It is a parsing strategy that first looks at the lowest level of the parse tree and works up the parse tree by using the rules of grammar.
Top-down parsing attempts to find the left most derivations for an input string.	Bottom-up parsing can be defined as an attempt to reduce the input string to start symbol of a grammar.
In this parsing technique we start parsing from top (start symbol of parse tree) to down (the leaf node of parse tree) in top-down manner.	In this parsing technique we start parsing from bottom (leaf node of parse tree) to up (the start symbol of parse tree) in bottom-up manner.
This parsing technique uses Left Most Derivation.	This parsing technique uses Right Most Derivation.
Its main decision is to select what production rule to use in order to construct the string.	Its main decision is to select when to use a production rule to reduce the string to get the starting symbol.
A top-down parser can be easily structured and formed.	It is difficult to produce a bottom-up parser
It is less powerful.	It is more powerful.
Error detection is easy.	Error detection is difficult.

AI Generated Answer

4. Consider the following grammar:

a. Eliminate Left recursion.

b. Compute FIRST & FOLLOW for the symbol in the grammar.

6 marks

Details

Official Answer

Given grammar,

a) Removing left recursion of the given grammar;

S → (L) | a

L → SL'

L' → ,SL' | ∈

b) Now, the FIRST and FOLLOW for the given symbols are:

Non-terminals	FIRST()	FOLLOW()
S	{(, a}	{$, , , )}
L	{(, a}	{)}
L’	{, , ∈}	{)}

AI Generated Answer

4. Define finite automata. Construct a finite automata that will accept a string at zeros and ones that contains an odd number of zeros and an even number of ones.

6 marks

Details

Official Answer

A finite automaton is a mathematical (model) abstract machine that has a set of "state" and its "control" moves from state to state in response to external "inputs". The control may be either "deterministic" meaning that the automation can‘t be in more than one state at any one time, or "non-deterministic", meaning that it may be in several states at once. This distinguishes the class of automata as DFA or NFA.

The DFA, i.e. Deterministic Finite Automata can‘t be in more than one state at any time.
The NFA, i.e. Non-Deterministic Finite Automata can be in more than one state at a time.

Now,

DFA that will accept a string at zeros and ones that contains an odd number of zeros and an even number of ones is shown below:

Here,

Q = {q0, q1, q2, q3}

∑ = {0, 1}

δ = Above transition diagram

q0 = q0

F = {q2}

AI Generated Answer

4. Differentiate between LR(0) and LR(1) algorithm.

6 marks

Details

Official Answer

Difference between LR(0) and LR(1) algorithm

LR(0)requires just the canonical items for the construction of a parsing table, but LR(1) requires the loookahead symbol as well.
LR(1) allows us to choose the correct reductions and allows the use of grammar that are not uniquely invertible while LR(0) does not.
LR(0) reduces at all items whereas LR(1) reduces only at lookahead symbols.
LR(0) = canonical items and LR(1) = LR(0) + lookahead symbol.
In canonical LR(0) collection, state/item is in the form:

E.g. I₂ = { C → AB•}

whereas In cannonical LR(1) collection, state/item is in the form:

E.g. I₂ = { C →d•, e/d} where e & d are lookahead symbol.

Construction of canonical LR(0) collection starts with C = {closure({S‘→.S})} whereas Construction of canonical LR(1) collection starts with C = {closure({S‘→.S, $})} where S is the start symbol.
LR(0) algorithm is simple than LR(1) algorithm.
In LR(0), there may be conflict in parsing table while LL(1) parsing uses lookahead to avoid unnecessary conflicts in parsing table.

AI Generated Answer

4. Construct SLR parsing table for the following grammar.

S -> aAa | bAb | ba

6 marks

Details

Official Answer

AI Generated Answer

4. Consider the grammar:

a. Show that this grammar is ambiguous by constructing two different leftmost derivations for sentence abab.

b. Construct the corresponding rightmost derivations for abab.

c. Construct the corresponding parse trees for abab.

6 marks

Details

Official Answer

Given grammar;

a) Leftmost derivation:

Since the above grammar produces two different left-most derivation for the sentence abab, the given grammar is ambiguous.

b) Rightmost derivation:

c) The corresponding parse tree for abab is

AI Generated Answer

4. Construct SLR parse table for the following grammar.

6 marks

Details

Official Answer

The augmented grammar of given grammar is:

S' → S

S → E

E → E+T | T

T → T*F | F

F → id

Next, we obtain the canonical collection of sets of LR(0) items, as follows,

AI Generated Answer

4. Find first and follow for the non-terminals in the following grammar.

6 marks

Details

Official Answer

Given grammar;

Now, the FIRST and FOLLOW for the non-terminal symbols are:

Non-terminals	FIRST()	FOLLOW()
S	{a, b, d}	{$}
A	{d}	{a, c}
B	{a}	{a, c}

AI Generated Answer

4. Translate the arithmetic expression a*-(b+c) into syntax tree. Explain the ambiguous grammar.

6 marks

Details

Official Answer

Syntax tree for given arithmetic expression is shown below:

a*-(b+c)

If a same terminal string can be derived from the grammar using two or more distinct left-most derivation (or right most) then the grammar is said to be ambiguous i.e. from an ambiguous grammar, we can get two or more distinct parse tree for the same terminal string.

For example:

E → E+E | E*E | id

String: id+id*id

Derivation 1:

Derivation 2:

For the string id + id * id, the above grammar generates two parse trees or two distinct derivation. So the grammar is ambiguous.

AI Generated Answer

4. Given a regular expression (a+∈) bc*. Construct the DFA recognizing the pattern described by the regular expression using syntax tree based reduction.

6 marks

Details

Official Answer

Given RE;

(a+∈) bc*

The augmented RE of the given RE is:

(a+∈) bc* #

Syntax tree for augmented R.E. is

Calculating followpos:

followpos(1) = {2}

followpos(2) = {3, 4}

followpos(3) = {3, 4}

Now,

1) Start state of DFA = firstpos(root) = {1, 2} = S₀

Use followpos of symbol representing position in RE to obtain the next state of DFA.

2) From S₀ = {1, 2}

1 represents 'a'

2 represents 'b'

followpos(1) = {2} = S₁

δ(s₀, a) = S₁

followpos(2) = {3, 4} = S₂

δ(S₀, b) = S₂

3) From S₁ = {2}

Here 2 represents 'b'

followpos(2) = {3, 4} = S₂

δ(S₁, b) = S₂

4) From S₂ = {3, 4}

Here 3 represents 'c'

4 represents '#'

followpos(3) = {3, 4} = S₂

δ(S₂, c) = S₂

Now there was no new states occur.

Therefore, Final state = {S₂}

Now the resulting DFA of given regular expression is:

AI Generated Answer

4. Explain the role of the parser. Write an algorithm for non-recursive predictive parsing.

6 marks

Details

Official Answer

A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams. The parser analyzes the source code (token stream) against the production rules to detect any errors in the code. The output of this phase is a parse tree.

The parser obtains a string of tokens from the lexical analyzer and verifies that the string can be generated by the grammar for the source language. It has to report any syntax errors if occurs.

The tasks of parser can be expressed as

Analyzes the context free syntax
Generates the parse tree
Provides the mechanism for context sensitive analysis
Determine the errors and tries to handle them

Algorithm for non-recursive predictive parsing:

1. Set ip to the first symbol of the input string.

2. Set the stack to $S where S is the start symbol of the grammar G.

3. Let X be the top stack symbol and ‘a’ be the symbol pointed by ip then

Repeat

a. If X is a terminal or ‘$’ then

i. If X = a then pop X from the stack and advance ip

ii. else error()

b. Else

If M[X,a] = Y₁Y₂Y₃....... Y_k then

1. Pop X from Stack

2. Push Y_k , Y_k-1, .....Y₂, Y₁ on the stack with Y₁ on top.

3. Output the production X → Y₁Y₂Y₃....... Y_k

Until X = ‘$’.

AI Generated Answer

5. Explain in detail about the recursive-descent parsing with example.

6 marks

Details

Official Answer

Recursive decent parsing is a top-down parsing technique that uses a set of recursive procedure to scan its input from left to right. It may involve backtracking. It starts from the root node (start symbol) and use the first production, match the input string against the production rules and if there is no match backtrack and apply another rule.

Rules:

1. Use two pointers iptr for pointing the input symbol to be read and optr for pointing the symbol of output string that is initially start symbol S.

2. If the symbol pointed by optr is non-terminal use the first production rule for expansion.

3. While the symbol pointed by iptr and optr is same increment the both pointers.

4. The loop at the above step terminates when

a. A non-terminal is encountered in output

b. The end of input is reached.

c. Symbol pointed by iptr and optr is unmatched.

5. If 'a' is true, expand the non-terminal using the first rule and goto step 2

6. If 'b' is true, terminate with success

7. If 'c' is true, decrement both the pointers to the place of last non-terminal expansion and use the next production rule for non-terminal

8. If there is no more production and 'b' is not true report error.

Example:

Consider a grammar

S → cAd

A → ab | a

Input string is “cad”

Initially, iptr = optr = 0

Input	Output	Rules Fired
iptr(cad)	optr(S)	Try S → cAd
iptr(cad)	optr(cAd)	Match c
c iptr(ad)	c optr(Ad)	Try A → ab
c iptr(ad)	c optr(abd)	Match a
ca iptr(d)	ca optr(bd)	dead end, backtrack
c iptr(ad)	c optr(Ad)	Try A → a
c iptr(ad)	c optr(ad)	Match a
ca iptr(d)	ca optr(d)	Match d
cad iptr(eof)	cad optr(eof)
Parsing completed.

AI Generated Answer

5. Consider the following grammar:

a. Show steps of shift-reduce parsing for the input string id+id*id.

b. Identify conflicts during the parsing.

6 marks

Details

Official Answer

Given grammar;

Input string: id + id * id

Here, the shift-reduce conflict is encountered during the parsing. At $E+T, the conflict occured whether to reduce or shift during the parsing.

AI Generated Answer

5. Consider the grammar

Calculate the canonical LR(0) items.

6 marks

Details

Official Answer

The augmented grammar of given grammar is,

C‘→ C

C → AB

A → a

B → b

Next, we obtain the canonical collection of sets of LR(0) items, as follows,

I₀= closure ({C‘ → •C}) = {C‘ → •C, C → •AB, A → •a}

goto(I₀, C) = closure(C‘ → C•) = { C‘ → C•} = I₁

goto(I₀, A) = closure(C → A•B) = { C → A•B, B → •b} = I₂

goto(I₀, a) = closure(A → a•) = { A → a• } = I₃

goto(I₂, B) = closure(C → AB•) = { C → AB•} = I₄

goto(I₂, b) = closure(B → b•) = { B → b•} = I₅

Now, the canonical collection of sets of LR(0) items for given grammar are:

AI Generated Answer

5. What is Syntax Directed Definition? Define synthesized and inherited attributes with example.

6 marks

Details

Official Answer

A syntax-directed definition (SDD) is a context-free grammar together with attributes and rules. Attributes are associated with grammar symbols and rules are associated with productions. For example,

Production	Semantic Rules
E → E₁ + T	E.val = E₁.val+T.val
E → T	E.val = T.val

Synthesized Attributes

The attribute of node that are derived from its children nodes are called synthesized attributes. Assume the following production:

S → ABC

If S is taking values from its child nodes (A, B, C), then it is said to be a synthesized attribute, as the values of ABC are synthesized to S.

Example for synthesized attribute:

Production	Semantic Rules
E → E₁ * E₂	{E.val = E₁.val * E₂.val}
E → E₁ + E₂	{E.val = E₁.val +E₂.val}
E → int	{E.val = int.val}

The Syntax Directed Definition associates to each non terminal a synthesized attribute called val.

In synthesized attributes, value owns from child to parent in the parse tree. For example, for string 3*2+5

Inherited Attributes

The attributes of node that are derived from its parent or sibling nodes are called inherited attributes. If the value of the attribute depends upon its parent or siblings then it is inherited attribute. As in the following production,

S → ABC

"A‟ can get values from S, B and C. B can take values from S, A, and C. Likewise, C can take values from S, A, and B.

Example for Inherited Attributes:

Production	Semantic Rules
D ® TL	L.in = T.type
T ® int	T.type = integer
T ® real	T.type = real
L ® L1, id	L1.in = L.in; addtype(id.entry, L.in)
L ® id	addtype(id.entry, L.in)

Symbol T is associated with a synthesized attribute type and symbol L is associated with an inherited attribute in.

In inherited attributes, values flows into a node in the parse tree from parents and/or siblings. For example, for input: int id, id

AI Generated Answer

5. Define Syntax directed definition. Construct annotated parse tree for the input expression (5*3+2)*5 according to the following syntax directed definition.

Production	Semantic Rule
L -> En	Print E.val
E -> E₁ + T	E.val -> E₁.val + T.val
E -> T	E.val ->T.val
T -> T₁* F	T.val -> T₁.val * F.val
T -> F	T.val -> F.val
F -> (E)	F.val -> (E.val)
F -> digit	F.val -> digit.lexval

6 marks

Details

Official Answer

A syntax-directed definition (SDD) is a context-free grammar together with attributes and rules. Attributes are associated with grammar symbols and rules are associated with productions. For example,

Production	Semantic Rules
E → E₁ + T	E.val = E₁.val+T.val
E → T	E.val = T.val

The annotated parse tree for the input expression (5*3+2)*5 using given syntax directed definition is shown below:

AI Generated Answer

5. Construct the grammar:

Compute the complete LR(0) collection of item set from above grammar.

6 marks

Details

Official Answer

The augmented grammar of given grammar is;

E’→E

E→E+T | T

T→T*F | F

F→ (E) | id

Next, we obtain the canonical collection of sets of LR(0) items, as follows,

I₀ = closure ({E‘ → •E})

= {E‘ → •E, E → •E + T, E → •T, T → •T * F, T → •F, F → • (E), F → •id}

goto(I₀, E) = closure({E‘ → E•, E → E •+ T}) = {E‘ → E•, E → E •+ T} = I₁

goto(I₀, T) = closure({E → T•, T → T •* F}) = { E → T•, T → T •* F } = I₂

goto(I₀, F) = closure({T → F•}) = {T → F•} = I₃

goto(I₀, ( ) = closure({F → (•E)}) = {F → (•E), E → •E + T, E → •T, T → •T * F, T → •F, F → • (E), F → •id} = I₄

goto(I₀, id ) = closure({F → id•}) = {F → id•} = I₅

goto(I₁, + ) = closure({E → E + •T}) = {E → E +• T, T → •T * F, T → •F, F → • (E), F → •id} = I₆

goto(I₂, * ) = closure({T → T * •F}) = {T → T * •F, F → • (E), F → •id} = I₇

goto(I₄, E) = closure({F → (E•), E → E•+ T}) = {F → (E•), E → E•+ T } = I₈

goto(I₄, T) = closure({E → T•, T → T• * F}) = {E → T• , T → T•*F}= I₂

goto(I₄, F) = closure({T → F•}) = {T → F•} = I₃

goto(I₄, () = closure({F → •(E)} = I₄

goto(I₄, id) = closure({F → id•}) = I₅

goto(I₆, T) = closure({E → E +T•, T → T•* F}) = {E → E +T•, T → T•* F} = I₉

goto(I₆, F) = closure ({T→ F•}) = I₃

goto (I₆, () = closure ({F → (• E)} = I₄

goto(I₆, id) = closure({F → id•}) = I₅

goto(I₇, F ) = closure({T → T * F•})= {T → T * F•} = I₁₀

goto( I₇, () = closure({F → •(E)} = I₄

goto(I₇, id) = closure({F → id•}) = I₅

goto(I₈, ) ) = closure({F → (E) •}) = {F → (E) •} = I₁₁

goto(I₈, +) = closure({ E → E+•T} ) = I₆

goto(I₉, *) = closure({T → T*•F}) = { T → T*•F, F→ •(E), F→ •id } = I₇

Now,

The canonical collection of set of LR(0) items for given grammar are:

AI Generated Answer

5. What is semantic analysis? Why is it necessary to perform semantic analysis? Explain.

6 marks

Details

Official Answer

Semantic Analysis is the third phase of compiler. Semantic Analysis makes sure that declarations and statements of program are semantically correct. It is a collection of procedures which is called by parser as and when required by grammar. Both syntax tree of previous phase and symbol table are used to check the consistency of the given code.

Semantic analysis is necessary due to the following reasons:

1. In this phase, the semantic analyzer checks the source program for semantic errors and collects the type information for the code generation. Semantic analyzer checks whether they form a sensible set of instructions in the programming language or not.

2. Semantic analyzer recognizes following types of errors:

Type mismatch
Undeclared variables
Reserved identifier misuse
Multiple declaration of variable in a scope.
Accessing an out of scope variable.
Actual and formal parameter mismatch.

3. Semantic analysis performs the following function:

Type Checking : Ensures that data types are used in a way consistent with their definition.
Label Checking : A program should contain labels references.
Flow Control Check : Keeps a check that control structures are used in a proper manner.(example: no break statement outside a loop)

Example:

float x = 10.1;

float y = x*30;

In this example integer 30 will be typecasted to float 30.0 before multiplication, by semantic analyzer.

AI Generated Answer

5. Construct LR(1) parse table for

6 marks

Details

Official Answer

The augmented grammar of given grammar is:

S' → S

S → XX

X → pX | q

Now, Canonical LR(1) item sets are:

I₀ = closure([S' →•S, $]) = {[S' →•S, $], [S →•XX, $], [X → •pX, p/q], [X → •q, p/q]}

goto (I₀, S) = closure([S' →•S, $]) = {[S'→ S•, $]} = I₁

goto(I₀, X) = closure([S → X•X, $] = {[S→X•X, $], [X→•pX, $], [X→•q, $]} = I₂

goto(I₀, p) = closure([X →p•X, p/q]) = {[X →p•X, p/q], [X →•pX, p/q], [X →•q, p/q]} = I₃

goto(I₀, q) = closure([X→q•, p/q]) = {[X→q•, p/q]} = I₄

goto(I₂, X) = closure ([S→XX•, $]) = {[S → XX •, $]} = I₅

goto(I₂, p) = closure([X→p•X, $]) = {[X → p•X, $], [X → • pX, $], [X → • q, $]} = I₆

goto (I₂, q) = closure ([X → q•, $]) = {[X → q•, $]} = I₇

goto (I₃, X) = closure ([X → pX•, p/q]) = {[X → pX•, p/q]} = I₈

goto(I₃, p) = closure([X →p•X, p/q]) = I₃

goto(I₃, q) = closure([X→q•, p/q]) = I₄

goto(I₆, X) = closure([X → pX•, $]) = {[X →pX•, $]} = I₉

goto (I₆, p) = closure([X→p•X, $]) = I₆

goto(I₆, q) = closure ([X → q•, $] = I₇

Now, the LR(1) parsing table is,

1. S → XX

2. X → pX

3. X → q

States/ Terminals	Action Table			Goto Table
States/ Terminals	p	q	$	S	X
0	S₃	S₄		1	2
1			accept
2	S₆	S₇			5
3	S₃	S₄			8
4	R₄	R₃
5			R₁
6	S₆	S₇			9
7			R₃
8	R₂	R₂
9			R₂

AI Generated Answer

6. What are the main issues involved in designing lexical analyzer? Mention the various error recovery strategies for a lexical analyzer.

6 marks

Details

Official Answer

Lexical analysis is the process of producing tokens from the source program. It has the following issues:

• Lookahead

• Ambiguities

1. Lookahead:

Lookahead is required to decide when one token will end and the next token will begin. The simple example which has lookahead issues are i vs. if, = vs. ==. Therefore a way to describe the lexemes of each token is required.

A way needed to resolve ambiguities

• Is if it is two variables i and f or if?

• Is == is two equal signs =, = or ==?

• arr(5, 4) vs. fn(5, 4) II in Ada (as array reference syntax and function call syntax are similar.

Hence, the number of lookahead to be considered and a way to describe the lexemes of each token is also needed.

2. Ambiguities:

The lexical analysis programs written with lex accept ambiguous specifications and choose the longest match possible at each input point. Lex can handle ambiguous specifications. When more than one expression can match the current input, lex chooses as follows:

• The longest match is preferred.

• Among rules which matched the same number of characters, the rule given first is preferred.

Error recovery strategies for a lexical analyzer

A character sequence that cannot be scanned into any valid token is a lexical error. Misspelling of identifiers, keyword, or operators are considered as lexical errors. Usually, a lexical error is caused by the appearance of some illegal character, mostly at the beginning of a token.

The following are the error-recovery strategies in lexical analysis:
1) Panic Mode Recovery: Once an error is found, the successive characters are always ignored until we reach a well-formed token like end, semicolon.

2) Deleting an extraneous character.

3) Inserting a missing character.

4)Replacing an incorrect character by a correct character.

5)Transforming two adjacent characters.

AI Generated Answer

6. Show that the following grammar is not in a LL(1) grammar.

6 marks

Details

Official Answer

Given Grammar,

The FIRST and FOLLOW of the grammar symbols are:

FIRST(S) = {c}

FIRST(A) = {a}

FOLLOW(S) = {$}

FOLLOW(A) = {b, d}

Then, constructing the LL(1) parsing table:

Non-terminals	Terminal Symbols
Non-terminals	a	b	c	d	$
S			S → cAd
A	A → Ab A → a

Since, there are two entries for A in the LL(1) parsing table, hence the given grammar is not in LL(1) grammar.

AI Generated Answer

6. How can syntax directed definition be used in type checking?

6 marks

Details

Official Answer

AI Generated Answer

6. Describe the inherited and synthesized attributes of grammar using an example.

6 marks

Details

Official Answer

Synthesized Attributes

The attribute of node that are derived from its children nodes are called synthesized attributes. Assume the following production:

S → ABC

If S is taking values from its child nodes (A, B, C), then it is said to be a synthesized attribute, as the values of ABC are synthesized to S.

Example for synthesized attribute:

Production	Semantic Rules
E → E₁ * E₂	{E.val = E₁.val * E₂.val}
E → E₁ + E₂	{E.val = E₁.val +E₂.val}
E → int	{E.val = int.val}

The Syntax Directed Definition associates to each non terminal a synthesized attribute called val.

In synthesized attributes, value owns from child to parent in the parse tree. For example, for string 3*2+5

Inherited Attributes

S → ABC

"A‟ can get values from S, B and C. B can take values from S, A, and C. Likewise, C can take values from S, A, and B.

Example for Inherited Attributes:

Production	Semantic Rules
D ® TL	L.in = T.type
T ® int	T.type = integer
T ® real	T.type = real
L ® L1, id	L1.in = L.in; addtype(id.entry, L.in)
L ® id	addtype(id.entry, L.in)

Symbol T is associated with a synthesized attribute type and symbol L is associated with an inherited attribute in.

In inherited attributes, values flows into a node in the parse tree from parents and/or siblings. For example, for input: int id, id

AI Generated Answer

6. Given the following grammar:

Construct the parsing table for this grammar for non-recursive predictive parser.

6 marks

Details

Official Answer

Given Grammar;

Now, Calculating First and Follow:

FIRST

First(E) = First(T) = First(F) = { (, id }

First(E’) = { +, ∈ }

First(T’) = { *, ∈ }

FOLLOW

Follow(E) = { ), $}

Follow(E’) = { ), $}

Follow(F) = { *, +, ), $ }

Follow(T) = { +, ) , $}

Follow(T’) = { +, ) , $}

Now, the Parsing table for given grammar is as below:

AI Generated Answer

6. Describe the L- attributed definitions. How L-attributed definitions are evaluated?

6 marks

Details

Official Answer

An syntax directed definition that uses both synthesized and inherited attribute with restriction of not taking values from right siblings is called L-attributed definition.

Mathematically,

A syntax-directed definition is L-attributed if each inherited attribute of X_jon the right side of A → X₁X₂ … X_n depends only on

The attributes of the symbols X₁, X₂, …, X_j-1
The inherited attributes of A

Consider a production: S → ABC

Here, S can take values from A, B, and C (synthesized). 'A' can take values from S only. B can take values from S and A. C can get values from S, A, and B.

L – attributed definitions allow for a natural order of evaluating attributes: depth-first and left to right. Every S-attributed syntax-directed definition is also L-attributed and the above rule applied only for inherited attributes.

Procedure for Depth First Evaluation of L – Attributed Definitions:

Procedure dfvisit(n: node); //depth-first evaluation

for each child m of n, from left to right do

evaluate inherited attributes of m;

dfvisit(m);

evaluate synthesized attributes of n

Example:

A → XY

X.i = A.i

Y.i = X.s

A.s = Y.s

AI Generated Answer

6. Write Syntax Directed Definition to carry out type checking for the following expression.

E -> id | E1 op E2 | E1 relop E2 | E1[E2] | E1 ↑

6 marks

Details

Official Answer

E → id { E.type = lookup(id.entry) }

E → E1 op E2 { if (E1.type = integer and E2.type = integer) then E.type = integer

else if (E1.type = integer and E2.type = real) then E.type = real

else if (E1.type = real and E2.type = integer) then E.type = real

else if (E1.type = real and E2.type = real) then E.type = real

else E.type = type-error }

E → E1 relop E2 { if E1.type = boolean and E2.type = boolean then E.type = boolean

else E.type = error}

E → E1 [E2] { if (E2.type = int and E1.type = array(s, t)) then E.type = t

else E.type = type-error }

E →E1 ↑ { if E1.type = pointer(t) then E.type = t

else type-error}

AI Generated Answer

6. Differentiate between static and dynamic type checking. How can we carry out type checking for the following expression using syntax-directed definition?

6 marks

Details

Official Answer

The key difference between the static and dynamic type checking is that with static type checking, the type of variable is known at compile time (it checks the type of variable before running) while with dynamic type checking, the type of variable is known at runtime (it checks the type of variable while executing).

Type checking for the given expression using syntax-directed definition:

S → id = E { if (id.type = E.type then S.type=void else S.type = type-error }

S → if E then S1 { if (E.type = boolean then S.type = S1.type else S.type = type-error }

S → while E do S1 { if (E.type = boolean then S.type = S1.type else S.type = type-error }

S → S1;S2 { if S1.type = void and S2.type = void then S.type = void else S.type = type_error }

AI Generated Answer

6. Write Syntax Directed Definition to carry out type checking for the following expression.

E -> id | E1 op E2 | E1 relop E2 | E1[E2] | E1 ↑

6 marks

Details

Official Answer

E → id { E.type = lookup(id.entry) }

E → E1 op E2 { if (E1.type = integer and E2.type = integer) then E.type = integer

else if (E1.type = integer and E2.type = real) then E.type = real

else if (E1.type = real and E2.type = integer) then E.type = real

else if (E1.type = real and E2.type = real) then E.type = real

else E.type = type-error }

E → E1 relop E2 { if E1.type = boolean and E2.type = boolean then E.type = boolean

else E.type = error}

E → E1 [E2] { if (E2.type = int and E1.type = array(s, t)) then E.type = t

else E.type = type-error }

E →E1 ↑ { if E1.type = pointer(t) then E.type = t

else type-error}

AI Generated Answer

7. Write the type expressions for the following types:

a. An array of pointers to real’s, where the array index range from 1 to 100.

b. Function whose domains are function from two characters and whose range is a pointer of integer.

6 marks

Details

Official Answer

T→ real { T.type = real}

E → array [100, T] { if T.Type = real then T.type = array(1..100, T) else type_error() }

E→ Pointer[E1] { if (E1.type = array[100, real)) then E.type = E1.type else E.type = type-error }

T→ int { T.type = int }

T →char { T.type = char}

T→ Pointer[T1] { T.type = Pointer(T1.type) }

E→ E1[E2] { if ( E2.type = (char, char) and E1.type = (char, char) → Pointer(int) then E.type = E1.type else type_error }

AI Generated Answer

7. Define the term ‘item’, ‘closure’ and ‘goto’ used in LR parsing. Compute the SLR DFA for the following grammar.

6 marks

Details

Official Answer

Item

An “item” is a production rule that contains a dot(.) somewhere in the right side of the production. For example, A→ .αAβ , A→ α.Aβ, A→ αA.β, A→ αAβ. are items if there is a production A→ αAβ in the grammar.

Closure

If I is a set of items for a grammar G, then closure(I) is the set of LR(0) items constructed from I using the following rules:

1. Initially, every item in I is added to closure(I).

2. If A→ α.Bβ is in closure(I) and B →γ is a production, then add the item B → .γ to I if it is not already there.

Apply this rule until no more new items can be added to closure(I).

goto

If I is a set of LR(0) items and X is a grammar symbol (terminal or non-terminal), then goto(I, X) is defined as follows:

If A → α•Xβ in I then every item in closure({A → αX•β}) will be in goto(I, X).

Now,

The augmented grammar of the given grammar is,

S‘ → S

S → CC

C → aC | b

Next, we obtain the canonical collection of sets of LR(0) items, as follows,

I₀ = closure ({S‘ → •S}) = {S‘ → •S, S → •CC, C → •aC, C → •b}

goto (I₀, S) = closure ({S‘ → S•}) = {S‘ → S•} = I₁

goto (I₀, C) = closure ({S → C•C}) = {S → C•C, C → •aC, C→ •b} = I₂

goto (I₀, a) = closure ({C→ a•C}) = {C → a•C, C → •aC, C → •b} = I₃

goto (I₀, b) = closure ({C → b•}) = {C → b•} = I₄

goto (I₂, C) = closure ({S → CC•}) = {S → CC•} = I₅

goto (I₂, a) = closure ({C → a•C}) = {C → a•C, C → •aC, C → •b} = I₃

goto (I₂, b) = closure ({C → b•}) = {C → b•} = I₄

goto (I₃, C) = closure ({C → aC•}) = {C → aC•} = I₆

goto (I₃, a) = closure ({C → a•C}) = {C→ a•C, C → •aC, C → •b} = I₃

goto (I₃, b) = closure ({C → b•}) = {C → b•} = I₄

Now, the SLR DFA of the given grammar is:

AI Generated Answer

7. What do you mean by kernel and non-kernel items? Compute the kernel items for LR(0) for the following grammar.

6 marks

Details

Official Answer

Kernel Items

The initial item S‘ → •S and all the other items those with no dot (.) at the beginning of R.H.S are called the kernel items. For example, For any production A→ αBβ

Kernels are A→ α.Bβ, A→ αB.β, A→ αBβ.

Non-Kernel Items

All items except first item S‘ → •S with dot (.) at the beginning of R.H.S are called the non-kernel items. For example A→ .αBβ

Now,

The augmented grammar of the given grammar is,

S‘ → S

S → CC

C → bC | d

Next, we obtain the canonical collection of sets of LR(0) items, as follows,

I₀ = closure ({S‘ → •S}) = {S‘ → •S, S → •CC, C → •bC, C → •d}

goto (I₀, S) = closure ({S‘ → S•}) = {S‘ → S•} = I₁

goto (I₀, C) = closure ({S → C•C}) = {S → C•C, C → •bC, C→ •d} = I₂

goto (I₀, b) = closure ({C→ b•C}) = {C → b•C, C → •bC, C → •d} = I₃

goto (I₀, d) = closure ({C → d•}) = {C → d•} = I₄

goto (I₂, C) = closure ({S → CC•}) = {S → CC•} = I₅

goto (I₂, b) = closure ({C → b•C}) = {C → b•C, C → •bC, C → •d} = I₃

goto (I₂, d) = closure ({C → d•}) = {C → d•} = I₄

goto (I₃, C) = closure ({C → bC•}) = {C → bC•} = I₆

goto (I₃, b) = closure ({C → b•C}) = {C→ b•C, C → •bC, C → •d} = I₃

goto (I₃, d) = closure ({C → d•}) = {C → d•} = I₄

Now the Kernel items for LR(0) for the given grammar are;

States	Kernel Items
I₀	S‘ → •S
I₁	S‘ → S•
I₂	S → C•C
I₃	C → b•C
I₄	C → d•
I₅	S → CC•
I₆	C → bC•

AI Generated Answer

7. Write the type expressions for the following types:

a. An array of pointers to real’s, where the array index range from 1 to 100.

b. Function whose domains are function from two characters and whose range is a pointer of integer.

6 marks

Details

Official Answer

T→ real { T.type = real}

E → array [100, T] { if T.Type = real then T.type = array(1..100, T) else type_error() }

E→ Pointer[E1] { if (E1.type = array[100, real)) then E.type = E1.type else E.type = type-error }

T→ int { T.type = int }

T →char { T.type = char}

T→ Pointer[T1] { T.type = Pointer(T1.type) }

E→ E1[E2] { if ( E2.type = (char, char) and E1.type = (char, char) → Pointer(int) then E.type = E1.type else type_error }

AI Generated Answer

7. Define a context free grammar. What are the component of context free grammar? Explain.

6 marks

Details

Official Answer

A Context Free Grammar (CFG) is defined by 4-tuples as G = (V, T, P, S) where

V is a finite set of variable symbols (Non-terminals)
T is a finite set of terminal symbols.
P is a set of production rules.
S is a start symbol, S ∈ V

For e.g.

E → EAE | (E) | -E | id

A → + | - | * | /

Here, E and A are non-terminals with E as start symbol and other symbols are terminals.

The components of CFG are:

A set of terminal symbols which are the characters that appear in the language/strings generated by the grammar. Terminal symbols never appear on the left-hand side of the production rule and are always on the right-hand side.
A set of non-terminal symbols (or variables) which are placeholders for patterns of terminal symbols that can be generated by the non-terminal symbols. These are the symbols that will always appear on the left-hand side of the production rules, though they can be included on the right-hand side. The strings that a CFG produces will contain only symbols from the set of non-terminal symbols.
A set of production rules which are the rules for replacing non-terminal symbols. Production rules have the following form: variable → string of variables and terminals.
A start symbol which is a special non-terminal symbol that appears in the initial string generated by the grammar.

AI Generated Answer

8. What do you mean by S-attributed definition and how they are evaluated? Explain with example.

6 marks

Details

Official Answer

A syntax directed definition that uses synthesized attributes exclusively is said to be an S-attributed definition. A parse tree for S-attributed definition can always be annotated by evaluating the semantic rules for attributes at each node in bottom up manner. The evaluation of s-attributed definition is based on the Depth first Traversal (DFT) of the annotated tree.

Example for S-attributed definitions:

Let us consider the Grammar for arithmetic expressions:

Attributes in S-attributed definitions are evaluated in bottom-up parsing, as the values of the parent nodes depend upon the values of the child nodes.

Consider input string: 5+3*4 return

Depth first Traversal (DFT) of annotated graph:

AI Generated Answer

8. For the following grammar,

Annotate the grammar with syntax directed definitions using synthesized attributes. Remove left recursion from the grammar and re-write the annotation correspondingly.

6 marks

Details

Official Answer

First part:

E → E + E {E.val = E1.val + E2.val}

E → E * E {E.val = E1.val * E2.val}

E → (E) {E.val = E1.val}

E →id {E.val = id.lexval}

Second part:

Removing left recursion as,

E → (E)R | id R

R → +ER1 | *ER1 | ε

Now add the attributes within this non-left recursive grammar as,

E → (E) {R.in = E1.val}R {E.val = R.s}

E →id {R.in = id.lexval} R1 { E.val = R.s}

R → +E {R1.in = E.val+R.in}R1 {R.s = R1.s}

R → *E { R1.in = E.val*R.in }R1 { R.s = R1.s }

R → ε { R.s = R.in}

AI Generated Answer

9. Consider the following grammar for arithmetic expression using an operator ‘op’ to integer or real numbers.

E → E₁op E₂| num.num | num | id

Give the syntax directed definitions to determine the type of expression as when two integers are used in expression, resulting type is integer otherwise real.

6 marks

Details

Official Answer

E → id { E.type = lookup(id.entry)}

E → num {E.type = integer }

E → num.num { E.type = real}

E → E₁op E₂ { E.type = if(E1.type = integer and E2.type = integer)

then integer

else if (E1.type = integer and E2.type = real)

then real

else if (E1.type = real and E2.type = integer)

then real

else if (E1.type = real and E2.type = real)

then real

else type_error()

}

AI Generated Answer

6. Given a regular expression (ε + 0)*10. Construct the DFA recognizing the pattern described by this regular expression using syntax tree based reduction.

5 marks

Details

Official Answer

Given RE;

(ε + 0)*10

The augmented RE of the given RE is:

(ε + 0)*10 #

Syntax tree for augmented R.E. is

Calculating followpos:

followpos(1) = {1, 2}

followpos(2) = {3}

followpos(3) = {4}

Now,

1) Start state of DFA = firstpos(root) = {1, 2} = S0

Use followpos of symbol representing position in RE to obtain the next state of DFA.

2) From S0 = {1, 2}

1 represents '0'

2 represents '1'

followpos(1) = {1, 2} = S0

δ(s0, 0) = S0

followpos(2) = {3} = S1

δ(S0, 1) = S1

3) From S1 = {3}

Here 3 represents '0'

followpos(3) = {4} = S2

δ(S1, 0) = S2

4) From S2 = {4}

Here 4 represents '#'

Now there was no new states occur.

Therefore, Final state = {S2}

Now the resulting DFA of given regular expression is:

AI Generated Answer

7. Define the terms token, pattern and lexeme. How input buffer can be used for scanner. Explain.

5 marks

Details

Official Answer

Token

Token is a sequence of characters that can be treated as a single logical entity. For example, Identifiers, Keywords, Operators, Constants etc.

Pattern

Lexeme

A lexeme is a sequence of characters in the source program that is matched by the pattern for a token.

Input Buffering

Lexical analysis needs to look ahead several characters before a match can be announced. We have two buffer input scheme that is useful when look ahead is necessary.

Buffer Pair
Sentinels

Buffer Pair (2N Buffering):

Sentinels:

AI Generated Answer

8. Find first and follow of all the non terminals in the following grammar.

E ➜ TA ; A➜+TA|ɛ ; T➜FB ; B➜*FB | ɛ ; F➜(E) | id

5 marks

Details

Official Answer

Given grammar;

Now, the FIRST and FOLLOW for the non-terminals are:

Non-terminals	FIRST()	FOLLOW()
E	{(, id}	{), $}
A	{+, ∈}	{), $}
T	{(, id}	{), +, $}
B	{*, ∈}	{), +, $}
F	{(, id}	{*, +, ), $}

AI Generated Answer

9. What is Syntax Directed Definition? Define synthesized and inherited attributes with example.

5 marks

Details

Official Answer

A syntax-directed definition (SDD) is a context-free grammar together with attributes and rules. Attributes are associated with grammar symbols and rules are associated with productions. For example,

Production	Semantic Rules
E → E1 + T	E.val = E1.val+T.val
E → T	E.val = T.val

Synthesized Attributes

The attribute of node that are derived from its children nodes are called synthesized attributes. Assume the following production:

S → ABC

If S is taking values from its child nodes (A, B, C), then it is said to be a synthesized attribute, as the values of ABC are synthesized to S.

Example for synthesized attribute:

Production	Semantic Rules
E → E1 * E2	{E.val = E1.val * E2.val}
E → E1 + E2	{E.val = E1.val +E2.val}
E → int	{E.val = int.val}

The Syntax Directed Definition associates to each non terminal a synthesized attribute called val.

In synthesized attributes, value owns from child to parent in the parse tree. For example, for string 3*2+5

Inherited Attributes

S → ABC

"A‟ can get values from S, B and C. B can take values from S, A, and C. Likewise, C can take values from S, A, and B.

Example for Inherited Attributes:

Production	Semantic Rules
D ® TL	L.in = T.type
T ® int	T.type = integer
T ® real	T.type = real
L ® L1, id	L1.in = L.in; addtype(id.entry, L.in)
L ® id	addtype(id.entry, L.in)

Symbol T is associated with a synthesized attribute type and symbol L is associated with an inherited attribute in.

In inherited attributes, values flows into a node in the parse tree from parents and/or siblings. For example, for input: int id, id

AI Generated Answer

Unit 3: Unit 3

4 Questions

9. What is activation record? Discuss the different activities performed by caller and callee during procedure call and return.

6 marks

Details

Official Answer

A program is a sequence of instructions combined into a number of procedures. Instructions in a procedure are executed sequentially. A procedure has a start and an end delimiter and everything inside it is called the body of the procedure. The procedure identifier and the sequence of finite instructions inside it make up the body of the procedure.

The execution of a procedure is called its activation. An activation record contains all the necessary information required to call a procedure. An activation record may contain the following units (depending upon the source language used).

Activation Record

In any function call, the routine that initiates the call is called caller and the routine that is being called is known as callee.

Activities performed by caller and callee during procedure call and return are given below:

On the caller's side:

save registers for later use

stack push
caller-save registers

arrange for the procedure arguments to be found by the procedure

copy some arguments to registers
push additional arguments onto stack

call the procedure

save return address in register or on stack
jump to procedure starting address

later, the return value (if any) is in a pre-determined place, and control is transfered back to the return address

copy the return value

move to another register or store to memory (pop the stack if necessary)

throw away the procedure arguments

adjust stack pointer

restore saved registers

stack pop

continue

On the callee's side:

control is transfered from the caller to the starting address

return address is in register or on stack
arguments are in registers and on stack

save registers for later use

stack push
arguments and return address
caller's frame pointer
other callee-save registers

allocate storage for automatic local variables

set the frame pointer, adjust the stack pointer
assign initial values

allocate storage for temporaries

adjust the stack pointer

work

stack may be used for additional temporaries
push, use, pop

put return value in appropriate place

throw away the local variables and temporaries

adjust the stack pointer

restore saved registers

stack pop
arguments and return address
caller's frame pointer
other callee-saved registers

jump through the return address register

AI Generated Answer

5. “Symbol table is a necessary component of compiler”, justify this statement with examples.

5 marks

Details

Official Answer

Symbol tables are data structures that are used by compilers to hold information about source-program constructs. The information is collected incrementally by the analysis phase of a compiler and used by the synthesis phases to generate the target code. Entries in the symbol table contain information about an identifier such as its type, its position in storage, and any other relevant information.

The basic operations defined on a symbol table include

allocate – to allocate a new empty symbol table
free – to remove all entries and free the storage of a symbol table
insert – to insert a name in a symbol table and return a pointer to its entry
lookup – to search for a name and return a pointer to its entry
set_attribute – to associate an attribute with a given entry
get_attribute – to get an attribute associated with a given entry

AI Generated Answer

10. Discuss the importance of symbol table in computer. How is it manipulated in the different phases of compilation?

6 marks

Details

Official Answer

Symbol table is an important data structure created and maintained by compilers in order to store information about the occurrence of various entities such as variable names, function names, objects, classes, interfaces, etc. Symbol table is used by both the analysis and the synthesis parts of a compiler.

A symbol table may serve the following purposes depending upon the language in hand:

To store the names of all entities in a structured form at one place.
To verify if a variable has been declared.
To implement type checking, by verifying assignments and expressions in the source code are semantically correct.
To determine the scope of a name (scope resolution).

It is used by various phases of compiler as follows:

Lexical Analysis: Creates new table entries in the table, example like entries about token.
Syntax Analysis: Adds information regarding attribute type, scope, dimension, line of reference, use, etc. in the table.
Semantic Analysis: Uses available information in the table to check for semantics i.e. to verify that expressions and assignments are semantically correct (type checking) and update it accordingly.
Intermediate Code generation: Refers symbol table for knowing how much and what type of run-time is allocated and table helps in adding temporary variable information.
Code Optimization: Uses information present in symbol table for machine dependent optimization.
Target Code generation: Generates code by using address information of identifier present in the table.

AI Generated Answer

10. What do you mean by runtime storage allocation? Differentiate static and dynamic allocation.

5 marks

Details

Official Answer

Runtime storage allocation

■ Compiler allocates space only for global variables at compile time

■ Space for variables of procedures will be allocated at run-time

Stack/heap allocation
Ex: C, C++, Java, Fortran 8/9
Variable access is slow (compared to static allocation) since addresses are accessed through the stack/heap pointer
Recursion can be implemented

Difference between static and dynamic allocation:

Static allocation: Compiler makes the decision regarding storage allocation by looking only at the program text

Dynamic allocation: Storage allocation decisions are made only while the program is running.

AI Generated Answer

Unit 4: Unit 4

31 Questions

3. What is the role of intermediate code generation in the entire compilation process? Convert the following into three address code.

a+(b-c)*d

10 marks

Details

Official Answer

Intermediate code is used to translate the source code into the machine code. Intermediate code lies between the high-level language and the machine language. The given program in a source language is converted into an equivalent program in an intermediate language by the intermediate code generator. Intermediate codes are machine independent codes.

Compiler - Intermediate Code Generation - Tutorialspoint

Roles of Intermediate code are :

It acts as a glue between front-end and backend (or source and machine codes).
If the compiler directly translates source code into the machine code without generating intermediate code then a full native compiler is required for each new machine.
The intermediate code keeps the analysis portion same for all the compilers that's why it doesn't need a full compiler for every unique machine.
Intermediate code generator receives input from its predecessor phase and semantic analyzer phase. It takes input in the form of an annotated syntax tree.
Using the intermediate code, the second phase of the compiler synthesis phase is changed according to the target machine.
Intermediate code generator lowers abstraction from source level.
Complete some syntactic checks, perform more semantic checks. For e.g. break should be inside loop or switch only.

The intermediate code representation are

Graphical representation e.g. Abstract Syntax Tree(AST) , DAGS
Postfix notations
Three Address codes

Given expression:

a+(b-c)*d

Converting it into three address code:

t₁ = b-c

t₂ = t₁*d

t₃ = a+t₂

For given expression the quadruples is:

	op	arg1	arg2	result
(0)	-	b	c	t1
(1)	*	t1	d	t2
(2)	+	a	t2	t3

For given expression, the triples is:

	op	arg1	arg2
(0)	-	b	c
(1)	*	(0)	d
(2)	+	a	(1)

AI Generated Answer

5. What are the different issues in the design of code generator? Explain with example about the optimization of basic blocks.

6 marks

Details

Official Answer

The issues to code generator design includes:

1. Input to the code generator: The input to the code generator is intermediate representation together with the information in the symbol table. Intermediate representation has the several choices:

Postfix notation,
Syntax tree or DAG,
Three address code

The code generation phase needs complete error-free intermediate code as an input requires.

2. Target Program: The target program is the output of the code generator. The output can be:

Assembly language: It allows subprogram to be separately compiled.
Relocatable machine language: It makes the process of code generation easier.
Absolute machine language: It can be placed in a fixed location in memory and can be executed immediately.

3. Target Machine: Implementing code generation requires thorough understanding of the target machine architecture and its instruction set.

4. Instruction Selection: Efficient and low cost instruction selection is important to obtain efficient code. When we consider the efficiency of target machine then the instruction speed and machine idioms are important factors. The quality of the generated code can be determined by its speed and size.

5. Register Allocation: Proper utilization of registers improve code efficiency. Use of registers make the computations faster in comparison to that of memory, so efficient utilization of registers is important. The use of registers are subdivided into two subproblems:

During Register allocation, we select only those set of variables that will reside in the registers at each point in the program.
During a subsequent Register assignment phase, the specific register is picked to access the variable.

6. Choice of Evaluation order: The efficiency of the target code can be affected by the order in which the computations are performed.

Basic block optimization techniques are:

1. Common-Subexpression Elimination: In the common sub-expression, we don't need to be computed it over and over again. Instead of this we can compute it once and kept in store from where it's referenced when encountered again. For e.g.

2. Dead Code Elimination: The dead code may be a variable or the result of some expression computed by the programmer that may not have any further uses. By eliminating these useless things from a code, the code will get optimized. For e.g.

3. Renaming Temporary Variables: Temporary variables that are dead at the end of a block can be safely renamed. The basic block is transforms into an equivalent block in which each statement that defines a temporary defines a new temporary. Such a basic block is called normal-form block or simple block. For e.g.

4. Interchange of Statements: Independent statements can be reordered without effecting the value of block to make its optimal use. For e.g.

5. Algebraic Transformations: In the algebraic transformation, we can change the set of expression into an algebraically equivalent set. Simplify expression or replace expensive expressions by cheaper ones. For e.g.

AI Generated Answer

5. Explain the dynamic programming code generation algorithm with example.

6 marks

Details

Official Answer

The dynamic programming algorithm can be used to generate code for any machine with r interchangeable registers R₀,R₁, .. . , R_r-1 and load, store, and add instructions. For simplicity, we assume every instruction costs one unit, although the dynamic programming algorithm can easily be modified to work even if each instruction has its own cost.

The dynamic programming algorithm partitions the problem of generating optimal code for an expression into the subproblems of generating optimal code for the subexpressions of the given expression.

Contiguous evaluation: Complete the evaluation of T1, T2, then evaluate root.
Non-contiguous evaluation: First evaluate part of T1 leaving the value in a register, next evaluate T2, then return to evaluate the rest of T1.

Dynamic programming algorithm uses contiguous evaluation.

The dynamic programming algorithm proceeds in three phases (suppose the target machine has r registers):

1. Compute bottom-up for each node n of the expression tree T an array C of costs, in which the i^th component C[i] is the optimal cost of computing the subtree S rooted at n into a register, assuming i registers are available for the computation, for 1 <= i <= r.

2. Traverse T, using the cost vectors to determine which subtrees of T must be computed into memory.

3. Traverse each tree using the cost vectors and associated instructions to generate the final target code. The code for the subtrees computed into memory locations is generated first.

Example:

Consider a machine having two registers R₀ and R_l, and the following instructions, each of unit cost:

In these instructions, R_i is either R_O or R_l, and M_i is a memory location. The operator op corresponds to an arithmetic operators.

Now, Computing cost vector at each node:

Final Output:

AI Generated Answer

6. What do you mean by code optimization? Explain the basic blocks and their optimization.

6 marks

Details

Official Answer

Code optimization is used to improve the intermediate code so that the output of the program could run faster and takes less space. It removes the unnecessary lines of the code and arranges the sequence of statements in order to speed up the program execution without wasting resources. It tries to improve the code by making it consume less resources (i.e. CPU, Memory) and deliver high speed.

A basic block is a sequence of consecutive instructions in which flow of control enters by one entry point and exit to another point without halt or branching except at the end.

Basic block optimization techniques are:

1. Common-Subexpression Elimination: In the common sub-expression, we don't need to be computed it over and over again. Instead of this we can compute it once and kept in store from where it's referenced when encountered again. For e.g.

2. Dead Code Elimination: The dead code may be a variable or the result of some expression computed by the programmer that may not have any further uses. By eliminating these useless things from a code, the code will get optimized. For e.g.

4. Interchange of Statements: Independent statements can be reordered without effecting the value of block to make its optimal use. For e.g.

AI Generated Answer

7. What is code optimization? What is its importance?

6 marks

Details

Official Answer

Intermediate code

Optimized Intermediate Code

t1 = 3.0;

t2 = id3 * t1;

t3 = id2 + t2;

id1 = t3;

t2 = id3 * 3.0;

id1 = id2 + t2;

A code optimizing process must follow the three rules given below:

The output code must not, in any way, change the meaning of the program.
Optimization should increase the speed of the program and if possible, the program should demand less number of resources.
Optimization should itself be fast and should not delay the overall compiling process.

Code optimization is important because:

Code optimization enhances the portability of the compiler to the target processor.
Code optimization allows consumption of fewer resources (i.e. CPU, Memory).
Optimized code has faster execution speed.
Optimized code utilizes the memory efficiently.
Optimized code gives better performance.
Optimized code often promotes re-usability.

AI Generated Answer

7. What is the theme of code optimization? Why is code optimization important in computer?

6 marks

Details

Official Answer

Intermediate code

Optimized Intermediate Code

t1 = 3.0;

t2 = id3 * t1;

t3 = id2 + t2;

id1 = t3;

t2 = id3 * 3.0;

id1 = id2 + t2;

A code optimizing process must follow the three rules given below:

The output code must not, in any way, change the meaning of the program.
Optimization should increase the speed of the program and if possible, the program should demand less number of resources.
Optimization should itself be fast and should not delay the overall compiling process.

Code optimization is important because:

Code optimization enhances the portability of the compiler to the target processor.
Code optimization allows consumption of fewer resources (i.e. CPU, Memory).
Optimized code has faster execution speed.
Optimized code utilizes the memory efficiently.
Optimized code gives better performance.
Optimized code often promotes re-usability.

AI Generated Answer

7. What are the generic issues in the design of code generators? Explain

6 marks

Details

Official Answer

The final phase in compiler model is the code generator. It takes as input an intermediate representation of the source program and produces as output an equivalent target program. Designing of code generator should be done in such a way so that it can be easily implemented, tested and maintained.

The issues to code generator design includes:

1. Input to the code generator: The input to the code generator is intermediate representation together with the information in the symbol table. Intermediate representation has the several choices:

Postfix notation,
Syntax tree or DAG,
Three address code

The code generation phase needs complete error-free intermediate code as an input requires.

2. Target Program: The target program is the output of the code generator. The output can be:

Assembly language: It allows subprogram to be separately compiled.
Relocatable machine language: It makes the process of code generation easier.
Absolute machine language: It can be placed in a fixed location in memory and can be executed immediately.

3. Target Machine: Implementing code generation requires thorough understanding of the target machine architecture and its instruction set.

4. Instruction Selection: Efficient and low cost instruction selection is important to obtain efficient code. When we consider the efficiency of target machine then the instruction speed and machine idioms are important factors. The quality of the generated code can be determined by its speed and size.

5. Register Allocation: Proper utilization of registers improve code efficiency. Use of registers make the computations faster in comparison to that of memory, so efficient utilization of registers is important. The use of registers are subdivided into two subproblems:

During Register allocation, we select only those set of variables that will reside in the registers at each point in the program.
During a subsequent Register assignment phase, the specific register is picked to access the variable.

6. Choice of Evaluation order: The efficiency of the target code can be affected by the order in which the computations are performed.

AI Generated Answer

7. Explain with example about different methods of intermediate code representation.

6 marks

Details

Official Answer

The given program in a source language is converted into an equivalent program in an intermediate language by the intermediate code generator. Intermediate codes are machine independent codes. There are three kinds of intermediate representations:

Graphical representation (Syntax tree or DAGs)
Postfix notations
Three Address codes

Syntax Tree

Syntax tree is a graphic representation of given source program and it is also called variant of parse tree. A tree in which each leaf represents an operand and each interior node represents an operator is called syntax tree. For e.g. Syntax tree for the expression a+b*c-d/(b*c)

Directed Acyclic Graph (DAG)

A DAG for an expression identifies the common sub expressions in the expression. It is similar to syntax tree, only difference is that a node in a DAG representing a common sub expression has more than one parent, but in syntax tree the common sub expression would be represented as a duplicate sub tree. For e.g. DAG for the expression a+b*c-d/(b*c)

Postfix Notation

The representation of an expression in operators followed by operands is called postfix notation of that expression. In general if x and y be any two postfix expressions and OP is a binary operator then the result of applying OP to the x and y in postfix notation by "x y OP". For e.g. (a+ b) * c in postfix notation is: a b + c *

Three Address Code

The address code that uses at most three addresses, two for operands and one for result is called three address code. Each instruction in three address code can be described as a 4-tuple: (operator, operand1, operand2, result).

A quadruple (Three address code) is of the form: x = y op z where x, y and z are names, constants or compiler-generated temporaries and op is any operator.

For e.g.

The three-address code for a+b*c-d/(b*c)

AI Generated Answer

7. Define three address codes. Write three address codes for

S --> do m=n-p while a<=b

6 marks

Details

Official Answer

A quadruple (Three address code) is of the form: x = y op z where x, y and z are names, constants or compiler-generated temporaries and op is any operator.

Followings are the 3-address code statements for different statements.

Assignment statement:

x = y op z ( binary operator op)

x = op y ( unary operator op)

Copy statement: x = z
Unconditional jump: goto L ( jump to label L)
Conditional jump: if x relop y goto L
Procedure call:

param x1

param x2

. .

param xn

call p, n i.e. call procedure P(x1,x2,….. xn)

Indexed assignment:

x = y[i]

x[i] = y

Address and pointer assignments:

x = &y

x = *y

*x = y

Given expression;

S --> do m=n-p while a<=b

Three address code for given expression:

1. m = n-p

2. if a<b goto (1)

3. if a=b goto (1)

4. stop

AI Generated Answer

8. Explain about peephole optimization with example.

6 marks

Details

Official Answer

Peephole optimization is a simple and effective technique for locally improving target code. This technique is applied to improve the performance of the target program by examining the short sequence of target instructions (called the peephole) and replace these instructions replacing by shorter or faster sequence whenever possible. Peephole is a small, moving window on the target program.

Peephole Optimization Techniques are:

1. Redundant load and store elimination: In this technique the redundancy is eliminated. For e.g.

Initial code:

y = x + 5;

i = y;

z = i;

w = z * 3;

Optimized code:

y = x + 5;

i = y;

w = y * 3;

2. Constant folding: The code that can be simplified by user itself, is simplified. For e.g.

Initial code:

x = 2 * 3;

Optimized code:

x = 6;

3. Strength Reduction: The operators that consume higher execution time are replaced by the operators consuming less execution time. For e.g.

Initial code:

y = x * 2;

Optimized code:

y = x + x;

4. Algebraic Simplification: Peephole optimization is an effective technique for algebraic simplification. The statements such as x = x + 0 or x = x * 1 can eliminated by peephole optimization.

5. Machine idioms: The target instructions have equivalent machine instructions for performing some operations. Hence we can replace these target instructions by equivalent machine instructions in order to improve the efficiency. For e.g., some machine have auto-increment or auto-decrement addressing modes that are used to perform add or subtract operations.

AI Generated Answer

8. What do you mean by three address code? Write the syntax directed definition for following grammar to produce the three address codes for assignments

S -> id = E

E -> id

6 marks

Details

Official Answer

A quadruple (Three address code) is of the form: x = y op z where x, y and z are names, constants or compiler-generated temporaries and op is any operator.

For e.g.

The three-address code for a+b*c-d/(b*c) is

Given grammar;

S -> id = E

E -> id

Syntax directed definition for given grammar to produce the three address codes is:

Production

Sematic Rules

S → id = E

S.place = newtemp();

S.code = E.code || gen (id.place ‘=’ E.place);

E → id

E .place = id.entry

E.code = null

AI Generated Answer

8. What are the various issues of code generator? Explain the benefits of intermediate code generation.

6 marks

Details

Official Answer

The issues to code generator design includes:

1. Input to the code generator: The input to the code generator is intermediate representation together with the information in the symbol table. Intermediate representation has the several choices:

Postfix notation,
Syntax tree or DAG,
Three address code

The code generation phase needs complete error-free intermediate code as an input requires.

2. Target Program: The target program is the output of the code generator. The output can be:

Assembly language: It allows subprogram to be separately compiled.
Relocatable machine language: It makes the process of code generation easier.
Absolute machine language: It can be placed in a fixed location in memory and can be executed immediately.

3. Target Machine: Implementing code generation requires thorough understanding of the target machine architecture and its instruction set.

During Register allocation, we select only those set of variables that will reside in the registers at each point in the program.
During a subsequent Register assignment phase, the specific register is picked to access the variable.

6. Choice of Evaluation order: The efficiency of the target code can be affected by the order in which the computations are performed.

The given program in a source language is converted into an equivalent program in an intermediate language by the intermediate code generator. Intermediate codes are machine independent codes.

Benefits of intermediate code generation:

If a compiler translates the source language to its target machine language without having the option for generating intermediate code, then for each new machine, a full native compiler is required.
Intermediate code eliminates the need of a new full compiler for every unique machine by keeping the analysis portion same for all the compilers.
It becomes easier to apply the source code modifications to improve code performance by applying code optimization techniques on the intermediate code.

AI Generated Answer

8. Explain different types of loop optimization techniques.

6 marks

Details

Official Answer

Loop Optimization is the process of increasing execution speed and reducing the overheads associated with loops. Loop Optimization is a machine independent optimization.

Loop Optimization techniques are given below:

1. Frequency Reduction (Code Motion): In frequency reduction, the amount of code in loop is decreased. A statement or expression, which can be moved outside the loop body without affecting the semantics of the program, is moved outside the loop. For e.g.

Before optimization:

while(i<100)

{

a = Sin(x)/Cos(x) + i;

i++;

}

After optimization:

t = Sin(x)/Cos(x);

while(i<100)

{

a = t + i;

i++;

}

2. Loop Unrolling: In this technique the number of jumps and tests can be optimized by writing the code to times without changing the meaning of a code. For e.g.

Before optimization:

int i = 1;

while(i<100)

{

a[i] = b[i];

i++;

}

After optimization:

int i = 1;

while(i<100)

{

a[i] = b[i];

i++;

a[i] = b[i];

i++;

}

The first code loop repeats 50 times whereas second code loop repeats 25 times. Hence, optimization is done.

3. Loop Fusion/Loop Jamming: This technique combines the bodies of two loops whenever the same index variable and number of iterations are shared. For e.g.

Before optimization:

for(i=0;i<=10;i++)

{

Printf(“Jayanta”);

}

for(i=0;i<=10;i++)

{

Printf(“Poudel”);

}

After optimization:

for(i=0;i<=10;i++)

{

Printf(“Jayanta”);

Printf(“Poudel”);

}

4. Reduction in Strength: The strength of certain operators is higher than other operators. For example, strength of * is higher than +. Usually, compiler takes more time for higher strength operators and execution speed is less. Replacement of higher strength operator by lower strength operator is called a strength reduction technique

Optimization can be done by applying strength reduction technique where higher strength can be replaced by lower strength operators. For e.g.

Before optimization:

for (i=1;i<=10;i++)

{

sum = i * 7;

printf(“%d”, sum);

}

After optimization:

temp = 7;

for(i=1;i<=10;i++)

{

temp = temp + 7;

sum = temp;

printf(“%d”, sum)

}

AI Generated Answer

8. Defıne code optimization. Discuss about any three code optimization techniques with example.

6 marks

Details

Official Answer

Some code optimization techniques are:

3. Algebraic Transformations: In the algebraic transformation, we can change the set of expression into an algebraically equivalent set. Simplify expression or replace expensive expressions by cheaper ones. For e.g.

AI Generated Answer

8. What is the purpose of code optimization? Explain different types of loop optimization techniques with example.

6 marks

Details

Official Answer

Code optimization is used to improve the intermediate code so that the output of the program could run faster and takes less space. It removes the unnecessary lines of the code and arranges the sequence of statements in order to speed up the program execution without wasting resources. It tries to improve the code by making it consume less resources (i.e. CPU, Memory) and deliver high speed.

Loop Optimization techniques are given below:

1. Frequency Reduction (Code Motion): In frequency reduction, the amount of code in loop is decreased. A statement or expression, which can be moved outside the loop body without affecting the semantics of the program, is moved outside the loop. For e.g.

Before optimization:

while(i<100)

{

a = Sin(x)/Cos(x) + i;

i++;

}

After optimization:

t = Sin(x)/Cos(x);

while(i<100)

{

a = t + i;

i++;

}

2. Loop Unrolling: In this technique the number of jumps and tests can be optimized by writing the code to times without changing the meaning of a code. For e.g.

Before optimization:

int i = 1;

while(i<100)

{

a[i] = b[i];

i++;

}

After optimization:

int i = 1;

while(i<100)

{

a[i] = b[i];

i++;

a[i] = b[i];

i++;

}

The first code loop repeats 50 times whereas second code loop repeats 25 times. Hence, optimization is done.

3. Loop Fusion/Loop Jamming: This technique combines the bodies of two loops whenever the same index variable and number of iterations are shared. For e.g.

Before optimization:

for(i=0;i<=10;i++)

{

Printf(“Jayanta”);

}

for(i=0;i<=10;i++)

{

Printf(“Poudel”);

}

After optimization:

for(i=0;i<=10;i++)

{

Printf(“Jayanta”);

Printf(“Poudel”);

}

4. Reduction in Strength: The strength of certain operators is higher than other operators. For example, strength of * is higher than +. Usually, compiler takes more time for higher strength operators and execution speed is less. Replacement of higher strength operator by lower strength operator is called a strength reduction technique

Optimization can be done by applying strength reduction technique where higher strength can be replaced by lower strength operators. For e.g.

Before optimization:

for (i=1;i<=10;i++)

{

sum = i * 7;

printf(“%d”, sum);

}

After optimization:

temp = 7;

for(i=1;i<=10;i++)

{

temp = temp + 7;

sum = temp;

printf(“%d”, sum)

}

AI Generated Answer

8. What do you mean by intermediate code? Explain the role of intermediate code in compiler design.

6 marks

Details

Official Answer

Compiler - Intermediate Code Generation - Tutorialspoint

Roles of Intermediate code are :

It acts as a glue between front-end and backend (or source and machine codes).
If the compiler directly translates source code into the machine code without generating intermediate code then a full native compiler is required for each new machine.
The intermediate code keeps the analysis portion same for all the compilers that's why it doesn't need a full compiler for every unique machine.
Intermediate code generator receives input from its predecessor phase and semantic analyzer phase. It takes input in the form of an annotated syntax tree.
Using the intermediate code, the second phase of the compiler synthesis phase is changed according to the target machine.
Intermediate code generator lowers abstraction from source level.
Complete some syntactic checks, perform more semantic checks. For e.g. break should be inside loop or switch only.

The intermediate code representation are

Graphical representation e.g. Abstract Syntax Tree(AST) , DAGS
Postfix notations
Three Address codes

AI Generated Answer

9. Discuss the issues in design of simple code generator.

6 marks

Details

Official Answer

The issues to code generator design includes:

1. Input to the code generator: The input to the code generator is intermediate representation together with the information in the symbol table. Intermediate representation has the several choices:

Postfix notation,
Syntax tree or DAG,
Three address code

The code generation phase needs complete error-free intermediate code as an input requires.

2. Target Program: The target program is the output of the code generator. The output can be:

Assembly language: It allows subprogram to be separately compiled.
Relocatable machine language: It makes the process of code generation easier.
Absolute machine language: It can be placed in a fixed location in memory and can be executed immediately.

3. Target Machine: Implementing code generation requires thorough understanding of the target machine architecture and its instruction set.

During Register allocation, we select only those set of variables that will reside in the registers at each point in the program.
During a subsequent Register assignment phase, the specific register is picked to access the variable.

6. Choice of Evaluation order: The efficiency of the target code can be affected by the order in which the computations are performed.

AI Generated Answer

9. Define three address codes. Write three address code for

a = -b*(c+d)

6 marks

Details

Official Answer

A quadruple (Three address code) is of the form: x = y op z where x, y and z are names, constants or compiler-generated temporaries and op is any operator.

Followings are the 3-address code statements for different statements.

Assignment statement:

x = y op z ( binary operator op)

x = op y ( unary operator op)

Copy statement: x = z
Unconditional jump: goto L ( jump to label L)
Conditional jump: if x relop y goto L
Procedure call:

param x1

param x2

. .

param xn

call p, n i.e. call procedure P(x1,x2,….. xn)

Indexed assignment:

x = y[i]

x[i] = y

Address and pointer assignments:

x = &y

x = *y

*x = y

Given expression

a = -b*(c+d)

The three address code of given expression is given below:

t₁ = -b

t₂ = c + d

t₃ = t₁ * t₂

a = t₃

AI Generated Answer

9. What is operation of simple code generator? Explain.

6 marks

Details

Official Answer

The final phase in compiler model is the code generator. It takes as input an intermediate representation of the source program and produces as output an equivalent target program.

Fig: Position of code generator

A simple code generator generates target code for a sequence of three- address statements and effectively uses registers to store operands of the statements. For example: consider the three-address statement a := b+c It can have the following sequence of codes:

ADD R_j, R_i Cost = 1 // if R_i contains b and R_j contains c

(or)

ADD c, R_i Cost = 2 // if c is in a memory location

(or)

MOV c, R_j Cost = 3 // move c from memory to R_jand add

ADD R_j, R_i

AI Generated Answer

9. Discuss about different factors affecting the process of target code generation.

6 marks

Details

Official Answer

The different factors affecting target code generation includes:

1. Input to the code generator: The input to the code generator is intermediate representation together with the information in the symbol table. Intermediate representation has the several choices:

Postfix notation,
Syntax tree or DAG,
Three address code

The code generation phase needs complete error-free intermediate code as an input requires.

2. Target Program: The target program is the output of the code generator. The output can be:

Assembly language: It allows subprogram to be separately compiled.
Relocatable machine language: It makes the process of code generation easier.
Absolute machine language: It can be placed in a fixed location in memory and can be executed immediately.

3. Target Machine: Implementing code generation requires thorough understanding of the target machine architecture and its instruction set.

During Register allocation, we select only those set of variables that will reside in the registers at each point in the program.
During a subsequent Register assignment phase, the specific register is picked to access the variable.

6. Choice of Evaluation order: The efficiency of the target code can be affected by the order in which the computations are performed.

AI Generated Answer

9. What do you mean by three-address code representation? Explain with example.

6 marks

Details

Official Answer

Three address code is a type of intermediate code which is easy to generate and can be easily converted to machine code.It makes use of at most three addresses and one operator to represent an expression and the value computed at each instruction is stored in temporary variable generated by compiler.

There are 3 representations of three address code:

1. Quadruples:
It is structure with consist of 4 fields namely op, arg1, arg2 and result. op denotes the operator and arg1 and arg2 denotes the two operands and result is used to store the result of the expression. The three address statements x = y op z is represented by placing y in arg1, z in arg2 and x in result. For e.g. for statement a = b*-c +b*-c , the quadruples is:

2. Triples:

This representation doesn’t make use of extra temporary variable to represent a single operation instead when a reference to another triple’s value is needed, a pointer to that triple is used. So, it consist of only three fields namely op, arg1 and arg2. For e.g. for statement a = b*-c +b*-c , the triples is:

3. Indirect triples:
This representation makes use of pointer to the listing of all references to computations which is made separately and stored. It involves in listing of pointers to triples rather than listing of triples themselves. For e.g. the three address indirect triples for same statement above is

AI Generated Answer

9. Explain the peephole optimization. Write a three address code for the expression r:= 7*3+9.

6 marks

Details

Official Answer

Peephole optimization is a simple and effective technique for locally improving target code. This technique is applied to improve the performance of the target program by examining the short sequence of target instructions (called the peephole) and replace these instructions replacing by shorter or faster sequence whenever possible. Peephole is a small, moving window on the target program.

Peephole Optimization Techniques are:

1. Redundant load and store elimination: In this technique the redundancy is eliminated. For e.g.

Initial code:

y = x + 5;

i = y;

z = i;

w = z * 3;

Optimized code:

y = x + 5;

i = y;

w = y * 3;

2. Constant folding: The code that can be simplified by user itself, is simplified. For e.g.

Initial code:

x = 2 * 3;

Optimized code:

x = 6;

3. Strength Reduction: The operators that consume higher execution time are replaced by the operators consuming less execution time. For e.g.

Initial code:

y = x * 2;

Optimized code:

y = x + x;

4. Algebraic Simplification: Peephole optimization is an effective technique for algebraic simplification. The statements such as x = x + 0 or x = x * 1 can eliminated by peephole optimization.

5. Machine idioms: The target instructions have equivalent machine instructions for performing some operations. Hence we can replace these target instructions by equivalent machine instructions in order to improve the efficiency. For e.g., some machine have auto-increment or auto-decrement addressing modes that are used to perform add or subtract operations.

Given expression

r:= 7*3+9

Three address code for given expression:

t1 = 7 * 3

t2 = t1 + 9

r = t2

AI Generated Answer

9. What are the advantages of intermediate code? Describe various representation of intermediate code.

6 marks

Details

Official Answer

The given program in a source language is converted into an equivalent program in an intermediate language by the intermediate code generator. Intermediate codes are machine independent codes.

Advantages of intermediate code:

If a compiler translates the source language to its target machine language without having the option for generating intermediate code, then for each new machine, a full native compiler is required.
Intermediate code eliminates the need of a new full compiler for every unique machine by keeping the analysis portion same for all the compilers.
It becomes easier to apply the source code modifications to improve code performance by applying code optimization techniques on the intermediate code.

There are three kinds of intermediate code representations:

Graphical representation (Syntax tree or DAGs)
Postfix notations
Three Address codes

Syntax Tree

Directed Acyclic Graph (DAG)

Postfix Notation

Three Address Code

A quadruple (Three address code) is of the form: x = y op z where x, y and z are names, constants or compiler-generated temporaries and op is any operator.

For e.g.

The three-address code for a+b*c-d/(b*c)

AI Generated Answer

9. Explain the principle sources of code optimization with example.

6 marks

Details

Official Answer

A transformation of a program is called local if it can be performed by looking only at the statements in a basic block; otherwise, it is called global. Many transformations can be performed at both the local and global levels. Local transformations are usually performed first.

There are a number of ways in which a compiler can improve a program without changing the function its computes.

Principal sources of code optimization are:

2. Copy Propagation: Assignments of the form f : = g called copy statements, or copies for short. The idea behind the copy-propagation transformation is to use g for f, whenever possible after the copy statement f: = g. Copy propagation means use of one variable instead of another. This may not appear to be an improvement, but as we shall see it gives us an opportunity to eliminate x. For e.g.

x = Pi;

A=x*r*r;

The optimization using copy propagation can be done as follows: A=Pi*r*r;

Here the variable x is eliminated.

3. Dead Code Elimination: The dead code may be a variable or the result of some expression computed by the programmer that may not have any further uses. By eliminating these useless things from a code, the code will get optimized. For e.g.

4. Constant folding: Deducing at compile time that the value of an expression is a constant and using the constant instead is known as constant folding. The code that can be simplified by user itself, is simplified. For e.g.

Initial code:

x = 2 * 3;

Optimized code:

x = 6;

5. Loop Optimizations: In loops, especially in the inner loops, programs tend to spend the bulk of their time. The running time of a program may be improved if the number of instructions in an inner loop is decreased, even if we increase the amount of code outside that loop. Some loop optimization techniques are:

i) Frequency Reduction (Code Motion): In frequency reduction, the amount of code in loop is decreased. A statement or expression, which can be moved outside the loop body without affecting the semantics of the program, is moved outside the loop. For e.g.

Before optimization:

while(i<100)

{

a = Sin(x)/Cos(x) + i;

i++;

}

After optimization:

t = Sin(x)/Cos(x);

while(i<100)

{

a = t + i;

i++;

}

ii) Induction-variable elimination, which we apply to replace variables from inner loop.

iii) Reduction in Strength: The strength of certain operators is higher than other operators. For example, strength of * is higher than +. Usually, compiler takes more time for higher strength operators and execution speed is less. Replacement of higher strength operator by lower strength operator is called a strength reduction technique

Optimization can be done by applying strength reduction technique where higher strength can be replaced by lower strength operators. For e.g.

Before optimization:

for (i=1;i<=10;i++)

{

sum = i * 7;

printf(“%d”, sum);

}

After optimization:

temp = 7;

for(i=1;i<=10;i++)

{

temp = temp + 7;

sum = temp;

printf(“%d”, sum)

}

AI Generated Answer

10. Discuss about the different factors affecting target code generation.

6 marks

Details

Official Answer

The different factors affecting target code generation includes:

1. Input to the code generator: The input to the code generator is intermediate representation together with the information in the symbol table. Intermediate representation has the several choices:

Postfix notation,
Syntax tree or DAG,
Three address code

The code generation phase needs complete error-free intermediate code as an input requires.

2. Target Program: The target program is the output of the code generator. The output can be:

Assembly language: It allows subprogram to be separately compiled.
Relocatable machine language: It makes the process of code generation easier.
Absolute machine language: It can be placed in a fixed location in memory and can be executed immediately.

3. Target Machine: Implementing code generation requires thorough understanding of the target machine architecture and its instruction set.

During Register allocation, we select only those set of variables that will reside in the registers at each point in the program.
During a subsequent Register assignment phase, the specific register is picked to access the variable.

6. Choice of Evaluation order: The efficiency of the target code can be affected by the order in which the computations are performed.

AI Generated Answer

10. How next-use information is useful in code-generation? Explain the steps involved on computing next-use information.

6 marks

Details

Official Answer

Next-use information is needed for dead-code elimination and register allocation. Next-use is computed by a backward scan of a basic block. The next-use information will indicate the statement number at which a particular variable that is defined in the current position will be reused.

The following are the steps involved in generating next-use information:

Input: Basic block B of three-address statements

Output: At each statement i: x= y op z, we attach to i the liveliness and next-uses of x, y and z.

Method: We start at the last statement of B and scan backwards.

1. Attach to statement i the information currently found in the symbol table regarding the next-use and liveliness of x, y and z.

2. In the symbol table, set x to “not live” and “no next use”.

3. In the symbol table, set y and z to "live", and next-uses of y and Z to i.

Example live and nextuse() computation

i: a := b + c

j: t := a + b

Statement number

Instruction

Liveness and nextuse initialized

Modified liveness and nextuse after statement ‘j’

Modified liveness and nextuse after statement ‘i’

Comments

a := b + c

live(a) = true, live(b) = true, live(t) = true, nextuse(a)=none,

nextuse(b)=none,

nextuse(t)=none

live(a) = true, nextuse(a) = j, live(b) = true, nextuse(b) = j, live(t) = false, nextuse(t)=none

live(a) = false, nextuse(a)=none, live(b)=true, nextuse(b) = i, live(c) = true, nextuse(c) = i, live(t) = false, nextuse(t)=none

‘a’ is said to false and no nextuse. ‘b’ and ‘c’ are said to live at statement ‘i’ and their next use is at ‘i’ . ‘t’’sliveness and nextuse remains as it is computed in statement ‘j’

t := a + b

live(a) = true, live(b) = true, live(t) = true, nextuse(a)=none,

nextuse(b)=none,

nextuse(t =none

live(a) = true, nextuse(a) = j, live(b) = true, nextuse(b) = j, live(t) = false, nextuse(t)=none

‘a’ and ‘b’ are used here and hence their nextuse is at ‘j’ and they are said to be live. ‘t’ is computed and hence nextuse is none and live information is false

AI Generated Answer

10. Define the following optimization techniques:

a. Unreachable code elimination

b. Flow-of-control optimization

6 marks

Details

Official Answer

a. Unreachable code elimination

The unreachable code is a part of the source code that will never be executed due to inappropriate exit points/control flow.. By eliminating these useless things from a code, the code will get optimized. For e.g.

b. Flow-of-control optimization

An unlabeled instruction immediately following an unconditional jump can be removed. For e.g.

Using peephole optimization unnecessary jumps can eliminate in either the intermediate code or the target code.

Goto L1

. . . . . . .

L1: goto L2

By the sequence,

Goto L2

. . . . . . .

L1: goto L2

If there are no jumps to L1 then it may be possible to eliminate the statement L1: goto L2 provided it preceded by an unconditional jump.

AI Generated Answer

10. Why optimization is often required in the code generated by simple code generator? Explain the unreachable code optimization.

6 marks

Details

Official Answer

Optimization is required in the code generated by simple code generator due to the following reasons:

Code optimization enhances the portability of the compiler to the target processor.
Code optimization allows consumption of fewer resources (i.e. CPU, Memory).
Optimized code has faster execution speed.
Optimized code utilizes the memory efficiently.
Optimized code gives better performance.
Optimized code often promotes re-usability.

Unreachable code optimization

An unlabeled instruction immediately following an unconditional jump can be removed. For e.g.

AI Generated Answer

10. What is three address code? Translate the expression a = b * - c + b * - c in to three address code statement.

6 marks

Details

Official Answer

A quadruple (Three address code) is of the form: x = y op z where x, y and z are names, constants or compiler-generated temporaries and op is any operator.

Followings are the 3-address code statements for different statements.

Assignment statement:

x = y op z ( binary operator op)

x = op y ( unary operator op)

Copy statement: x = z
Unconditional jump: goto L ( jump to label L)
Conditional jump: if x relop y goto L
Procedure call:

param x1

param x2

. .

param xn

call p, n i.e. call procedure P(x1,x2,….. xn)

Indexed assignment:

x = y[i]

x[i] = y

Address and pointer assignments:

x = &y

x = *y

*x = y

Given expression

a = b * - c + b * - c

The three address code of given statement is given below:

t₁ = -c

t₂ = b * t₁

t₃ = -c

t₄ = b * t₃

t₅ = t₂ + t₄

a = t₅

AI Generated Answer

11. Why is it necessary to optimize code? Explain any two code optimization techniques with example.

5 marks

Details

Official Answer

Optimization is necessary in the code generated by simple code generator due to the following reasons:

Code optimization enhances the portability of the compiler to the target processor.
Code optimization allows consumption of fewer resources (i.e. CPU, Memory).
Optimized code has faster execution speed.
Optimized code utilizes the memory efficiently.
Optimized code gives better performance.
Optimized code often promotes re-usability.

Code optimization techniques are:

AI Generated Answer

12. Explain about the factors affecting code generation.

5 marks

Details

Official Answer

The different factors affecting target code generation includes:

1. Input to the code generator: The input to the code generator is intermediate representation together with the information in the symbol table. Intermediate representation has the several choices:

Postfix notation,
Syntax tree or DAG,
Three address code

The code generation phase needs complete error-free intermediate code as an input requires.

2. Target Program: The target program is the output of the code generator. The output can be:

Assembly language: It allows subprogram to be separately compiled.
Relocatable machine language: It makes the process of code generation easier.
Absolute machine language: It can be placed in a fixed location in memory and can be executed immediately.

3. Target Machine: Implementing code generation requires thorough understanding of the target machine architecture and its instruction set.

During Register allocation, we select only those set of variables that will reside in the registers at each point in the program.
During a subsequent Register assignment phase, the specific register is picked to access the variable.

6. Choice of Evaluation order: The efficiency of the target code can be affected by the order in which the computations are performed.

AI Generated Answer

Course Content

Other Courses

Compiler Design and Construction - Unit Wise Questions

Representing valid tokens of a language in regular expression:

Representing occurrence of symbols using regular expressions:

letter = [a – z] or [A – Z]
digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0-9]
sign = [ + | - ]

Representing language tokens using regular expressions:

Decimal = (sign)?(digit)+
Identifier = (letter)(letter | digit)*

1. Lookahead:

Lookahead is required to decide when one token will end and the next token will begin. The simple example which has lookahead issues are i vs. if, = vs. ==. Therefore a way to describe the lexemes of each token is required.

2. Ambiguities:

Trivhuwan University

Pokhara University

Course Content

Other Courses

Compiler Design and Construction - Unit Wise Questions

Representing valid tokens of a language in regular expression:

Representing occurrence of symbols using regular expressions:

letter = [a – z] or [A – Z]digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0-9]sign = [ + | - ]

Representing language tokens using regular expressions:

Decimal = (sign)?(digit)+Identifier = (letter)(letter | digit)*

1. Lookahead:

Lookahead is required to decide when one token will end and the next token will begin. The simple example which has lookahead issues are i vs. if, = vs. ==. Therefore a way to describe the lexemes of each token is required.

2. Ambiguities:

letter = [a – z] or [A – Z]
digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0-9]
sign = [ + | - ]

Decimal = (sign)?(digit)+
Identifier = (letter)(letter | digit)*