Compiler Design and Construction - Question Paper Model Question

Section A

Official Answer

AI Generated Answer

Attempt any TWO questions. (2 × 10 = 20)

Official Answer

AI Generated Answer

1. Differentiate between top-down and bottom-up parsing methods. Construct SLR parse table for the following grammar.

10 marks

Details

Official Answer

Difference between top-down and bottom-up parsing methods

Top-down Parsing	Bottom-up Parsing
It is a parsing strategy that first looks at the highest level of the parse tree and works down the parse tree by using the rules of grammar.	It is a parsing strategy that first looks at the lowest level of the parse tree and works up the parse tree by using the rules of grammar.
Top-down parsing attempts to find the left most derivations for an input string.	Bottom-up parsing can be defined as an attempt to reduce the input string to start symbol of a grammar.
In this parsing technique we start parsing from top (start symbol of parse tree) to down (the leaf node of parse tree) in top-down manner.	In this parsing technique we start parsing from bottom (leaf node of parse tree) to up (the start symbol of parse tree) in bottom-up manner.
This parsing technique uses Left Most Derivation.	This parsing technique uses Right Most Derivation.
Its main decision is to select what production rule to use in order to construct the string.	Its main decision is to select when to use a production rule to reduce the string to get the starting symbol.
A top-down parser can be easily structured and formed.	It is difficult to produce a bottom-up parser
It is less powerful.	It is more powerful.
Error detection is easy.	Error detection is difficult.

Now,

Given grammar:

1. S → aETe

2. E → Ebc

3. E → b

4. T → d

The augmented grammar of given grammar is,

S‘ → S

S → aETe

E → Ebc

E → b

T → d

Next, we obtain the canonical collection of sets of LR(0) items, as follows,

I₀ = closure({S'→•S}) = {S'→•S, S→ •aETe }

goto(I₀, S) = {S'→S•}= I₁

goto(I₀, a) = closure({S→ a•ETe}) = {S→ a•ETe, E → •Ebc, E → •b}= I₂

goto(I₂, E) = closure({S→ aE•Te, E → E•bc}) = {S→ aE•Te, E → E•bc, T → •d}= I₃

goto(I₂, b) = closure({E → b•}) = {E → b•} = I₄

goto(I₃ , d) = closure({T → d•}) = I₅

goto(I₃ , b) = closure({E → Eb•c}) = {E → Eb•c}= I₆

goto(I₃ , T) = closure({S→ aET•e}) = {S→ aET•e} = I₇

goto( I₆ , c) = closure({E → Ebc•}) = {E → Ebc•}= I₈

goto(I₇ , e) = closure({S→ aETe•}) = {S→ aETe•}= I₉

FOLLOW set for non-terminals are,

FOLLOW(S) = {$}

FOLLOW(E) = {b, d}

FOLLOW(T) = {e}

Now, the SLR parsing table is,

States/ Terminals	Action Table						Goto Table
States/ Terminals	a	b	c	d	e	$	S	E	T
0	S₂						1
1						Accept
2		S₄						3
3		S₆		S₅					7
4		R₃		R₃
5					R₄
6			S₈
7					S₉
8		R₂		R₂
9						R₁

AI Generated Answer

2. What are static and dynamic type checking? Write SDD to carry out type checking for the following expression.

E->id |E1 op E2 | E1 relop E2 | E1[E2] | E1↑

10 marks

Details

Official Answer

In static type checking, the type of variable is known at compile time (it checks the type of variable before running). Typical examples of static checking are:

• Type checks: Report an error if an operator is applied to an incompatible operand.

• Flow-of-control checks: Statements that results in a branch need to be terminated correctly.

• Uniqueness checks: There are situations where an object must be defined exactly once.

• Name-related checks: Sometimes the same names may be appeared two or more times.

In dynamic type checking, the type of variable is known at runtime (it checks the type of variable while executing). Compiler generates verification code to enforce programming language's dynamic semantics. If a programming language has no any dynamic checking, it is called strongly typed language i.e. there will be no type errors during run-time.

Now,

SDD to carry out type checking for the given expression is given below:

E → id { E.type = lookup(id.entry) }

E → E1 op E2 { if (E1.type = integer and E2.type = integer) then E.type = integer

else if (E1.type = integer and E2.type = real) then E.type = real

else if (E1.type = real and E2.type = integer) then E.type = real

else if (E1.type = real and E2.type = real) then E.type = real

else E.type = type-error }

E → E1 relop E2 { if E1.type = boolean and E2.type = boolean then E.type = boolean

else E.type = error}

E → E1 [E2] { if (E2.type = int and E1.type = array(s, t)) then E.type = t

else E.type = type-error }

E →E1 ↑ { if E1.type = pointer(t) then E.type = t

else type-error}

AI Generated Answer

3. What is the role of intermediate code generation in the entire compilation process? Convert the following into three address code.

a+(b-c)*d

10 marks

Details

Official Answer

Intermediate code is used to translate the source code into the machine code. Intermediate code lies between the high-level language and the machine language. The given program in a source language is converted into an equivalent program in an intermediate language by the intermediate code generator. Intermediate codes are machine independent codes.

Compiler - Intermediate Code Generation - Tutorialspoint

Roles of Intermediate code are :

It acts as a glue between front-end and backend (or source and machine codes).
If the compiler directly translates source code into the machine code without generating intermediate code then a full native compiler is required for each new machine.
The intermediate code keeps the analysis portion same for all the compilers that's why it doesn't need a full compiler for every unique machine.
Intermediate code generator receives input from its predecessor phase and semantic analyzer phase. It takes input in the form of an annotated syntax tree.
Using the intermediate code, the second phase of the compiler synthesis phase is changed according to the target machine.
Intermediate code generator lowers abstraction from source level.
Complete some syntactic checks, perform more semantic checks. For e.g. break should be inside loop or switch only.

The intermediate code representation are

Graphical representation e.g. Abstract Syntax Tree(AST) , DAGS
Postfix notations
Three Address codes

Given expression:

a+(b-c)*d

Converting it into three address code:

t₁ = b-c

t₂ = t₁*d

t₃ = a+t₂

For given expression the quadruples is:

	op	arg1	arg2	result
(0)	-	b	c	t1
(1)	*	t1	d	t2
(2)	+	a	t2	t3

For given expression, the triples is:

	op	arg1	arg2
(0)	-	b	c
(1)	*	(0)	d
(2)	+	a	(1)

AI Generated Answer

Section B

Official Answer

AI Generated Answer

Attempt any EIGHT questions. (8 × 5 = 40)

Official Answer

AI Generated Answer

4. Define compiler. Explain analysis phase of compiler.

5 marks

Details

Official Answer

A compiler is a program that takes a program written in a source language and translates it into an equivalent program in a target language. As an important part of a compiler is error showing to the programmer.

Analysis phase breaks up the source program into constituent pieces and creates an intermediate representation of the source program. Analysis of source program includes: lexical analysis, syntax analysis and semantic analysis.

1. Lexical Analysis:

In this phase, lexical analyzer reads the source program and returns the tokens of the source program. Token is a sequence of characters that can be treated as a single logical entity (such as identifier, operators, keywords, constants etc.).

2. Syntax Analysis:

In this phase, the syntax analyzer takes the token produced by lexical analyzer as input and generates a parse tree as output. In syntax analysis phase, the parser checks that the expression made by the token is syntactically correct or not, according to the rules that define the syntax of the source language.

3. Semantic Analysis:

In this phase, the semantic analyzer checks the source program for semantic errors and collects the type information for the code generation. Semantic analyzer checks whether they form a sensible set of instructions in the programming language or not. Type-checking is an important part of semantic analyzer.

Example of Analysis phase:

AI Generated Answer

5. “Symbol table is a necessary component of compiler”, justify this statement with examples.

5 marks

Details

Official Answer

Symbol tables are data structures that are used by compilers to hold information about source-program constructs. The information is collected incrementally by the analysis phase of a compiler and used by the synthesis phases to generate the target code. Entries in the symbol table contain information about an identifier such as its type, its position in storage, and any other relevant information.

Symbol tables typically need to support multiple declarations of the same identifier within a program. The lexical analyzer can create a symbol table entry and can return token to the parser, say id, along with a pointer to the lexeme. Then the parser can decide whether to use a previously created symbol table or create new one for the identifier.

The basic operations defined on a symbol table include

allocate – to allocate a new empty symbol table
free – to remove all entries and free the storage of a symbol table
insert – to insert a name in a symbol table and return a pointer to its entry
lookup – to search for a name and return a pointer to its entry
set_attribute – to associate an attribute with a given entry
get_attribute – to get an attribute associated with a given entry

AI Generated Answer

6. Given a regular expression (ε + 0)*10. Construct the DFA recognizing the pattern described by this regular expression using syntax tree based reduction.

5 marks

Details

Official Answer

Given RE;

(ε + 0)*10

The augmented RE of the given RE is:

(ε + 0)*10 #

Syntax tree for augmented R.E. is

Calculating followpos:

followpos(1) = {1, 2}

followpos(2) = {3}

followpos(3) = {4}

Now,

1) Start state of DFA = firstpos(root) = {1, 2} = S0

Use followpos of symbol representing position in RE to obtain the next state of DFA.

2) From S0 = {1, 2}

1 represents '0'

2 represents '1'

followpos(1) = {1, 2} = S0

δ(s0, 0) = S0

followpos(2) = {3} = S1

δ(S0, 1) = S1

3) From S1 = {3}

Here 3 represents '0'

followpos(3) = {4} = S2

δ(S1, 0) = S2

4) From S2 = {4}

Here 4 represents '#'

Now there was no new states occur.

Therefore, Final state = {S2}

Now the resulting DFA of given regular expression is:

AI Generated Answer

7. Define the terms token, pattern and lexeme. How input buffer can be used for scanner. Explain.

5 marks

Details

Official Answer

Token

Token is a sequence of characters that can be treated as a single logical entity. For example, Identifiers, Keywords, Operators, Constants etc.

Pattern

A set of string in the input for which the same token is produced as output. This set of string is described by a rule called a pattern associated with the token. For example, “a letter followed by zero or more letters, digits, or underscores.”

Lexeme

A lexeme is a sequence of characters in the source program that is matched by the pattern for a token.

Input Buffering

Lexical analysis needs to look ahead several characters before a match can be announced. We have two buffer input scheme that is useful when look ahead is necessary.

Buffer Pair
Sentinels

Buffer Pair (2N Buffering):

In this technique buffer is divided into two halves with N-characters each, where N is number of characters on the disk block like 1024 and 4096. Rather than reading characters by characters from file we read N input character at once. If there are fewer than N character in input EOF marker is placed. There are two pointers: lexeme pointer and forward pointer.

Lexeme pointer points to the start of the lexeme while forward pointer scans the input buffer for lexeme. When forward pointer reaches the end of one half, second half is loaded and forwarded pointer points to the beginning of the next half.

Sentinels:

While using buffer pair technique, we have to check each time fwd is moved that it doesn't crosses the buffer half and when it reaches end of buffer, the other one needs to be loaded. We can solve this problem by introducing a sentinel character at the end of both halves of buffer. The sentinel can be any character which is not a part of source program. EOF is usually preferred as it will also indicate end of source program. It signals the need for some special action (fill other buffer-half, or terminate processing).

AI Generated Answer

8. Find first and follow of all the non terminals in the following grammar.

E ➜ TA ; A➜+TA|ɛ ; T➜FB ; B➜*FB | ɛ ; F➜(E) | id

5 marks

Details

Official Answer

Given grammar;

Now, the FIRST and FOLLOW for the non-terminals are:

Non-terminals	FIRST()	FOLLOW()
E	{(, id}	{), $}
A	{+, ∈}	{), $}
T	{(, id}	{), +, $}
B	{*, ∈}	{), +, $}
F	{(, id}	{*, +, ), $}

AI Generated Answer

9. What is Syntax Directed Definition? Define synthesized and inherited attributes with example.

5 marks

Details

Official Answer

A syntax-directed definition (SDD) is a context-free grammar together with attributes and rules. Attributes are associated with grammar symbols and rules are associated with productions. For example,

Production	Semantic Rules
E → E1 + T	E.val = E1.val+T.val
E → T	E.val = T.val

Synthesized Attributes

The attribute of node that are derived from its children nodes are called synthesized attributes. Assume the following production:

S → ABC

If S is taking values from its child nodes (A, B, C), then it is said to be a synthesized attribute, as the values of ABC are synthesized to S.

Example for synthesized attribute:

Production	Semantic Rules
E → E1 * E2	{E.val = E1.val * E2.val}
E → E1 + E2	{E.val = E1.val +E2.val}
E → int	{E.val = int.val}

The Syntax Directed Definition associates to each non terminal a synthesized attribute called val.

In synthesized attributes, value owns from child to parent in the parse tree. For example, for string 3*2+5

Inherited Attributes

The attributes of node that are derived from its parent or sibling nodes are called inherited attributes. If the value of the attribute depends upon its parent or siblings then it is inherited attribute. As in the following production,

S → ABC

"A‟ can get values from S, B and C. B can take values from S, A, and C. Likewise, C can take values from S, A, and B.

Example for Inherited Attributes:

Production	Semantic Rules
D ® TL	L.in = T.type
T ® int	T.type = integer
T ® real	T.type = real
L ® L1, id	L1.in = L.in; addtype(id.entry, L.in)
L ® id	addtype(id.entry, L.in)

Symbol T is associated with a synthesized attribute type and symbol L is associated with an inherited attribute in.

In inherited attributes, values flows into a node in the parse tree from parents and/or siblings. For example, for input: int id, id

AI Generated Answer

10. What do you mean by runtime storage allocation? Differentiate static and dynamic allocation.

5 marks

Details

Official Answer

Runtime storage allocation

■ Compiler allocates space only for global variables at compile time

■ Space for variables of procedures will be allocated at run-time

Stack/heap allocation
Ex: C, C++, Java, Fortran 8/9
Variable access is slow (compared to static allocation) since addresses are accessed through the stack/heap pointer
Recursion can be implemented

Difference between static and dynamic allocation:

Static allocation: Compiler makes the decision regarding storage allocation by looking only at the program text

Dynamic allocation: Storage allocation decisions are made only while the program is running.

AI Generated Answer

11. Why is it necessary to optimize code? Explain any two code optimization techniques with example.

5 marks

Details

Official Answer

Code optimization is used to improve the intermediate code so that the output of the program could run faster and takes less space. It removes the unnecessary lines of the code and arranges the sequence of statements in order to speed up the program execution without wasting resources. It tries to improve the code by making it consume less resources (i.e. CPU, Memory) and deliver high speed.

Optimization is necessary in the code generated by simple code generator due to the following reasons:

Code optimization enhances the portability of the compiler to the target processor.
Code optimization allows consumption of fewer resources (i.e. CPU, Memory).
Optimized code has faster execution speed.
Optimized code utilizes the memory efficiently.
Optimized code gives better performance.
Optimized code often promotes re-usability.

Code optimization techniques are:

1. Common-Subexpression Elimination: In the common sub-expression, we don't need to be computed it over and over again. Instead of this we can compute it once and kept in store from where it's referenced when encountered again. For e.g.

2. Dead Code Elimination: The dead code may be a variable or the result of some expression computed by the programmer that may not have any further uses. By eliminating these useless things from a code, the code will get optimized. For e.g.

AI Generated Answer

12. Explain about the factors affecting code generation.

5 marks

Details

Official Answer

The final phase in compiler model is the code generator. It takes as input an intermediate representation of the source program and produces as output an equivalent target program. Designing of code generator should be done in such a way so that it can be easily implemented, tested and maintained.

The different factors affecting target code generation includes:

1. Input to the code generator: The input to the code generator is intermediate representation together with the information in the symbol table. Intermediate representation has the several choices:

Postfix notation,
Syntax tree or DAG,
Three address code

The code generation phase needs complete error-free intermediate code as an input requires.

2. Target Program: The target program is the output of the code generator. The output can be:

Assembly language: It allows subprogram to be separately compiled.
Relocatable machine language: It makes the process of code generation easier.
Absolute machine language: It can be placed in a fixed location in memory and can be executed immediately.

3. Target Machine: Implementing code generation requires thorough understanding of the target machine architecture and its instruction set.

4. Instruction Selection: Efficient and low cost instruction selection is important to obtain efficient code. When we consider the efficiency of target machine then the instruction speed and machine idioms are important factors. The quality of the generated code can be determined by its speed and size.

5. Register Allocation: Proper utilization of registers improve code efficiency. Use of registers make the computations faster in comparison to that of memory, so efficient utilization of registers is important. The use of registers are subdivided into two subproblems:

During Register allocation, we select only those set of variables that will reside in the registers at each point in the program.
During a subsequent Register assignment phase, the specific register is picked to access the variable.

6. Choice of Evaluation order: The efficiency of the target code can be affected by the order in which the computations are performed.

AI Generated Answer

Course Content

Other Courses

Compiler Design and Construction Model Question

Trivhuwan University

Pokhara University

Course Content

Other Courses

Compiler Design and Construction Model Question