Compiler Design and Construction - Question Paper 2069

Attempt all questions. (10x6=60)

Official Answer

AI Generated Answer

1. Explain the phase of a compiler with block diagram.

6 marks

Official Answer

A compiler operates in phases. A phase is a logically interrelated operation that takes source program in one representation and produces output in another representation. The phases of a compiler are shown in below:

1. Analysis phase breaks up the source program into constituent pieces and creates an intermediate representation of the source program. Analysis of source program includes: lexical analysis, syntax analysis and semantic analysis.

2. Synthesis phase construct the desired target program from the intermediate representation. The synthesis part of compiler consists of the following phases: Intermediate code generation, Code optimization and Target code generation.

COMPILER DESIGN: Phases of a compiler

1. Lexical Analysis:

In this phase, lexical analyzer reads the source program and returns the tokens of the source program. Token is a sequence of characters that can be treated as a single logical entity (such as identifier, operators, keywords, constants etc.).

Example:

Input String: c = a + b * 3

Tokens: id1 = id2 + id3 * 3

2. Syntax Analysis:

In this phase, the syntax analyzer takes the token produced by lexical analyzer as input and generates a parse tree as output. In syntax analysis phase, the parser checks that the expression made by the token is syntactically correct or not, according to the rules that define the syntax of the source language.

Example:

3. Semantic Analysis:

In this phase, the semantic analyzer checks the source program for semantic errors and collects the type information for the code generation. Semantic analyzer checks whether they form a sensible set of instructions in the programming language or not. Type-checking is an important part of semantic analyzer.

Example:

4. Intermediate Code Generation:

If the program syntactically and semantically correct then intermediate code generator generates a simple machine independent intermediate language. The intermediate code should be generated in such a way that it can easily translated into the target machine code.

Example:

t1 = 3.0;

t2 = id3 * t1;

t3 = id2 + t2;

id1 = t3;

5. Code Optimization:

It is used to improve the intermediate code so that the output of the program could run faster and takes less space. It removes the unnecessary lines of the code and arranges the sequence of statements in order to speed up the program execution without wasting resources.

Example:

t2 = id3 * 3.0;

id1 = id2 + t2;

6. Code Generation:

Code generation is the final stage of the compilation process. It takes the optimized intermediate code as input and maps it to the target machine language.

Example:

MOV R1, id3

MUL R1, #3.0

MOV R2, id2

ADD R1, R2

MOV id1, R1

Symbol Table

Symbol tables are data structures that are used by compilers to hold information about source-program constructs. The information is collected incrementally by the analysis phase of compiler and used by the synthesis phases to generate the target code. Entries in the symbol table contain information about an identifier such as its type, its position in storage, and any other relevant information.

Error Handling

Whenever an error is encountered during the compilation of the source program, an error handler is invoked. Error handler generates a suitable error reporting message regarding the error encountered.

AI Generated Answer

2. Define token, pattern and lexeme with suitable example. How input buffering can be implemented for scanner, Explain.

6 marks

Details

Official Answer

Token

Token is a sequence of characters that can be treated as a single logical entity. For example, Identifiers, Keywords, Operators, Constants etc.

Pattern

A set of string in the input for which the same token is produced as output. This set of string is described by a rule called a pattern associated with the token. For example, “a letter followed by zero or more letters, digits, or underscores.”

Lexeme

A lexeme is a sequence of characters in the source program that is matched by the pattern for a token.

Input Buffering

Lexical analysis needs to look ahead several characters before a match can be announced. We have two buffer input scheme that is useful when look ahead is necessary.

Buffer Pair
Sentinels

Buffer Pair (2N Buffering):

In this technique buffer is divided into two halves with N-characters each, where N is number of characters on the disk block like 1024 and 4096. Rather than reading characters by characters from file we read N input character at once. If there are fewer than N character in input EOF marker is placed. There are two pointers: lexeme pointer and forward pointer.

Lexeme pointer points to the start of the lexeme while forward pointer scans the input buffer for lexeme. When forward pointer reaches the end of one half, second half is loaded and forwarded pointer points to the beginning of the next half.

Sentinels:

While using buffer pair technique, we have to check each time fwd is moved that it doesn't crosses the buffer half and when it reaches end of buffer, the other one needs to be loaded. We can solve this problem by introducing a sentinel character at the end of both halves of buffer. The sentinel can be any character which is not a part of source program. EOF is usually preferred as it will also indicate end of source program. It signals the need for some special action (fill other buffer-half, or terminate processing).

AI Generated Answer

3. Given the regular expression (0 + 1)* 011 , construct a DFA equivalent to this regular expression computing follow pos ().

6 marks

Details

Official Answer

Given RE;

(0 + 1)* 011

The augmented RE of the given RE is:

Syntax tree for augmented R.E. is

Calculating followpos:

followpos(1) = {1, 2, 3}

followpos(2) = {1, 2, 3}

followpos(3) = {4}

followpos(4) = {5}

followpos(5) = {6}

Now,

1) Start state of DFA = firstpos(root) = {1, 2, 3} = S₀

Use followpos of symbol representing position in RE to obtain the next state of DFA.

2) From S₀ = {1, 2, 3}

1 and 3 represents '0'

2 represents '1'

followpos({1, 3}) = {1, 2, 3, 4} = S₁

δ(s₀, 0) = S₁

followpos({2}) = {1, 2, 3} = S₀

δ(S₀, 1) = S₀

3) From S₁ = {1, 2, 3, 4}

1, 3 → 0

2, 4 → 1

followpos({1, 3}) = {1, 2, 3, 4} = S₁

δ(S₁, 0) = S₁

followpos({2, 4}) = {1, 2, 3, 5} = S₂

δ(S₁, 1) = S₂

4) From S₂ = {1, 2, 3, 5}

1, 3 → 0

2, 5 → 1

followpos({1, 3}) = {1, 2, 3, 4}= S₁

δ(S₂, 0) = S₁

followpos({2, 5}) = {1, 2, 3, 6} = S₃

δ(S₂, 1) = S₃

5) From S₃ = {1, 2, 3, 6}

1, 3 → 0

2 → 1

6 → #

followpos({1, 3}) = {1, 2, 3, 4}= S₁

δ(S₃, 0) = S₁

followpos({2}) = {1, 2, 3} = S₀

δ(S₃, 1) = S₀

Now there was no new states occur.

Therefore, Final state = {S₃}

Now the resulting DFA of given regular expression is:

AI Generated Answer

4. Explain the role of the parser. Write an algorithm for non-recursive predictive parsing.

6 marks

Details

Official Answer

A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams. The parser analyzes the source code (token stream) against the production rules to detect any errors in the code. The output of this phase is a parse tree.

The parser obtains a string of tokens from the lexical analyzer and verifies that the string can be generated by the grammar for the source language. It has to report any syntax errors if occurs.

The tasks of parser can be expressed as

Analyzes the context free syntax
Generates the parse tree
Provides the mechanism for context sensitive analysis
Determine the errors and tries to handle them

Algorithm for non-recursive predictive parsing:

1. Set ip to the first symbol of the input string.

2. Set the stack to $S where S is the start symbol of the grammar G.

3. Let X be the top stack symbol and ‘a’ be the symbol pointed by ip then

Repeat

a. If X is a terminal or ‘$’ then

i. If X = a then pop X from the stack and advance ip

ii. else error()

b. Else

If M[X,a] = Y₁Y₂Y₃....... Y_k then

1. Pop X from Stack

2. Push Y_k , Y_k-1, .....Y₂, Y₁ on the stack with Y₁ on top.

3. Output the production X → Y₁Y₂Y₃....... Y_k

Until X = ‘$’.

AI Generated Answer

5. Construct the grammar:

Compute the complete LR(0) collection of item set from above grammar.

6 marks

Details

Official Answer

The augmented grammar of given grammar is;

E’→E

E→E+T | T

T→T*F | F

F→ (E) | id

Next, we obtain the canonical collection of sets of LR(0) items, as follows,

I₀ = closure ({E‘ → •E})

= {E‘ → •E, E → •E + T, E → •T, T → •T * F, T → •F, F → • (E), F → •id}

goto(I₀, E) = closure({E‘ → E•, E → E •+ T}) = {E‘ → E•, E → E •+ T} = I₁

goto(I₀, T) = closure({E → T•, T → T •* F}) = { E → T•, T → T •* F } = I₂

goto(I₀, F) = closure({T → F•}) = {T → F•} = I₃

goto(I₀, ( ) = closure({F → (•E)}) = {F → (•E), E → •E + T, E → •T, T → •T * F, T → •F, F → • (E), F → •id} = I₄

goto(I₀, id ) = closure({F → id•}) = {F → id•} = I₅

goto(I₁, + ) = closure({E → E + •T}) = {E → E +• T, T → •T * F, T → •F, F → • (E), F → •id} = I₆

goto(I₂, * ) = closure({T → T * •F}) = {T → T * •F, F → • (E), F → •id} = I₇

goto(I₄, E) = closure({F → (E•), E → E•+ T}) = {F → (E•), E → E•+ T } = I₈

goto(I₄, T) = closure({E → T•, T → T• * F}) = {E → T• , T → T•*F}= I₂

goto(I₄, F) = closure({T → F•}) = {T → F•} = I₃

goto(I₄, () = closure({F → •(E)} = I₄

goto(I₄, id) = closure({F → id•}) = I₅

goto(I₆, T) = closure({E → E +T•, T → T•* F}) = {E → E +T•, T → T•* F} = I₉

goto(I₆, F) = closure ({T→ F•}) = I₃

goto (I₆, () = closure ({F → (• E)} = I₄

goto(I₆, id) = closure({F → id•}) = I₅

goto(I₇, F ) = closure({T → T * F•})= {T → T * F•} = I₁₀

goto( I₇, () = closure({F → •(E)} = I₄

goto(I₇, id) = closure({F → id•}) = I₅

goto(I₈, ) ) = closure({F → (E) •}) = {F → (E) •} = I₁₁

goto(I₈, +) = closure({ E → E+•T} ) = I₆

goto(I₉, *) = closure({T → T*•F}) = { T → T*•F, F→ •(E), F→ •id } = I₇

Now,

The canonical collection of set of LR(0) items for given grammar are:

AI Generated Answer

6. Show that the following grammar is not in a LL(1) grammar.

6 marks

Details

Official Answer

Given Grammar,

The FIRST and FOLLOW of the grammar symbols are:

FIRST(S) = {c}

FIRST(A) = {a}

FOLLOW(S) = {$}

FOLLOW(A) = {b, d}

Then, constructing the LL(1) parsing table:

Non-terminals	Terminal Symbols
Non-terminals	a	b	c	d	$
S			S → cAd
A	A → Ab A → a

Since, there are two entries for A in the LL(1) parsing table, hence the given grammar is not in LL(1) grammar.

AI Generated Answer

7. What do you mean by kernel and non-kernel items? Compute the kernel items for LR(0) for the following grammar.

6 marks

Details

Official Answer

Kernel Items

The initial item S‘ → •S and all the other items those with no dot (.) at the beginning of R.H.S are called the kernel items. For example, For any production A→ αBβ

Kernels are A→ α.Bβ, A→ αB.β, A→ αBβ.

Non-Kernel Items

All items except first item S‘ → •S with dot (.) at the beginning of R.H.S are called the non-kernel items. For example A→ .αBβ

Now,

The augmented grammar of the given grammar is,

S‘ → S

S → CC

C → bC | d

Next, we obtain the canonical collection of sets of LR(0) items, as follows,

I₀ = closure ({S‘ → •S}) = {S‘ → •S, S → •CC, C → •bC, C → •d}

goto (I₀, S) = closure ({S‘ → S•}) = {S‘ → S•} = I₁

goto (I₀, C) = closure ({S → C•C}) = {S → C•C, C → •bC, C→ •d} = I₂

goto (I₀, b) = closure ({C→ b•C}) = {C → b•C, C → •bC, C → •d} = I₃

goto (I₀, d) = closure ({C → d•}) = {C → d•} = I₄

goto (I₂, C) = closure ({S → CC•}) = {S → CC•} = I₅

goto (I₂, b) = closure ({C → b•C}) = {C → b•C, C → •bC, C → •d} = I₃

goto (I₂, d) = closure ({C → d•}) = {C → d•} = I₄

goto (I₃, C) = closure ({C → bC•}) = {C → bC•} = I₆

goto (I₃, b) = closure ({C → b•C}) = {C→ b•C, C → •bC, C → •d} = I₃

goto (I₃, d) = closure ({C → d•}) = {C → d•} = I₄

Now the Kernel items for LR(0) for the given grammar are;

States	Kernel Items
I₀	S‘ → •S
I₁	S‘ → S•
I₂	S → C•C
I₃	C → b•C
I₄	C → d•
I₅	S → CC•
I₆	C → bC•

AI Generated Answer

8. What do you mean by S-attributed definition and how they are evaluated? Explain with example.

6 marks

Details

Official Answer

A syntax directed definition that uses synthesized attributes exclusively is said to be an S-attributed definition. A parse tree for S-attributed definition can always be annotated by evaluating the semantic rules for attributes at each node in bottom up manner. The evaluation of s-attributed definition is based on the Depth first Traversal (DFT) of the annotated tree.

Example for S-attributed definitions:

Let us consider the Grammar for arithmetic expressions:

Attributes in S-attributed definitions are evaluated in bottom-up parsing, as the values of the parent nodes depend upon the values of the child nodes.

Consider input string: 5+3*4 return

Depth first Traversal (DFT) of annotated graph:

AI Generated Answer

9. What do you mean by three-address code representation? Explain with example.

6 marks

Details

Official Answer

Three address code is a type of intermediate code which is easy to generate and can be easily converted to machine code.It makes use of at most three addresses and one operator to represent an expression and the value computed at each instruction is stored in temporary variable generated by compiler.

There are 3 representations of three address code:

1. Quadruples:
It is structure with consist of 4 fields namely op, arg1, arg2 and result. op denotes the operator and arg1 and arg2 denotes the two operands and result is used to store the result of the expression. The three address statements x = y op z is represented by placing y in arg1, z in arg2 and x in result. For e.g. for statement a = b*-c +b*-c , the quadruples is:

2. Triples:

This representation doesn’t make use of extra temporary variable to represent a single operation instead when a reference to another triple’s value is needed, a pointer to that triple is used. So, it consist of only three fields namely op, arg1 and arg2. For e.g. for statement a = b*-c +b*-c , the triples is:

3. Indirect triples:
This representation makes use of pointer to the listing of all references to computations which is made separately and stored. It involves in listing of pointers to triples rather than listing of triples themselves. For e.g. the three address indirect triples for same statement above is

AI Generated Answer

10. How next-use information is useful in code-generation? Explain the steps involved on computing next-use information.

6 marks

Details

Official Answer

Next-use information is needed for dead-code elimination and register allocation. Next-use is computed by a backward scan of a basic block. The next-use information will indicate the statement number at which a particular variable that is defined in the current position will be reused.

The following are the steps involved in generating next-use information:

Input: Basic block B of three-address statements

Output: At each statement i: x= y op z, we attach to i the liveliness and next-uses of x, y and z.

Method: We start at the last statement of B and scan backwards.

1. Attach to statement i the information currently found in the symbol table regarding the next-use and liveliness of x, y and z.

2. In the symbol table, set x to “not live” and “no next use”.

3. In the symbol table, set y and z to "live", and next-uses of y and Z to i.

Example live and nextuse() computation

i: a := b + c

j: t := a + b

Statement number

Instruction

Liveness and nextuse initialized

Modified liveness and nextuse after statement ‘j’

Modified liveness and nextuse after statement ‘i’

Comments

i

a := b + c

live(a) = true, live(b) = true, live(t) = true, nextuse(a)=none,

nextuse(b)=none,

nextuse(t)=none

live(a) = true, nextuse(a) = j, live(b) = true, nextuse(b) = j, live(t) = false, nextuse(t)=none

live(a) = false, nextuse(a)=none, live(b)=true, nextuse(b) = i, live(c) = true, nextuse(c) = i, live(t) = false, nextuse(t)=none

‘a’ is said to false and no nextuse. ‘b’ and ‘c’ are said to live at statement ‘i’ and their next use is at ‘i’ . ‘t’’sliveness and nextuse remains as it is computed in statement ‘j’

j

t := a + b

live(a) = true, live(b) = true, live(t) = true, nextuse(a)=none,

nextuse(b)=none,

nextuse(t =none

live(a) = true, nextuse(a) = j, live(b) = true, nextuse(b) = j, live(t) = false, nextuse(t)=none

‘a’ and ‘b’ are used here and hence their nextuse is at ‘j’ and they are said to be live. ‘t’ is computed and hence nextuse is none and live information is false

AI Generated Answer

Course Content

Other Courses

Compiler Design and Construction 2069

Trivhuwan University

Pokhara University

Course Content

Other Courses

Compiler Design and Construction 2069