Compiler Design and Construction Model Question

Tribhuwan University
Institute of Science and Technology
Model Question
Bachelor Level / Sixth Semester / Science
Computer Science and Information Technology ( CSC365 )
( Compiler Design and Construction )
Full Marks: 60
Pass Marks: 24
Time: 3 hours
Candidates are required to give their answers in their own words as far as practicable.
The figures in the margin indicate full marks.

Section A

Attempt any TWO questions. (2 × 10 = 20)

1. Differentiate between top-down and bottom-up parsing methods. Construct SLR parse table for the following grammar.

    

10 marks view

Difference between top-down and bottom-up parsing methods

Top-down Parsing

Bottom-up Parsing

It is a parsing strategy that first looks at the highest level of the parse tree and works down the parse tree by using the rules of grammar.

It is a parsing strategy that first looks at the lowest level of the parse tree and works up the parse tree by using the rules of grammar.

Top-down parsing attempts to find the left most derivations for an input string.

Bottom-up parsing can be defined as an attempt to reduce the input string to start symbol of a grammar.

In this parsing technique we start parsing from top (start symbol of parse tree) to down (the leaf node of parse tree) in top-down manner.

In this parsing technique we start parsing from bottom (leaf node of parse tree) to up (the start symbol of parse tree) in bottom-up manner.

This parsing technique uses Left Most Derivation.

This parsing technique uses Right Most Derivation.

Its main decision is to select what production rule to use in order to construct the string.

Its main decision is to select when to use a production rule to reduce the string to get the starting symbol.

A top-down parser can be easily structured and formed.

It is difficult to produce a bottom-up parser

It is less powerful.

It is more powerful.

Error detection is easy.

Error detection is difficult.



Now, 

Given grammar:

    1. S → aETe

    2. E → Ebc

    3. E → b

    4. T → d

The augmented grammar of given grammar is,

S‘ → S

→ aETe

→ Ebc

→ b

→ d

Next, we obtain the canonical collection of sets of LR(0) items, as follows,

I0 = closure({S'S}) = {S'S, S→ aETe }

goto(I0, S) = {S'→S•}= I1

goto(I0, a) = closure({S→ aETe}) = {S→ aETe, → Ebc, → b}= I2

goto(I2, E)  closure({S→ aETe, → Ebc}) = {S→ aETe, → Ebc, → d}= I3

goto(I2, b) closure({→ b•}) = {→ b•} = I4

goto(I3 , d) = closure({→ d•}) = I5

goto(I3 , b) = closure({→ Ebc}) = {→ Ebc}= I6

goto(I3 , T) = closure({S→ aETe}) = {S→ aETe} = I7

goto( I6 , c) = closure({→ Ebc}) = {→ Ebc}= I8

goto(I7 , e) = closure({S→ aETe}) = {S→ aETe}= I9

FOLLOW set for non-terminals are,

FOLLOW(S) = {$}

FOLLOW(E) = {b, d}

FOLLOW(T) = {e}

Now, the SLR parsing table is,

States/

Terminals

Action Table

Goto Table

a

b

c

d

e

$

S

E

T

0

S2

 

 

 

 

 

1

 

 

1

 

 

 

 

 

Accept

 

 

 

2

 

S4

 

 

 

 

 

3

 

3

 

S6

 

S5

 

 

 

 

7

4

 

R3

 

R3

 

 

 

 

 

5

 

 

 

 

R4

 

 

 

 

6

 

 

S8

 

 

 

 

 

 

7

 

 

 

 

S9

 

 

 

 

8

 

R2

 

R2

 

 

 

 

 

9

 

 

 

 

 

R1

 

 

 

2. What are static and dynamic type checking? Write SDD to carry out type checking for the following expression.

E->id |E1 op E2 | E1 relop E2 | E1[E2] | E1↑

10 marks view

In static type checking, the type of variable is known at compile time (it checks the type of variable before running). Typical examples of static checking are:

    Type checks: Report an error if an operator is applied to an incompatible operand.

    Flow-of-control checks: Statements that results in a branch need to be terminated correctly.

    Uniqueness checks: There are situations where an object must be defined exactly once.

    Name-related checks: Sometimes the same names may be appeared two or more times.

In dynamic type checking, the type of variable is known at runtime (it checks the type of variable while executing). Compiler generates verification code to enforce programming language's dynamic semantics. If a programming language has no any dynamic checking, it is called strongly typed language i.e. there will be no type errors during run-time.


Now,

SDD to carry out type checking for the given expression is given below:

E → id     { E.type = lookup(id.entry) }


E → E1 op E2     { if (E1.type = integer and E2.type = integer) then E.type = integer

        else if (E1.type = integer and E2.type = real) then E.type = real

        else if (E1.type = real and E2.type = integer) then E.type = real

        else if (E1.type = real and E2.type = real) then E.type = real

        else E.type = type-error }


E → E1 relop E2    { if E1.type = boolean and E2.type = boolean then E.type = boolean

                    else E.type = error}


E → E1 [E2]     { if (E2.type = int and E1.type = array(s, t)) then E.type = t

                   else E.type = type-error }


E →E1     { if E1.type = pointer(t) then E.type = t

                else type-error}

3. What is the role of intermediate code generation in the entire compilation process? Convert the following into three address code.

        a+(b-c)*d

10 marks view

Intermediate code is used to translate the source code into the machine code. Intermediate code lies between the high-level language and the machine language. The given program in a source language is converted into an equivalent program in an intermediate language by the intermediate code generator. Intermediate codes are machine independent codes. 

Compiler - Intermediate Code Generation - Tutorialspoint

Roles of Intermediate code are :

  • It acts as a glue between front-end and backend (or source and machine codes).
  • If the compiler directly translates source code into the machine code without generating intermediate code then a full native compiler is required for each new machine.
  • The intermediate code keeps the analysis portion same for all the compilers that's why it doesn't need a full compiler for every unique machine.
  • Intermediate code generator receives input from its predecessor phase and semantic analyzer phase. It takes input in the form of an annotated syntax tree.
  • Using the intermediate code, the second phase of the compiler synthesis phase is changed according to the target machine.
  • Intermediate code generator lowers abstraction from source level.
  • Complete some syntactic checks, perform more semantic checks. For e.g. break should be inside loop or switch only.

The intermediate code representation are

  • Graphical representation e.g. Abstract Syntax Tree(AST) , DAGS
  • Postfix notations
  • Three Address codes

Given expression:

a+(b-c)*d

Converting it into three address code:

    t1 = b-c

    t2 = t1*d

    t3 = a+t2

For given expression the quadruples is:

 

op

arg1

arg2

result

(0)

-

b

c

t1

(1)

*

t1

d

t2

(2)

+

a

t2

t3


For given expression, the triples is:

 

op

arg1

arg2

(0)

-

b

c

(1)

*

(0)

d

(2)

+

a

(1)

Section B

Attempt any EIGHT questions. (8 × 5 = 40)

4. Define compiler. Explain analysis phase of compiler.

5 marks view

compiler is a program that takes a program written in a source language and translates it into an equivalent program in a target language. As an important part of a compiler is error showing to the programmer.


Analysis phase breaks up the source program into constituent pieces and creates an intermediate representation of the source program. Analysis of source program includes: lexical analysis, syntax analysis and semantic analysis.

1. Lexical Analysis:

In this phase, lexical analyzer reads the source program and returns the tokens of the source program. Token is a sequence of characters that can be treated as a single logical entity (such as identifier, operators, keywords, constants etc.).

2. Syntax Analysis:

In this phase, the syntax analyzer takes the token produced by lexical analyzer as input and generates a parse tree as output. In syntax analysis phase, the parser checks that the expression made by the token is syntactically correct or not, according to the rules that define the syntax of the source language.

3. Semantic Analysis:

In this phase, the semantic analyzer checks the source program for semantic errors and collects the type information for the code generation. Semantic analyzer checks whether they form a sensible set of instructions in the programming language or not. Type-checking is an important part of semantic analyzer.

Example of Analysis phase:


5. “Symbol table is a necessary component of compiler”, justify this statement with examples.

5 marks view

Symbol tables are data structures that are used by compilers to hold information about source-program constructs. The information is collected incrementally by the analysis phase of a compiler and used by the synthesis phases to generate the target code. Entries in the symbol table contain information about an identifier such as its type, its position in storage, and any other relevant information.

Symbol tables typically need to support multiple declarations of the same identifier within a program. The lexical analyzer can create a symbol table entry and can return token to the parser, say id, along with a pointer to the lexeme. Then the parser can decide whether to use a previously created symbol table or create new one for the identifier.

The basic operations defined on a symbol table include

  • allocate – to allocate a new empty symbol table
  • free – to remove all entries and free the storage of a symbol table
  • insert – to insert a name in a symbol table and return a pointer to its entry
  • lookup – to search for a name and return a pointer to its entry
  • set_attribute – to associate an attribute with a given entry
  • get_attribute – to get an attribute associated with a given entry

6. Given a regular expression (ε + 0)*10. Construct the DFA recognizing the pattern described by this regular expression using syntax tree based reduction.

5 marks view

Given RE;

(ε + 0)*10

The augmented RE of the given RE is:

(ε + 0)*10 #

Syntax tree for augmented R.E. is


Calculating followpos:

followpos(1) = {1, 2}

followpos(2) = {3}

followpos(3) = {4}

Now,

1) Start state of DFA = firstpos(root) = {1, 2} = S0

Use followpos of symbol representing position in RE to obtain the next state of DFA.

2) From S0 = {1, 2}

    1  represents '0'

    2 represents '1'

followpos(1) = {1, 2} = S0

     δ(s0, 0) = S0

followpos(2) = {3} = S1

    δ(S0, 1) = S1

3) From S1 = {3}

    Here 3 represents '0'

followpos(3) = {4} = S2

     δ(S1, 0) = S2

4) From S2 = {4}

    Here 4 represents '#'

Now there was no new states occur.

Therefore, Final state = {S2}

Now the resulting DFA of given regular expression is:

7. Define the terms token, pattern and lexeme. How input buffer can be used for scanner. Explain.

5 marks view

Token

Token is a sequence of characters that can be treated as a single logical entity. For example, Identifiers, Keywords, Operators, Constants etc.

Pattern

A set of string in the input for which the same token is produced as output. This set of string is described by a rule called a pattern associated with the token. For example, “a letter followed by zero or more letters, digits, or underscores.”

Lexeme

A lexeme is a sequence of characters in the source program that is matched by the pattern for a token. 


Input Buffering

Lexical analysis needs to look ahead several characters before a match can be announced. We have two buffer input scheme that is useful when look ahead is necessary.

  • Buffer Pair
  • Sentinels

Buffer Pair (2N Buffering):

In this technique buffer is divided into two halves with N-characters each, where N is number of characters on the disk block like 1024 and 4096. Rather than reading characters by characters from file we read N input character at once. If there are fewer than N character in input EOF marker is placed. There are two pointers: lexeme pointer and forward pointer.

Lexeme pointer points to the start of the lexeme while forward pointer scans the input buffer for lexeme. When forward pointer reaches the end of one half, second half is loaded and forwarded pointer points to the beginning of the next half.

Sentinels:

While using buffer pair technique, we have to check each time fwd is moved that it doesn't crosses the buffer half and when it reaches end of buffer, the other one needs to be loaded. We can solve this problem by introducing a sentinel character at the end of both halves of buffer. The sentinel can be any character which is not a part of source program. EOF is usually preferred as it will also indicate end of source program. It signals the need for some special action (fill other buffer-half, or terminate processing).

8. Find first and follow of all the non terminals in the following grammar.

 TA ; A+TA|ɛ ; TFB ; B*FB | ɛ ; F(E) | id

5 marks view

Given grammar;

Now, the FIRST and FOLLOW for the non-terminals are:

Non-terminals

FIRST()

FOLLOW()

E

{(, id}

 {), $}

A

{+, }

 {), $}

T

{(, id}

{), +, $}

B

{*, }

{), +, $}

F

{(, id}

{*, +, ), $}

9. What is Syntax Directed Definition? Define synthesized and inherited attributes with example.

5 marks view

syntax-directed definition (SDD) is a context-free grammar together with attributes and rules. Attributes are associated with grammar symbols and rules are associated with productions. For example,

Production

Semantic Rules

→ E1 + T

E.val = E1.val+T.val

→ T

E.val = T.val


Synthesized Attributes

The attribute of node that are derived from its children nodes are called synthesized attributes. Assume the following production:

    S → ABC

If S is taking values from its child nodes (A, B, C), then it is said to be a synthesized attribute, as the values of ABC are synthesized to S.

Example for synthesized attribute:

Production

Semantic Rules

E → E1 * E  

{E.val = E1.val * E2.val}

E → E1 + E2   

{E.val = E1.val +E2.val}

E → int   

{E.val = int.val}

The Syntax Directed Definition associates to each non terminal a synthesized attribute called val.

In synthesized attributes, value owns from child to parent in the parse tree. For example, for string 3*2+5


Inherited Attributes

The attributes of node that are derived from its parent or sibling nodes are called inherited attributes. If the value of the attribute depends upon its parent or siblings then it is inherited attribute. As in the following production,

S → ABC

"A‟ can get values from S, B and C. B can take values from S, A, and C. Likewise, C can take values from S, A, and B.

Example for Inherited Attributes:

Production

Semantic Rules

® TL

L.in = T.type

® int

 T.type = integer 

® real

 T.type = real 

® L1, id

 L1.in = L.in;

addtype(id.entry, L.in) 

® id

addtype(id.entry, L.in)

Symbol T is associated with a synthesized attribute type and symbol L is associated with an inherited attribute in.

In inherited attributes, values flows into a node in the parse tree from parents and/or siblings. For example, for input: int id, id

10. What do you mean by runtime storage allocation? Differentiate static and dynamic allocation.

5 marks view

Runtime storage allocation

■ Compiler allocates space only for global variables at compile time

 Space for variables of procedures will be allocated at run-time

  • Stack/heap allocation
  • Ex: C, C++, Java, Fortran 8/9
  • Variable access is slow (compared to static allocation) since addresses are accessed through the stack/heap pointer
  • Recursion can be implemented

Difference between static and dynamic allocation:

Static allocation: Compiler makes the decision regarding storage allocation by looking only at the program text

Dynamic allocation: Storage allocation decisions are made only while the program is running.
Difference between static and dynamic allocation | Download Scientific  Diagram

11. Why is it necessary to optimize code? Explain any two code optimization techniques with example.

5 marks view

Code optimization is used to improve the intermediate code so that the output of the program could run faster and takes less space. It removes the unnecessary lines of the code and arranges the sequence of statements in order to speed up the program execution without wasting resources. It tries to improve the code by making it consume less resources (i.e. CPU, Memory) and deliver high speed.

Optimization is necessary in the code generated by simple code generator due to the following reasons:

  • Code optimization enhances the portability of the compiler to the target processor.
  • Code optimization allows consumption of fewer resources (i.e. CPU, Memory).
  • Optimized code has faster execution speed.
  • Optimized code utilizes the memory efficiently.
  • Optimized code gives better performance.
  • Optimized code often promotes re-usability.

Code optimization techniques are:

1. Common-Subexpression Elimination: In the common sub-expression, we don't need to be computed it over and over again. Instead of this we can compute it once and kept in store from where it's referenced when encountered again. For e.g.


2. Dead Code Elimination: The dead code may be a variable or the result of some expression computed by the programmer that may not have any further uses. By eliminating these useless things from a code, the code will get optimized. For e.g.

12. Explain about the factors affecting code generation.

5 marks view

The final phase in compiler model is the code generator. It takes as input an intermediate representation of the source program and produces as output an equivalent target program. Designing of code generator should be done in such a way so that it can be easily implemented, tested and maintained.

The different factors affecting target code generation includes:

1. Input to the code generator: The input to the code generator is intermediate representation together with the information in the symbol table. Intermediate representation has the several choices: 

  • Postfix notation, 
  • Syntax tree or DAG,
  • Three address code

The code generation phase needs complete error-free intermediate code as an input requires.

2. Target Program: The target program is the output of the code generator. The output can be:

  • Assembly language: It allows subprogram to be separately compiled.
  • Relocatable machine language: It makes the process of code generation easier.
  • Absolute machine language: It can be placed in a fixed location in memory and can be executed immediately.

3. Target Machine: Implementing code generation requires thorough understanding of the target machine architecture and its instruction set. 

4. Instruction Selection: Efficient and low cost instruction selection is important to obtain efficient code. When we consider the efficiency of target machine then the instruction speed and machine idioms are important factors. The quality of the generated code can be determined by its speed and size.

5. Register Allocation: Proper utilization of registers improve code efficiency. Use of registers make the computations faster in comparison to that of memory, so efficient utilization of registers is important. The use of registers are subdivided into two subproblems:

  • During Register allocation, we select only those set of variables that will reside in the registers at each point in the program.
  • During a subsequent Register assignment phase, the specific register is picked to access the variable.

6. Choice of Evaluation order: The efficiency of the target code can be affected by the order in which the computations are performed.