Compiler Design and Construction 2068

Tribhuwan University
Institute of Science and Technology
2068
Bachelor Level / Sixth Semester / Science
Computer Science and Information Technology ( CSC-352 )
( Compiler Design and Construction )
Full Marks: 60
Pass Marks: 24
Time: 3 hours
Candidates are required to give their answers in their own words as far as practicable.
The figures in the margin indicate full marks.

Attempt all questions.                    (10x6=60)

1.  What do you mean by compiler? How source program analyzed? Explain in brief.

6 marks view

A compiler is a program that takes a program written in a source language and translates it into an equivalent program in a target language. As an important part of a compiler is error showing to the programmer.


Analysis phase breaks up the source program into constituent pieces and creates an intermediate representation of the source program. Analysis of source program includes: lexical analysis, syntax analysis and semantic analysis.

1. Lexical Analysis:

In this phase, lexical analyzer reads the source program and returns the tokens of the source program. Token is a sequence of characters that can be treated as a single logical entity (such as identifier, operators, keywords, constants etc.).

2. Syntax Analysis:

In this phase, the syntax analyzer takes the token produced by lexical analyzer as input and generates a parse tree as output. In syntax analysis phase, the parser checks that the expression made by the token is syntactically correct or not, according to the rules that define the syntax of the source language.

3. Semantic Analysis:

In this phase, the semantic analyzer checks the source program for semantic errors and collects the type information for the code generation. Semantic analyzer checks whether they form a sensible set of instructions in the programming language or not. Type-checking is an important part of semantic analyzer.

Example of Analysis phase:



2.  Discuss the role of symbol table in compiler design.

6 marks view

Symbol Table is an important data structure created and maintained by the compiler in order to keep track of semantics of variable i.e. it stores information about scope and binding information about names, information about instances of various entities such as variable and function names, classes, objects, etc. It is built in lexical and syntax analysis phases. The information is collected by the analysis phases (front end) of compiler and is used by synthesis phases (back end) of compiler to generate code. It is used by compiler to achieve compile time efficiency.

A symbol table may have the following roles/functions depending upon the language in hand:

  • To store the names of all entities in a structured form at one place.
  • To verify if a variable has been declared or not.
  • To determine the scope of a name (scope resolution).
  • To access information associated with a given name.
  • To add new information with a given name.
  • To delete a name or group of names from the table.
  • To implement type checking, by verifying assignments and expressions in the source code are semantically correct.

3.  Convert the regular expression '0 +(1+ 0)*00' first into NFA and then into DFA using Thomson’s and Subset Construction methods.

6 marks view

Given RE:

+(1+ 0)*00

Converting into NFA using Thomson's method:

For 0,


For 1,


For 1+0


For (1+0)*


For (1+0)*00


For 0+(1+0)*00

Which is the required NFA. Now converting it into a DFA:

The initial state of the NFA is {A}.

Therefore, the initial state of the DFA is:

S0∈-closure({A}) = {A, B, C, D, E, F, J}

Here,

Σ = {0, 1)

Now,

Dtran[S0, 0] = âˆˆ-closure(move(S0, 0)) = âˆˆ-closure({H, K, M}) = {D, E, F, H, I, J, K, M, N} = S1 (say)

Dtran[S0, 1] = âˆˆ-closure(move(S0, 1)) = âˆˆ-closure({G}) = {D, E, F, G, I, J} = S2

Dtran[S1, 0] = âˆˆ-closure(move(S1, 0)) = âˆˆ-closure({H, K, L}) = {D, E, F, H, I, J, L, N} = S3

Dtran[S1, 1] = âˆˆ-closure(move(S1, 1)) = âˆˆ-closure({G}) = {D, E, F, G, I, J} = S2

Dtran[S2, 0] = âˆˆ-closure(move(S2, 0)) = âˆˆ-closure({H, K}) = {D, E, F, H, I, J, K} = S4

Dtran[S2, 1] = âˆˆ-closure(move(S2, 1)) = âˆˆ-closure({G}) = {D, E, F, G, I, J} = S2

Dtran[S3, 0] = âˆˆ-closure(move(S3, 0)) = âˆˆ-closure({H, K}) = {D, E, F, H, I, J, K} = S4

Dtran[S3, 1] = âˆˆ-closure(move(S3, 1)) = âˆˆ-closure({G}) = {D, E, F, G, I, J} = S2

Dtran[S4, 0] = âˆˆ-closure(move(S4, 0)) = âˆˆ-closure({H, K, L}) = {D, E, F, H, I, J, L, N} = S3

Dtran[S4, 1] = âˆˆ-closure(move(S4, 1)) = âˆˆ-closure({G}) = {D, E, F, G, I, J} = S2

Here, S1 & S3 are the accepting state of DFA, since N is a member of both S1 & S3 states.

The equivalent DFA is;

4.  Consider the following grammar:

            

a.       Eliminate Left recursion.

b.      Compute FIRST & FOLLOW for the symbol in the grammar.

6 marks view

Given grammar,

a) Removing left recursion of the given grammar;

    â†’ (L) | a

    â†’ SL'

    L' â†’ ,SL' | âˆˆ

b) Now, the FIRST and FOLLOW for the given symbols are:

Non-terminals

FIRST()

FOLLOW()

S

{(, a}

{$, , , )}

L

{(, a}

{)}

L’

{, , ∈}

{)}

5.  Consider the grammar


Calculate the canonical LR(0) items.

6 marks view

The augmented grammar of given grammar is,

    C‘→ C

    C → AB

    A → a

    B → b

Next, we obtain the canonical collection of sets of LR(0) items, as follows,

I0 = closure ({C‘ → •C}) = {C‘ → •C, C → •AB, A → •a}

goto(I0, C) = closure(C‘ → C•) = { C‘ → C•} = I1

goto(I0, A) = closure(C → A•B) = { C → A•B, B → •b} = I2

goto(I0, a) = closure(A → a•) = { A → a• } = I3

goto(I2, B) = closure(C → AB•) = { C → AB•} = I4

goto(I2, b) = closure(B → b•) = { B → b•} = I5

Now, the canonical collection of sets of LR(0) items for given grammar are:


6.  Describe the inherited and synthesized attributes of grammar using an example.

6 marks view

Synthesized Attributes

The attribute of node that are derived from its children nodes are called synthesized attributes. Assume the following production:

    S → ABC

If S is taking values from its child nodes (A, B, C), then it is said to be a synthesized attribute, as the values of ABC are synthesized to S.

Example for synthesized attribute:

Production

Semantic Rules

E â†’ E1 * E  

{E.val = E1.val * E2.val}

E â†’ E1 + E2   

{E.val = E1.val +E2.val}

E â†’ int   

{E.val = int.val}

The Syntax Directed Definition associates to each non terminal a synthesized attribute called val.

In synthesized attributes, value owns from child to parent in the parse tree. For example, for string 3*2+5


Inherited Attributes

The attributes of node that are derived from its parent or sibling nodes are called inherited attributes. If the value of the attribute depends upon its parent or siblings then it is inherited attribute. As in the following production,

S → ABC

"A‟ can get values from S, B and C. B can take values from S, A, and C. Likewise, C can take values from S, A, and B.

Example for Inherited Attributes:

Production

Semantic Rules

® TL

L.in = T.type

® int

 T.type = integer 

® real

 T.type = real 

® L1, id

 L1.in = L.in;

addtype(id.entry, L.in) 

® id

addtype(id.entry, L.in)

Symbol T is associated with a synthesized attribute type and symbol L is associated with an inherited attribute in.

In inherited attributes, values flows into a node in the parse tree from parents and/or siblings. For example, for input: int id, id

7.  Write the type expressions for the following types:

               a. An array of pointers to real’s, where the array index range from 1 to 100.

   b. Function whose domains are function from two characters and whose range is a pointer of integer.

6 marks view

a) 

T→ real     { T.type = real}

E → array [100, T]     { if T.Type = real then T.type = array(1..100, T) else type_error() }

E→ Pointer[E1]     { if (E1.type = array[100, real)) then E.type = E1.type else E.type = type-error }


b)

T→ int         { T.type = int }

T →char     { T.type = char}

T→ Pointer[T1]     { T.type = Pointer(T1.type) }

E→ E1[E2]     { if ( E2.type = (char, char) and E1.type = (char, char) → Pointer(int) then E.type = E1.type else type_error }

8.  What do you mean by intermediate code? Explain the role of intermediate code in compiler design.

6 marks view

Intermediate code is used to translate the source code into the machine code. Intermediate code lies between the high-level language and the machine language. The given program in a source language is converted into an equivalent program in an intermediate language by the intermediate code generator. Intermediate codes are machine independent codes. 

Compiler - Intermediate Code Generation - Tutorialspoint

Roles of Intermediate code are :

  • It acts as a glue between front-end and backend (or source and machine codes).
  • If the compiler directly translates source code into the machine code without generating intermediate code then a full native compiler is required for each new machine.
  • The intermediate code keeps the analysis portion same for all the compilers that's why it doesn't need a full compiler for every unique machine.
  • Intermediate code generator receives input from its predecessor phase and semantic analyzer phase. It takes input in the form of an annotated syntax tree.
  • Using the intermediate code, the second phase of the compiler synthesis phase is changed according to the target machine.
  • Intermediate code generator lowers abstraction from source level.
  • Complete some syntactic checks, perform more semantic checks. For e.g. break should be inside loop or switch only.

The intermediate code representation are

  • Graphical representation e.g. Abstract Syntax Tree(AST) , DAGS
  • Postfix notations
  • Three Address codes

9.  What is operation of simple code generator? Explain.

6 marks view

The final phase in compiler model is the code generator. It takes as input an intermediate representation of the source program and produces as output an equivalent target program.

        Fig: Position of code generator

A simple code generator generates target code for a sequence of three- address statements and effectively uses registers to store operands of the statements. For example: consider the three-address statement a := b+c It can have the following sequence of codes:

ADD Rj, Ri     Cost = 1    // if Ri contains b and Rj contains c

(or)

ADD c, Ri     Cost = 2    // if c is in a memory location

(or)

MOV c, Rj     Cost = 3    // move c from memory to Rj and add

ADD Rj, Ri

10.  Why optimization is often required in the code generated by simple code generator? Explain the unreachable code optimization.

6 marks view

Code optimization is used to improve the intermediate code so that the output of the program could run faster and takes less space. It removes the unnecessary lines of the code and arranges the sequence of statements in order to speed up the program execution without wasting resources. It tries to improve the code by making it consume less resources (i.e. CPU, Memory) and deliver high speed.

Optimization is required in the code generated by simple code generator due to the following reasons:

  • Code optimization enhances the portability of the compiler to the target processor.
  • Code optimization allows consumption of fewer resources (i.e. CPU, Memory).
  • Optimized code has faster execution speed.
  • Optimized code utilizes the memory efficiently.
  • Optimized code gives better performance.
  • Optimized code often promotes re-usability.

Unreachable code optimization

The unreachable code is a part of the source code that will never be executed due to inappropriate exit points/control flow.. By eliminating these useless things from a code, the code will get optimized. For e.g.

An unlabeled instruction immediately following an unconditional jump can be removed. For e.g.