Compiler Design and Construction 2072

Tribhuwan University
Institute of Science and Technology
2072
Bachelor Level / Sixth Semester / Science
Computer Science and Information Technology ( CSC-352 )
( Compiler Design and Construction )
Full Marks: 60
Pass Marks: 24
Time: 3 hours
Candidates are required to give their answers in their own words as far as practicable.
The figures in the margin indicate full marks.

Attempt all questions.                    (10x6=60)

1.  Explain the analysis phase of the compiling a source program with example.

6 marks view

Analysis phase breaks up the source program into constituent pieces and creates an intermediate representation of the source program. Analysis of source program includes: lexical analysis, syntax analysis and semantic analysis.

Block diagram for phases of compiler:

COMPILER DESIGN: Phases of a compiler

1. Lexical Analysis:

In this phase, lexical analyzer reads the source program and returns the tokens of the source program. Token is a sequence of characters that can be treated as a single logical entity (such as identifier, operators, keywords, constants etc.).

2. Syntax Analysis:

In this phase, the syntax analyzer takes the token produced by lexical analyzer as input and generates a parse tree as output. In syntax analysis phase, the parser checks that the expression made by the token is syntactically correct or not, according to the rules that define the syntax of the source language.

3. Semantic Analysis:

In this phase, the semantic analyzer checks the source program for semantic errors and collects the type information for the code generation. Semantic analyzer checks whether they form a sensible set of instructions in the programming language or not. Type-checking is an important part of semantic analyzer.

Example of Analysis phase:


2.  What are the functional responsibilities of a lexical analyzer? Write a lexical analyzer that recognizes the identifier of C language.

6 marks view

Lexical Analyzer reads the input characters of the source program, groups them into lexemes, and produces a sequence of tokens for each lexeme. The tokens are then sent to the parser for syntax analysis.

Responsibilities of a lexical analyzer are:

  • It reads the input character and produces output sequence of tokens that the Parser uses for syntax analysis.
  • Lexical analyzer helps to identify token into the symbol table.
  • Lexical Analyzer is also responsible for eliminating comments and white spaces from the source program.
  • It also generates lexical errors.
  • Lexical analyzer is used by web browsers to format and display a web page with the help of parsed data from JavsScript, HTML, CSS

Second part:

/* Lex program to recognize the identifier of C language */

% {
#include <stdio.h>
%}
% %
^[a - z A - Z _ ][a - z A - Z 0 - 9 _ ]* printf("Valid Identifier");
^[^a - z A - Z _] printf("Invalid Identifier");
. ;
% %
main()
{
 yylex();
}

3.  What are regular definitions? Given the following CFG,


Write the regular definitions for the terminals used. Also, construct the transition diagram for id and num.

6 marks view

If  Σ is an alphabet of basic symbols, then a regular definition is a sequence of definition of the form

d1  r1

d2  r2

.....

dn  rn

where, di is a distinct name and ri is a regular expression over symbols in  Σ ∪ {d1, d2, ...., di-1}


Now,

Regular definitions for the terminals used in given grammar are:

digit → [0-9]
num → digit+(. digit+)?(E[+|-]? digit+)?
letter → [A-Za-z]
id → letter ( letter | digit )*
if → if
then → then
else → else
relop → < | > | <= | >= | = | <>

Transition diagram for id:

trans dia id

Here 9, 10 & 11 are the name of states.

Transition diagram for num:

trans dia num

4.  Given a regular expression (a+) bc*. Construct the DFA recognizing the pattern described by the regular expression using syntax tree based reduction.

6 marks view

Given RE;

(a+bc*

The augmented RE of the given RE is:

(a+bc* #

Syntax tree for augmented R.E. is

Calculating followpos:

followpos(1) = {2}

followpos(2) = {3, 4}

followpos(3) = {3, 4}

Now,

1) Start state of DFA = firstpos(root) = {1, 2} = S0

Use followpos of symbol representing position in RE to obtain the next state of DFA.

2) From S0 = {1, 2}

    1  represents 'a'

    2 represents 'b'

followpos(1) = {2} = S1

     δ(s0, a) = S1

followpos(2) = {3, 4} = S2

    δ(S0, b) = S2

3) From S1 = {2}

    Here 2 represents 'b'

followpos(2) = {3, 4} = S2

     δ(S1, b) = S2

4) From S2 = {3, 4}

    Here 3 represents 'c'

    4 represents '#'

followpos(3) = {3, 4} = S2

    δ(S2, c) =  S2

Now there was no new states occur.

Therefore, Final state = {S2}

Now the resulting DFA of given regular expression is:


5.  Explain in detail about the recursive-descent parsing with example.

6 marks view

Recursive decent parsing is a top-down parsing technique that uses a set of recursive procedure to scan its input from left to right. It may involve backtracking. It starts from the root node (start symbol) and use the first production, match the input string against the production rules and if there is no match backtrack and apply another rule.

Rules:

1. Use two pointers iptr for pointing the input symbol to be read and optr for pointing the symbol of output string that is initially start symbol S.

2. If the symbol pointed by optr is non-terminal use the first production rule for expansion.

3. While the symbol pointed by iptr and optr is same increment the both pointers.

4. The loop at the above step terminates when

    a. A non-terminal is encountered in output 

    b. The end of input is reached. 

    c. Symbol pointed by iptr and optr is unmatched.

5. If 'a' is true, expand the non-terminal using the first rule and goto step 2

6. If 'b' is true, terminate with success

7. If 'c' is true, decrement both the pointers to the place of last non-terminal expansion and use the next production rule for non-terminal

8. If there is no more production and 'b' is not true report error.

Example:

Consider a grammar

S → cAd

A → ab | a

Input string is “cad”

Initially, iptr = optr = 0

Input

Output

Rules Fired

iptr(cad)

optr(S)

Try S → cAd

iptr(cad)

optr(cAd)

Match c

c iptr(ad)

c optr(Ad)

Try A → ab

c iptr(ad)

c optr(abd)

Match a

ca iptr(d)

ca optr(bd)

dead end, backtrack

c iptr(ad)

c optr(Ad)

Try A → a

c iptr(ad)

c optr(ad)

Match a

ca iptr(d)

ca optr(d)

Match d

cad iptr(eof)

cad optr(eof)

 

Parsing completed.

 

6.  Given the following grammar:

      

Construct the parsing table for this grammar for non-recursive predictive parser.
6 marks view

Given Grammar;

  

Now, Calculating First and Follow:

FIRST

First(E) = First(T) = First(F) = { (, id }

First(E) = { +, ∈ }

First(T’) = { *, ∈ }

FOLLOW

Follow(E) = { ), $}

Follow(E) = { ), $}

Follow(F) = { *, +, ), $ }

Follow(T) = { +, ) , $}

Follow(T’) = { +, ) , $}

Now, the Parsing table for given grammar is as below:


7.  Define the term ‘item’, ‘closure’ and ‘goto’ used in LR parsing. Compute the SLR DFA for the following grammar.

        

6 marks view

Item

An “item” is a production rule that contains a dot(.) somewhere in the right side of the production. For example, A→ .αAβ , A→ α.Aβ, A→ αA.β, A→ αAβ. are items if there is a production A→ αAβ in the grammar.

Closure

If I is a set of items for a grammar G, then closure(I) is the set of LR(0) items constructed from I using the following rules:

 1. Initially, every item in I is added to closure(I).

 2. If A→ α.Bβ is in closure(I) and B →γ is a production, then add the item B → .γ to I if it is not already there.

Apply this rule until no more new items can be added to closure(I).

goto

If I is a set of LR(0) items and X is a grammar symbol (terminal or non-terminal), then goto(I, X) is defined as follows:

    If A → α•Xβ in I then every item in closure({A → αX•β}) will be in goto(I, X).


Now,

The augmented grammar of the given grammar is,

    S‘ → S

    S → CC

    C → aC | b

Next, we obtain the canonical collection of sets of LR(0) items, as follows,

I0 = closure ({S‘ → •S}) = {S‘ → •S, S → •CC, C → •aC,  C → •b}

goto (I0, S) = closure ({S‘ → S•}) = {S‘ → S•} = I1

goto (I0, C) = closure ({S → C•C}) = {S → C•C, C → •aC, C→ •b} = I2

goto (I0, a) = closure ({C→ a•C}) = {C → a•C, C → •aC, C → •b} = I3

goto (I0, b) = closure ({C → b•}) = {C → b•} = I4

goto (I2, C) = closure ({S → CC•}) = {S → CC•} = I5

goto (I2, a) = closure ({C → a•C}) = {C → a•C, C → •aC, C → •b} = I3

goto (I2, b) = closure ({C → b•}) = {C → b•} = I4

goto (I3, C) = closure ({C → aC•}) = {C → aC•} = I6

goto (I3, a) = closure ({C → a•C}) = {C→ a•C, C → •aC, C → •b} = I3

goto (I3, b) = closure ({C → b•}) = {C → b•} = I4

Now, the SLR DFA of the given grammar is:


8.  For the following grammar,

    

    Annotate the grammar with syntax directed definitions using synthesized attributes. Remove left recursion from the grammar and re-write the annotation correspondingly.

6 marks view

First part:

E → E + E  {E.val = E1.val + E2.val}

E → E * E  {E.val = E1.val * E2.val}

E → (E)  {E.val = E1.val}

E →id  {E.val = id.lexval}

Second part:

Removing left recursion as,

E → (E)R | id R

R → +ER1 | *ER1 | ε

Now add the attributes within this non-left recursive grammar as,

E → (E) {R.in = E1.val}R {E.val = R.s}

E →id {R.in = id.lexval} R1 { E.val = R.s}

R → +E {R1.in = E.val+R.in}R1 {R.s = R1.s}

R → *E { R1.in = E.val*R.in }R1 { R.s = R1.s }

R → ε { R.s = R.in}

9.  Consider the following grammar for arithmetic expression using an operator ‘op’ to integer or real numbers.

    E → E1 op E2 | num.num | num | id

Give the syntax directed definitions to determine the type of expression as when two integers are used in expression, resulting type is integer otherwise real.
6 marks view

E → id { E.type = lookup(id.entry)}

E → num {E.type = integer }

E → num.num { E.type = real}

E → E1 op E2 { E.type = if(E1.type = integer and E2.type = integer)

                        then integer

                        else if (E1.type = integer and E2.type = real)

                        then real

                        else if (E1.type = real and E2.type = integer)

                        then real

                        else if (E1.type = real and E2.type = real)

                        then real

                        else type_error()

            }

10.  What is three address code? Translate the expression a = b * - c + b * - c in to three address code statement.

6 marks view

The address code that uses at most three addresses, two for operands and one for result is called three address code. Each instruction in three address code can be described as a 4-tuple: (operator, operand1, operand2, result).

A quadruple (Three address code) is of the form: x = y op z where x, y and z are names, constants or compiler-generated temporaries and op is any operator.

Followings are the 3-address code statements for different statements.

  • Assignment statement:

            x = y op z ( binary operator op)

             x = op y ( unary operator op)

  • Copy statement: x = z 
  • Unconditional jump: goto L ( jump to label L)
  • Conditional jump: if x relop y goto L
  • Procedure call:

            param x1

            param x2

            . .

            ..

            param xn

            call p, n i.e. call procedure P(x1,x2,….. xn)

  • Indexed assignment:

            x = y[i]

            x[i] = y

  • Address and pointer assignments:

            x = &y

            x = *y

            *x = y


Given expression 

a = b * - c + b * - c

The three address code of given statement is given below:

t1 = -c

t2 = b * t1

t3 = -c

t4 = b * t3

t5 = t2 + t4

a = t5