Compiler Design and Construction 2070

Tribhuwan University
Institute of Science and Technology
2070
Bachelor Level / Sixth Semester / Science
Computer Science and Information Technology ( CSC-352 )
( Compiler Design and Construction )
Full Marks: 60
Pass Marks: 24
Time: 3 hours
Candidates are required to give their answers in their own words as far as practicable.
The figures in the margin indicate full marks.

Attempt all questions.                    (10x6=60)

2.  Why are regular expressions used in token specification? Write the regular expression to specify the identifier like in C.

6 marks view

The lexical analyzer needs to scan and identify only a finite set of valid string/token/lexeme that belong to the language in hand. It searches for the pattern defined by the language rules. Regular expressions have the capability to express finite languages by defining a pattern for finite strings of symbols. So regular expressions are used in token specification.

Regular expressions are the algebraic expressions that are used to describe tokens of a programming language. It uses the three regular operations. These are called union/or, concatenation and star. Brackets ( and ) are used for grouping.

  • Union of two languages L and M is written as

                L U M = {s | s is in L or s is in M}

  • Concatenation of two languages L and M is written as

                LM = {st | s is in L and t is in M}

  • The Kleene Closure of a language L is written as

                L* = Zero or more occurrence of language L.


Now,

In C the regular expression for identifiers can be written using the regular definition as;

    letter → A | B | .... | Z | a | b | .... | z | _

    digit → 0 | 1 | 2 | ..... |9

    identifier → letter(letter | digit)*

The regular expression representing identifiers without using regular definition:

(A | B | C |.........| Z | a | b | c |.........| z | _). ((A | B | C |.........| Z | a | b | c |.........| z | _ |) (1 | 2 |................| 9))*

3.  Discuss the specification of lexical analyzer generator Lex.

6 marks view

Lexical Analyzer Generator (Lex/Flex) systematically translate regular definitions into C source code for efficient scanning. Generated code is easy to integrate in C applications. Flex is a latter version of lex.

  • Firstly, specification of a lexical analyzer is prepared by creating a program lex.l in Lex/Flex
  • lex.l is run into lex compiler to produce a C program lex.yy.c
  • lex.yy.c consists of the tabular representation of lex.l, together with a standard routine that uses the table to recognize lexeme.
  • Lex.yy.c is run through C compiler to produce the object program a.out which is a lexical analyzer that input into tokens.

A  lex/flex specification consists of three parts:

1. regular definitions, C declarations in %{ %}

        %%

2. translation rules

        %%

3. user-defined auxiliary procedures

The translation rules are of the form:

    p1     { action1 }

    p2     { action2 }

    …

    pn     { actionn }

Lex/flex regular definitions are of the form:

       name definition

E.g. Digit   [0-9]

Example of lex specification

%{

    #include <stdio.h>

%}

%%

[0-9]+   { printf(“%s\\n”, yytext); }

| \\n     {  }

%%

main()

{ yylex();

}

4.  Consider the grammar:

        

    a. Show that this grammar is ambiguous by constructing two different leftmost derivations for sentence abab.

    b. Construct the corresponding rightmost derivations for abab.

    c. Construct the corresponding parse trees for abab.

6 marks view

Given grammar;

 

a) Leftmost derivation:

Since the above grammar produces two different left-most derivation for the sentence abab, the given grammar is ambiguous.

b) Rightmost derivation:

c) The corresponding parse tree for abab is


5.  Consider the following grammar:

        

    a.  Show steps of shift-reduce parsing for the input string id+id*id.

    b.  Identify conflicts during the parsing.

6 marks view

Given grammar;

a)

Input string: id + id * id


b)

Here, the shift-reduce conflict is encountered during the parsing. At $E+T, the conflict occured whether to reduce or shift during the parsing.

6.  Describe the L- attributed definitions. How L-attributed definitions are evaluated?

6 marks view

An syntax directed definition that uses both synthesized and inherited attribute with restriction of not taking values from right siblings is called L-attributed definition.

Mathematically,

A syntax-directed definition is L-attributed if each inherited attribute of Xj on the right side of A → X1 X2 … Xn depends only on

  • The attributes of the symbols X1, X2, …, Xj-1
  • The inherited attributes of A

Consider a production: S → ABC

Here, S can take values from A, B, and C (synthesized). 'A' can take values from S only. B can take values from S and A. C can get values from S, A, and B.

L – attributed definitions allow for a natural order of evaluating attributes: depth-first and left to right. Every S-attributed syntax-directed definition is also L-attributed and the above rule applied only for inherited attributes.

Procedure for Depth First Evaluation of L – Attributed Definitions:

Procedure dfvisit(n: node);     //depth-first evaluation

    for each child m of n, from left to right do

          evaluate inherited attributes of m;

          dfvisit(m);

    evaluate synthesized attributes of n

Example:

A → XY

X.i = A.i

Y.i = X.s

A.s = Y.s

7.  Write the type expressions for the following types:

a.       An array of pointers to real’s, where the array index range from 1 to 100.

b.       Function whose domains are function from two characters and whose range is a pointer of integer.

6 marks view

a) 

T→ real     { T.type = real}

E → array [100, T]     { if T.Type = real then T.type = array(1..100, T) else type_error() }

E→ Pointer[E1]     { if (E1.type = array[100, real)) then E.type = E1.type else E.type = type-error }


b)

T→ int         { T.type = int }

T →char     { T.type = char}

T→ Pointer[T1]     { T.type = Pointer(T1.type) }

E→ E1[E2]     { if ( E2.type = (char, char) and E1.type = (char, char) → Pointer(int) then E.type = E1.type else type_error }

8.  What do you mean by three address code? Write the syntax directed definition for following grammar to produce the three address codes for assignments

    S -> id = E

    E -> id

6 marks view

The address code that uses at most three addresses, two for operands and one for result is called three address code. Each instruction in three address code can be described as a 4-tuple: (operator, operand1, operand2, result).

A quadruple (Three address code) is of the form: x = y op z where x, y and z are names, constants or compiler-generated temporaries and op is any operator.

For e.g. 

The three-address code for a+b*c-d/(b*c) is


Given grammar;

    S -> id = E

    E -> id

Syntax directed definition for given grammar to produce the three address codes is:

Production

Sematic Rules

S → id = E

S.place = newtemp();

S.code = E.code || gen (id.place ‘=’ E.place);

E → id

E .place = id.entry

E.code = null

9.  Discuss the issues in design of simple code generator.

6 marks view

The final phase in compiler model is the code generator. It takes as input an intermediate representation of the source program and produces as output an equivalent target program. Designing of code generator should be done in such a way so that it can be easily implemented, tested and maintained.

The issues to code generator design includes:

1. Input to the code generator: The input to the code generator is intermediate representation together with the information in the symbol table. Intermediate representation has the several choices: 

  • Postfix notation, 
  • Syntax tree or DAG,
  • Three address code

The code generation phase needs complete error-free intermediate code as an input requires.

2. Target Program: The target program is the output of the code generator. The output can be:

  • Assembly language: It allows subprogram to be separately compiled.
  • Relocatable machine language: It makes the process of code generation easier.
  • Absolute machine language: It can be placed in a fixed location in memory and can be executed immediately.

3. Target Machine: Implementing code generation requires thorough understanding of the target machine architecture and its instruction set. 

4. Instruction Selection: Efficient and low cost instruction selection is important to obtain efficient code. When we consider the efficiency of target machine then the instruction speed and machine idioms are important factors. The quality of the generated code can be determined by its speed and size.

5. Register Allocation: Proper utilization of registers improve code efficiency. Use of registers make the computations faster in comparison to that of memory, so efficient utilization of registers is important. The use of registers are subdivided into two subproblems:

  • During Register allocation, we select only those set of variables that will reside in the registers at each point in the program.
  • During a subsequent Register assignment phase, the specific register is picked to access the variable.

6. Choice of Evaluation order: The efficiency of the target code can be affected by the order in which the computations are performed. 

10.  Define the following optimization techniques:

a.       Unreachable code elimination

b.       Flow-of-control optimization

6 marks view

a.       Unreachable code elimination

The unreachable code is a part of the source code that will never be executed due to inappropriate exit points/control flow.. By eliminating these useless things from a code, the code will get optimized. For e.g.

b.       Flow-of-control optimization

An unlabeled instruction immediately following an unconditional jump can be removed. For e.g.

Using peephole optimization unnecessary jumps can eliminate in either the intermediate code or the target code.

Goto L1

. . . . . . .

L1: goto L2

By the sequence,

Goto L2

. . . . . . .

L1: goto L2

If there are no jumps to L1 then it may be possible to eliminate the statement L1: goto L2 provided it preceded by an unconditional jump.

1.  What do you mean by compiler? Explain the semantic analysis phase of compiler construction.

6 marks view

compiler is a program that takes a program written in a source language and translates it into an equivalent program in a target language. As an important part of a compiler is error showing to the programmer.

Block diagram for phases of compiler:

COMPILER DESIGN: Phases of a compiler

In Semantic Analysis phase, the semantic analyzer checks the source program for semantic errors and collects the type information for the code generation. Semantic analyzer checks whether they form a sensible set of instructions in the programming language or not. Type-checking is an important part of semantic analyzer. 

Semantic analyzer is required when the compiler may require performing some additional checks such as determining the type of expressions and checking that all statements are correct with respect to the typing rules, that variables have been properly declared before they are used, that functions are called with the proper number of parameters etc. This semantic analyzer phase is carried out using information from the parse tree and the symbol table.

Example: