Compiler Design and Construction 2071-II

Tribhuwan University
Institute of Science and Technology
2071-II
Bachelor Level / Sixth Semester / Science
Computer Science and Information Technology ( CSC-352 )
( Compiler Design and Construction )
Full Marks: 60
Pass Marks: 24
Time: 3 hours
Candidates are required to give their answers in their own words as far as practicable.
The figures in the margin indicate full marks.

Attempt all questions.                     (10x6=60)

1.  Define the compiler. Explain the phases of compiler.

6 marks view

compiler is a program that takes a program written in a source language and translates it into an equivalent program in a target language. As an important part of a compiler is error showing to the programmer.

A compiler operates in phases. A phase is a logically interrelated operation that takes source program in one representation and produces output in another representation. The phases of a compiler are shown in below:

1. Analysis phase breaks up the source program into constituent pieces and creates an intermediate representation of the source program. Analysis of source program includes: lexical analysis, syntax analysis and semantic analysis.

2. Synthesis phase construct the desired target program from the intermediate representation. The synthesis part of compiler consists of the following phases: Intermediate code generation, Code optimization and Target code generation.

COMPILER DESIGN: Phases of a compiler

1. Lexical Analysis:

In this phase, lexical analyzer reads the source program and returns the tokens of the source program. Token is a sequence of characters that can be treated as a single logical entity (such as identifier, operators, keywords, constants etc.).

Example:

    Input String: c = a + b * 3

    Tokens: id1 = id2 + id3 * 3

2. Syntax Analysis:

In this phase, the syntax analyzer takes the token produced by lexical analyzer as input and generates a parse tree as output. In syntax analysis phase, the parser checks that the expression made by the token is syntactically correct or not, according to the rules that define the syntax of the source language.

Example:


3. Semantic Analysis:

In this phase, the semantic analyzer checks the source program for semantic errors and collects the type information for the code generation. Semantic analyzer checks whether they form a sensible set of instructions in the programming language or not. Type-checking is an important part of semantic analyzer.

Example:


4. Intermediate Code Generation:

If the program syntactically and semantically correct then intermediate code generator generates a simple machine independent intermediate language. The intermediate code should be generated in such a way that it can easily translated into the target machine code.

Example:

    t1 = 3.0;

    t2 = id3 * t1;

    t3 = id2 + t2;

    id1 = t3;

5. Code Optimization:

It is used to improve the intermediate code so that the output of the program could run faster and takes less space. It removes the unnecessary lines of the code and arranges the sequence of statements in order to speed up the program execution without wasting resources.

Example:

    t2 = id3 * 3.0;

    id1 = id2 + t2;

6. Code Generation:

Code generation is the final stage of the compilation process. It takes the optimized intermediate code as input and maps it to the target machine language.

Example:

    MOV  R1, id3

    MUL  R1, #3.0

    MOV  R2, id2

    ADD  R1, R2

    MOV  id1, R1

Symbol Table

Symbol tables are data structures that are used by compilers to hold information about source-program constructs. The information is collected incrementally by the  analysis phase of compiler and used by the synthesis phases to generate the target code. Entries in the symbol table contain information about an identifier such as its type, its position in storage, and any other relevant information.

Error Handling

Whenever an error is encountered during the compilation of the source program, an error handler is invoked. Error handler generates a suitable error reporting message regarding the error encountered.

2.  Design a lexical analyzer generator and explain it.

6 marks view

Lexical Analyzer Generator (Lex/Flex) systematically translate regular definitions into C source code for efficient scanning. Generated code is easy to integrate in C applications. Flex is a latter version of lex.

  • Firstly, specification of a lexical analyzer is prepared by creating a program lex.l in Lex/Flex
  • lex.l is run into lex compiler to produce a C program lex.yy.c
  • lex.yy.c consists of the tabular representation of lex.l, together with a standard routine that uses the table to recognize lexeme.
  • Lex.yy.c is run through C compiler to produce the object program a.out which is a lexical analyzer that input into tokens.

flex specification consists of three parts:

1. regular definitions, C declarations in %{ %}

        %%

2. translation rules

        %%

3. user-defined auxiliary procedures

The translation rules are of the form:

    p1     { action1 }

    p2     { action2 }

    ā€¦

    pn     { actionn }

Lex/flex regular definitions are of the form:

       name definition

E.g. Digit   [0-9]

Example of lex specification

%{

    #include <stdio.h>

%}

%%

[0-9]+   { printf(ā€œ%s\\nā€, yytext); }

| \\n     {  }

%%

main()

{ yylex();

}

3.  Differentiate between top-down parsing and bottom-up parsing.

6 marks view

Top-down Parsing

Bottom-up Parsing

It is a parsing strategy that first looks at the highest level of the parse tree and works down the parse tree by using the rules of grammar.

It is a parsing strategy that first looks at the lowest level of the parse tree and works up the parse tree by using the rules of grammar.

Top-down parsing attempts to find the left most derivations for an input string.

Bottom-up parsing can be defined as an attempt to reduce the input string to start symbol of a grammar.

In this parsing technique we start parsing from top (start symbol of parse tree) to down (the leaf node of parse tree) in top-down manner.

In this parsing technique we start parsing from bottom (leaf node of parse tree) to up (the start symbol of parse tree) in bottom-up manner.

This parsing technique uses Left Most Derivation.

This parsing technique uses Right Most Derivation.

Its main decision is to select what production rule to use in order to construct the string.

Its main decision is to select when to use a production rule to reduce the string to get the starting symbol.

A top-down parser can be easily structured and formed.

It is difficult to produce a bottom-up parser

It is less powerful.

It is more powerful.

Error detection is easy.

Error detection is difficult.

4.  Translate the arithmetic expression a*-(b+c) into syntax tree. Explain the ambiguous grammar.

6 marks view

Syntax tree for given arithmetic expression is shown below:

a*-(b+c) 


If a same terminal string can be derived from the grammar using two or more distinct left-most derivation (or right most) then the grammar is said to be ambiguous i.e. from an ambiguous grammar, we can get two or more distinct parse tree for the same terminal string.

For example:

E ā†’ E+E | E*E | id

String: id+id*id

Derivation 1:

Derivation 2:

For the string id + id * id, the above grammar generates two parse trees or two distinct derivation. So the grammar is ambiguous.

5.  Explain the dynamic programming code generation algorithm with example.

6 marks view

The dynamic programming algorithm can be used to generate code for any machine with r interchangeable registers R0,R1, .. . , Rr-1 and load, store, and add instructions. For simplicity, we assume every instruction costs one unit, although the dynamic programming algorithm can easily be modified to work even if each instruction has its own cost.

The dynamic programming algorithm partitions the problem of generating optimal code for an expression into the subproblems of generating optimal code for the subexpressions of the given expression.


  • Contiguous evaluation: Complete the evaluation of T1, T2, then evaluate root.
  • Non-contiguous evaluation: First evaluate part of T1 leaving the value in a register, next evaluate T2, then return to evaluate the rest of T1.

Dynamic programming algorithm uses contiguous evaluation.

The dynamic programming algorithm proceeds in three phases (suppose the target machine has r registers):

1. Compute bottom-up for each node n of the expression tree T an array C of costs, in which the ith component C[i] is the optimal cost of computing the subtree S rooted at n into a register, assuming i registers are available for the computation, for 1 <= i <= r.

2. Traverse T, using the cost vectors to determine which subtrees of T must be computed into memory.

3. Traverse each tree using the cost vectors and associated instructions to generate the final target code. The code for the subtrees computed into memory locations is generated first.

Example: 

Consider a machine having two registers R0 and Rl, and the following instructions, each of unit cost:

In these instructions, Ri is either RO or Rl, and Mi is a memory location. The operator op corresponds to an arithmetic operators.

Now, Computing cost vector at each node:

Final Output:


6.  What do you mean by code optimization? Explain the basic blocks and their optimization.

6 marks view

Code optimization is used to improve the intermediate code so that the output of the program could run faster and takes less space. It removes the unnecessary lines of the code and arranges the sequence of statements in order to speed up the program execution without wasting resources. It tries to improve the code by making it consume less resources (i.e. CPU, Memory) and deliver high speed.

A basic block is a sequence of consecutive instructions in which flow of control enters by one entry point and exit to another point without halt or branching except at the end. 

Basic block optimization techniques are:

1. Common-Subexpression Elimination: In the common sub-expression, we don't need to be computed it over and over again. Instead of this we can compute it once and kept in store from where it's referenced when encountered again. For e.g.


2. Dead Code Elimination: The dead code may be a variable or the result of some expression computed by the programmer that may not have any further uses. By eliminating these useless things from a code, the code will get optimized. For e.g.


3. Renaming Temporary Variables: Temporary variables that are dead at the end of a block can be safely renamed. The basic block is transforms into an equivalent block in which each statement that defines a temporary defines a new temporary. Such a basic block is called normal-form block or simple block. For e.g.


4. Interchange of Statements: Independent statements can be reordered without effecting the value of block to make its optimal use. For e.g.


5. Algebraic Transformations: In the algebraic transformation, we can change the set of expression into an algebraically equivalent set. Simplify expression or replace expensive expressions by cheaper ones. For e.g.

7.  What are the generic issues in the design of code generators? Explain

6 marks view

The final phase in compiler model is the code generator. It takes as input an intermediate representation of the source program and produces as output an equivalent target program. Designing of code generator should be done in such a way so that it can be easily implemented, tested and maintained.

The issues to code generator design includes:

1. Input to the code generator: The input to the code generator is intermediate representation together with the information in the symbol table. Intermediate representation has the several choices: 

  • Postfix notation, 
  • Syntax tree or DAG,
  • Three address code

The code generation phase needs complete error-free intermediate code as an input requires.

2. Target Program: The target program is the output of the code generator. The output can be:

  • Assembly language: It allows subprogram to be separately compiled.
  • Relocatable machine language: It makes the process of code generation easier.
  • Absolute machine language: It can be placed in a fixed location in memory and can be executed immediately.

3. Target Machine: Implementing code generation requires thorough understanding of the target machine architecture and its instruction set. 

4. Instruction Selection: Efficient and low cost instruction selection is important to obtain efficient code. When we consider the efficiency of target machine then the instruction speed and machine idioms are important factors. The quality of the generated code can be determined by its speed and size.

5. Register Allocation: Proper utilization of registers improve code efficiency. Use of registers make the computations faster in comparison to that of memory, so efficient utilization of registers is important. The use of registers are subdivided into two subproblems:

  • During Register allocation, we select only those set of variables that will reside in the registers at each point in the program.
  • During a subsequent Register assignment phase, the specific register is picked to access the variable.

6. Choice of Evaluation order: The efficiency of the target code can be affected by the order in which the computations are performed. 

8.  What are the compiler construction tools? Explain.

6 marks view

For the construction of a compiler, the compiler writer uses different types of software tools that are known as compiler construction tools. These tools make use of specialized languages for specifying and implementing specific components, and most of them use sophisticated algorithms. The tools should hide the details of the algorithm used and produce component in such a way that they can be easily integrated into the rest of the compiler. Some of the most commonly used compiler construction tools are:

1. Parser generators: They automatically produce syntax analyzers from a grammatical description of a programming language.

2. Scanner generators: They produce lexical analyzers from a regular-expression description of the tokens of a language.

3. Syntax-directed translation engine: They produce collections of routines for walking a parse tree and generating intermediate code.

4. Code-generator generators: They produce a code generator from a collection of rules for translating each operation of the intermediate language into the machine language for a target machine.

5. Data-flow analysis engines: They facilitate the gathering of information about how the data is transmitted from one part of the program to another. For code optimization, data-flow analysis is a key part.

6. Compiler-construction toolkits: They provide an integrated set of routines for constructing various phases of a compiler.

9.  Explain the principle sources of code optimization with example.

6 marks view

Code optimization is used to improve the intermediate code so that the output of the program could run faster and takes less space. It removes the unnecessary lines of the code and arranges the sequence of statements in order to speed up the program execution without wasting resources. It tries to improve the code by making it consume less resources (i.e. CPU, Memory) and deliver high speed.

A transformation of a program is called local if it can be performed by looking only at the statements in a basic block; otherwise, it is called global. Many transformations can be performed at both the local and global levels. Local transformations are usually performed first.

There are a number of ways in which a compiler can improve a program without changing the function its computes.

Principal sources of code optimization are:

1. Common-Subexpression Elimination: In the common sub-expression, we don't need to be computed it over and over again. Instead of this we can compute it once and kept in store from where it's referenced when encountered again. For e.g.

2. Copy Propagation: Assignments of the form f : = g called copy statements, or copies for short. The idea behind the copy-propagation transformation is to use g for f, whenever possible after the copy statement f: = g. Copy propagation means use of one variable instead of another. This may not appear to be an improvement, but as we shall see it gives us an opportunity to eliminate x. For e.g.

    x = Pi; 

    A=x*r*r;

The optimization using copy propagation can be done as follows: A=Pi*r*r;

Here the variable x is eliminated.


3. Dead Code Elimination: The dead code may be a variable or the result of some expression computed by the programmer that may not have any further uses. By eliminating these useless things from a code, the code will get optimized. For e.g.

4. Constant folding: Deducing at compile time that the value of an expression is a constant and using the constant instead is known as constant folding. The code that can be simplified by user itself, is simplified. For e.g.

   Initial code:

        x = 2 * 3;

   Optimized code:

        x = 6;

5. Loop Optimizations: In loops, especially in the inner loops, programs tend to spend the bulk of their time. The running time of a program may be improved if the number of instructions in an inner loop is decreased, even if we increase the amount of code outside that loop. Some loop optimization techniques are:

    i) Frequency Reduction (Code Motion): In frequency reduction, the amount of code in loop is decreased. A statement or expression, which can be moved outside the loop body without affecting the semantics of the program, is moved outside the loop. For e.g.

Before optimization:

while(i<100)

{

 a = Sin(x)/Cos(x) + i;

 i++;

}

After optimization:

t = Sin(x)/Cos(x);

while(i<100)

{

 a = t + i;

 i++;



    ii) Induction-variable elimination, which we apply to replace variables from inner loop.

    iii) Reduction in Strength: The strength of certain operators is higher than other operators. For example, strength of * is higher than +. Usually, compiler takes more time for higher strength operators and execution speed is less. Replacement of higher strength operator by lower strength operator is called a strength reduction technique

Optimization can be done by applying strength reduction technique where higher strength can be replaced by lower strength operators. For e.g.

Before optimization:

for (i=1;i<=10;i++)

 {

   sum = i * 7;

   printf(ā€œ%dā€, sum);

}

 

After optimization:

temp = 7;

for(i=1;i<=10;i++)

{

  temp = temp + 7;

  sum = temp;

  printf(ā€œ%dā€, sum)

 }

10.  Differentiate between C compiler and Pascal compiler.

6 marks view

Pascal Compiler

C/C++ Compiler

It is one pass compiler where all the phases are combined into one pass.

It is multi-pass compiler where different phases of compiler are grouped into multiple phases.

Here intermediate representation of source program is not created.

Here intermediate representation of source program is created.

It is faster than C/C++ compiler.

It is slightly slower than Pascal compiler.

It does not look back at code it

previously processed

Each pass takes the result of the previous pass as the input and create intermediate output.

Unable to generate efficient programs due to limited scope available.

Due to wider scope availability, these

compiler allows better code generation

It simplifies the job of writing a compiler but limited.

It is complex job writing a multi-pass compiler, code is improved pass by pass until final code.

Memory consumption is lower.

Memory consumption is higher.