Compiler Design and Construction 2074
Attempt all questions. (10x6=60)
AI is thinking...
1. Draw block diagram of compiler. Explain different steps in synthesis phase.
A compiler is a program that takes a program written in a source language and translates it into an equivalent program in a target language. As an important part of a compiler is error showing to the programmer.
Block diagram for phases of compiler:
Synthesis phase construct the desired target program from the intermediate representation. The synthesis part of compiler consists of the following phases: Intermediate code generation, Code optimization and Target code generation.
1. Intermediate Code Generation:
If the program syntactically and semantically correct then intermediate code generator generates a simple machine independent intermediate language. The intermediate code should be generated in such a way that it can easily translated into the target machine code.
Example:
Intermediate code for: id1 = id2 + id3 * 3.0
t1 = 3.0;
t2 = id3 * t1;
t3 = id2 + t2;
id1 = t3;
2. Code Optimization:
It is used to improve the intermediate code so that the output of the program could run faster and takes less space. It removes the unnecessary lines of the code and arranges the sequence of statements in order to speed up the program execution without wasting resources.
Example:
t2 = id3 * 3.0;
id1 = id2 + t2;
3. Code Generation:
Code generation is the final stage of the compilation process. It takes the optimized intermediate code as input and maps it to the target machine language.
Example:
MOV R1, id3
MUL R1, #3.0
MOV R2, id2
ADD R1, R2
MOV id1, R1
AI is thinking...
2. What is the role of regular expression in lexical analysis? Also give example.
The lexical analyzer needs to scan and identify only a finite set of valid string/token/lexeme that belong to the language in hand. It searches for the pattern defined by the language rules.
Regular expressions have the capability to express finite languages by defining a pattern for finite strings of symbols. The grammar defined by regular expressions is known as regular grammar. The language defined by regular grammar is known as regular language.
An important notation for specifying the patterns in known as Regular expression. Set of strings is matched by each pattern and the names of the set of strings are served by the regular expressions. Regular languages describe the programming language tokens.
The various operations on languages are:
- Union of two languages L and M is written as
L U M = {s | s is in L or s is in M}
- Concatenation of two languages L and M is written as
LM = {st | s is in L and t is in M}
- The Kleene Closure of a language L is written as
L* = Zero or more occurrence of language L.
Representing valid tokens of a language in regular expression:
If x is a regular expression, then:
- x* means zero or more occurrence of x.
- x+ means one or more occurrence of x.
- x? means at most one occurrence of x
- [a-z] is all lower-case alphabets of English language.
- [A-Z] is all upper-case alphabets of English language.
- [0-9] is all natural digits used in mathematics.
Representing occurrence of symbols using regular expressions:
- letter = [a – z] or [A – Z]
- digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0-9]
- sign = [ + | - ]
Representing language tokens using regular expressions:
- Decimal = (sign)?(digit)+
- Identifier = (letter)(letter | digit)*
AI is thinking...
3. What is shift reduce parsing technique? Show shift reduce parsing action for the string a*(a+a), given the grammar
The process of reducing the given input string into the starting symbol is called shift-reducing parsing.
A shift reduce parser uses a stack to hold the grammar symbol and an input buffer to hold the input string W.
Sift reduce parsing performs the two actions: shift and reduce.
- At the shift action, the current symbol in the input string is pushed to a stack.
- At each reduction, the symbols will replaced by the non-terminals. The symbol is the right side of the production and non-terminal is the left side of the production.
Now,
Given grammar;
Input String: a*(a+a)
Shift reduce parsing action for given input string is shown below:
AI is thinking...
4. Find first and follow for the non-terminals in the following grammar.
Given grammar;
Now, the FIRST and FOLLOW for the non-terminal symbols are:
Non-terminals |
FIRST() |
FOLLOW() |
S |
{a, b, d} |
{$} |
A |
{d} |
{a, c} |
B |
{a} |
{a, c} |
AI is thinking...
5. What is semantic analysis? Why is it necessary to perform semantic analysis? Explain.
Semantic Analysis is the third phase of compiler. Semantic Analysis makes sure that declarations and statements of program are semantically correct. It is a collection of procedures which is called by parser as and when required by grammar. Both syntax tree of previous phase and symbol table are used to check the consistency of the given code.
Semantic analysis is necessary due to the following reasons:
1. In this phase, the semantic analyzer checks the source program for semantic errors and collects the type information for the code generation. Semantic analyzer checks whether they form a sensible set of instructions in the programming language or not.
2. Semantic analyzer recognizes following types of errors:
- Type mismatch
- Undeclared variables
- Reserved identifier misuse
- Multiple declaration of variable in a scope.
- Accessing an out of scope variable.
- Actual and formal parameter mismatch.
3. Semantic analysis performs the following function:
- Type Checking : Ensures that data types are used in a way consistent with their definition.
- Label Checking : A program should contain labels references.
- Flow Control Check : Keeps a check that control structures are used in a proper manner.(example: no break statement outside a loop)
Example:
float x = 10.1;
float y = x*30;
In this example integer 30 will be typecasted to float 30.0 before multiplication, by semantic analyzer.
AI is thinking...
6. Write Syntax Directed Definition to carry out type checking for the following expression.
E -> id | E1 op E2 | E1 relop E2 | E1[E2] | E1 ↑
E → id { E.type = lookup(id.entry) }
E → E1 op E2 { if (E1.type = integer and E2.type = integer) then E.type = integer
else if (E1.type = integer and E2.type = real) then E.type = real
else if (E1.type = real and E2.type = integer) then E.type = real
else if (E1.type = real and E2.type = real) then E.type = real
else E.type = type-error }
E → E1 relop E2 { if E1.type = boolean and E2.type = boolean then E.type = boolean
else E.type = error}
E → E1 [E2] { if (E2.type = int and E1.type = array(s, t)) then E.type = t
else E.type = type-error }
E →E1 ↑ { if E1.type = pointer(t) then E.type = t
else type-error}
AI is thinking...
7. What is code optimization? What is its importance?
Code optimization is used to improve the intermediate code so that the output of the program could run faster and takes less space. It removes the unnecessary lines of the code and arranges the sequence of statements in order to speed up the program execution without wasting resources. It tries to improve the code by making it consume less resources (i.e. CPU, Memory) and deliver high speed. For e.g.
Intermediate code | Optimized Intermediate Code |
t1 = 3.0; t2 = id3 * t1; t3 = id2 + t2; id1 = t3; | t2 = id3 * 3.0; id1 = id2 + t2; |
A code optimizing process must follow the three rules given below:
- The output code must not, in any way, change the meaning of the program.
- Optimization should increase the speed of the program and if possible, the program should demand less number of resources.
- Optimization should itself be fast and should not delay the overall compiling process.
Code optimization is important because:
- Code optimization enhances the portability of the compiler to the target processor.
- Code optimization allows consumption of fewer resources (i.e. CPU, Memory).
- Optimized code has faster execution speed.
- Optimized code utilizes the memory efficiently.
- Optimized code gives better performance.
- Optimized code often promotes re-usability.
AI is thinking...
8. Explain different types of loop optimization techniques.
Loop Optimization is the process of increasing execution speed and reducing the overheads associated with loops. Loop Optimization is a machine independent optimization.
Loop Optimization techniques are given below:
1. Frequency Reduction (Code Motion): In frequency reduction, the amount of code in loop is decreased. A statement or expression, which can be moved outside the loop body without affecting the semantics of the program, is moved outside the loop. For e.g.
Before
optimization: while(i<100) { a = Sin(x)/Cos(x) + i; i++; } |
After optimization: t = Sin(x)/Cos(x); while(i<100) { a = t + i; i++; } |
2. Loop Unrolling: In this technique the number of jumps and tests can be optimized by writing the code to times without changing the meaning of a code. For e.g.
Before
optimization: int i = 1; while(i<100) { a[i] = b[i]; i++; }
|
After
optimization: int i = 1; while(i<100) { a[i] = b[i]; i++; a[i] = b[i]; i++; } |
3. Loop Fusion/Loop Jamming: This technique combines the bodies of two loops whenever the same index variable and number of iterations are shared. For e.g.
Before
optimization: for(i=0;i<=10;i++) { Printf(“Jayanta”); } for(i=0;i<=10;i++) { Printf(“Poudel”); } |
After
optimization: for(i=0;i<=10;i++) { Printf(“Jayanta”); Printf(“Poudel”); }
|
4. Reduction in Strength: The strength of certain operators is higher than other
operators. For example, strength of * is higher than +. Usually, compiler takes
more time for higher strength operators and execution speed is less. Replacement of higher strength operator by lower strength
operator is called a strength reduction technique
Optimization can be done by applying strength reduction
technique where higher strength can be replaced by lower strength operators.
For e.g.
Before
optimization: for (i=1;i<=10;i++) { sum = i * 7; printf(“%d”, sum); }
|
After
optimization: temp = 7; for(i=1;i<=10;i++) { temp = temp + 7; sum = temp; printf(“%d”, sum) } |
AI is thinking...
9. Define three address codes. Write three address code for
a = -b*(c+d)
The address code that uses at most three addresses, two for operands and one for result is called three address code. Each instruction in three address code can be described as a 4-tuple: (operator, operand1, operand2, result).
A quadruple (Three address code) is of the form: x = y op z where x, y and z are names, constants or compiler-generated temporaries and op is any operator.
Followings are the 3-address code statements for different statements.
- Assignment statement:
x = y op z ( binary operator op)
x = op y ( unary operator op)
- Copy statement: x = z
- Unconditional jump: goto L ( jump to label L)
- Conditional jump: if x relop y goto L
- Procedure call:
param x1
param x2
. .
..
param xn
call p, n i.e. call procedure P(x1,x2,….. xn)
- Indexed assignment:
x = y[i]
x[i] = y
- Address and pointer assignments:
x = &y
x = *y
*x = y
Given expression
a = -b*(c+d)
The three address code of given expression is given below:
t1 = -b
t2 = c + d
t3 = t1 * t2
a = t3
AI is thinking...
10. Discuss the importance of error handler in compiler. How is it manipulated in the different phases of compilation.
Error detection and reporting of errors are important functions of the compiler. Whenever an error is encountered during the compilation of the source program, an error handler is invoked. Error handler generates a suitable error reporting message regarding the error encountered. The error reporting message allows the programmer to find out the exact location of the error. Errors can be encountered at any phase of the compiler during compilation of the source program for several reasons such as:
- In lexical analysis phase, errors can occur due to misspelled tokens, unrecognized characters, etc. These errors are mostly the typing errors.
- In syntax analysis phase, errors can occur due to the syntactic violation of the language.
- In intermediate code generation phase, errors can occur due to incompatibility of operands type for an operator.
- In code optimization phase, errors can occur during the control flow analysis due to some unreachable statements.
- In code generation phase, errors can occurs due to the incompatibility with the computer architecture during the generation of machine code. For example, a constant created by compiler may be too large to fit in the word of the target machine.
- In symbol table, errors can occur during the bookkeeping routine, due to the multiple declaration of an identifier with ambiguous attributes.
After detecting an error, a phase must somehow deal with that error, so that compilation can proceed, allowing further errors in the source program to be detected.
Error handler contains a set of routines for
handling error encountered in any phase of
the compiler. Every phase of compiler will call an appropriate
routine of the error handler as per their
specifications.
AI is thinking...