Compiler Design and Construction 2075
Attempt all questions. (10x6=60)
1. Differentiate between compiler and interpreter. "Symbol table is a necessary component of compiler", justify this statement with examples.
Difference between Interpreter and Compiler
Symbol tables are data structures that are used by compilers to hold information about source-program constructs. The information is collected incrementally by the analysis phase of a compiler and used by the synthesis phases to generate the target code. Entries in the symbol table contain information about an identifier such as its type, its position in storage, and any other relevant information.
Symbol tables typically need to support multiple declarations of the same identifier within a program. The lexical analyzer can create a symbol table entry and can return token to the parser, say id, along with a pointer to the lexeme. Then the parser can decide whether to use a previously created symbol table or create new one for the identifier.
The basic operations defined on a symbol table include
- allocate – to allocate a new empty symbol table
- free – to remove all entries and free the storage of a symbol table
- insert – to insert a name in a symbol table and return a pointer to its entry
- lookup – to search for a name and return a pointer to its entry
- set_attribute – to associate an attribute with a given entry
- get_attribute – to get an attribute associated with a given entry
2. List out the major tasks carried out in Lexical Analysis Phase. Convert the following NFA to DFA.
In lexical analysis phase, Lexical Analyzer reads the input characters of the source program, groups them into lexemes, and produces a sequence of tokens for each lexeme. The tokens are then sent to the parser for syntax analysis.
Task carried out in lexical analysis phase are:
- Lexical analyzer reads the input character and produces output sequence of tokens that the Parser uses for syntax analysis.
- Lexical analyzer helps to identify token into the symbol table.
- Lexical Analyzer is also responsible for eliminating comments and white spaces from the source program.
- It also generates lexical errors.
- Lexical analyzer is used by web browsers to format and display a web page with the help of parsed data from JavsScript, HTML, CSS
Second part:
Using the subset construction we have the DFA as:
States |
Next states |
|
0 |
1 |
|
→ *{p} |
{r, q} |
Φ |
*{r, q} |
{r, q} |
{r} |
*{r} |
{r} |
Φ |
Φ |
Φ |
Φ |
Now Transition diagram for above table is:
3. Differentiate between recursive descent and non-recursive predictive parsing method. Find first and follow of all the non-terminals in the following grammar.
Difference between recursive descent and non-recursive predictive parsing
Recursive predictive descent parsing |
Non-recursive predictive descent parsing |
It
is a technique which may or may not require backtracking process. |
It
is a technique that does not require any kind of back tracking. |
It
uses procedures for every non terminal entity to parse strings. |
It
finds out productions to use by replacing input string. |
It
is a type of top-down parsing built from a set of mutually recursive
procedures where each procedure implements one of non-terminal s of grammar. |
It
is a type of top-down approach, which is also a type of recursive parsing
that does not uses technique of backtracking. |
It
contains several small functions one for each non- terminals in grammar. |
The
predictive parser uses a look ahead pointer which points to next input
symbols to make it parser back tracking free, predictive parser puts some
constraints on grammar. |
It
accepts all kinds of grammars. |
It
accepts only a class of grammar known as LL(k) grammar. |
Now,
Given grammar;
Now, the FIRST and FOLLOW for the non-terminals are:
Non-terminals |
FIRST() |
FOLLOW() |
E |
{(, id} |
{), $} |
A |
{+, ∈} |
{), $} |
T |
{(, id} |
{), +, $} |
B |
{*, ∈} |
{), +, $} |
F |
{(, id} |
{*, +, ), $} |
4. Construct SLR parse table for the following grammar.
The augmented grammar of given grammar is:
S' → S
S → E
E → E+T | T
T → T*F | F
F → id
Next, we obtain the canonical collection of sets of LR(0) items, as follows,
5. What is Syntax Directed Definition? Define synthesized and inherited attributes with example.
A syntax-directed definition (SDD) is a context-free grammar together with attributes and rules. Attributes are associated with grammar symbols and rules are associated with productions. For example,
Production | Semantic Rules |
E → E1 + T | E.val = E1.val+T.val |
E → T | E.val = T.val |
Synthesized Attributes
The attribute of node that are derived from its children nodes are called synthesized attributes. Assume the following production:
S → ABC
If S is taking values from its child nodes (A, B, C), then it is said to be a synthesized attribute, as the values of ABC are synthesized to S.
Example for synthesized attribute:
Production |
Semantic
Rules |
E → E1 *
E2 |
{E.val = E1.val * E2.val} |
E → E1 +
E2 |
{E.val = E1.val +E2.val} |
E → int
|
{E.val = int.val} |
The Syntax Directed Definition associates to each non terminal a synthesized attribute called val.
In synthesized attributes, value owns from child to parent in the parse tree. For example, for string 3*2+5
Inherited Attributes
The attributes of node that are derived from its parent or sibling nodes are called inherited attributes. If the value of the attribute depends upon its parent or siblings then it is inherited attribute. As in the following production,
S → ABC
"A‟ can get values from S, B and C. B can take values from S, A, and C. Likewise, C can take values from S, A, and B.
Example for Inherited Attributes:
Production |
Semantic
Rules |
D ® TL |
L.in = T.type |
T ® int |
T.type = integer |
T ® real |
T.type = real |
L ® L1, id |
L1.in = L.in; addtype(id.entry,
L.in) |
L ® id |
addtype(id.entry, L.in) |
Symbol T is associated with a synthesized attribute type and symbol L is associated with an inherited attribute in.
In inherited attributes, values flows into a node in the parse tree from parents and/or siblings. For example, for input: int id, id
6. Differentiate between static and dynamic type checking. How can we carry out type checking for the following expression using syntax-directed definition?
The key difference between the static and dynamic type checking is that with static type checking, the type of variable is known at compile time (it checks the type of variable before running) while with dynamic type checking, the type of variable is known at runtime (it checks the type of variable while executing).
Type checking for the given expression using syntax-directed definition:
S → id = E { if (id.type = E.type then S.type=void else S.type = type-error }
S → if E then S1 { if (E.type = boolean then S.type = S1.type else S.type = type-error }
S → while E do S1 { if (E.type = boolean then S.type = S1.type else S.type = type-error }
S → S1;S2 { if S1.type = void and S2.type = void then S.type = void else S.type = type_error }
7. Define three address codes. Write three address codes for
S --> do m=n-p while a<=b
The address code that uses at most three addresses, two for operands and one for result is called three address code. Each instruction in three address code can be described as a 4-tuple: (operator, operand1, operand2, result).
A quadruple (Three address code) is of the form: x = y op z where x, y and z are names, constants or compiler-generated temporaries and op is any operator.
Followings are the 3-address code statements for different statements.
- Assignment statement:
x = y op z ( binary operator op)
x = op y ( unary operator op)
- Copy statement: x = z
- Unconditional jump: goto L ( jump to label L)
- Conditional jump: if x relop y goto L
- Procedure call:
param x1
param x2
. .
..
param xn
call p, n i.e. call procedure P(x1,x2,….. xn)
- Indexed assignment:
x = y[i]
x[i] = y
- Address and pointer assignments:
x = &y
x = *y
*x = y
Given expression;
S --> do m=n-p while a<=b
Three address code for given expression:
1. m = n-p
2. if a<b goto (1)
3. if a=b goto (1)
4. stop
8. Defıne code optimization. Discuss about any three code optimization techniques with example.
Code optimization is used to improve the intermediate code so that the output of the program could run faster and takes less space. It removes the unnecessary lines of the code and arranges the sequence of statements in order to speed up the program execution without wasting resources. It tries to improve the code by making it consume less resources (i.e. CPU, Memory) and deliver high speed.
Some code optimization techniques are:
1. Common-Subexpression Elimination: In the common sub-expression, we don't need to be computed it over and over again. Instead of this we can compute it once and kept in store from where it's referenced when encountered again. For e.g.
2. Dead Code Elimination: The dead code may be a variable or the result of some expression computed by the programmer that may not have any further uses. By eliminating these useless things from a code, the code will get optimized. For e.g.
3. Algebraic Transformations: In the algebraic transformation, we can change the set of expression into an algebraically equivalent set. Simplify expression or replace expensive expressions by cheaper ones. For e.g.
9. What is activation record? Discuss the different activities performed by caller and callee during procedure call and return.
A program is a sequence of instructions combined into a number of procedures. Instructions in a procedure are executed sequentially. A procedure has a start and an end delimiter and everything inside it is called the body of the procedure. The procedure identifier and the sequence of finite instructions inside it make up the body of the procedure.
The execution of a procedure is called its activation. An activation record contains all the necessary information required to call a procedure. An activation record may contain the following units (depending upon the source language used).
In any function call, the routine that initiates the call is called caller and the routine that is being called is known as callee.
Activities performed by caller and callee during procedure call and return are given below:
On the caller's side:
- save registers for later use
- stack push
- caller-save registers
- arrange for the procedure arguments to be found by the procedure
- copy some arguments to registers
- push additional arguments onto stack
- call the procedure
- save return address in register or on stack
- jump to procedure starting address
- later, the return value (if any) is in a pre-determined place, and control is transfered back to the return address
- copy the return value
- move to another register or store to memory (pop the stack if necessary)
- throw away the procedure arguments
- adjust stack pointer
- restore saved registers
- stack pop
- continue
On the callee's side:
- control is transfered from the caller to the starting address
- return address is in register or on stack
- arguments are in registers and on stack
- save registers for later use
- stack push
- arguments and return address
- caller's frame pointer
- other callee-save registers
- allocate storage for automatic local variables
- set the frame pointer, adjust the stack pointer
- assign initial values
- allocate storage for temporaries
- adjust the stack pointer
- work
- stack may be used for additional temporaries
- push, use, pop
- put return value in appropriate place
- register or stack
- throw away the local variables and temporaries
- adjust the stack pointer
- restore saved registers
- stack pop
- arguments and return address
- caller's frame pointer
- other callee-saved registers
- jump through the return address register
10. Discuss about the different factors affecting target code generation.
The final phase in compiler model is the code generator. It takes as input an intermediate representation of the source program and produces as output an equivalent target program. Designing of code generator should be done in such a way so that it can be easily implemented, tested and maintained.
The different factors affecting target code generation includes:
1. Input to the code generator: The input to the code generator is intermediate representation together with the information in the symbol table. Intermediate representation has the several choices:
- Postfix notation,
- Syntax tree or DAG,
- Three address code
The code generation phase needs complete error-free intermediate code as an input requires.
2. Target Program: The target program is the output of the code generator. The output can be:
- Assembly language: It allows subprogram to be separately compiled.
- Relocatable machine language: It makes the process of code generation easier.
- Absolute machine language: It can be placed in a fixed location in memory and can be executed immediately.
3. Target Machine: Implementing code generation requires thorough understanding of the target machine architecture and its instruction set.
4. Instruction Selection: Efficient and low cost instruction selection is important to obtain efficient code. When we consider the efficiency of target machine then the instruction speed and machine idioms are important factors. The quality of the generated code can be determined by its speed and size.
5. Register Allocation: Proper utilization of registers improve code efficiency. Use of registers make the computations faster in comparison to that of memory, so efficient utilization of registers is important. The use of registers are subdivided into two subproblems:
- During Register allocation, we select only those set of variables that will reside in the registers at each point in the program.
- During a subsequent Register assignment phase, the specific register is picked to access the variable.
6. Choice of Evaluation order: The efficiency of the target code can be affected by the order in which the computations are performed.