Compiler Design and Construction - Old Questions

6.  What are the main issues involved in designing lexical analyzer? Mention the various error recovery strategies for a lexical analyzer.

6 marks | Asked in 2071-I

Lexical analysis is the process of producing tokens from the source program. It has the following issues:

    • Lookahead

    • Ambiguities

1. Lookahead:

Lookahead is required to decide when one token will end and the next token will begin. The simple example which has lookahead issues are i vs. if, = vs. ==. Therefore a way to describe the lexemes of each token is required.

A way needed to resolve ambiguities

    • Is if it is two variables and or if?

    • Is == is two equal signs =, = or ==?

    • arr(5, 4) vs. fn(5, 4) II in Ada (as array reference syntax and function call syntax are similar.

Hence, the number of lookahead to be considered and a way to describe the lexemes of each token is also needed.

2. Ambiguities:

The lexical analysis programs written with lex accept ambiguous specifications and choose the longest match possible at each input point. Lex can handle ambiguous specifications. When more than one expression can match the current input, lex chooses as follows:

• The longest match is preferred.

• Among rules which matched the same number of characters, the rule given first is preferred.


Error recovery strategies for a lexical analyzer

A character sequence that cannot be scanned into any valid token is a lexical error. Misspelling of identifiers, keyword, or operators are considered as lexical errors. Usually, a lexical error is caused by the appearance of some illegal character, mostly at the beginning of a token.

The following are the error-recovery strategies in lexical analysis:
1) Panic Mode Recovery: Once an error is found, the successive characters are always ignored until we reach a well-formed token like end, semicolon. 

2) Deleting an extraneous character.

3) Inserting a missing character.

4)Replacing an incorrect character by a correct character.

5)Transforming two adjacent characters.