Compiler Design and Construction - Old Questions

7. Define the terms token, pattern and lexeme. How input buffer can be used for scanner. Explain.

5 marks | Asked in Model Question

Token

Token is a sequence of characters that can be treated as a single logical entity. For example, Identifiers, Keywords, Operators, Constants etc.

Pattern

A set of string in the input for which the same token is produced as output. This set of string is described by a rule called a pattern associated with the token. For example, “a letter followed by zero or more letters, digits, or underscores.”

Lexeme

A lexeme is a sequence of characters in the source program that is matched by the pattern for a token. 


Input Buffering

Lexical analysis needs to look ahead several characters before a match can be announced. We have two buffer input scheme that is useful when look ahead is necessary.

  • Buffer Pair
  • Sentinels

Buffer Pair (2N Buffering):

In this technique buffer is divided into two halves with N-characters each, where N is number of characters on the disk block like 1024 and 4096. Rather than reading characters by characters from file we read N input character at once. If there are fewer than N character in input EOF marker is placed. There are two pointers: lexeme pointer and forward pointer.

Lexeme pointer points to the start of the lexeme while forward pointer scans the input buffer for lexeme. When forward pointer reaches the end of one half, second half is loaded and forwarded pointer points to the beginning of the next half.

Sentinels:

While using buffer pair technique, we have to check each time fwd is moved that it doesn't crosses the buffer half and when it reaches end of buffer, the other one needs to be loaded. We can solve this problem by introducing a sentinel character at the end of both halves of buffer. The sentinel can be any character which is not a part of source program. EOF is usually preferred as it will also indicate end of source program. It signals the need for some special action (fill other buffer-half, or terminate processing).