Compiler Design and Construction - Old Questions

2.  What is the role of regular expression in lexical analysis? Also give example.

6 marks | Asked in 2074

The lexical analyzer needs to scan and identify only a finite set of valid string/token/lexeme that belong to the language in hand. It searches for the pattern defined by the language rules.

Regular expressions have the capability to express finite languages by defining a pattern for finite strings of symbols. The grammar defined by regular expressions is known as regular grammar. The language defined by regular grammar is known as regular language.

An important notation for specifying the patterns in known as Regular expression. Set of strings is matched by each pattern and the names of the set of strings are served by the regular expressions. Regular languages describe the programming language tokens. 

The various operations on languages are:

  • Union of two languages L and M is written as

            L U M = {s | s is in L or s is in M}

  • Concatenation of two languages L and M is written as

            LM = {st | s is in L and t is in M}

  • The Kleene Closure of a language L is written as

            L* = Zero or more occurrence of language L.

Representing valid tokens of a language in regular expression:

If x is a regular expression, then:

  • x* means zero or more occurrence of x.
  • x+ means one or more occurrence of x.
  • x? means at most one occurrence of x
  • [a-z] is all lower-case alphabets of English language.
  • [A-Z] is all upper-case alphabets of English language.
  • [0-9] is all natural digits used in mathematics.

Representing occurrence of symbols using regular expressions:

  • letter = [a – z] or [A – Z]
  • digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0-9]
  • sign = [ + | - ]

Representing language tokens using regular expressions:

  • Decimal = (sign)?(digit)+
  • Identifier = (letter)(letter | digit)*