SEG-2106 - Software Construction - Winter 2016

Assignment 2 : Lexical Analysis

Given: February 22 - Due date: Monday or Tuesday, March 14 or 15, during the lab session  -  Students may work individually or in groups of two.

Please submit: Report on paper. Lex files in a zip file sent by e-mail to the TA Rui Chen ‎[]‎ who will mark your report.

1. Find regular expressions for languages (15 points)

Find regular expressions that define the following languages:

  1. XML tags where the opening tags may include attribues. The form of such tags is given by the following examples:
  2. All strings over the alphabet {a, b} that do not contain the substring aba.
  3. All strings over the alphabet {a, b} for which the number of "a" is a multiple of 3 (including zero).


2. Find accepting automata (15 points)

For each of the languages defined under point (1) above, find a deterministic accepting automaton that accepts exactly that language.

3. Comparison of regular expressions (20 points)

We consider the following pairs of regular expressions. For each pair, you should indicate whether the two expressions define the same language or not. If they are equivalent, you should give a proof by considering the algebraic properties of the regular expression operators. If they are not equivalent, you should give an example of a string that is included in one of the languages, but not in the other.

  1. (a|b|c)*                  a*|b*|c*
  2. a(b|c)|(aa|bb)c       ac|bbc|ab|aac
  3. a(bca)*bc              ab(cab)*c
  4. a (b*a|cba)*           (ab*|acb)*a

4. Construct a lexical analyser using Lex (50 points)

Your task Construct a lexical analyzer for a subset of the Pascal language   The analyzer must (at least) be able to count the occurences of identifiers (including keywords and reserved words), numbers, comments and lines of code in the program. Note: for this exercise, you should assume that keywords and reserved words count as identifiers.

Method: You should use Lex (or Flex) as you have seen in Lab-5.

Information: In the following, you find the definition of the Pascal syntax, some notes on lexical analysis, and a sample Pascal program. The syntax is defined in a version of BNF notation (you do not have to understand this notation in detail. Here is some explanation of the notation used in the grammar definition:

Syntax definition:





A sample Pascal program