GRAMMARS AND LANGUAGES

Please note that the material on this website is not intended to be exhaustive.
This is intended as a summary and supplementary material to the required textbook.
Alphabet
A is a finite set whose elements are called letters.
Word
on A is a finite string of letters from A.   The set of all words on A is denoted by A *.
Empty String or Null String
is a unique string denoted by l.
Phrase Structured Grammar (PSG)
or simply grammar is denoted by G = { N, T, P, s } and consisting of the following:
Production
( A, B ) Î P where A Î ( N T ) * – T * and B Î ( N T ) * is denoted by A ® B or A ::= B.
Directly Derivable
Let G = ( N, T, P, s ) be a grammar.   If a ® b is a production and xay Î ( N T ) *, then xby is directly derivable from xay and is denoted as xay Þ xby.

If ai Î ( N T )* for i = 1, 2, ¼, n, and ai+1 is directly derivable from ai for i = 1, 2, ¼, n–1, we say that an is derivable from a1 and is denoted by a1 Þ* an = a1 Þ a2 Þ ¼ Þ an
Backus Naur Form (BNF)
is an alternative way to state the productions of a grammar where non–terminals are delimited by the symbols "<" and ">" and ::= are used instead ® in productions.   Furthermore, productions with the same left hand side such as

<S> ::= <P>, <S> ::= <Q>, <S> ::= <R>, ¼, <S> ::= <T>

may be combinely written as follows and in what is referred to as the Extended Backus Naur Form (EBNF):

<S> ::= <P> | <Q> | <R> | ¼ | <T>

Example:   Given a grammar G where <ID>, <D>, <L> Î N and 0, ¼, 9, a, ¼, z Î T with the following productions in EBNF that will derive an identifier:

<D> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<L> ::= a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
<ID> ::= <L> ( <D> | <L> ) *

Determine if temp3 is a valid identifier.

<ID> Þ <L> ( <D> | <L> ) * <L> ® t
Þ t ( <D> | <L> ) * ( <D> | <L> ) * ® ( <D> | <L> ) ( <D> | <L> ) *
Þ t ( <D> | <L> ) ( <D> | <L> ) * ( <D> | <L> ) ® <L>
Þ t <L> ( <D> | <L> ) * <L> ® e
Þ te ( <D> | <L> ) * ( <D> | <L> ) * ® ( <D> | <L> ) ( <D> | <L> ) *
Þ te ( <D> | <L> ) ( <D> | <L> ) * ( <D> | <L> ) ® <L>
Þ te <L> ( <D> | <L> ) * <L> ® m
Þ tem ( <D> | <L> ) * ( <D> | <L> ) * ® ( <D> | <L> ) ( <D> | <L> ) *
Þ tem ( <D> | <L> ) ( <D> | <L> ) * ( <D> | <L> ) ® <L>
Þ tem <L> ( <D> | <L> ) * <L> ® p
Þ temp ( <D> | <L> ) * ( <D> | <L> ) * ® ( <D> | <L> )
Þ temp <D> | <L> ( <D> | <L> ) ® <D>
Þ temp <D> <D> ® 3
Þ temp3

A derivation may also be graphically represented in a tree.   Such tree is called a derivation tree.   In the example below the BNF productions are as follows:

<ID> ::= <L> <LD>
<LD> ::= l
<LD> ::= <L> <LD>
<LD> ::= <D> <LD>

and where <L> may be replaced by any of the letters in the alphabet and <D> may be replaced by any of the digits in the decimal number system.

A grammar G is a

Language
of a grammar G consists of all the words in the terminals which can be derived from the start symbol using the productions and is denoted by L(G).

Regular Language
L(a) of a regular expression a on some alphabet A, is the set of words determined by a.   A language L is regular if L = L(a) for some regular expression a.

Given an alphabet A, a language L over A, denoted by L(A), is a subset of A *, i.e. L(A) Í A *.

Notation:

A set of positive integers S is

A language L is recursively enumerable if there is a Turing machine M such that M accepts L as input.

Two grammars G1 and G2 are equivalent if L(G1) = L(G2).

ASSIGNMENT


© 2000-07-07 cpsm ; last update 2007-08-19 20:41