GRAMMARS AND LANGUAGES
Please note that the material on this website is not intended to be exhaustive.
This is intended as a summary and supplementary material to the required textbook.
- Alphabet
-
A is a finite set whose elements are called letters.
- Word
-
on A is a finite string of letters from A.
The set of all words on A is denoted by A *.
-
Empty String or
Null String
- is a unique string denoted by l.
- Phrase Structured Grammar (PSG)
-
or simply grammar is denoted by G = { N, T, P, s }
and consisting of the following:
- A finite set N of nonterminals.
-
A finite set T of terminals where N
T = Æ.
-
A finite subset P of ( ( N
T ) * T * ) x ( N
T ) * called the set of
productions.
-
An element s Î N called the
start symbol.
- Production
-
( A, B ) Î P where A Î ( N
T ) * T * and B Î
( N
T ) * is denoted by A
® B or A ::= B.
- Directly Derivable
-
Let G = ( N, T, P, s ) be a grammar.
If a ® b is a
production and xay Î ( N
T ) *, then xby is
directly derivable from xay and is denoted as
xay Þ xby.
If ai Î ( N
T )* for i = 1, 2, ¼,
n, and ai+1 is directly derivable from
ai for i = 1, 2, ¼, n1,
we say that an is derivable from
a1 and is denoted by a1
Þ* an =
a1 Þ
a2 Þ ¼
Þ an
- Backus Naur Form (BNF)
-
is an alternative way to state the productions of a grammar where nonterminals are delimited by the symbols
"<" and ">" and ::= are used instead ® in
productions.
Furthermore, productions with the same left hand side such as
<S> ::= <P>, <S> ::= <Q>, <S> ::= <R>,
¼, <S> ::= <T>
may be combinely written as follows and in what is referred to as the
Extended Backus Naur Form (EBNF):
<S> ::= <P> | <Q> | <R> | ¼ | <T>
Example:
Given a grammar G where <ID>, <D>, <L> Î N
and 0, ¼, 9,
a, ¼, z
Î T with the following productions in EBNF that will derive an identifier:
<D> ::= 0 | 1 | 2 |
3 | 4 | 5 |
6 | 7 | 8 |
9
<L> ::= a | b | c |
d | e | f |
g | h | i |
j | k | l |
m | n | o |
p | q | r |
s | t | u |
v | w | x |
y | z
<ID> ::= <L> ( <D> | <L> ) *
|
Determine if temp3 is a valid identifier.
| <ID> Þ |
<L> ( <D> | <L> ) * |
<L> ® t |
| Þ |
t ( <D> | <L> ) * |
( <D> | <L> ) * ®
( <D> | <L> ) ( <D> | <L> ) *
|
| Þ |
t ( <D> | <L> ) ( <D> | <L> ) *
|
( <D> | <L> ) ® <L> |
| Þ |
t <L> ( <D> | <L> ) * |
<L> ® e |
| Þ |
te ( <D> | <L> ) * |
( <D> | <L> ) * ® ( <D> | <L> )
( <D> | <L> ) *
|
| Þ |
te ( <D> | <L> ) ( <D> | <L> ) *
|
( <D> | <L> ) ® <L> |
| Þ |
te <L> ( <D> | <L> ) * |
<L> ® m |
| Þ |
tem ( <D> | <L> ) * |
( <D> | <L> ) * ® ( <D> | <L> )
( <D> | <L> ) *
|
| Þ |
tem
( <D> | <L> ) ( <D> | <L> ) *
|
( <D> | <L> ) ® <L> |
| Þ |
tem <L> ( <D> | <L> ) * |
<L> ® p |
| Þ |
temp ( <D> | <L> ) * |
( <D> | <L> ) * ® ( <D> | <L> )
|
| Þ |
temp <D> | <L> |
( <D> | <L> ) ® <D> |
| Þ |
temp <D> |
<D> ® 3 |
| Þ |
temp3 |
A derivation may also be graphically represented in a tree.
Such tree is called a derivation tree.
In the example below the BNF productions are as follows:
<ID> ::= <L> <LD>
<LD> ::= l
<LD> ::= <L> <LD>
<LD> ::= <D> <LD>
|
and where <L> may be replaced by any of the letters in the alphabet and <D> may be replaced
by any of the digits in the decimal number system.

A grammar G is a
-
Phrasestructured grammar or
type 0 grammar if its productions have no restrictions.
-
Contextsensitive grammar or
type 1 grammar if all its productions not involving
s and which replace a word involving variables by a word in the terminals, are of
the form xAy ® xwy, where x, y, w are words in the
terminals of G.
-
Contextfree grammar or
type 2 grammar if all its productions not involving
s and which replace a word involving variables by a word in the terminals, are of
the form A ® w, where w is a word in the terminals of G.
-
Regular grammar or
type 3 grammar is either rightregular or
leftregular.
A grammar is leftregular if all the productions not involving s are
of the form A ® a or A ® aB.
A grammar is rightregular if all productions not involving s are of
the form A ® a or A ® Ba.
- Language
-
of a grammar G consists of all the words in the terminals which can be derived from the start symbol using
the productions and is denoted by L(G).
-
A contextsensitive language is of the form L(G)
for a contextsensitive grammar G.
-
A contextfree language is of the form L(G) for
a contextfree grammar G.
-
A regular expression over an alphabet A is
recursively defined as follows:
-
The empty set Æ and l are regular
expressions.
- Each a in A is a regular expression.
-
If a and b are regular expressions,
then (a),
ab, a +
b, and a* are also regular expressions.
- Nothing else is a regular expression.
- Regular Language
-
L(a) of a regular expression a on some alphabet
A, is the set of words determined by a.
A language L is regular if L = L(a) for some regular expression
a.
Given an alphabet A, a language L over A, denoted by L(A), is a subset of A *,
i.e. L(A) Í A *.
Notation:
- Sets of Strings
- Æ represents the set with no strings.
-
l represents a set consisting only of the string with length zero.
Note that l is the identity element.
- Lowercase letters such as a represents a set consisting only of the string with length one.
- String Operations
- (L) represents the same set of strings as L.
-
LR represents the set of strings a b
or simply ab obtained by concatenating a
string a of L with a string b of
R.
Note that concatenation is associative.
- L + R represents the union of L and R.
-
L*, called the Kleene star of L, represents the language obtained by concatenating zero or
more strings of L, that is,
L* = l + L + LL + LLL + ¼
L* = l + L + L2 + L3 + ¼
- Hierarchy of Operations
- ( ) takes precedence over *
- * takes precedence over concatenation
- concatenation takes precedence over union.
A set of positive integers S is
-
Recursive if the elements of S are generated by a recursion, that is, if S = {a1,
a2, ¼, an, ¼}, then there
is a recursion which computes an in terms of a1, a2,
¼, an1 provided that n is sufficiently large.
-
Recursively enumerable if there is an algorithm which will determine whether any positive integer a
is a member of S in finitely many ways.
A language L is recursively enumerable if there is a Turing machine M such that M accepts
L as input.
Two grammars G1 and G2 are equivalent if L(G1) =
L(G2).
ASSIGNMENT
© 2000-07-07 cpsm ; last update
2007-08-19 20:41