FOOTNOTES:
LALR(k) Parsing
An LALR(k) parser can handle complex computer languages, such as COBOL and other languages that require
more than one symbol of lookahead to resolve ambiguity in the language definition.
The parser goes beyond LALR(1) parsing, if necessary, by using a nondeterministic parsing
algorithm for states that require it. It will look ahead 'k' symbols to resolve
ambiguities. 'k' can be as large as necessary, however grammars should be
defined to make 'k' as small as possible for better performance and error recovery.
LBNF Regular Expressions
LBNF is a Lexical BNF notation for defining the symbols (or tokens) of a computer language.
LBNF allows regular expressions, but offers better readability. When defining a token, you are
not restricted to one line of notation. You may use several lines.
In fact, you may use BNF grammar notation to describe each token of your language. It also gives
you the freedom to define your own escape symbols (\z) to match Perl regular expressions or some other
notation with which you are familiar. Here is an LBNF specification for the C language:
C lexical grammar.
Matrix Tables
Matrix tables are used to provide very fast lexers and parsers. LRSTAR and DFASTAR use
compressed-matrix tables for their finite-state automata, which offer small tables without losing
much speed. Matrix tables compile faster than direct-code recognizers (recursive descent parsers,
direct-code lexers). So, with matrix tables, you get the best solution.
Direct-Code Lexers
A direct-code lexer generator creates a switch statement for each state in the finite-state machine.
The speed can be 20% faster than table-driven lexers, when line counting is necessary. However, in a
compiler front end, that is only 4% faster at best. The disadvantage is that direct-code lexers have
many more lines of source code and compile much slower, as the number of keywords increases. The limit
on the number of keywords is about 3,000 (a compiler limitation).
Skeleton File
A skeleton file is source code with special code added which tells the parser or lexer generator
how to format the parser or lexer tables. This skeleton-file concept makes it possible to have
the parser or lexer generator output code in many different languages. This way you are not
locked into one language or one style of coding. You can customize the code to fit your preferred
language and style of coding.
TBNF Grammar Notation
TBNF is a grammar notation that can define the complete translation process from input source code to
intermediate code. It goes beyond BNF and EBNF. It makes the task of building compilers and
translators easier, quicker and more reliable. For more information, see the
TBNF Paper.