|
DFASTAR Lexer Generator 3.1
Description
DFASTAR 3.1 is a DFA lexer generator, which can create table-driven lexical analyzers in C/C++ that run at 2 times the speed
of FLEX lexers. The small lexers built with DFASTAR can read 31,286,000 tokens/second. The medium lexers built
with DFASTAR can read 34,620,000 tokens/second. However, adding a line counter decreases the speed of table-driven lexers
by about 10%.
|
Table-Driven Lexers
Table-driven lexers, use a compressed-matrix data structure, which is much more scalable and compiles much faster than
direct-code lexers. For example, DFASTAR can create a lexer for a 250,000 word dictionary, whereas direct-code
lexers could not be compiled when the number of keywords was 4,000.
Very-Fast Lexers
DFASTAR creates table-driven lexers, similar to FLEX but 2 times the speed of FLEX lexers.
Small Lexers
DFASTAR created a small table-driven lexer for the C language that is almost as small
as the lexer created by FLEX. For this test, we used a C-language lexical specification which included all 36
keywords of the C language. See the chart on the right.
Generation & Build Time
For the lexer-build-time test, we used a lexical specification for the DB2 programming language,
which has 550 keywords. The build time for a DFASTAR lexer was only 5 seconds, whereas the
build time for an RE2C direct-code lexer was 38 seconds.
|
|
Keywords and Identifiers
DFASTAR lexers can recognize keywords and identifiers, simultaneously. This is faster than classifying
all words as identifiers and doing a symbol-table lookup to discover that an identifier is a actually a keyword.
Note, you do not have to be concerned about the order of the rules in your lexical grammar, because DFASTAR is smart enough
to figure out the difference between a keyword and an identifier, if the keywords are listed in the grammar.
Advanced Regular Expressions
DFASTAR reads an advanced lexical notation which is a combination of BNF grammar rules and regular expressions.
This permits more readable lexical grammars. DFASTAR originated in the world of compiler construction
tools and does a thorough job of checking for errors. One nice feature is that you will be warned about
ambiguities that other tools may not detect.
Error Messages in Visual Studio
One of the nice features of DFASTAR is that error messages
displayed in Visual Studio provide the file name and line number of the error. A double-click takes you to
the error.

Test Information
All tests were done on a Dell Dimension 3000 desktop computer with a 3 GHz Pentium CPU and 2 GB of RAM.
Visual Studio C/C++ 2008 was used for compiling and linking with optimizations for speed. The speed was
measured for the lexer processing time in memory only and does not include the time required to load the large
input file. The C-language lexical grammar contains the 36 keywords of the C language. This means
that the lexers were recognizing both identifiers and keywords (no symbol table was used).
|