|
Compilerware
Open Source Fast LALR(k) Parser Generator |
DOWNLOADS AMDG |
INTRODUCTION
Downloads
Feedback From Users
Release Information
PRODUCTS
LRSTAR
DFASTAR
DFASTAR/c
THEORY
Definition Of Terms
Theoretical Information
COMPANY
About Us
Contact and Support
Donate
SourceForge Page
|
Description LRSTAR is an LR parser generator which reads advanced EBNF grammar notation and creates an LR state machine in C/C++ source code. The generated parsers are table-driven, like YACC and Bison parsers, however, LRSTAR creates four-matrix-parser-tables (4MPT) which gives better performance and has the same small size of YACC parsers. Testing of 4MPT shows a speed of about twice the speed of YACC parsers. However, this is only the parser speed, which is typically 20% of a compiler front-end. Compiler front ends, built with LRSTAR and DFASTAR, are currently reading source code at over 1,000,000 lines per second. LALR(k) Nondeterministic Parsing LRSTAR parsers can do LALR(k) parsing wherer k can be as large as necessary to make a parsing decision. Consider this LR(2) grammar taken from the paper IELR(1): Practical LR(1) Parser Tables for Non-LR(1) Grammars with Conflict Resolution by Joel Denny and Brian Malloy (Clemson University). G -> S <eof> S -> a A a -> b A b A -> a -> a a If the parser reads an 'a' and the next token (the lookahead) is 'a', there are two choices: A -> a . reduce A -> a . a shift 'a' This is an LR(2) grammar, therefore another lookahead symbol is needed to make a parsing decision. An LALR(1) parser generator CANNOT handle this grammar. An LR(1) parser generator CANNOT handle this grammar, either. However, an LALR(k) parser generator can handle this grammar. LRSTAR can generate a nondeterministic parser from this grammar and, with k set to 2, it can correctly parse all four sentences of the language defined by this grammar: aaa aaaa bab baab Compressed Matrix Parser Tables The LR state machine makes a transition to a state for each input token provided by the lexer, so the speed of the parser is largely determined by the way it computes the next state. Many parsers are slow because they compute the next state by a linear search or a binary search. LRSTAR parsers use the fastest method, a compressed matrix data structure, in which the next state is given by the following formula: x = Tm [Tr[x] + Tc[t]]; // Compute the next state. Where 'Tm' is the Terminal transition matrix, 'Tr' gives the row (base), 'Tc' gives the column (displacement), 'x' is the state and 't' is the input lookahead symbol. This is a base plus displacement which gives the correct index in 'Tm'. However, this is just one of four computations that are necessary in a compressed parser table structure. It is also necessary to have a Bit (failure) matrix, a Nonterminal-transition matrix and a Reduction matrix. This is what gives LRSTAR parsers a speed that is the same for all languages, no matter how large the grammar is. A parser for a large language would have the same speed as a parser for a small language. According to the tests, the speed is about 1,000,000 lines per second on a 3 GHz desktop computer. Chain-Reduction Elimination LRSTAR can do an optimization called "chain-reduction elimination" on the parser tables, which gives an increase in speed. A chain reduction is a condition caused by a grammar that has rules something like this:
Stmt -> LeftSide '=' Exp ';'
Exp -> Term
-> Exp '+' Term
-> Exp '-' Term
Term -> Factor
-> Term '*' Factor
-> Term '/' Factor
Factor -> <number>
-> <identifier>
-> '(' Exp ')'
If the input line is, "y = 100;", the parser makes 3 reductions: Factor <- <number> Term <- Factor Exp <- Term instead of just one reduction Exp <- <number> So removing unnecessary chain reductions can speed up the parser and that's what LRSTAR does when you select the 'o' option for "optimize". Shift-Reduce Actions LRSTAR generates parsers that use shift-reduce actions. About 40% of the states of a typical LR state macine are reduce-only states and it's a waste of time making the transition to one of these states. A parser can be smaller and faster by removing these read-only states and replacing the goto action with a shift-reduce action. Syntax Analysis For An Error LRSTAR creates parsers that do a syntax analysis of an error in the input. Here is an example of what the parser tells the user, in this case, when a '(' is missing:
LRSTAR parser
Parsing test.inp ...
test.inp(18) : Error if *arg[i] == '(')
test.inp(18) : Error --------------^ at * '*'
Expecting one of the following:
'('
Parsing done.
The parser does the syntax analysis automatically by examining the parser tables. The compiler developer does not have to do anything extra. Expecting List or Statement Completion The expecting list shows the user what tokens are expected to come next in the input. This algorithm is included in the generated parser, if you specify the 'exp' option. This is useful information for someone trying to learn your computer language. It's also useful information for making sure your grammar is correct. LRSTAR and SableCC are the only two parser generators that have this feature (as far as I know). Parser Skeleton Concept LRSTAR reads a parser skeleton file and plugs in the parser tables (mostly numbers) and outputs the parser file in source code. You have complete control over the code that gets generated. In fact, you may re-write the parser skeleton in C#, Java, Python or any language. Then you can generate parsers in the language of your choice. One user told me that he re-wrote the parser skeleton in IBM 390 Assembly language and was generating parsers in that language. TBNF Grammar Notation Here is a Calc grammar written in TBNF for input to LRSTAR. This grammar builds a complete translator that generates intermediate code. What's missing are the node action functions, (goal_, function_, store_, if_, target_, etc.) Those functions are very simple actually. You will find them in the LRSTAR download.
/* Simple Grammar. */
/* Input Tokens. */
<error> => error()
<identifier> => lookup() // Lookup & store in symbol table.
<integer> => lookup() // Lookup & store in symbol table.
/* Operator precedence. */
{ '==' '!=' } << // Lowest priority.
{ '+' '-' } <<
{ '*' '/' } << // Highest priority.
/* Productions. */
Program -> Function... <eof> *> goal_
Function -> function <identifier> '{' Stmt... '}' *> function_(2)
Stmt ~> Target '=' Exp ';' *> store_
-> if RelExp Then endif *> if_
-> if RelExp Else endif *> if_
-> if RelExp Then Else endif *> if_
Target -> <identifier> *> target_(1)
Exp -> Exp '+' Exp *> add_
-> Exp '-' Exp *> sub_
-> Exp '*' Exp *> mul_
-> Exp '/' Exp *> div_
-> <identifier> *> ident_(1)
-> <integer> *> int_(1)
-> '(' Exp ')'
RelExp -> Exp '==' Exp *> eq_
-> Exp '!=' Exp *> ne_
Then -> then Stmt... *> then_
Else -> else Stmt... *> else_
/* End of Grammar. */
|
| © Copyright Compilerware 2012. All rights reserved. |