![]() |
![]() |
|
![]() |
![]() |
Encyclopedia :
L :
LR :
LRP :
LR parser |
|
|
LR parserIn computer science, an LR parser is a type of bottom-up parser for context-free grammars that is very commonly used by computer programming language compilers (and other associated tools). LR parsers read their input from Left to right and produce a Rightmost derivation (hence LR, compare with LL parser). The term "LR(k) parser" is also used; here the k refers to the number of unconsumed "look ahead" input symbols that are used in making parsing decisions. Usually k is 1 and is often omitted. A context-free grammar is called LR(k) if there exists a LR(k) parser for it.In typical use when we refer to an LR parser we mean a particular parser capable of recognising a particular language specified by a context free grammar. It is not uncommon, however, to refer to an LR parser meaning a driver program that can be supplied with a suitable table to produce a wide number of different particular LR parsers. LR parsing has many benefits: LR parsers are difficult to produce by hand; they are usually constructed by a parser generator or a compiler-compiler. Depending on how the parsing table is generated these parsers are called Simple LR parser (SLR), Look-ahead LR parser (LALR), and Canonical LR parser. These types of parsers can deal with increasingly large sets of grammars; LALR parsers can deal with more grammars than SLR, Canonical LR parsers work on more grammars than LALR parsers. Yacc produces LALR parsers and is fairly popular. Architecture of LR Parsers A table-based bottom-up parser can be schematically presented as in Figure 1. +---+---+---+---+ Input: | 1 | + | 1 | $ | +---+---+---+---+ ^ | Stack: | +----------+ +---+ | | | 6 |<------| Parser |-----> Output +---+ | | | 3 | +----------+ +---+ | | 0 | +--+--------+ +---+ | | +----------+ +----------+ | Action | | Goto | | table | | table | +----------+ +----------+ Figure 1. Architecture of a table-based bottom-up parser. The parser has an input buffer, a stack on which it keeps a list of states it has been in, an action table and a goto table that tell it to what new state it should move or which grammar rule it should use given the state it is currently in and the terminal or nonterminal it has just read on the input stream. To explain its workings we will use the following small grammar:
The action table is indexed by a state of the parser and a terminal (including a special terminal $ that indicates the end of the input stream) and contains three types of actions: The goto table is indexed by a state of the parser and a nonterminal and simply indicates what the next state of the parser will be if it has recognized a certain nonterminal. The Parsing AlgorithmThe LR parsing algorithm now works as follows: In practice the parser will usually try to recover from a syntax error by ignoring unexpected symbols and/or inserting missing symbols, and continue parsing, but this is outside the scope of this article. An ExampleTo explain why this algorithm works we now proceed with showing how a string like "1 + 1" would be parsed by such a parser. When the parser starts it always starts with the initial state 0 and the following stack:
In state 2 the action table says that whatever terminal we see on the input stream we should do a reduction with grammar rule 5. If the table is correct then this means that the parser has just recognized the right-hand side of rule 5, which is indeed the case.
The next terminal is now '1' and this means that we perform a shift and go to state 2:
The rule numbers that will then have been written to the output stream will be [ 5, 3, 5, 2 ] which is indeed a rightmost derivation of the string "1 + 1" in reverse. Constructing LR(0) parsing tablesItems The construction of these parsing tables is based on the notion of LR(0) item (simply called item here) which are grammar rules with a special dot added somewhere in the right-hand side. For example the rule E → E + B has the following four corresponding items: Item sets It is usually not possible to characterize the state of the parser with a single item because it may not know in advance which rule it is going to use for reduction. For example if there is also a rule E → E * B then the items E → E · + B and E → E · * B will both apply after a string corresponding with E has been read. Therefore we will characterize the state of the parser by a set of items, in this case the set { E → E · + B, E → E · * B }. Closure of item setsAn item with a dot in front of a nonterminal, such as E → E + · B, indicates that the parser expects to parse the nonterminal B next. To ensure the item set contains all possible rules the parser may be in the midst of parsing, it must include all items describing how B itself will be parsed. This means that if there are rules such as B → 1 and B → 0 then the item set must also include the items B → · 1 and B → · 0. In general this can be formulated as follows:
The augmented grammarBefore we start determining the transitions between the different states, the grammar is always augmented with an extra rule
For our example we will take the same grammar as before and augment it:
Finding the reachable item sets and the transitions between themThe first step of constructing the tables consists of determining the transitions between the closed item sets. These transitions will be determined as if we are considering a finite automaton that can read terminals as well as nonterminals. The begin state of this automaton is always the closure of the first item of the added rule: S → · E:
Beginning from the begin state we will now determine all the states that can be reached from this state. The possible transitions for an item set can be found by looking at the symbols (terminals and nonterminals) we find right behind the dots; in the case of item set 0 these are the terminals '0' and '1' and the nonterminal E and B. To find the item set that a symbol x leads to we follow the following procedure: For the terminal '0' this results in:
Constructing the action and goto tableFrom this table and the found item sets we construct the action and goto table as follows: The reader may verify that this results indeed in the action and goto table that were presented earlier on. = A note about LR(0) versus SLR and LALR parsing =Note that only step 4 of the above procedure produces reduce actions, and so all reduce actions must occupy an entire table row, causing the reduction to occur regardless of the next symbol in the input stream. This is why these are LR(0) parse tables: they don't do any lookahead (that is, they look ahead zero symbols) before deciding which reduction to perform. A grammar that needs lookahead to disambiguate reductions would require a parse table row containing different reduce actions in different columns, and the above procedure is not capable of creating such rows. Refinements to the LR(0) table construction procedure (such as SLR and LALR) are capable of constructing reduce actions that do not occupy entire rows. Therefore, they are capable of parsing more grammars than LR(0) parsers. Conflicts in the constructed tablesThe automaton that is constructed is by the way it is constructed always a deterministic automaton. However, when reduce actions are added to the action table it can happen that the same cell is filled with a reduce action and a shift action (a shift-reduce conflict) or with two different reduce actions (a reduce-reduce conflict). However, it can be shown that when this happens the grammar is not an LL(0) grammar. A small example of a non-LL(0) grammar with a shift-reduce conflict is:
A small example of a non-LL(0) grammar with a reduce-reduce conflict is:
Both examples above can be solved by letting the parser use the follow set (see LL parser) of a nonterminal A to decide if it is going to use one of As rules for a reduction; it will only use the rule A → w for a reduction if the next symbol on the input stream is in the follow set of A. This solution results in so-called Simple LR parsers.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
This article is from Wikipedia. All text is available under the terms of the GNU Free Documentation License. |
|
| © 2008 Chamas Enterprises Inc. |