LALR parser

LALR parser

ith two-token lookahead, and LALR(k) parsers with k-token lookup, but these are rare in actual use. The LALR parser is based on the LR(0) parser, so it can also be denoted LALR(1) = LA(1)LR(0) (1 token of lookahead, LR(0)) or more generally LALR(k) = LA(k)LR(0) (k tokens of lookahead, LR(0)). There is in fact a two-parameter family of LA(k)LR(j) parsers for all combinations of j and k, which can be derived from the LR(j + k) parser,[9] but these do not see practical use.

As with other types of LR parsers, an LALR parser is quite efficient at finding the single correct bottom-up parse in a single left-to-right scan over the input stream, because it does not need to use backtracking. Being a lookahead parser by definition, it always uses a lookahead, with LALR(1) being the most-common case.

Relation to other parsers

LR parsers

The LALR(1) parser is less powerful than the LR(1) parser, and more powerful than the SLR(1) parser, though they all use the same production rules. The simplification that the LALR parser introduces consists in merging rules that have identical kernel item sets, because during the LR(0) state-construction process the lookaheads are not known. This reduces the power of the parser because not knowing the lookahead symbols can confuse the parser as to which grammar rule to pick next, resulting in reduce/reduce conflicts. All conflicts that arise in applying a LALR(1) parser to an unambiguous LR(1) grammar are reduce/reduce conflicts. The SLR(1) parser performs further merging, which introduces additional conflicts.

The standard example of an LR(1) grammar that cannot be parsed with the LALR(1) parser, exhibiting such a reduce/reduce conflict, is:[10][11]

  S → a E c
    → a F d
    → b F c
    → b E d
  E → e
  F → e

In the LALR table construction, two states will be merged into one state and later the lookaheads will be found to be ambiguous. The one state with lookaheads is:

  E → e. {c,d}
  F → e. {c,d}

An LR(1) parser will create two different states (with non-conflicting lookaheads), neither of which is ambiguous. In an LALR parser this one state has conflicting actions (given lookahead c or d, reduce to E or F), a "reduce/reduce conflict"; the above grammar will be declared ambiguous by a LALR parser generator and conflicts will be reported.

To recover, this ambiguity is resolved by choosing E, because it occurs before F in the grammar. However, the resultant parser will not be able to recognize the valid input sequence b e c, since the ambiguous sequence e c is reduced to (E → e) c, rather than the correct (F → e) c, but b E c is not in the grammar.

LL parsers

The LALR(k) parsers are incomparable with LL(k) parsers: for any j and k both greater than 0, there are LALR(j) grammars that are not LL(k) grammars and conversely. In fact, it is undecidable whether a given LL(1) grammar is LALR(k) for any k > 0.[2]

Depending on the presence of empty derivations, a LL(1) grammar can be equal to a SLR(1) or a LALR(1) grammar. If the LL(1) grammar has no empty derivations it is SLR(1) and if all symbols with empty derivations have non-empty derivations it is LALR(1). If symbols having only an empty derivation exist, the grammar may or may not be LALR(1).[12]

Implementation issues

Because the LALR parser performs a right derivation instead of the more intuitive left derivation, understanding how it works is quite difficult. This makes the process of finding a correct and efficient LALR grammar very demanding and time-consuming. For the same reason, error-reporting can be quite hard because LALR parser errors cannot always be interpreted into messages with high-level terms meaningful for the end user. However, any LR(k > 0) table makes it trivial to at least enumerate the various tokens that would have been valid options when a syntax error occurred, for low-level error messages. For this reason, the recursive descent parser is sometimes preferred over the LALR parser. This parser requires more hand-written code because of its lower language-recognition power. However, it does not have the special difficulties of the LALR parser because it performs left-derivation. Notable examples of this phenomenon are the C-language and C++ parsers of the Gnu Compiler Collection. These started as LALR parsers but were later changed to recursive-descent parsers.[13][14]

See also


  1. ^ "LALR" is pronounced as the initialism "el-ay-el-arr"
  2. ^ "LALR(1)" is pronounced as the initialism "el-ay-el-arr-one"


  1. ^ a b c DeRemer 1969.
  2. ^ a b c LR Parsing: Theory and Practice, Nigel P. Chapman, p. 86–87
  3. ^ "Generate the Parser". Eclipse JDT Project. Retrieved 29 June 2012. 
  4. ^ Anderson, T.; Eve, J.; Horning, J. (1973). "Efficient LR(1) parsers". Acta Informatica (2): 2–39. 
  5. ^ a b DeRemer, Frank; Penello, Thomas (October 1982). "Efficient Computation of LALR(1) Look-Ahead Sets". Tansactions on Programming Languages and Systems (ACM) 4 (4): 615–649. Retrieved July 25, 2014. 
  6. ^
  7. ^ Pager, D. (1977), "A Practical General Method for Constructing LR(k) Parsers", Acta Informatica 7: 249–268 
  8. ^
  9. ^ Parsing Techniques: A Practical Guide, by Dick Grune and Ceriel J. H. Jacobs, "9.7 LALR(1)", p. 302
  10. ^ "7.9 LR(1) but not LALR(1)", CSE 756: Compiler Design and Implementation, Eitan Gurari, Spring 2008
  11. ^ "Why is this LR(1) grammar not LALR(1)?"
  12. ^ (Beatty 1982)
  13. ^ "GCC 3.4 Release Series Changes, New Features, and Fixes",
  14. ^ "GCC 4.1 Release Series Changes, New Features, and Fixes",
  • DeRemer, Franklin L. (1969). Practical Translators for LR(k) languages (Ph.D.). MIT. 
  • Beatty, J. C. (1982). "On the relationship between LL(1) and LR(1) grammars". Journal of the ACM 29 (4 (Oct)): 1007–1022.  

External links

  • Parsing Simulator This simulator is used to generate parsing tables LALR and resolve the exercises of the book.
  • JS/CC JavaScript based implementation of a LALR(1) parser generator, which can be run in a web-browser or from the command-line.
  • LALR(1) tutorial, a flash card-like tutorial on LALR(1) parsing.