Comparison of parser generators

This is a list of notable lexer generators and parser generators for various language classes.

Regular languages

Regular languages are a category of languages (sometimes termed Chomsky Type 3) which can be matched by a state machine (more specifically, by a deterministic finite automaton or a nondeterministic finite automaton) constructed from a regular expression. In particular, a regular language can match constructs like "A follows B", "Either A or B", "A, followed by zero or more instances of B", but cannot match constructs which require consistency between non-adjacent elements, such as "some instances of A followed by the same number of instances of B", and also cannot express the concept of recursive "nesting" ("every A is eventually followed by a matching B"). A classic example of a problem which a regular grammar cannot handle is the question of whether a given string contains correctly-nested parentheses. (This is typically handled by a Chomsky Type 2 grammar, also termed a context-free grammar.)

NameLexer algorithmOutput languagesGrammar, codeDevelopment platformLicense
AlexDFAHaskellMixedAllFree, BSD
AnnoFlexDFAJavaMixedJava virtual machineFree, BSD
AstirDFA table driven, with branchingC++Only grammar (actioned)AllFree, MIT
AustenXDFAJavaSeparateAllFree, BSD
C# FlexDFAC#Mixed.NET CLRFree, GNU GPL
C# LexDFAC#Mixed.NET CLR ?
CarburettaDFAC, C++MixedAllFree, Apache 2.0
CookCCDFAJavaMixedJava virtual machineFree, Apache 2.0
DFADFA compressed matrixC, C++SeparateWindows, Visual StudioBSD
DolphinDFAC++SeparateAllProprietary
FlexDFA table drivenC, C++MixedAllFree, BSD
gelexDFAEiffelMixedEiffelFree, MIT
golexDFAGoMixedGoFree, BSD-style
gplexDFAC#Mixed.NET CLRFree, BSD-like
JFlexDFAJavaMixedJava virtual machineFree, BSD
JLexDFAJavaMixedJava virtual machineFree, BSD-like
lexDFACMixedPOSIXPartial, proprietary, CDDL
lexertlDFAC++ ?AllFree, GNU LGPL
QuexDFA direct codeC, C++MixedAllFree, GNU LGPL
RagelDFAGo, C, C++, assemblyMixedAllFree, GNU GPL, MIT[1][2]
RE/flexDFA direct code, DFA table driven, and NFA regex librariesC++MixedAllFree, BSD
re2cDFA direct codeC, C++, Go, RustMixedAllFree, public domain

Deterministic context-free languages

Context-free languages are a category of languages (sometimes termed Chomsky Type 2) which can be matched by a sequence of replacement rules, each of which essentially maps each non-terminal element to a sequence of terminal elements and/or other nonterminal elements. Grammars of this type can match anything that can be matched by a regular grammar, and furthermore, can handle the concept of recursive "nesting" ("every A is eventually followed by a matching B"), such as the question of whether a given string contains correctly-nested parentheses. The rules of Context-free grammars are purely local, however, and therefore cannot handle questions that require non-local analysis such as "Does a declaration exist for every variable that is used in a function?". To do so technically would require a more sophisticated grammar, like a Chomsky Type 1 grammar, also termed a context-sensitive grammar. However, parser generators for context-free grammars often support the ability for user-written code to introduce limited amounts of context-sensitivity. (For example, upon encountering a variable declaration, user-written code could save the name and type of the variable into an external data structure, so that these could be checked against later variable references detected by the parser.)

The deterministic context-free languages are a proper subset of the context-free languages which can be efficiently parsed by deterministic pushdown automata.

NameParsing algorithmInput grammar notationOutput languagesGrammar, codeLexerDevelopment platformIDELicense
ANTLR4Adaptive LL(*)[3]EBNFC#, Java, Python, JavaScript, C++, Swift, Go, PHPSeparategeneratedJava virtual machineYesFree, BSD
ANTLR3LL(*)EBNFActionScript, Ada95, C, C++, C#, Java, JavaScript, Objective-C, Perl, Python, RubyMixedgeneratedJava virtual machineYesFree, BSD
APG[4]Recursive descent, backtrackingABNFPython, JavaScript, C, JavaSeparatenoneAllNoFree, BSD
Beaver[5][6]LALR(1)EBNFJavaMixedexternalJava virtual machineNoFree, BSD
BisonLALR(1), LR(1), IELR(1), GLRYaccC, C++, JavaMixedexternalAllNoFree, GNU GPL with exception
BtYaccBacktracking Bottom-up ?C++MixedexternalAllNoFree, public domain
byaccLALR(1)YaccCMixedexternalAllNoFree, public domain
CL-Yacc[7][8]LALR(1)LispCommon LispMixedexternalAllNoFree, MIT
Coco/RLL(1)EBNFC, C++, C#, F#, Java, Ada, Object Pascal, Delphi, Modula-2, Oberon, Ruby, Swift, Unicon, Visual Basic .NETMixedgeneratedJava virtual machine, .NET framework, Windows, POSIX (depends on output language)NoFree, GNU GPL
CppCC[9][10]LL(k) ?C++MixedgeneratedPOSIXNoFree, GNU GPL
CUP[11][12]LALR(1) ?JavaMixedexternalJava virtual machineNoFree, BSD-like
Eli[13][14]LALR(1) ?CMixedgeneratedPOSIXNoFree, GNU GPL, GNU LGPL
Essence[15]LR(?) ?Scheme 48MixedexternalAllNoFree, BSD
eyapp[16]LALR(1) ?PerlMixedexternal or generatedAllNoFree, Artistic
FrownLALR(k) ?Haskell 98MixedexternalAllNoFree, GNU GPL
geyaccLALR(1) ?EiffelMixedexternalAllNoFree, MIT
GOLDLALR(1)BNFx86 assembly language, ANSI C, C#, D, Java, Pascal, Object Pascal, Python, Visual Basic 6, Visual Basic .NET, Visual C++SeparategeneratedWindowsYesFree, zlib modified
GPPGLALR(1)YaccC#SeparateexternalWindowsYesFree, BSD
GrammaticaLL(k)BNF dialectC#, JavaSeparategeneratedJava virtual machineNoFree, BSD
HiLexedLL(*)EBNF or JavaJavaSeparateinternalJava virtual machineNoFree, GNU LGPL
Hime Parser GeneratorLALR(1), GLRBNF dialectC#, Java, RustSeparategenerated.NET framework, Java virtual machineNoFree, GNU LGPL
HyaccLR(1), LALR(1), LR(0)YaccCMixedexternalAllNoFree, GNU GPL
iyaccLALR(1)YaccIconMixedexternalAllNoFree, GNU LGPL
jaccLALR(1) ?JavaMixedexternalJava virtual machineNoFree, BSD
JavaCCLL(k)EBNFJava, C++, JavaScript (via GWT compiler)[17]MixedgeneratedJava virtual machineYesFree, BSD
jayLALR(1)YaccC#, JavaMixednoneJava virtual machineNoFree, BSD
JFLAPLL(1), LALR(1) ?Java ? ?Java virtual machineYes ?
JetPAGLL(k) ?C++MixedgeneratedAllNoFree, GNU GPL
JS/CCLALR(1)EBNFJavaScript, JScript, ECMAScriptMixedinternalAllYesFree, BSD
KDevelop-PG-QtLL(1), backtracking, shunting-yard ?C++Mixedgenerated or externalAll, KDENoFree, GNU LGPL
KelbtBacktracking LALR(1) ?C++MixedgeneratedPOSIXNoFree, GNU GPL
kmyaccLALR(1) ?C, Java, Perl, JavaScriptMixedexternalAllNoFree, GNU GPL
LapgLALR(1) ?C, C++, C#, Java, JavaScriptMixedgeneratedJava virtual machineNoFree, GNU GPL
LarkLALR(1)EBNFPython, JavaScriptMixedgeneratedAllYesFree, MIT
LemonLALR(1) ?CMixedexternalAllNoFree, public domain
LimeLALR(1) ?PHPMixedexternalAllNoFree, GNU GPL
LISALR(?), LL(?), LALR(?), SLR(?) ?JavaMixedgeneratedJava virtual machineYesFree, public domain
LLgenLL(1) ?CMixedexternalPOSIXNoFree, BSD
LLnextgenLL(1) ?CMixedexternalAllNoFree, GNU GPL
LLLPGLL(k) + syntactic and semantic predicatesANTLR-likeC#Mixedgenerated (?).NET framework, MonoVisual StudioFree, GNU LGPL
LPGBacktracking LALR(k) ?JavaMixedgeneratedJava virtual machineNoFree, EPL
LRSTARLALR(1), LALR(*)YACC, ANTLR, EBNFC++SeparategeneratedWindowsVisual StudioFree, BSD
MenhirLR(1) ?OCamlMixedgeneratedAllNoFree, QPL
ML-YaccLALR(1) ?MLMixedexternalAllNo ?
MonkeyLR(1) ?JavaSeparategeneratedJava virtual machineNoFree, GNU GPL
MstaLALR(k), LR(k)YACC, EBNFC, C++Mixedexternal or generatedPOSIX, CygwinNoFree, GNU GPL
MTP (More Than Parsing)LL(1) ?JavaSeparategeneratedJava virtual machineNoFree, GNU GPL
MyParserLL(*)MarkdownC++11SeparateinternalAny with standard C++11 compilerNoFree, MIT
NLTGLRC#/BNF-likeC#Mixedmixed.NET frameworkNoFree, MIT
ocamlyaccLALR(1) ?OCamlMixedexternalAllNoFree, QPL
olexLL(1) ?C++MixedgeneratedAllNoFree, GNU GPL
ParsecLL, backtrackingHaskellHaskellMixednoneAllNoFree, BSD
yapp[16]LALR(1) ?PerlMixedexternalAllNoFree, GNU GPL
Parser ObjectsLL(k) ?JavaMixed ?Java virtual machineNoFree, zlib
PCCTSLL ?C, C++ ? ?AllNo ?
PLYLALR(1)BNFPythonMixedgeneratedAllNoFree, MIT
PlyPlusLALR(1)EBNFPythonSeparategeneratedAllNoFree, MIT
PRECCLL(k) ?CSeparategeneratedDOS, POSIXNoFree, GNU GPL
racc[18] LALR(1) BNF-like, yacc-like[19] Ruby Mixed  ? Windows, Linux, macOS, FreeBSD, NetBSD No LGPL
QLALRLALR(1) ?C++MixedexternalAllNoFree, GNU GPL
SableCCLALR(1) ?C, C++, C#, Java, OCaml, PythonSeparategeneratedJava virtual machineNoFree, GNU LGPL
SLK[20]LL(k) LR(k) LALR(k)EBNFC, C++, C#, Java, JavaScriptSeparateexternalAllNoSLK[21]
SLY[22]LALR(1)BNFPythonMixedgeneratedAllNoFree, MIT
SP (Simple Parser)Recursive descentPythonPythonSeparategeneratedAllNoFree, GNU LGPL
SpiritRecursive descent ?C++MixedinternalAllNoFree, Boost
StyxLALR(1) ?C, C++SeparategeneratedAllNoFree, GNU LGPL
Sweet ParserLALR(1) ?C++SeparategeneratedWindowsNoFree, zlib
TapLL(1) ?C++MixedgeneratedAllNoFree, GNU GPL
TextTransformerLL(k) ?C++MixedgeneratedWindowsYesProprietary
TinyPGLL(1) ?C#, Visual Basic ? ?WindowsYesPartial, CPOL 1.0
Toy Parser GeneratorRecursive descent ?PythonMixedgeneratedAllNoFree, GNU LGPL
TP YaccLALR(1) ?Turbo PascalMixedexternalAllYesFree, GNU GPL
Tree-Sitter[23]LR(1), GLRJavaScript DSL, JSONC, bindings (Rust, WebAssembly, JavaScript, Python, many other)Separategenerated + externalAllNeovim, Visual Studio CodeFree, MIT
Tunnel Grammar StudioTunnel ParsingABNFC++SeparategeneratedWindowsYesProprietary
UltraGramLALR(1), LR(1), GLRBNFC++, Java, C#, Visual Basic .NETSeparateexternalWindowsYesFree, public domain
UniCCLALR(1)EBNFC, C++, Python, JavaScript, JSON, XMLMixedgeneratedPOSIXNoFree, BSD
UrchinCCLL(1) ?Java ?generatedJava virtual machineNo ?
Yacc AT&T/SunLALR(1)YaccCMixedexternalPOSIXNoFree, CPL & CDDL
Yacc++LR(1), LALR(1)YaccC++, C#Mixedgenerated or externalAllNoProprietary
YappsLL(1) ?PythonMixedgeneratedAllNoFree, MIT
yeccLALR(1) ?ErlangSeparategeneratedAllNoFree, Apache 2.0
Visual BNFLR(1), LALR(1) ?C#Separategenerated.NET frameworkYesProprietary
YooParseLR(1), LALR(1) ?C++MixedexternalAllNoFree, MIT
Parse[24]LR(1)BNF in C++ types ? ?noneC++11 standard compilerNoFree, MIT
GGLLLL(1)GraphJavaMixedgeneratedWindowsYesFree, MIT
ProductParsing algorithmInput grammar notationOutput languagesGrammar, codeLexerDevelopment platformIDELicense

Parsing expression grammars, deterministic boolean grammars

This table compares parser generators with parsing expression grammars, deterministic boolean grammars.

NameParsing algorithmOutput languagesGrammar, codeDevelopment platformLicense
AustenXPackrat (modified)JavaSeparateAllFree, BSD
AurochsPackratC, OCaml, JavaMixedAllFree, GNU GPL
BNFliteRecursive descentC++MixedAllFree, MIT
CanopyPackratJava, JavaScript, Python, RubySeparateAllFree, GNU GPL
CL-pegPackratCommon LispMixedAllFree, MIT
Drat!PackratDMixedAllFree, GNU GPL
FrisbyPackratHaskellMixedAllFree, BSD
grammar::pegPackratTclMixedAllFree, BSD
GrakoPackrat + Cut + Left RecursionPython, C++ (beta)SeparateAllFree, BSD
IronMetaPackratC#MixedWindowsFree, BSD
Laja2-phase scannerless top-down backtracking + runtime supportJavaSeparateAllFree, GNU GPL
lars::ParserPackrat (supporting left-recursion and grammar ambiguity)C++IdenticalAllFree, BSD
LPegParsing machineLuaMixedAllFree, MIT
lugParsing machineC++17MixedAllFree, MIT
MouseRecursive descentJavaSeparateJava virtual machineFree, Apache 2.0
NarwhalPackratCMixedPOSIX, WindowsFree, BSD
NearleyEarleyJavaScriptMixedAllFree, MIT
Nemerle.PegRecursive descent + PrattNemerleSeparateAllFree, BSD
neotomaPackratErlangSeparateAllFree, MIT
NPEGRecursive descentC#MixedAllFree, MIT
OMetaPackrat (modified, partial memoization)JavaScript, Squeak, PythonMixedAllFree, MIT
PackCCPackrat (modified, left-recursion support)CMixedAllFree, MIT
PackratPackratSchemeMixedAllFree, MIT
PappyPackratHaskellMixedAllFree, BSD
parboiledRecursive descentJava, ScalaMixedJava virtual machineFree, Apache 2.0
Lambda PEGRecursive descentJavaMixedJava virtual machineFree, Apache 2.0
parseppRecursive descentC++MixedAllFree, public domain
ParsnipPackratC++MixedWindowsFree, GNU GPL
PatternsParsing machineSwiftIdenticalAllFree, MIT
pegRecursive descentCMixedAllFree, MIT
PEG.jsPackrat (partial memoization)JavaScriptMixedAllFree, MIT
Peggy[25]Packrat (partial memoization)JavaScriptMixedAllFree, MIT
PegasusRecursive descent, Packrat (selectively)C#MixedWindowsFree, MIT
pegcRecursive descentCMixedAllFree, public domain
pestRecursive descentRustSeparateAllFree, MIT, Apache 2.0
PetitParserPackratSmalltalk, Java, DartMixedAllFree, MIT
PEGTLRecursive descentC++11, C++17MixedAllFree, Boost
Parser Grammar Engine (PGE)Hybrid recursive descent / operator precedence[26]Parrot bytecodeMixedParrot virtual machineFree, Artistic 2.0
PyPy rlibPackratPythonMixedAllFree, MIT
Rats!PackratJavaMixedJava virtual machineFree, GNU LGPL
RekexRecursive descentJavaMixedJava virtual machineFree, Apache 2.0
Spirit2Recursive descentC++MixedAllFree, Boost
TreetopRecursive descentRubyMixedAllFree, MIT
YardRecursive descentC++MixedAllFree, MIT or public domain
WaxeyeParsing machineC, Java, JavaScript, Python, Racket, RubySeparateAllFree, MIT
PHP PEGPEG Parser?PHPMixedAllFree, BSD

General context-free, conjunctive, or boolean languages

This table compares parser generator languages with a general context-free grammar, a conjunctive grammar, or a boolean grammar.

NameParsing algorithmInput grammar notationOutput languagesGrammar, codeLexerDevelopment platformIDELicense
ACCENTEarleyYacc variantCMixedexternalAllNoFree, GNU GPL
APaGeDGLR, LALR(1), LL(k) ?DMixedgeneratedAllNoFree, Artistic
BisonLALR(1), LR(1), IELR(1), GLRYaccC, C++, Java, XMLMixed, except XMLexternalAllNoFree, GNU GPL
DMS Software Reengineering ToolkitGLR ?ParlanseMixedgeneratedWindowsNoProprietary
DParserScannerless GLR ?CMixedscannerlessPOSIXNoFree, BSD
DypgenRuntime-extensible GLR ?OCamlMixedgeneratedAllNoFree, CeCILL-B
E3Earley ?OCamlMixedexternal, or scannerlessAllNo ?
ElkhoundGLR ?C++, OCamlMixedexternalAllNoFree, BSD
GDKLALR(1), GLR ?C, Lex, Haskell, HTML, Java, Object Pascal, YaccMixedgeneratedPOSIXNoFree, MIT
HappyLALR, GLR ?HaskellMixedexternalAllNoFree, BSD
Hime Parser GeneratorGLR ?C#, Java, RustSeparategenerated.NET framework, Java virtual machineNoFree, GNU LGPL
IronText LibraryLALR(1), GLRC#C#Mixedgenerated or external.NET frameworkNoFree, Apache 2.0
JisonLALR(1), LR(0), SLR(1)YaccJavaScript, C#, PHPMixedgeneratedAllNoFree, MIT
SyntaxLALR(1), LR(0), SLR(1) CLR(1) LL(1)JSON/YaccJavaScript, Python, PHP, Ruby, C++, C#, Rust, JavaMixedgeneratedAllNoFree, MIT
LajaScannerless, two phaseLajaJavaSeparatescannerlessAllNoFree, GNU GPL
ModelCCEarleyAnnotated class modelJavaGeneratedgeneratedAllNoFree, BSD
P3Earley–combinatorsBNF-likeOCamlMixedexternal, or scannerlessAllNo ?
P4Earley–combinators, infinitary CFGsBNF-likeOCamlMixedexternal, or scannerlessAllNo ?
Scannerless Boolean ParserScannerless GLR (Boolean grammars) ?Haskell, JavaSeparatescannerlessJava virtual machineNoFree, BSD
SDF/SGLRScannerless GLRSDFC, JavaSeparatescannerlessAllYesFree, BSD
SmaCCGLR(1), LALR(1), LR(1) ?SmalltalkMixedinternalAllYesFree, MIT
SPARKEarley ?PythonMixedexternalAllNoFree, MIT
TomGLR ?CGeneratednoneAllNoFree, "No licensing or copyright restrictions"
UltraGramLALR, LR, GLR ?C++, C#, Java, Visual Basic .NETSeparategeneratedWindowsYesProprietary
WormholePruning, LR, GLR, Scannerless GLR ?C, PythonMixedscannerlessWindowsNoFree, MIT
Whale CalfGeneral tabular, SLL(k), Linear normal form (conjunctive grammars), LR, Binary normal form (Boolean grammars) ?C++SeparateexternalAllNoProprietary
yaepEarleyYacc-likeCMixedexternalAllNoFree, GNU LGPL

Context-sensitive grammars

This table compares parser generators with context-sensitive grammars.

NameParsing algorithmInput grammar notationBoolean grammar abilitiesDevelopment platformLicense
LuZc[27][28]delta chainmodularConjunctive, not complementaryPOSIXProprietary
bnf2xmlRecursive descent (is a text filter output is xml)simple BNF grammar (input matching), output is xml ?Beta, and not a full EBNF parserFree, GNU GPL

See also

Notes

    References

    1. "Ragel State Machine Compiler".
    2. http://www.colm.net/open-source/ragel/
    3. "Adaptive LL(*) Parsing: The Power of Dynamic Analysis" (PDF). Terence Parr. Retrieved 2016-04-03.
    4. "Survey on Various Syntax Analyzer Tools". www.ijraset.com. Retrieved 2023-09-16.
    5. Boyland, John; Spiewak, Daniel (2010-09-17). "Tool Paper: ScalaBison Recursive Ascent-Descent Parser Generator". Electronic Notes in Theoretical Computer Science. Proceedings of the Ninth Workshop on Language Descriptions Tools and Applications (LDTA 2009). 253 (7): 65–74. doi:10.1016/j.entcs.2010.08.032. ISSN 1571-0661.
    6. "Beaver - a LALR Parser Generator". beaver.sourceforge.net. Retrieved 2023-09-16.
    7. Newton, Jim E.; Demaille, Akim; Verna, Didier (2016-05-09). "Type-Checking of Heterogeneous Sequences in Common Lisp" (PDF). Proceedings of the 9th European Lisp Symposium on European Lisp Symposium. ELS2016. Kraków, Poland: European Lisp Scientific Activities Association: 13–20. ISBN 978-2-9557474-0-7.
    8. "CL-Yacc — a LALR(1) parser generator for Common Lisp". www.irif.fr. Retrieved 2023-09-16.
    9. Hosseinpour, Sahereh; Alavi Milani, Mir Mohammad Reza; Pehlivan, Hüseyin (July 2018). "A Step-by-Step Solution Methodology for Mathematical Expressions". Symmetry. 10 (7): 285. doi:10.3390/sym10070285. ISSN 2073-8994.
    10. "CppCC's Home Page". cppcc.sourceforge.net. Retrieved 2023-09-16.
    11. "Java Cup". pages.cs.wisc.edu. Retrieved 2023-09-16.
    12. "CUP". www2.cs.tum.edu. Retrieved 2023-09-16.
    13. Thiemann, Peter; Neubauer, Matthias (2004-12-31). "Parameterized LR Parsing". Electronic Notes in Theoretical Computer Science. Proceedings of the Fourth Workshop on Language Descriptions, Tools, and Applications (LDTA 2004). 110: 115–132. doi:10.1016/j.entcs.2004.06.007. ISSN 1571-0661.
    14. Gray, Robert W.; Levi, Steven P.; Heuring, Vincent P.; Sloane, Anthony M.; Waite, William M. "Eli: a complete, flexible compiler construction system". Communications of the ACM. 35 (2): 121–130. doi:10.1145/129630.129637. ISSN 0001-0782.
    15. Owens, Scott; Flatt, M.; Shivers, O.; McMullan, Benjamin (2004-10-01). "Lexer and Parser Generators in Scheme" (PDF). Scheme 2004: Proceedings of the Fifth Workshop on Scheme and Functional Programming.
    16. Areias, Hugo; Simões, Alberto; Henriques, P.; Cruz, Daniela Carneiro da (2010-09-01). "Parser generation in Perl : an overview and available tools" (PDF). {{cite journal}}: Cite journal requires |journal= (help)
    17. "Building parsers for the web with JavaCC & GWT (Part one)". Chris Ainsley. 14 April 2014. Retrieved 2014-05-04.
    18. "Racc". i.loveruby.net. Retrieved 2021-11-26.
    19. "Racc Grammar File Reference". i.loveruby.net. Retrieved 2021-11-26.
    20. "The SLK Parser Generator supports C, C++, Java, JavaScript, and C#, optional backtracking, free".
    21. http://www.slkpg.site/license.txt
    22. "SLY (Sly Lex Yacc)".
    23. "Tree-Sitter - An incremental parsing system for programming tools".
    24. "Parse - Compile time (LR) type safe parser generator for C++". GitHub. 30 December 2021.
    25. Maintained fork of PEG.js
    26. "Parrot: Grammar Engine". The Parrot Foundation. 2011. PGE rules provide the full power of recursive descent parsing and operator precedence parsing.
    27. "LuZ: A context sensitive parser". 2016-10-17. Archived from the original on 2016-10-17. Retrieved 2018-10-17.
    28. "LuZc – A conjunctive context-sensitive parser". luzc.zohosites.com. Archived from the original on 2018-12-21. Retrieved 2018-10-17.
    This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.