Copyright © Quviq AB, 2007-2023
Version: 1.46.3
Main interface for grammar specifications
This module is used to create generators for grammar files. Note that grammars normally define all valid inputs that a parser must accept. The generators that we create from a grammar represent valid input streams that can be used to feed the parser. Moreover, the result of the parser is a syntax tree. For yecc grammars, we can directly create generators that create such syntax trees.
We support yecc grammars as experimental alpha release. Main limitation in this version is the limited control on the size of the generated token streams or syntax trees. In case of a deeply nested grammar, the generated objects become very large.
Assume you have a grammar specified as arithm.yrl, i.e., an Erlang yecc file.
Nonterminals E uminus. Terminals '*' '-' number. Rootsymbol E. Left 100 '-'. Left 200 '*'. Unary 300 uminus. E -> E '-' E: {minus,'$1','$3'}. E -> E '*' E: {times,'$1','$3'}. E -> uminus: '$1'. E -> number: element(3,'$1'). uminus -> '-' E: {minus,'$2'}.
Assume you want to generate arbitrary lists of tokens that are accepted by the parser of this grammar. In case of yecc grammars the parser is automatically generated and you may find it unnecessary to create streams of tokens. However, the example given shows that running a few tests in that way is cheap and useful. In case you have defined your (optimized) parser by hand, it gets even more useful.
In order to create lists of tokens that are accepted by the parser, we need to create generators like'E'() -> oneof(['E'(),{'-',nat()},'E'()],...]).
Where {'-',nat()} is a generator for the token "-", since we represent those tokens with an atom and a line number (standard for the Erlang scanner).
We know that we cannot specify recursive generators like above and that
we need a bound on the size. We have to find out base cases and analyze
the recursive structure. For example, in this grammar we have a mutual
recursive case in which uminus has no direct base case. In addition,
we have to worry about how we shrink the generated values in order to
find minimal counter examples.
Manually constructing the generators is more work than necessary, since
we do have the grammar as specification.
We define a QuickCheck specification that will contain all non-terminals as generator by specifying the terminals, i.e., the tokens that correspond to the terminals of the grammar.
-module(arithm_eqc). -compile({parse_transform,eqc_grammar}). -eqc_grammar({yecc_tokens,"../examples/arithm.yrl"}). -include_lib("eqc/include/eqc.hrl"). -compile(export_all). number() -> {number,nat(),nat()}. '*'() -> {'*',nat()}. '-'() -> {'-',nat()}.
We add a compiler option for the parse transformation defined by
eqc_grammar or alternatively we can compile it with a special option
viz. c("arithm_eqc",[{parse_transform,eqc_grammar},Options]).
Options may contain all normal compiler options and in addition options to specify which grammar format to use and whether to generate tokens or syntax trees.
Valid options are-eqc_grammar({yecc_tokens,FileName}).
After compilation with this parse transformation, for each non-terminal in the grammar you have a generator that generates arbitrary lists of tokens that the parser for this non-terminal should accept. Alternatively, you created a generator that generates an arbitrary syntax tree.
Note that you can always use the standard Erlang compiler option 'P' if you want to save the generated source code. Only do this for inspection, never as a start of a new module; the link between new code and specification will be lost.
The token generators return a symbolic representation of the parse tree
of these tokens. This helps in analyzing the cause of errors found. One
should not rely upon the internal format of the symbolic representation,
since it may change in the future. One is supposed to use access functions
to this data structure, in particular the eval/1
function that
evaluates a symbolic form and returns a list of tokens.
Given the generators from the grammar, there are some obvious properties that you can check. We give a few examples here.
Token generators are used to test a parser, but also to test whether the grammar is specified in the way you expect (in case you define the grammar instead of a specification someone else provides you with).
Property: Parser does not crash and accepts valid lists of tokensprop_parse() -> ?FORALL(SymbolicExpr,'E'(), begin Tokens = eqc_grammar:eval(SymbolicExpr), case arithm:parse(Tokens) of {ok,SyntaxTree} -> true; {error,_} -> false end end).Property: What you can parse and print, you can also scan.
prop_parse_print() -> ?FORALL(SymbolicExpr,'E'(), begin Tokens = eqc_grammar:eval(SymbolicExpr), {ok,SyntaxTree} = arithm:parse(Tokens), String = print(SyntaxTree), {ok,Scanned,_} = arithm_scan:scan(String), equal(Scanned,Tokens) end).
Note the function 'equal' here instead of writing '='. The reason is that tokens are equivalent up to the line number in which they occur. The property above applied to the given grammar will reveal a few errors.
A tree generator could be seen as a generated set of tokens which is parsed and produces an abstract syntax tree. Here, however, the syntax tree is directly generated from the grammar, without using the parser. The tree generator is used to test the code that normally uses the result of the parser, but it can also be used to test parser and scanner.
Property: Pretty printer can handle all possible syntax trees?FORALL(SyntaxTree,'E'(), begin print(SyntaxTree), true end).Property: Scanner works for all syntactically correct code
?FORALL(SyntaxTree,'E'(), begin Tokens = scan(print(SyntaxTree)), SyntaxTree == parse(Tokens) end).
eval/1 | Given a generated syntax tree in symbolic form, produces a list of tokens. |
eval(SyntaxTree) -> any()
Given a generated syntax tree in symbolic form, produces a list of tokens.
Generated by EDoc