Module eqc_grammar

Main interface for grammar specifications.

Copyright © Quviq AB, 2007-2023

Version: 1.46.3

Description

Main interface for grammar specifications

This module is used to create generators for grammar files. Note that grammars normally define all valid inputs that a parser must accept. The generators that we create from a grammar represent valid input streams that can be used to feed the parser. Moreover, the result of the parser is a syntax tree. For yecc grammars, we can directly create generators that create such syntax trees.

We support yecc grammars as experimental alpha release. Main limitation in this version is the limited control on the size of the generated token streams or syntax trees. In case of a deeply nested grammar, the generated objects become very large.

Assume you have a grammar specified as arithm.yrl, i.e., an Erlang yecc file.

  Nonterminals E uminus.
  Terminals '*' '-' number.
  Rootsymbol E.
 
  Left 100 '-'.
  Left 200 '*'.
  Unary 300 uminus.
 
  E -> E '-' E: {minus,'$1','$3'}.
  E -> E '*' E: {times,'$1','$3'}.
  E -> uminus: '$1'.
  E -> number: element(3,'$1').
 
  uminus -> '-' E: {minus,'$2'}.
  

Assume you want to generate arbitrary lists of tokens that are accepted by the parser of this grammar. In case of yecc grammars the parser is automatically generated and you may find it unnecessary to create streams of tokens. However, the example given shows that running a few tests in that way is cheap and useful. In case you have defined your (optimized) parser by hand, it gets even more useful.

In order to create lists of tokens that are accepted by the parser, we need to create generators like
  'E'() -> oneof(['E'(),{'-',nat()},'E'()],...]).
  

Where {'-',nat()} is a generator for the token "-", since we represent those tokens with an atom and a line number (standard for the Erlang scanner).

We know that we cannot specify recursive generators like above and that we need a bound on the size. We have to find out base cases and analyze the recursive structure. For example, in this grammar we have a mutual recursive case in which uminus has no direct base case. In addition, we have to worry about how we shrink the generated values in order to find minimal counter examples.
Manually constructing the generators is more work than necessary, since we do have the grammar as specification.

We define a QuickCheck specification that will contain all non-terminals as generator by specifying the terminals, i.e., the tokens that correspond to the terminals of the grammar.

 -module(arithm_eqc).
 
 -compile({parse_transform,eqc_grammar}).
 -eqc_grammar({yecc_tokens,"../examples/arithm.yrl"}).
 
 -include_lib("eqc/include/eqc.hrl").
 
 -compile(export_all).
 
 number() ->
    {number,nat(),nat()}.
 
 '*'() ->
    {'*',nat()}.
 
 '-'() ->
    {'-',nat()}.
 
 

We add a compiler option for the parse transformation defined by eqc_grammar or alternatively we can compile it with a special option
viz. c("arithm_eqc",[{parse_transform,eqc_grammar},Options]).

Options may contain all normal compiler options and in addition options to specify which grammar format to use and whether to generate tokens or syntax trees.

Valid options are
{eqc_grammar,yecc_tokens,FileName}
generators for tokens according to grammar specification in FileName, which is a .yrl file
{eqc_grammar,yecc_tree,FileName}
generator for syntax trees that can be generated from a .yrl file
As an alternative to passing compiler options, you can also include the option in the module by the attribute notation:
  -eqc_grammar({yecc_tokens,FileName}).
 

After compilation with this parse transformation, for each non-terminal in the grammar you have a generator that generates arbitrary lists of tokens that the parser for this non-terminal should accept. Alternatively, you created a generator that generates an arbitrary syntax tree.

Note that you can always use the standard Erlang compiler option 'P' if you want to save the generated source code. Only do this for inspection, never as a start of a new module; the link between new code and specification will be lost.

Tokens in Symbolic Form

The token generators return a symbolic representation of the parse tree of these tokens. This helps in analyzing the cause of errors found. One should not rely upon the internal format of the symbolic representation, since it may change in the future. One is supposed to use access functions to this data structure, in particular the eval/1 function that evaluates a symbolic form and returns a list of tokens.

Properties

Given the generators from the grammar, there are some obvious properties that you can check. We give a few examples here.

Tokens generator

Token generators are used to test a parser, but also to test whether the grammar is specified in the way you expect (in case you define the grammar instead of a specification someone else provides you with).

Property: Parser does not crash and accepts valid lists of tokens
  prop_parse() ->
     ?FORALL(SymbolicExpr,'E'(),
           begin
               Tokens = eqc_grammar:eval(SymbolicExpr),
               case arithm:parse(Tokens) of
                    {ok,SyntaxTree} -> true;
                    {error,_} -> false
               end
           end).
  
Property: What you can parse and print, you can also scan.
If necessary do write the function print, since it is useful for testing! In particulat you will find out when you make your parse tree too complex.
  prop_parse_print() ->
     ?FORALL(SymbolicExpr,'E'(),
           begin
               Tokens = eqc_grammar:eval(SymbolicExpr),
               {ok,SyntaxTree} = arithm:parse(Tokens),
               String = print(SyntaxTree),
               {ok,Scanned,_} = arithm_scan:scan(String),
               equal(Scanned,Tokens)
           end).
  

Note the function 'equal' here instead of writing '='. The reason is that tokens are equivalent up to the line number in which they occur. The property above applied to the given grammar will reveal a few errors.

Tree generator

A tree generator could be seen as a generated set of tokens which is parsed and produces an abstract syntax tree. Here, however, the syntax tree is directly generated from the grammar, without using the parser. The tree generator is used to test the code that normally uses the result of the parser, but it can also be used to test parser and scanner.

Property: Pretty printer can handle all possible syntax trees
  ?FORALL(SyntaxTree,'E'(),
          begin
             print(SyntaxTree), true
          end).
  
Property: Scanner works for all syntactically correct code
  ?FORALL(SyntaxTree,'E'(),
          begin
             Tokens = scan(print(SyntaxTree)),
             SyntaxTree == parse(Tokens)
          end).
  

Function Index

eval/1Given a generated syntax tree in symbolic form, produces a list of tokens.

Function Details

eval/1

eval(SyntaxTree) -> any()

Given a generated syntax tree in symbolic form, produces a list of tokens.


Generated by EDoc