[PUP-7093] Parser Combinators Created: 2017/01/11  Updated: 2017/05/18  Resolved: 2017/05/16

Status: Closed
Project: Puppet
Component/s: Language
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Normal
Reporter: Lucas Yamanishi Assignee: Unassigned
Resolution: Won't Do Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
relates to PUP-7033 Consider adding a StringScan Data Typ... Open
QA Risk Assessment: Needs Assessment


I'd like to be able to match context-free languages using the type system. Most of the pieces are there, but it needs a Concat or Combinator type. Here's how I see it working.

The Concat type would take multiple Scalars and all of its sub-types (converted to match strings), other Concat types, Undef and Optional where Undef matches the empty string, plus Variant types containing all these types (or maybe a new type of restricted Variant). The type would concatenate its parameter types to create a recognizer for larger strings. For example the following would match the strings 'foo bar' and 'foobar'.


This could also be used to match complete contex-free languages. Take the textbook example of balanced parentheses.

S → SS
S → (S)
S → ε

Or in BNF

<empty-string> ::= ""
<BP> ::= <empty-string>
    | ( <BP> )
    | <BP> <BP>

This could be expressed using Puppet type aliases as follows.

type BP = Variant[Enum[''], Concat[Enum['('], BP, Enum[')']], Concat[BP, BP]]

This alone could match some really complex things, like a user-defined subset of JSON, however it would be very difficult and ugly without PUP-7033.

Comment by Thomas Hallgren [ 2017/01/12 ]

Although elegant, I don't think this proposal is feasible. A simple example like the Variant above where the Concat starts and ends with an Enum could perhaps work, but the more patterns, enums, and other types come into play and gets concatenated, the more inefficient the type checking would become. Some kind of scanner would need to be involved, attempting to resolve the first type, then the next, if next fails, go back to the first, retry with a different resolution (a pattern may match a partial string in more ways than one, especially if it's not anchored), try the next again, etc. until all types resolve (or not). The number of possible permutations that the resolver would need to go through would very quickly become very high.

Comment by Henrik Lindberg [ 2017/01/13 ]

For this to work I think it needs an outer type that states the the type is a Grammar. The specification of the grammar's rules would then be based on types. That would require that there is a mapping from type to grammar rules and tokens. If the implementation of the lexing/parsing of that grammar is done naively in Ruby it would be horribly slow on anything beyond short input strings. I can imagine using racc to generate a parser, but that generation is not quick and something you would want to do as a build step. If we had access to C++ Boost parser generators it would be more feasible as it can build a grammar quickly at runtime.

Wonder if Peter Huene can be nerd sniped...

Comment by Thomas Hallgren [ 2017/01/13 ]

Then again, how useful is it to have a full-blown parser that isn't capable of building an AST? A type will just validate, it won't produce anything. If we were to seriously consider this, then we should also consider how to use it for more than just validating a text.

Comment by Lucas Yamanishi [ 2017/02/02 ]

Thomas Hallgren That makes sense. I think I would be happy with just Enum and Pattern-- not sure if either of those can take or return a start or finishing index. As far as returning an AST, that would be very useful. Perhaps something similar to the positional parameters of Regexp?

Anyway, just an idea. Thanks for looking!

Comment by Russell Mull [ 2017/05/16 ]

Thank you for filing this issue. However, we believe this change represents a technical direction that we have decided not to follow in Puppet. As such, we are closing this as “Won’t Do”. If any watcher believes this is an error, please add a comment explaining.

Generated at Sat Jul 11 11:56:53 PDT 2020 using Jira 8.5.2#805002-sha1:a66f9354b9e12ac788984e5d84669c903a370049.