[PUP-7033] Consider adding a StringScan Data Type that can Reuse Patterns Created: 2016/12/19  Updated: 2018/11/30

Status: Open
Project: Puppet
Component/s: Language, Type System
Affects Version/s: PUP 4.8.1
Fix Version/s: None

Type: Improvement Priority: Normal
Reporter: Trevor Vaughan Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
relates to PUP-7093 Parser Combinators Closed
Epic Link: 5.y Type System
Sub-team: Language
Team: Froyo
QA Risk Assessment: Needs Assessment


I'm honestly not sure how you would go about this, but currently you must repeat your Data Type aliases constantly to cover all cases.


Base IPv6 Address: https://github.com/simp/pupmod-simp-simplib/blob/master/types/ip/v6/base.pp

Bracketed IPv6 Address: https://github.com/simp/pupmod-simp-simplib/blob/master/types/ip/v6/bracketed.pp

It would be really nice if the Bracketed case could be something like the following instead:

type Simplib::IP::V6::Bracketed = Composite[String['['], Simplib::IP::V6::Base, String[']']]

Comment by Henrik Lindberg [ 2016/12/20 ]

Trevor Vaughan - so, the Pattern is an OR between a set of patterns. For this case, it looks like maybe a general And[T,...] type would work. Just a concat of regexps is difficult though, since it requires rewriting a pattern that is already anchored (as in your case).

Comment by Trevor Vaughan [ 2016/12/20 ]

Henrik Lindberg An AND would probably work.

String + String is easy. Regex combinations could be done as long as they are not anchored or if the first and last elements are the only ones that are anchored.

String + Regex + String should also be easy as long as it's a bookend.

Comment by Thomas Hallgren [ 2016/12/21 ]

I fail to see how an And would solve this problem. The first regexp is anchored. Saying that it should match together with another regexp will not create a match for something that is bracketed. Building regexps like this will require:

1. A string that matches the regexp without anchors.
2. The anchored 'base' regexp which amounts to string concatenation at beginning and end of #1
3. The anchored 'bracketed' regexp which a similar concatenation of #1 (it does not involve #2)

I don't think we will ever try to parse a regexp and make intelligent decisions on how to dissect and reassemble it so in order to address pattern reuse as presented here, we need to discuss how to make string concatenation possible in type expressions. One alternative could be:

type BasePattern = String['base pattern without anchors goes here']
type Base = Pattern[String['^'].instance + BasePattern.instance + String['$'].instance]

There's no instance method on a type at present, but there could be. It would distinguish the type from the instance that the type describes in cases where this is applicable (which it might be for String and Number, Struct, and Tuple types).

The distinction between type and instance is important. We already use binary operators on types and we may want to use '+' as a way to concatenating the types themselves.

Comment by Henrik Lindberg [ 2016/12/21 ]

Hm, what you seem to be after is more like interpolation into a "composite" regexp. Problem is then that you do not have variables to interpolate, only types. Maybe we can come up with something along those lines.

  • henrik
Comment by Henrik Lindberg [ 2017/05/16 ]

This is actually a string scanner type (like in a lexer). I can imagine that the type describes a sequence of scans (as done by the Ruby StringScanner). The types added as parameters must be string matching types. If a type describes variants (Variant, Patterns) it is taken as an OR. A type that matches an anchored regexp can be used, but it will match from current pos for ^ and the end of input/line for $ and \Z. As an example:

Scan[ '(', Pattern['a', 'b', 'c'], ')' ]

would match strings like "(a)", "(b)", and "(c)".

Comment by Moses Mendoza [ 2017/05/18 ]

Henrik Lindberg do you have an epic this issue might go into?

Generated at Sat Jul 11 11:05:32 PDT 2020 using Jira 8.5.2#805002-sha1:a66f9354b9e12ac788984e5d84669c903a370049.