[PUP-5648] Add Iterable type and runtime object Created: 2016/01/04  Updated: 2016/03/18  Resolved: 2016/02/10

Status: Closed
Project: Puppet
Component/s: None
Affects Version/s: PUP 4.3.1
Fix Version/s: PUP 4.4.0

Type: Bug Priority: Normal
Reporter: Peter Huene Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Blocks
blocks PUP-5729 Add a reverse_each function Closed
blocks PUP-5771 Add introduction to Iterable and Iter... Resolved
blocks PUP-5772 Update lang specification with new ty... Resolved
Relates
relates to PUP-5658 Disallow numeric ranges where from > to Closed
relates to PUP-4702 Replace rgen model for the Puppet Typ... Closed
Support
supports PUP-5773 Performance Work in Puppet Designing
Template:
Epic Link: 4.x Type System
Story Points: 2
Sprint: Language 2016-01-13, Language 2016-01-27, Language 2016-02-10
Release Notes: New Feature
Release Notes Summary: Two new types where added to the puppet type system; Iterable[T], a type for values that an iterative function can operate on (i.e. a sequence of type T), and Iterator[T] an abstract Iterable that represents a lazily applied iterative function (that produces a sequence of T). In practice an Iterable that is also an Iterator describes a value that can not be assigned directly to attributes of resources; to assign the values an Iterator must first be iterated over to construct a concrete value like an Array).

Values of type Array, Hash, String, Integer, Iterator, and the types Type[Enum], Type[Integer] are Iterable,

An introduction to Iterables will be available in predocs, and the specification will be updated in separate tickets linked to this ticket)

 Description   

Commit 3a9db247c432d469e625d4abf7bee8e747178af9 regressed support for "descending" ranges, where the "max" is less than the "min":

notice Integer[5, 1]
Integer[5, 1].each |$x| {
  notice $x
}

This is expected to output:

Notice: Scope(Class[main]): Integer[5, 1]
Notice: Scope(Class[main]): 5
Notice: Scope(Class[main]): 4
Notice: Scope(Class[main]): 3
Notice: Scope(Class[main]): 2
Notice: Scope(Class[main]): 1

This is because having a maximum less than a minimum treats the range as being in "descending" order.

Instead, this is the actual output:

Notice: Scope(Class[main]): Integer[1, 5]
Notice: Scope(Class[main]): 1
Notice: Scope(Class[main]): 2
Notice: Scope(Class[main]): 3
Notice: Scope(Class[main]): 4
Notice: Scope(Class[main]): 5

It appears the min and max are silently being swapped when the max < min.

Related: I believe the compiler should error if a type is given a max argument less than a min argument when the type does not support a "descending" semantic, e.g. Callable[1, 0] should error rather than silently swapping the min and max.



 Comments   
Comment by Thomas Hallgren [ 2016/01/05 ]

I'm against having direction as the property of a type. A type describes a constraint and it's purpose is to validate instances of a type. I would be in favor of instead adding a check that asserts that min <= max at all times. This would also resolve the second issue regarding types that doesn't support "descending" semantic (and other weirdness).

Iterating backwards can be resolved in other ways. It's hardly a motivation for allowing "descending" semantic in this context.

Comment by Peter Huene [ 2016/01/05 ]

I am also in agreement that the "descending" property really shouldn't be on a type, for consistency's sake alone. It should really be an argument to each that says "we want to enumerate in reverse order".

Integer[0, 10].each(true) |$x| {}

But perhaps with a construct that makes it clearer than passing a boolean, such as an expected string, e.g. each(reverse), where the default value is undef meaning 'in-order'?

Or better yet, perhaps just have a reverse_each function (naming is hard though)?

The nice thing about doing it this way is that we can now enumerate arrays and hashes in reverse-order too, not just ranges.

Backwards compatibility concerns aside, I'd also like the compiler to error if max < min for all types that support them.

Comment by Henrik Lindberg [ 2016/01/05 ]

It snowballs with reverse_map, reverse_reduce, ... etc. So I don't like that much. Better to use a function to do a reverse instead. The problem is naturally that it requires a copy to be made as we have no enumeration runtime type (which I would really like to have, now that is implemented as a crutch with a special - is_enumerable or something like that). For an integer range it would need to create an array with every number to be able to iterate over it.

Regarding Integer and Type as ranges - I don't think it is odd that the range is specified with two values is min/max or max/min order - not if you think of these types as "type in a range of values" - it is after all the exact same range of values (since it is inclusive at both ends). I can understand the dislike though.

I can imagining supporting a Range runtime object (of a new type Range[T]), that can be specified with min/max, and Integer, a direction, and or a step value (negative for reversed stepping). It could be a subtype of Integer if we like. I can also imagine adding direction to the Integer type even if that is only of value for iteration (as that makes it clear what min/max are, and they must be in order).

What I think would be the best and most flexible though is to support an Iterable[T] type (and corresponding runtime object), we can then write functions like "in_reverse", step_in_reverse(2) and what not. This is the most flexible solution. There is more to implement for that I think, but not much.

We may even want both. They are both good to have.

Comment by Thomas Hallgren [ 2016/01/06 ]

We should design the Iterable.reverse to return a new Iterable and make the Iterable intelligent enough to prevent full copies and also to distinguish chained calls from a block. That would enable:

Iterable.on(Integer[0,10]).reverse() |$x| {}
Iterable.on(Integer[0,10]).reverse.map() |$x| {}
Iterable.on(Integer[0,10]).reverse.step(2) |$x| {}

Now let's assume that Iterable.on(Integer[0,10]).reverse() is equivalent to Integer[0,10].reverse(). Now we can do everything without snowball effect and with no need to create an internal full copy of the range:

Integer[0,10].reverse() |$x| {}
Integer[0,10].reverse.map() |$x| {}
Integer[0,10].reverse.step(2) |$x| {}

We achieve this by either having the Integer type be an instance of Iterable (the ruby implementation is already considered 'enumerable', albeit not by inheritance) or by letting it's reverse method do Iterable.on(self).reverse.

Comment by Henrik Lindberg [ 2016/01/06 ]

Since we want to make these changes, and they are an addition / new feature we cannot change this in 4.3.2. Instead for 4.3.2 I suggest that we error if min > max. We then target better iterable support at 4.4. Suggest we create a new ticket for making min > max an error, and retarget this ticket.

Comment by Henrik Lindberg [ 2016/01/06 ]

Some things to consider on this ticket (extended Iterable support):

  • there is already a 'reverse' function in stdlib, we cannot steal that name as it would shadow the existing function. (using the name in_reverse below to differentiate). We could come up with some standard naming scheme that makes sense - "reversing", "stepping", or "reversed", "stepped" or somesuch.
  • It will be hard to distinguish between chained calls, regular calls, etc. I think it is up to the functions that receive an Iterable what to do (e.g. step(2), and in_reverse).
  • we need to protect attributes from being assigned data types not serializeable in a catalog (we alrady have this problem, but now it is easier to make mistaks that e.g. assigns an intermediate Iterable say the result of [1,2,3].in_reverse() to an attribute; where it should have been [1,2,3].in_reverse.map ....
Comment by Thomas Hallgren [ 2016/01/07 ]

Added PUP-5658 for the min > max error.

Comment by Thomas Hallgren [ 2016/01/07 ]

I agree that it must be up to the function in question to interpret what to do. That doesn't mean we must use other names on the runtime implementation. In fact, if we use the name reverse for the method, then the stdlib function will function correctly if we add the Iterable class as one of the classes that it can handle (currently only String and Array or if we change it to instead check if the argument responds to reverse (which would be an improvement IMO). If we then also enhance that function to understand that it can use a block, well, then we have all we need. Perhaps a good exercise when converting stdlib functions to 4.x?

Comment by Henrik Lindberg [ 2016/01/07 ]

There is a very long fuse for things like that for standard lib as it does not move in step sync with puppet releases and maintains backwards compatibility (so some tricks are needed to have one and the same version support multiple puppet versions), and I would really like it to be part of core. I so wish that all of the stdlib functions were prefixed with std::...

Regarding supporting "any object that responds to" was a strategy used a lot in the past and that led to maintenance and API headaches later. I would rather limit the set of classes that a feature supports, or classes implemented to deliberately be used for that specific purpose.

I kind of still like the idea of calling methods that return an Iterator to spell that out e.g. reverse_iterator as that makes it more explicit what is going on and reduces the risk people misunderstand what they can do with the returned value.

BTW: I suspect we also have to have proper scopes for this to work ( a complex chain with lambdas that end with returning an iterator and we have implemented lazy evaluation), but I am not sure if it is actually needed.

Comment by Thomas Hallgren [ 2016/01/11 ]

What happens if Puppet implements a function using the same name as a function provided by a module? Will the module function get precedence? If so, is it a good thing to give modules the ability to override core functions in puppet? If not, then I'd say the discussion about reverse is moot. Let's just reimplement in puppet in a backward compatible way.

Comment by Henrik Lindberg [ 2016/01/11 ]

For 4.x functions it is well defined: core implementation wins, and 4.x functions wins over 3.x functions. For 3.x functions it is undefined but I believe that core 3.x functions will win over module 3.x functions but I suspect people can do things with gems and load paths to make this not be true.

Comment by Henrik Lindberg [ 2016/01/19 ]

This ticket also needs a PR for the language specification since it adds types.

Comment by Henrik Lindberg [ 2016/01/19 ]

As suggested in comments on the PR, I think we need two types for this:

  • Iterator - i.e. an instance where the specifics of the iteration has been abstracted (except the type of "each").
  • Iterable - matches all instances that for which an iterator can be created - ie. Collection, String, Integer, Type[Integer], Iterator. (I can think of extensions here; Type[Enum], Type[Variant], Type[Tuple] - i.e. all types that have an interesting iterable aspect).

Further, I think they both need to be parameterized, such that an Iterable[Integer] does not match a String (since it is an Iterable[String].

I also think we need to prevent Iterator instances from being serialized. (This makes it impossible for someone to do this:

notify { example: message => [1,2,3].reverse_each() }

That will fail when the catalog is produced when an Iterator is not serializeable.

Likewise, string interpolation of an Iterator must be defined: it should either fail, shrinkwrap (as if join("") was used), or separate entries as if join(", ") was used.

Since the iterative functions (e.g. each) requires that a block is given, they cannot be used to produce an Iterator. The PR contains the reverse_each which can. This is not quite symmetrical. How about having the two functions:

  • iterator - returns a normal "each" iterator
  • reverse_iterator - returns an iterator that yields elements in revere order.

We could allow the functions to directly function as "each" iterators as a convenience/shorthand for "each", but I am not sure that is a good idea. If instead, they just return an iterator, if you want "each" behavior, you would simply call each.
(Alternatively, change each to allow it to return an Iterator) - but I think having separate functions makes it less likely hat a beginner gets into trouble by forgetting the lambda and then ending up with a mysterious error; calling the new functions implies that you know what you are doing).

[1,2,3].iterator.each 
[1,2,3].reverse_iterator.each

Comment by Henrik Lindberg [ 2016/01/19 ]

Also, as commented on the PR, there are too many features being delivered in the same PR. I prefer additional tickets for the new functions, as well as for the modified behavior of existing functions. This ticket can be used for the type(s).
One reason for splitting this up is that they should be released noted separately, they all need (different) documentation work, etc.

At the same time; the implementation is almost there, and when it has been polished up with the comments, split up into tickets it will be a very nice open ended addition to the features of the language.

Comment by Henrik Lindberg [ 2016/01/19 ]

Coming back to the proposed function iterator(). I think that it should not accept a block for now. Later it could accept one, but it would be a block that is called to produce values. For that to work, we must also add a yield, break, next functions, and some way to deal with size of unbound iteration (and how to handle reverse). In the example below, size is capped by giving it to the iterator as an argument.

This mechanism would provide an alternative to writing recursive functions. Here is fibonacci as an iterator (the iterator/lambda is like a reduce operation where the block yields to the block given when each, map, etc. operates on the iterator.

function fib_iterator($n = 10) {
  iterator($n) | $prev, $n | {
    if $prev == 0 { 
      yield(1)
      [1 1]
    } else {
      $next = $prev[0] + $prev[1]
      yield($next)
      [$prev[1], $next]
    }
}
# notice the 5 first fib values
5.fib_iterator.each |$x| { notice $x }

I am only posting this as rationale for not accepting a block to the iterator and reverse_iterator functions for the purpose of turning them into variations of each. To actually implement a proposal like this, there are many things to decide on first.

Comment by Thomas Hallgren [ 2016/01/20 ]

I would consider the Iterator runtime type as an implementation detail that shouldn't be exposed in PL. In fact, I think that would be plain wrong to do that since the general idea of the Iterable[T] is that it should describe any runtime type that is iterable and yields elements of type T. The reason it's great is that we don't care about the implementation.

All iterative functions must instead accept an Iterable argument (today they accept Any) and not care if the actual implementation is an iterator, enumerator, or other type. Instead, each, reverse_each, filter, map, reduce, and step can all be called directly on an Array, String, Iterator or anything else that is an instance of an Iterable.

The important thing, not covered by the PR in it's current shape, is that the Iterable type recognizes the full set of iterable objects which means that the PIterableType#instance? and PIterableType#assignable? must be enhanced.

Example:

[1,2,3].reverse_each

This function, when called without a block, takes an Iterable as an argument and returns a new Iterable that will yield each element of the given argument in reverse order. The internal implementation might be an iterator. Then again, it might not. It all depends on it's argument, an Array responds to reverse_each which returns a ruby Enumerator which in itself is Iterable. The important thing is that we can consider the argument and return type to be Iterable, not that it is an Iterator, i.e. the specific implementation of Iterable is irrelevant.

The each function should of course change so that it can be called without a block to make it symmetrical.

The proposed iterator method will actually make all current iterative function somewhat asymmetric since one could argue that:

4.map |$x| { $x * 10 }

shouldn't work. To be symmetric, it should be:

4.iterator.map |$x| { $x * 10 }

which of course isn't backward compatible. I'm just mentioning this to point out that we've already chosen a route where we consider all enumerables to be allowed as the first argument in iterative functions.

An iterator function would behave exactly like each when called without a block. I don't think we need that. It makes a lot of sense the way the fib_iterator example is written though, and the return type of that function (once it can be declared) must of course be an Iterable[Integer] so that one then can do:

5.fib_iterator.reverse_each |$x| { notice $x }

Regarding serialization/interpolation of an Iterable. I think that's up to the actual implementation. Many Iterable objects already have a defined behavior. For others, we could dictate that the default behavior is to convert the iterable into an array before it's serialized. The given example:

notify { example: message => [1,2,3].reverse_each() }

would then create the message [3,2,1] which makes sense.

Comment by Thomas Hallgren [ 2016/01/20 ]

Another important aspect that to consider is that an Iterable doesn't retain the notion of a current position whereas an Iterator does.

$c = [1,2,3].iterator
$c.next
# Now what does this mean?
$c.each |$x| { notice($x) }

I don't think those two concepts mix in a good way. If we decide to introduce the concept of an Iterator, then it should be for the purpose of supporting the iteration state which in turn is somewhat contradictory to declarative programming.

Comment by Thomas Hallgren [ 2016/01/20 ]

Pushed a W.I.P. commit for review (not intended to be merged) where the Iterable#instance? and Iterable#assignable? has been imlemented, the each can be called without a block, and all functions acting on an Iterable now uses the Iterable type instead of Any in the declaration of its first parameter.

Comment by Henrik Lindberg [ 2016/01/20 ]

So, what is the result of this (assuming type_of function returns the type)?

$a = [1,2,3].each()
notice type_of($a)

I can see this either being Array[Integer] (since it is iterable already), or Iterable[Integer]. In fact, I wonder why you ever need to call each without a block since it is more or less a no-op. Sure, if it returns an instance of Iterable it is different, but then again, it really has no effect.

The reverse_each is different naturally.

The implementation of next should not be a stepping function on an "iterator", it is a function that raises a signal to "go to next" in the current ongoing iteration (like the next method in Ruby) - the same can be achieved with regular conditional logic (but with more if-the-else spaghetti).

I think we are talking past each other. The reason I wanted two types was to be able to tell an actual Iterable apart from something that is iterable (e.g. an Array). For that purpose I invented Iterator. In fact, the implementation as it is would work by having the type system simply calculate that:

Array < Iterable == true
Iterator < Iterable == true
Array < Iterator == false

By using instances of Iterable as (for what I called Iterator instances), there is no way to tell them apart in Puppet. I am not dead certain that is a problem, but I suspect there could be situations where you actually want to know - perhaps more in the ruby code, that in puppet logic (and there it is possible to make a distinction, but you cannot make that distinction using the type system, you have to use the Ruby runtime type to check).

The iterator function must be called with a block; it is that block that defines what that iterator yields. It is different from each. In a way, it is the opposite of reduce (many to one) - as it has the power to yield more than its input. Handling size is an issue, I can see us supporting unbound iterators; you cannot use them to directly iterate, but you can cap them. This is very useful for generating sequences of unique names etc. (I capped the fib function directly, but it could be done with a general function instead). When supporting that, the uncapped iterators can naturally not be serialized.

5.times(fib) |$n, $f| { notice "Fib[$n] = $d" }

Comment by Thomas Hallgren [ 2016/01/20 ]

Totally agree that each without a block is more or less a no-op. I added it to satisfy your remark on symmetry only. I don't mind leaving that out.

I like the iterator function when it must be called with a block, albeit I think that perhaps that too, should be called iterable since that is what I think it returns.

I read what you wrote about Iterator but I don't agree. You can get the next element with next and that changes the state of the iterator instance. Compare to Ruby's Enumerator. Whether you then can ask it has_next? or simply get a StopIteration exception is just an implementation detail. You cannot do next on an Enumeration.

I don't understand you last example with 5.times(fib). We have no times function but Puppet defines the Ruby times as the default enumerable for a positive integer.

Comment by Henrik Lindberg [ 2016/01/20 ]

I think each should require a block.

I don't mind naming the iterator function iterable (if we end up without an Iterator type.

I don't understand why you think there is a next on an Iterator, Iterator would simply be a type, nothing else, there are no operations on it.

The last example with times invented the times function as something that iterates that many times and can accept an unbound iterable/iterator from which it picks values. It can be viewed as a zip that joins two sequences and stops when the first sequence stops; its actual design would probably be different (it is just an example of a "capping" function).

Comment by Thomas Hallgren [ 2016/01/20 ]

I am thinking about Iterator as a type. It's not "operations" on it that I'm reacting to. It's conceptual. What an iterator is compared to an iterable. In my world there's a distinct difference between the two in that an iterator is something that knows where it's at. An iterable doesn't. I see no other reason why we would ever expose an iterator type than to make that distinction.

Now I understand your times sample. For a function like that, I'd prefer something that could be chained. Your sample would then be:

fib.limit(5) |$n, $f| { notice "Fib[$n] = $d" }

or, perhaps using the term 'cap'?

fib.cap(5) |$n, $f| { notice "Fib[$n] = $d" }

Comment by Thomas Hallgren [ 2016/01/21 ]

I updated the PR to only cover the Iterable type and created PUP-5729 and PUP-5730 to cover the new reverse_each and step functions.

Comment by Henrik Lindberg [ 2016/01/21 ]

Thanks for splitting up the work into several tickets. Helps focusing on one thing at the time.

Agree on your suggestion that a cap function is best as coming after the unbound (it pulls at most n times).

Regarding the difference in opinion about Iterator - I like to think that objects of type T for which an Iterable[T] can be created are different than an instance of Iterable[?] even if asking for an Iterable for an Iterable. The only thing I want is to be able to tell an actual Iterable apart from the others. I cannot do that with:

$x =~ Iterable

Since that would be true for both an Array (and a bunch of others), as well as Iterable.

I think I would need the type_of function, and then check if the type == Iterable.
So, how do you tell them apart?

Comment by Thomas Hallgren [ 2016/01/21 ]

That's pretty much the same question as $x =~ Data, isn't it? You'll need to check if it's an Array, as well as Data.

Comment by Henrik Lindberg [ 2016/01/21 ]

There is never anything that is an instance of Data floating around in the system - that type is completely abstract. That is not the case for Iterable, and that is precisely why I want to be able to tell them a part.

I am fine if it is encoded in Iterable actually, but it is kind of ugly. I.e. Iterable[T, true] could mean what I called Iterator such that all of these are true:

[1,2,3] =~ Iterable
[1,2,3] =~ Iterable[Any, false]
[1,2,3] !~ Iterable[Any, true]

We currently have the type_of in stdlib. I would like that function in puppet (or that we had an operator that means type_of. If we have that, then it would be perfectly ok to do this:

$x.type_of == Type[Iterable]

That may be a bit tricky because the LHS may be Iterable[T] and RHS is Iterable (unspecified T). Those types are not really equal.

Comment by Thomas Hallgren [ 2016/01/21 ]

Why don't you consider Iterable to be completely abstract? I really want to do that because I can see no gain in exposing the internal workings of the iterative functions.

Perhaps it would be easier to understand if you could present a use-case where it makes sense to use an Iterator or the second argument of the Iterable type.

Comment by Henrik Lindberg [ 2016/01/21 ]

If I do this:

$a = [1,2,3].reverse_each

then, $a is bound to a instance of Iterable, and it is no longer a pure implementation concern as the instance of Iterable that wraps the Array knows it is in "reverse" (and it is not an Array - i.e. it is not abstract, there is an actual instance, and it is not the same as an Array.

Comment by Thomas Hallgren [ 2016/01/21 ]

I don't agree at all. Depending on implementation, this might well be true (it won't be with the current implementation, but that's beside the point):

[3,2,1] == [1,2,3].reverse_each
[1,2,3].reverse_each =~ Array

My point is, you should make no other assumption about what the result of an iterative function is other than:

[1,2,3].reverse_each ~= Iterable

If you do, then you lock your code into one specific implementation and we don't want to add implementation specifics into the Puppet spec.

Comment by Henrik Lindberg [ 2016/01/21 ]

If we add type promotion / transformation yes, than we never need to know the type in the Puppet Language. An implementor of a function in Ruby should expect certain methods to be there if it is one or the other. We must deliver something that ruby is_a? Array since we have specified that Puppet Array is represented by Ruby Array. Hence, the conversion must be considered when matching against types in a a dispatcher, and the invocation logic must handle the type transformation.
The transformation must also kick in when serializing. (It is both serialized and deserialized as an array, and if the Iterable is unbound it is an error to serialize it, or transform it into an Array.

As long as that is not handled, we should not have any functions that returns an implementation of an Iterable other than the already supported concrete data types (e.g. instances of Array). The same rationale as for lambda (you are not allowed to return one since it is not first order in the language. (For lambda, when we have a sane stack implementation, we can implement the call operation or a call function). Same thing here.

At the same time, we are in no particular hurry. It is very nice to have but no one is screaming for it. I can imagine adding the type first without exposing the functions to the puppet language..

The idea of representing the "iterator" is that you would not have to do all that yet. It is defined what it is, and the operators does not support it, and there is no promotion. Serialization must however be handled or they will for sure end up in catalogs with completely weird behavior as the result. This does not hide the implementation details completely, but to a very large degree, as the "iterator" type would include all such implementations whatever they may be (they must share some Ruby method API, but that is all). (I tried to find a synonym to Iterator, but found none, and Iterable/Iterator, sort of goes hand in hand (Enumerble and Enumerator in Ruby terminlogy). I do agree that it is not an Iterator in the sense of a Java Iterator as it does not hold iteration state, the actual iterator in puppet is the function that drives the iteration.

At the same time, I would rather have the nice and opaque implementation. Do you want to do it that way? If so, we do not release this as a feature until all parts are ready - say Puppet 5.0.0 (I don't think this has high enough priority for 4.4.0 to push out some of the other things (and 4.4.0 is not far away)).

Comment by Henrik Lindberg [ 2016/01/21 ]

I think it is important to have a defined way in Ruby to check for the type of Iterable in such a way that it does not have to have a priory knowledge of all possible kinds of iterables. This, if you are writing functions for communication, if you are in JRuby/Java land etc. Using the Puppet type system would be nice , i.e. having an Iterator type), but checking against a ruby "marker module" would also be fine.

Comment by Thomas Hallgren [ 2016/01/22 ]

Checking if something is an Iterable using Ruby is simple. Just do PIterableType::DEFAULT.instance?(something) or use the type Iterable in a dispatch. Using Ruby is_a? is not an option since Puppet consider objects like ruby String, Integer, and Enumeration to be Iterable. The PR does provide the "marker module" Puppet::Pops::Types::Iterable for those created by Puppet though.

I agree that we must provide a method of determining if an Iterable is unbound or not. I suggest that we do that by adding Iterable::unbound?(object) to the "marker module" (as a class method, not an instance method). This method can check if the object is an Iterable and if it is either an integer or if it responds to size. This will cover PIngeterType, PEnumType, String, Array, Hash, Enumerator, and other internal implementations. We might encounter a Ruby instance that responds to Enumerable but isn't an Enumerator. Those would be the ones that are unbounded. The internal wrapper for an Enumeration should raise an exception on #to_a for such objects.

Comment by Thomas Hallgren [ 2016/01/22 ]

And yes, I want the nice and opaque implementation. And besides lacking the unbound? method (I'll add that), I think it is ready right now

Comment by Henrik Lindberg [ 2016/01/25 ]

There are comments on the PR that are not addressed. I resolved a couple, but the others need to be addressed.

If we are going with only the "Iterable" runtime object. I want to see this ticket include the work that makes it not leak into a catalog (in a defined way rather than some mysterious crash, or whimsical result). I think that is easily done for a large class of cases; this since we transform arguments when handing them off to create resources, classes, etc. That transformer should simply error in a nice way and tell the user that it is not supported. Alternatively, it will end up as an actual value in a Puppet::Resource, and there will be attempts to serialize it. Unsure how that will end (in tears over some obscure error probably). I would like this ticket to cover that.

Alternatively, it is acceptable to do to_a in the transformation layer. And to do the same if there is an attempt to serialize such a value (alternatively, raise a meaningful error for that).

The tricker problem here is that (IIRC) the transformation to 3.x does not do deep transformations on arrays and hashes and thus unsupported values can end up inside of those. Some compromises may be needed speed vs. safety (IIRC, we do have things that can leak), and we expect those that accept hashes etc. to either only do clean functional style things (return values), or carefully check what is in the data they were given. (e.g. you can probably use create_resources to stick funny values into a resource).

We assume that if you are writing 4.x functions you are aware of the type system, and that you are reasonably aware of Iterables (esp. if you are trying to use them).

Comment by Thomas Hallgren [ 2016/01/26 ]

Regarding leaking into a catalog. It seems to me that what's needed is a general mechanism for protecting the catalog. This, since any Ruby function may return an unsupported value that can cause mysterious crashes or whimsical results. Consequently, I don't think such a protection mechanism isn't limited to Iterable and it should be covered by a separate ticket. I'm also not sure if that protection mechanism should do attempts to transform. I think it would be better to simply error out with a message informing the user about the unsupported value. If a transformation is needed, it's better that the user adds that.

Comment by Henrik Lindberg [ 2016/01/26 ]

I don't see how they can add that when they cannot tell things apart in Puppet. Hence my quest for an additional type. If it is supposed to be transparent it must be transparent. If users needs to take care they need to be able to detect the "Iterables" that are not serializeable/supported. With only one type that is impossible.

If we have the "Iterator", or "Enumerator" (if that feels better, the name of it is not important) type representing the non serializeable/"virtual" objects then it is ok to pass them around. The alternative is doing stuff like:

$x =~ Iterable and $x !~ Collection and $x != Integer and $x != Type[Integer] .and $x !~ String
and $x =~ Runtime[ruby, 'Range'] and ...

And such code needs to be adjusted if there happens to be another thing that is "Iterable" in a newer version, or something someone added support for in a function/module. With the separation into two types it is the responsibility of the implementor of the new kind of "Iterable" to ensure it is done correctly, otherwise it is every user's responsibility.

Just because there are mistakes, or mechanisms that allow others to make bad implementations (like returning something unsupported from functions) does not mean we should pile on to that.

I would be happy if the following is done:

  • There is an Iterator/Enumerator type that allows the potentially problematic objects to be detected. Users can then transform the Iterator/Enumerable to an Array.
  • A mechanism that allows them to do this transformation (As a crutch, I can imagine that adding an empty Array to it would produce an Array; a better solution is available when we add support for new and you can transform more freely between data types).
  • The 3.x transformer errors if it sees Iterator/Enumerable (this will at least protect those that make easy to do/typical mistakes. It will prevent the majority of problems. The rest can be dealt with in separate tickets (possibly rewrite the create_resources 3.x function etc.)

We can then pun on serialization, and continue the discussion to make the Iterable operations transparent.

Comment by Henrik Lindberg [ 2016/01/28 ]

The PR looks good in general. I left a couple of comments regarding bugs.
There is a commit that changes the signatures of functions but forgets to mention that.
The documentation for each function must be updated to reflect the change behavior and features, I left suggestions on the PR what I think needs to be covered.

As I said there, I think it should be brief in each function. Later we need to explain the concept of "Iterable" and "Iterator" on the docs site.

Before we are completely done with this, we also need to update the Language Specification. I think we also need to describe the concept of Iterable and Iterator in predocs as the specification will be too terse and not contain enough examples. (We can create additional tickets as needed), but I like to see the bugfixes and function docs updated before we merge.

Moving this back to doing.

Comment by Henrik Lindberg [ 2016/01/29 ]

Merged to master at: bce6b0a

Comment by Henrik Lindberg [ 2016/01/29 ]

The performance implications of supporting iterables is that chains of iterations can be performed lazily instead of creating intermediate objects at each step in the iteration.

It is small contribution but reduces the amount of generated garbage and peak memory usage.

Comment by Peter Huene [ 2016/02/10 ]

Verified in bce6b0a with:

function test($value, $filter = undef) {
    notice "$value is iterable: ${$value =~ Iterable}"
    if $value =~ Iterable {
 
        notice 'values:'
        $value.each |$element| {
            notice $element
        }
 
        notice "mapped: ", $value.map |$element| {
            $element
        }
 
        if $filter {
            notice "filtered: ", $value.filter |$element| {
                $element != $filter
            }
        }
    }
}
 
$a = 1
test($a, 0)
 
$b = foo
test($b, o)
 
$c = [1, 2, 3]
test($c, 2)
 
$d = {a => b, c => d, e => f}
test($d, [a, b])
 
$e = Integer
test($e)
 
$f = Integer[0]
test($f)
 
$g = Integer[default, 10]
test($g)
 
$h = Integer[0, 10]
test($h, 5)
 
$i = Enum[foo, bar, baz]
test($i, bar)
 
$j = Type
test($j)
 
$k = /foo/
test($k)

With output:

$ puppet apply test.pp
Notice: Scope(Class[main]): 1 is iterable: true
Notice: Scope(Class[main]): values:
Notice: Scope(Class[main]): 0
Notice: Scope(Class[main]): mapped:  [0]
Notice: Scope(Class[main]): filtered:  []
Notice: Scope(Class[main]): foo is iterable: true
Notice: Scope(Class[main]): values:
Notice: Scope(Class[main]): f
Notice: Scope(Class[main]): o
Notice: Scope(Class[main]): o
Notice: Scope(Class[main]): mapped:  [f, o, o]
Notice: Scope(Class[main]): filtered:  [f]
Notice: Scope(Class[main]): [1, 2, 3] is iterable: true
Notice: Scope(Class[main]): values:
Notice: Scope(Class[main]): 1
Notice: Scope(Class[main]): 2
Notice: Scope(Class[main]): 3
Notice: Scope(Class[main]): mapped:  [1, 2, 3]
Notice: Scope(Class[main]): filtered:  [1, 3]
Notice: Scope(Class[main]): {a => b, c => d, e => f} is iterable: true
Notice: Scope(Class[main]): values:
Notice: Scope(Class[main]): [a, b]
Notice: Scope(Class[main]): [c, d]
Notice: Scope(Class[main]): [e, f]
Notice: Scope(Class[main]): mapped:  [[a, b], [c, d], [e, f]]
Notice: Scope(Class[main]): filtered:  {c => d, e => f}
Notice: Scope(Class[main]): Integer is iterable: false
Notice: Scope(Class[main]): Integer[0, default] is iterable: false
Notice: Scope(Class[main]): Integer[default, 10] is iterable: false
Notice: Scope(Class[main]): Integer[0, 10] is iterable: true
Notice: Scope(Class[main]): values:
Notice: Scope(Class[main]): 0
Notice: Scope(Class[main]): 1
Notice: Scope(Class[main]): 2
Notice: Scope(Class[main]): 3
Notice: Scope(Class[main]): 4
Notice: Scope(Class[main]): 5
Notice: Scope(Class[main]): 6
Notice: Scope(Class[main]): 7
Notice: Scope(Class[main]): 8
Notice: Scope(Class[main]): 9
Notice: Scope(Class[main]): 10
Notice: Scope(Class[main]): mapped:  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Notice: Scope(Class[main]): filtered:  [0, 1, 2, 3, 4, 6, 7, 8, 9, 10]
Notice: Scope(Class[main]): Enum['bar', 'baz', 'foo'] is iterable: true
Notice: Scope(Class[main]): values:
Notice: Scope(Class[main]): bar
Notice: Scope(Class[main]): baz
Notice: Scope(Class[main]): foo
Notice: Scope(Class[main]): mapped:  [bar, baz, foo]
Notice: Scope(Class[main]): filtered:  [baz, foo]
Notice: Scope(Class[main]): Type is iterable: false
Notice: Scope(Class[main]): /foo/ is iterable: false
Notice: Compiled catalog for peterhu-osx in environment production in 0.10 seconds
Notice: Applied catalog in 1.09 seconds

Comment by Jo Rhett [ 2016/03/18 ]

Still no mention of these types on https://docs.puppetlabs.com/puppet/4.4/reference/lang_data.html

Comment by Henrik Lindberg [ 2016/03/18 ]

Ping Jorie Tappa - if there is a docs ticket, can it be linked here?

Generated at Thu Jan 23 00:32:22 PST 2020 using JIRA 7.7.1#77002-sha1:e75ca93d5574d9409c0630b81c894d9065296414.