[PUP-1640] Provide agnostic mechanism for Hiera Based Data in Modules Created: 2014/02/11  Updated: 2019/04/04  Resolved: 2015/03/27

Status: Closed
Project: Puppet
Component/s: Docs
Affects Version/s: None
Fix Version/s: PUP 4.0.0

Type: New Feature Priority: Normal
Reporter: Henrik Lindberg Assignee: Unassigned
Resolution: Fixed Votes: 9
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
blocks PUP-1157 puppet should support data in modules Closed
is blocked by PUP-2733 Puppet squats on the PuppetX::Puppetl... Closed
is blocked by PUP-3900 Data providers cannot be added in mod... Closed
relates to PUP-1157 puppet should support data in modules Closed
Epic Link: Data in Modules
Story Points: 2
Sprint: Platform Server 2014-11-26, Platform Server 2014-12-17, Language 2015-01-21, Language 2015-02-04
QA Contact: Eric Thompson


A mechanism is wanted that allows support for hiera based data in modules such that:

  • A module author that wants this feature makes the module depend on the
    hiera - 'data-in-modules' module.
  • The use of 'hiera-data-in-modules' does not monopolize how data/lookup and injection is performed at a user site
  • The solution does not make the now available 'hiera-data-in-modules' the only possible implementation.

The current implementation of 'hiera-data-in-modules' performs monkey patching to wire the support for a new type of hiera backend. This was required to make the logic kick in early enough.

The idea is to use the binder/injector since it kicks in early. Two things can be bound:

  • HieraConfigManipulator - allows multiple manipulators to be bound, each is called with a hash containing the hiera.yaml config and returns a (possibly) manipulated config file.
  • HiraConfigOrganizer - is responsible for determining the order in which the HieraConfigManipulators are called.

Initially there will be an implementation of the HieraConfigOrganizer that simply calls the manipulators in an undefined order (this because there is currently only one known implementation of hiera manipulator). If a defined order is wanted, it should be possible to bind an array with names of the manipulators to define the order in which they are called.

This design has been discussed with R.I.Pienaar

Idea in a bit more detail:

class HieraConfigManipulator
  # @param [Hash] hiera_config_hash
  # @returns [Hash] - a manipulated hiera config hash
  def manipulate(hiera_config_hash)
    # ...
class HieraConfigOrganizer
   def organize(*config_manipulators)
     order =injector.lookup('Array[String]', 'hiera::manipulators::order', [])
     manipulators_hash = lookup('Hash[Ruby[HieraConfigManipulator]]', 'hiera::manipulator')
     # process manipulators

The module that provides a manipulator needs to make a binding - i.e. roughly do this:

Puppet::Bindings.newbindings('hiera_data_in_modules::default') do
  bind {
    name 'hiera_data_in_modules'
    to_instance 'Puppetx::MyModule::MyManipulator'

The code above is just a rough idea of how this is done, details to be worked out while implementing.

The idea is to make this available in 3.6.0 when using - -parser future --binder and that it becomes standard in Puppet 4.

Comment by Henrik Lindberg [ 2014/02/11 ]

ping Eric Sorenson - any additional input / comments ?

Comment by R.I.Pienaar [ 2014/02/11 ]

Hmm, I think I missed in our discussion that you wanted to limit the 3.6.x inclusion of this only to those who have --parser=future. Do not think that would be that useful

Any chance we can come up with something that will work for everyone? I know this is old code and stuff, but for it to be useful it needs to work for everyone not just those willing to enable the experimental parser

Comment by Jan Ă–rnstedt [ 2014/02/11 ]

I support R.I.Pienaar that it is needed even when running the old parser. Publish modules depending on the future parser will limit the collaboration on modules.

Comment by Henrik Lindberg [ 2014/02/11 ]

Oops, my mistake, the real requirement is not on --parser future but on rgen being installed since the bindings require this, and it has to be an opt in since
making rgen required breaks people. Turning on --binder has the same effect as --parser future in that respect. (To be clear, using --parser future is not

Is it ok to require that users turn on --binder and that they should have Rgen installed? (That is something I think we discussed, and then it sounded like it was ok).

Comment by R.I.Pienaar [ 2014/02/11 ]

Sure - in hindsight the mere fact that you're talking about bindings and such should have been hint enough to me that this would be a requirement.

The way I think of it, for this to be useful we might have a feature in the module system that says 'requires puppet version x' we're unlikely to have a feature that says 'requires puppet version x with y flag enabled'. Similarly if people want to rely on this feature in their module, the first statement is one they could acceptably document but the 2nd becomes too hard to really maintain the feeling of having written a reusable module.

Comment by Henrik Lindberg [ 2014/02/11 ]

I read that as dependency on rgen gem is ok, but not adding the --binder flag. Is that what you meant?

Will discuss with others if it is ok to automatically turn on binder if the rgen gem is available. That would be the simplest solution. I want to avoid having to invent yet another setting/ early boot config thing that is special purpose.

Comment by R.I.Pienaar [ 2014/02/11 ]

Hmm, can make the same argument 'must have puppet version x with gem y' isn't that different from 'must have puppet version x with flag y' (since that flag requires the gem).

I get why you're reluctant, just saying the desired user experience is that for a certain version of puppet it just works

Comment by Henrik Lindberg [ 2014/02/11 ]

The use of rgen can probably be expressed as a required feature - not sure if that can be expressed in modules? There are other such cases in the indirector where certain formats are only available if an extra gem is installed.

I do understand the desired user experience - I also like when things just works , and I am not dead-set on this being done one way or the other. Lets just talk about this until we have a decent solution...

Comment by R.I.Pienaar [ 2014/02/11 ]

Kewl, thanks. When I was looking at ways to plug into this I found several existing plugin systems. One seems to be in place to allow the PE CA cert count checks to function. It was tantalisingly close to what I needed but could do with just a little bit more - iirc I think if it just looked in the module libdirs rather than just the normal ruby libdirs for a named file it would have worked fine.

Could we add a slight enhancement to that for 3.x nodes and I drop one file in that patches hiera like I do today. Once 4 is out we do a proper thing as you proposed and then I add hooks in for that in the module. This way the fix is small to a existing bit of code and it has a natural EOL built into it as it would be 3.x only?

Now I am not 100% up to scratch with that plugin system and the code paths there but it's an option that might be worth exploring?

Comment by Henrik Lindberg [ 2014/02/11 ]

We are working on the overall boot of the puppet runtime and towards having only one plugin mechanism. We introduced something known as the Context and it has a simplified kind of injector in the 3.x branch. I think I rather make use of that and more explicitly do what the binder would do - at least with respect to the class that gets the configuration. The parts of the binder that deals with loading can probably still be used as that is not tied to Rgen IIRC.

That way we would replace the explicit and very specific call to say load one manipulator with a more general solution. The module with the "hiera data in modules" will have to change for 4.0 as the bindings then needs to be there. I would probably implement the proposed solution based on the binder first - and then make it work without the binder being present. (The binder based implementation is trivial - is is pretty much just the code that I posted as a rough outline).

Comment by R.I.Pienaar [ 2014/02/11 ]

ok- not aware of the context stuff but this sounds like the right direction

Comment by Henrik Lindberg [ 2014/11/23 ]

I started on an implementation of what I first suggested, but I soon realized that the solution was still both hiera specific, and very specific to the particular way that the "hiera data in modules" module had to be implemented due to the way the resource implementation looks up the one and only hiera to get default data. Doing the implementation that way has several problems (cannot support directory environments well, and they are standard in Puppet 4.0.0), there are issues with file watching (not evicting caches for environments that come and go) etc.

The implementation I came up with is agnostic, and it clearly separates the older API (left unchanged) with an extendable/plugin mechanism to lookup data in an environment, and in modules. A write up of this implementation is found here: https://docs.google.com/a/puppetlabs.com/document/d/1N5xnmhrC4v0EqXaxjneAzVjNvlAZA31O0y6KRLym7bE/edit#

I believe that it should be possible to use this very simple API to implement a data provider for hiera both for environments and modules, but I realize that the hiera config manipulation API may still be needed due to hiera's singleton nature. It would be a positive thing to not have to do all that hiera patching - which may be possible since the new API differentiates between the global singleton hiera, a call to lookup for the environment, and a call to lookup in a module, and these providers can have their own lifecycle and associate caches with either the environment, or the compiler (depending on what is being cached).

The work is available on branches as noted in the document - for people to play with and give feedback.

Comment by Henrik Lindberg [ 2014/11/25 ]

PR is now ready. (The main branch as mentioned in the document linked to this ticket has been rebased and has a rewritten history).

Comment by Damon Atkins [ 2014/11/30 ]

We just want a function which looks up a name file in a module data dir. E.g. heira_local_data (relative file name, key). It does not have to do bindings for params. E.g.heira_local_data($osfamily,'packagename')

Comment by Henrik Lindberg [ 2014/11/30 ]

Damon Atkins That is something slightly different. With the mechanism made available for this ticket, you can just use the data function directly - it returns a hash. If you really want to have a local hiera, you need to wait for the hiera module-data plugin support and that a function like you suggest is included.

If you want to try out the function support I can show you an example - do you have a small sample of what you want to do that I can base the example on?

Comment by Hailee Kenney [ 2014/12/02 ]

Merged into master in 74c07fb

Comment by Kurt Wall [ 2015/01/14 ]

Hi, Eric. This is ready for your review as to suitability and functionality.

Comment by Henrik Lindberg [ 2015/01/14 ]

I promised R.I.Pienaar to write a module with sample providers to have something to look at when implementing support for hiera. Do not expect to publish that on to forge, but will link repo to this ticket when I have one.

Comment by Henrik Lindberg [ 2015/01/21 ]

I started working on a sample, and ran into issues with delivering data providers in a separate module. Will log tickets for those. Currently it is possible to review the built in support though, but the feature as such is not ready for release due to the problems I found.

Comment by R.I.Pienaar [ 2015/01/21 ]

Glad it was not just me, was going to catch up with you in Ghent to work around what I tried, but I guess there's a bigger problem thanks Henrik Lindberg

Comment by Henrik Lindberg [ 2015/02/18 ]

Nicholas Fagerlund Added links to blog posts.

Comment by Rajasree Talla [ 2015/03/27 ]

Moving back to resolved- It was changed to closed by mistake

Generated at Sat Feb 29 06:20:38 PST 2020 using Jira 8.5.2#805002-sha1:a66f9354b9e12ac788984e5d84669c903a370049.