[PUP-5934] Updated fact values should be submitted after each Puppet run Created: 2016/02/19  Updated: 2019/11/26  Resolved: 2019/11/14

Status: Resolved
Project: Puppet
Component/s: None
Affects Version/s: None
Fix Version/s: PUP 5.5.18, PUP 6.4.5, PUP 6.11.0

Type: New Feature Priority: Normal
Reporter: Reid Vandewiele Assignee: Josh Cooper
Resolution: Fixed Votes: 5
Labels: resolved-issue-added
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates
relates to PUP-6040 "puppet facts" outputs indirector obj... Ready for Engineering
relates to PUP-7779 Re-implement `puppet facts upload` an... Closed
Support
Template:
Epic Link: Improve Agent Lifecycle
Team: Coremunity
Sprint: Platform Core KANBAN
Release Notes: New Feature
Release Notes Summary: Puppet submits facts when requesting a catalog, but if the agent modifies the system while applying the catalog, then the facts in puppetdb won't be refreshed until the agent runs again, which may be 30 minutes (or however runinterval is configured). This feature makes it possible to submit facts again at the end of the agent's run, after the catalog has been applied. To enable this feature, set "resubmit_facts=true" in the agent's puppet.conf. Note doing so will double the fact submission load on puppetdb, since each agent will submit facts twice per run. By default the feature is disabled.

 Description   

Today it is the case that facts are generated prior to a Puppet run, then Puppet makes changes to the system. Because changes are made, it may be the case that the true value of a fact for a system will have changed after the run, but it will not be updated in PuppetDB until the run interval elapses (typically 30 minutes) and the next run begins.

It is becoming common for PuppetDB queries to be used to select nodes based on fact values, for grouping and reporting. In addition, modules like Erik Dalén's puppetdbquery allow the master to build catalogs for node A based on facts about node B. The longer the time window during which node B's PuppetDB facts are out sync with the node's real fact values, the longer it takes for the infrastructure as a whole to converge on a correct, consistent configuration.

Additionally, the intuitive mental model of reading Puppet reports and fact lists in the console is "Puppet submitted a report at time X, therefore I know that the facts shown are valid as of time X". However, this is not true today as described above.

A common customer assumed use case for facts is being able to query facts in real time to validate information about a server. Because of today's limitations, querying facts does not provide real-time data. At best, the data provided is lagging real-time by up to run_interval on each node.

Feature Request

In addition to submitting facts at the beginning of each run, Puppet should submit updated fact values at the end of each run along with the report.



 Comments   
Comment by Henrik Lindberg [ 2016/02/19 ]

Ping Kylo Ginsberg

Comment by Kylo Ginsberg [ 2016/02/19 ]

Hmm, this may warrant a chat to get right.

If anything I've thought about decoupling fact submission from agent runs entirely, and ideally making them submitted at different rates depending on their volaitility. In part that idea is inspired by concerns like the one you raise here:

The longer the time window during which node B's PuppetDB facts are out sync with the node's real fact values, the longer it takes for the infrastructure as a whole to converge on a correct, consistent configuration.

In any event, re the solution requested:

Puppet should submit updated fact values at the end of each run along with the report.

The facts at the time the report was submitted seems kind of meaningless. Those fact values weren't used for anything and so don't correspond to anything else, including the report. What you're really asking for is very current facts. I think that's what we should aim for, not piggybacking the facts on the last known thing the agent did.

Comment by Reid Vandewiele [ 2016/02/19 ]

The facts at the time the report was submitted seems kind of meaningless. Those fact values weren't used for anything and so don't correspond to anything else, including the report. What you're really asking for is very current facts. I think that's what we should aim for, not piggybacking the facts on the last known thing the agent did.

Yes, I think that's accurate. The suggestion of report submission time was born out of an imagined implementation, and still might make a good first "all facts must be up to date when this event occurs no matter what" milestone. That said, if there were some magical way of just always having up-to-date facts instead, that would totally solve the problem statement.

Comment by Kylo Ginsberg [ 2016/02/29 ]

Email thread on puppet-users group requesting the same. It dates from 2013, but got a me-too in early 2016: https://groups.google.com/d/msg/puppet-users/z8VijBU3oH4/cxKSti4gVeoJ

Comment by Verne Lindner [ 2016/05/10 ]

Kylo Ginsberg Are there any near-term plans to address this ticket? I'm wondering if we need to think about labelling in the PE GUI for nodes to indicate the last time their facts were updated, or should we wait for the solution you propose in comments above.

Comment by Kylo Ginsberg [ 2016/07/20 ]

Verne Lindner no near-term plans to address this, so a label in the PE GUI with the last update time makes sense to me.

Also, /cc Eric Sorenson who's been thinking some about async fact submission and the like.

Comment by Henrik Lindberg [ 2017/01/25 ]

Got one idea around this set of functionality...

What if the agent ran facter on a regular interval and compared the set of facts against the last run. In the simplest case, if facts differ, initiate a scheduled run, or just send them to the server. (You would obviously always filter out ever changing facts before computing if there is a diff).

Comment by Reid Vandewiele [ 2017/01/25 ]

While real-time is the shining ideal, real-world use cases usually correspond pretty directly to Puppet reports. The impression customers usually have is that when looking at a node detail page, for a node that last reported at timestamp, the facts shown are representative of the enforced state of that node at that time. That is, the fact values at the end of a run.

Aiming for real-time as a first iteration seems like a perfect is the enemy of good scenario, and a diminished return.

Comment by Eric Sorenson [ 2017/01/30 ]

Can users who are interested in this run puppet facts upload as a post-run command? I've linked this ticket to another one regarding puppet facts, which will become more relevant as cached catalogs become a common thing.

Comment by Nick Walker [ 2017/01/31 ]

Eric Sorenson puppet facts upload no longer exists. In past versions of puppet the agent would upload it's facts to the master before compiling a catalog and then those facts were used to compile the catalog. Now you have to submit facts with a request to compile a catalog and there is no way to submit facts outside of the request to compile a catalog ( or so that is my understanding).

Comment by Eric Sorenson [ 2017/01/31 ]

Well, that sucks.

[root@cloudline repros]# puppet help facts save
 
USAGE: puppet facts save [--terminus TERMINUS] [--extra HASH] <key>
 
API only: create or overwrite an object. As the Faces framework does not
currently accept data from STDIN, save actions cannot currently be invoked
from the command line.

Comment by Reid Vandewiele [ 2017/01/31 ]

Not Eric Sorenson my understanding is there is no longer an API endpoint on puppetserver to accept facts independent of a catalog request, even if you were inclined to modify the face or agent code to send them.

Comment by Nick Walker [ 2017/01/31 ]

Yes, one possible way to achieve this would be to allow agents access to update their own facts and only their own facts directly via the puppetdb API. However, I'm not sure that's desirable in more locked down architectures and it may still make sense to have an API on the master that forwards the facts to PuppetDB.

In addition to submitting facts after a puppet run, you should be able to submit facts on whatever interval you prefer.

Comment by Verne Lindner [ 2017/02/06 ]

Reid Vandewiele If, in the console, we were to report on fact submission time, with report time, do you believe that would provide any value to users?

Comment by Reid Vandewiele [ 2017/02/06 ]

Verne Lindner in that it would help clarify to users what was happening, I think yes, there is a small potential value there. However, it's pretty small. The timestamps change all the time, and it's a bit of a stretch for people to read the timestamp and realize that the facts are always lagging the report, and then further connect the facts are older than the report and could have changed.

There would be far more value in making the system match people's intuition about how it should work. How it should work is simple. How it does work is very unintuitive.

Comment by Verne Lindner [ 2017/05/25 ]

Henrik Lindberg Just noticed a "triaged" label was applied to this ticket. Does that mean it's heading into a backlog for grooming?

Comment by Verne Lindner [ 2017/06/16 ]

Henrik Lindberg In reference to PUP-5934, do you have an idea of when that ticket might be groomed? I noticed it going into Triage a while back. For PE Console reporting, this issue impacts accuracy.

Comment by Henrik Lindberg [ 2017/06/17 ]

The fact that it was "triaged" and not closed only means that this ticket was considered to be a valid ticket.

Comment by Reid Vandewiele [ 2017/07/07 ]

Verne Lindner this came up in discussion on the Professional Services channel today in the context of looking for docs that describe Facter at a high level. Gist of the conversation was our docs do a decent job of describing WHAT Facter is, and HOW Facter works, but don't really touch on WHY Facter is useful. Some suggested "Why do we have Facter" were:

  1. Facter allows Puppet manifest writers to create dynamic configurations that adapt to machine-level specifics automatically
  2. Facter allows adminstrators to group nodes together by business purpose, application type, SDLC level, or other characteristic facts
  3. Facter allows users to create queries and node lists grouped by software licensing status, installed package versions, etc.

Today, Facter excels only at that first use case because of this ticket. The second and third use cases are as compelling, if not more, but realizing use cases #2 and #3 is at a real-world disadvantage because the accuracy of information available lags 30 minutes behind reality.

This ticket might well be re-titled "Promote Facter from a supporting role into a first-class feature" which fully supports all use cases ascribed to it, and not just use case #1 above.

Comment by Eric Sorenson [ 2017/07/14 ]

I'm closing this in favor of what seems to be the correct path forward: a re-implementation of the "puppet facts upload" functionality which we erroneously removed in Puppet 4. See PUP-7779 for the details.

Comment by Nick Walker [ 2017/07/15 ]

Eric Sorenson Are you saying that Puppet will never automatically submit facts after an agent run and it will always be up to the user to configure that themselves if they want to?

My suggestion is that this ticket should be implemented on top of PUP-7779. I think Puppet should have the ability to natively submit facts before and after a puppet run and not require the user to setup `puppet facts upload` in a post_run command.

Comment by Reid Vandewiele [ 2017/07/17 ]

+1 to Nick Walker's point. I am strongly in favor of PUP-7779 and think we need it. This ticket most naturally then becomes an improvement request to be built on top of PUP-7779. PUP-7779 is not an alternate solution; it's a prerequisite.

If "won't fix" is still correct for this ticket, can you please provide more background information about the decision?

Comment by Jon Pugh [ 2017/07/21 ]

+1 to the points made by Nick and Reid. As a customer I find the Puppet Console display of facts as they were before the last run to be extremely un-intuitive and find myself explaining to new team members that the reason they are seeing old values for facts they expected to be updated or created by a puppet run is due to this. A "before and after" view would be ideal - at the moment the "after" view only appears after the next run - some people end up doing a second run just to see the facts change.
Make it a parameter on the agent if you want customers to actively acknowledge the behaviour perhaps?

We also find it devalues the use of fact queries via PuppetDB significantly though I acknowledge the points made about "very current facts" being the underlying need.

Comment by Michael Smith [ 2017/08/24 ]

Submitting facts as part of the report seems like intuitive behavior to me.

Comment by Jeff Sparrow [ 2017/10/11 ]

1 have wanted this for 5 years

Comment by William Rodriguez [ 2018/03/30 ]

I'd love to see this, especially with PUP-7779. We have many use-cases where having facts updated faster would be a great value add for us. For example, with the package inventory, we don't see packages being updated until the next time puppet runs, which slows down remediation efforts. A simple flag to have the new facts upload face called after execution would be nice and we'd probably set it across the board. I'm pretty sure I've actually commented before to Nick Walker my thoughts on this, well before I knew this ticket even existed.

Comment by Verne Lindner [ 2018/03/30 ]

William Rodriguez

If a user creates a task in the console to update a package version, then runs a job, then checks their console Packages inventory, will they still see the old package version appear?

 

Comment by William Rodriguez [ 2018/04/02 ]

Ok, that was a bad example. I realize now that they actually are updating if updated via package inventory. The situation I was thinking of was a situation where we had updated the puppet-agent package across the fleet via a puppet agent run, using the puppet-agent module not a task, and it took us the length of an agent run to see the changes reflected across the fleet. I had forgotten that the package task actually does update inventory once a remediation action is taken. Sorry for scaring you Verne Lindner

Comment by Reid Vandewiele [ 2018/04/02 ]

Verne Lindner that's correct. When a user runs a task in the console to update a package version, the Packages inventory will fail to display the correct package version(s) until after the next Puppet run begins.

Interestingly, the Puppet run doesn't have to finish for the new package version to show up in the package inventory—it only has to start. Facts, including package inventory, are updated at the beginning of a Puppet run.

Comment by Verne Lindner [ 2018/04/02 ]

William Rodriguez Thanks for the clarification!

Ryan Coleman How do you feel about re-opening this ticket as an OPTY?

Not seeing updated facts after a run remains a source of confusion for users; as we promote use of tasks in the console, it seems even more important for users to see what they expect to see: create task > run job > see results (and, in some cases, report results), without having to know that an additional run is required if those results include a fact value change.

Comment by Charlie Sharpsteen [ 2018/07/20 ]

Now that PUP-7779 is in, I think we could do this in a fairly straight-forward manner by adding a report_facts setting that would cause the Puppet Agent to load and submit a new factset to `/puppet/v3/facts` at the end of the run before the report is sent to `/puppet/v3/report`.

This setting should default to false as the additional fact submission would represent a 33% increase in the datasets sent to PuppetDB on each run from facts, catalog, report to facts, catalog, facts, report. For most folks, PuppetDB will handle that just fine — but at large scale it is a big enough increase that has to be enabled deliberately.

We could follow the naive implementation above with something more sophisticated that computes the diff between facts loaded at the beginning of the run and those loaded at the end and attaches it to the report. That would be much more efficient as both the amount of data and number of processing operations would be reduced. However, this would require changes to the report format and PuppetDB and thus is a much bigger lift to accomplish.

Comment by Greg Dubicki [ 2018/08/11 ]

Big +1 for Charlie Sharpsteen's proposal. I can try to create a PR for this if this idea is accepted.

Comment by Eric Sorenson [ 2018/08/14 ]

Greg Dubicki that'd be awesome!

Comment by Charlie Sharpsteen [ 2019/08/09 ]

PR up with an implementation: https://github.com/puppetlabs/puppet/pull/7666

Targeted Puppet 5.5.z as this feels like something that will be useful to folks on PE 2018.1.

Comment by Josh Cooper [ 2019/11/12 ]

Merged to master in https://github.com/puppetlabs/puppet/commit/4f448a3e3149b9d0ce7fae6d4eac0fb5d6cb0074. This will be backported to 5.5.x in a future PR.

Comment by Josh Cooper [ 2019/11/13 ]

Backported to 5.5.x in https://github.com/puppetlabs/puppet/commit/3e6abbc836f69fcd08289c89bb765d02e5519f18. It will be released in 5.5.18, 6.4.5, 6.11.0

Comment by Adam Buxton [ 2019/11/26 ]

Charlie Sharpsteen add it to the quiver of useful `PS made this` https://forge.puppet.com/laura/puppet_agent_settings

Generated at Sun Feb 16 16:17:04 PST 2020 using JIRA 7.7.1#77002-sha1:e75ca93d5574d9409c0630b81c894d9065296414.