Details
-
Bug
-
Status: Closed
-
Normal
-
Resolution: Duplicate
-
PDB 6.20.0, PDB 7.9.0
-
None
-
None
-
HA
-
HA 2022-02-16, HA 2022-03-02
-
Customer Feedback
-
Needs Assessment
Description
This is a pretty vague report, I know, but I file it anyhow.
I'm running PuppetDB (and puppetserver and puppetexplorer) as a docker container, managed via pupperware.
I just upgraded the PuppetDB container from v7.8.0 to v7.9.0.
After that, querys from PuppetExplorer are really slow. Sometimes they even timeout.
All the cells in the PuppetExplorer dashboard usually update quickly (less than 1 second) for the last year during various PuppetDB 7.x.x versions.
Now when I reload the dashboard, a few cells immediately show a value while the others show the rotating animation, waiting for a response. After 40 seconds they finally show a value (and sometimes is times out completely and an error page is displayed).
In the PuppetDB container I see no warnings in the log.
In the PuppetExplorer container I see no warnings if there is a response (after ~40s), otherwise it logs this (and my browser show a red puppetexplorer timeout page):
puppetexplorer_1 | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
|
puppetexplorer_1 | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
|
puppetexplorer_1 | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
|
puppetexplorer_1 | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
|
puppetexplorer_1 | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
|
puppetexplorer_1 | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
|
puppetexplorer_1 | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
|
puppetexplorer_1 | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
|
puppetexplorer_1 | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
|
puppetexplorer_1 | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
|
puppetexplorer_1 | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
|
I'm running 'top' on the docker host.
I reload the PuppetExplorer dashboard (and it sends a bunch of querys to PDB).
In 'top' I can see how ca 20 postgres processes appear and each use 10% CPU.
The processes are present for ~40 seconds and then disappear. At the same time my browser finally show values in all dashboard cells.
So, my guess is that something in those postgres threads take 40 seconds before they finally return.
The question is what.
(Let me know if I can help debugging this somehow)
Meanwhile, all puppet agents speak to the puppetserver as usual - no increase in the time to apply the catalog, and the server replaces the facts and stores the reports to PuppetDB without problem.
In https://puppet.com/docs/puppetdb/7/release_notes.html#puppetdb-790 I see:
"Improved performance of the fact-contents endpoint. Testing against a database of 10,000 mocked nodes, there was an observed 84% decrease in time taken to complete a difficult query. https://tickets.puppetlabs.com/browse/PDB-5259"
Could it be this change that degraded the performance for me?
Possibly a bug in some library that distributes multiple querys over a pool of postgres threads?
Update:
I really think it is the changes in PDB-5259 that is the culprit.
I looked at all the cells in the PuppetExplorer dashboard, and all the cells that do not query a fact are updated immediately (like number of nodes in Production env, nodes that has not sent a report in the last 24h).
But all the cells that query a fact (like: apt_reboot_required=true) all take 40 seconds before they update.