Uploaded image for project: 'PuppetDB'
  1. PuppetDB
  2. PDB-5450

Slow fact queries after PuppetDB 7.9.0 upgrade when using Postgres 12 with jit enabled

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Normal
    • Resolution: Duplicate
    • PDB 6.20.0, PDB 7.9.0
    • None
    • PuppetDB
    • None
    • HA
    • HA 2022-02-16, HA 2022-03-02
    • Customer Feedback
    • Needs Assessment

    Description

      This is a pretty vague report, I know, but I file it anyhow.

      I'm running PuppetDB (and puppetserver and puppetexplorer) as a docker container, managed via pupperware.
      I just upgraded the PuppetDB container from v7.8.0 to v7.9.0.
      After that, querys from PuppetExplorer are really slow. Sometimes they even timeout.

      All the cells in the PuppetExplorer dashboard usually update quickly (less than 1 second) for the last year during various PuppetDB 7.x.x versions.
      Now when I reload the dashboard, a few cells immediately show a value while the others show the rotating animation, waiting for a response. After 40 seconds they finally show a value (and sometimes is times out completely and an error page is displayed).

      In the PuppetDB container I see no warnings in the log.
      In the PuppetExplorer container I see no warnings if there is a response (after ~40s), otherwise it logs this (and my browser show a red puppetexplorer timeout page):

      puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
      puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
      puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
      puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
      puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
      puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
      puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
      puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
      puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
      puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
      puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
      

      I'm running 'top' on the docker host.
      I reload the PuppetExplorer dashboard (and it sends a bunch of querys to PDB).
      In 'top' I can see how ca 20 postgres processes appear and each use 10% CPU.
      The processes are present for ~40 seconds and then disappear. At the same time my browser finally show values in all dashboard cells.

      So, my guess is that something in those postgres threads take 40 seconds before they finally return.
      The question is what.
      (Let me know if I can help debugging this somehow)

      Meanwhile, all puppet agents speak to the puppetserver as usual - no increase in the time to apply the catalog, and the server replaces the facts and stores the reports to PuppetDB without problem.

      In https://puppet.com/docs/puppetdb/7/release_notes.html#puppetdb-790 I see:
      "Improved performance of the fact-contents endpoint. Testing against a database of 10,000 mocked nodes, there was an observed 84% decrease in time taken to complete a difficult query. https://tickets.puppetlabs.com/browse/PDB-5259"
      Could it be this change that degraded the performance for me?
      Possibly a bug in some library that distributes multiple querys over a pool of postgres threads?

      Update:
      I really think it is the changes in PDB-5259 that is the culprit.
      I looked at all the cells in the PuppetExplorer dashboard, and all the cells that do not query a fact are updated immediately (like number of nodes in Production env, nodes that has not sent a report in the last 24h).
      But all the cells that query a fact (like: apt_reboot_required=true) all take 40 seconds before they update.

      Attachments

        Issue Links

          Activity

            People

              rob.browning Rob Browning
              elof Elof Ofel
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Zendesk Support