Uploaded image for project: 'PuppetDB'
  1. PuppetDB
  2. PDB-4830

Big performance impact when upgrading to PuppetDB5/6

    XMLWordPrintable

Details

    • Bug
    • Status: Open
    • Normal
    • Resolution: Unresolved
    • None
    • None
    • PuppetDB
    • None
    • Needs Assessment

    Description

      Hi,

      We're preparing the upgrade of a rather big PuppetDB instance (~40k hosts) from PuppetDB4 to PuppetDB5.

      There's a type of query that we do rather often (directly to the API or via the PuppetDB query module when compiling catalogs) which is to select certnames for which a certain fact has a given value. The simplest way to reproduce this scenario is to issue something like:

      GET pdb/query/v4/facts/factname/factvalue
      

      In the 'old' PuppetDB world (<=PuppetDB4), where the fact values were normalised in the facts table we added an index like this:

      "facts_value_string_idx" btree (value_string)
      

      which for a query on the database the size above for nodes with the fact fqdn set  to node1.example.org performed pretty well thanks to the index:

      [
       { 
       "certname": "node1.example.org",
       "environment": "production",
       "name": "fqdn",
       "value": "node1.example.org"
       } 
      ]
      

      real 0m0.319s

      The index is used avoiding a full table scan and the performance is acceptable. We normally query for more complex fact combinations using several string fact values but for illustration purposes the above is good enough.

      However, in the new world (PuppetDB5 and beyond) it seems that the facts table is gone and now the fact values are in two columns in the factsets table (stable and volatile) of type jsonb.

      With the same HW setup, PostgreSQL configuration, OS configuration and a copy of the data (we're populating both instances at the same time using an extra submit_only_server_urls in the terminus), the query above produces the same results as expected but it is much slower:

      real 0m7.403s

      We could easily reproduce the (bad) ratio PDB4/PDB5 with queries of different complexities.

      Looking at the possible indices that this query could use we found this one:

      "idx_factsets_jsonb_merged" gin ((stable || volatile) jsonb_path_ops)
      

      However, taking a look at the generated SQL, PuppetDB seems to be using jsonb_extract_path() to then compare to the value being looked for. However, this approach cannot benefit from the index above.

      I have the whole query if needed but to illustrate the issue what I did was to analyze the query and extract the juiciest subquery that could show the problem which is:

      explain analyze SELECT certname FROM factsets WHERE jsonb_extract_path(stable||volatile, 'fqdn') = '"node1.example.org"'::jsonb;
       QUERY PLAN
      ------------------------------------------------------------------------------------------------------------------
       Seq Scan on factsets (cost=0.00..7777.63 rows=182 width=20) (actual time=2708.687..5292.499 rows=1 loops=1)
       Filter: (jsonb_extract_path((stable || volatile), VARIADIC '\{fqdn}'::text[]) = '"node1.example.org"'::jsonb)
       Rows Removed by Filter: 36492
       Execution Time: 5292.528 ms
      

      As you can see it's a full table scan. However, if the query constructed by PuppetDB used for instance the @> operator which can make use of the GIN index above, the query would be much faster:

      explain analyze SELECT certname FROM factsets WHERE stable||volatile @> '\{"fqdn": "node1.example.org"}';
       QUERY PLAN
      -------------------------------------------------------------------------------------------------------------------------------------
       Bitmap Heap Scan on factsets (cost=288.28..425.15 rows=36 width=20) (actual time=0.767..0.768 rows=1 loops=1)
       Recheck Cond: ((stable || volatile) @> '\{"fqdn": "node1.example.org"}'::jsonb)
       Heap Blocks: exact=1
       -> Bitmap Index Scan on idx_factsets_jsonb_merged (cost=0.00..288.27 rows=36 width=0) (actual time=0.521..0.521 rows=1 loops=1)
       Index Cond: ((stable || volatile) @> '\{"fqdn": "node1.example.org"}'::jsonb)
       Execution Time: 0.827 ms
      

      I haven't looked very deep into the code itself but it seems to be same in HEAD. No idea either how feasible it'd be to patch PuppetDB so a more efficient query was constructed.

      This is a huge blocker for us as we cannot upgrade our systems taking such a big performance penalty. This would create bottlenecks in PuppetDB making Puppet runs much slower, increasing the load on the system and surely leading to time-outs when requesting catalogs.

      Do you have any suggestion on how to work around this? I'm happy to provide full query plans, etc.

      Thanks.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              nbarrientos Nacho Barrientos
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:

                Zendesk Support