Affects Version/s: None
Fix Version/s: None
Method Found:Needs Assessment
QA Risk Assessment:Needs Assessment
We're preparing the upgrade of a rather big PuppetDB instance (~40k hosts) from PuppetDB4 to PuppetDB5.
There's a type of query that we do rather often (directly to the API or via the PuppetDB query module when compiling catalogs) which is to select certnames for which a certain fact has a given value. The simplest way to reproduce this scenario is to issue something like:
In the 'old' PuppetDB world (<=PuppetDB4), where the fact values were normalised in the facts table we added an index like this:
which for a query on the database the size above for nodes with the fact fqdn set to node1.example.org performed pretty well thanks to the index:
The index is used avoiding a full table scan and the performance is acceptable. We normally query for more complex fact combinations using several string fact values but for illustration purposes the above is good enough.
However, in the new world (PuppetDB5 and beyond) it seems that the facts table is gone and now the fact values are in two columns in the factsets table (stable and volatile) of type jsonb.
With the same HW setup, PostgreSQL configuration, OS configuration and a copy of the data (we're populating both instances at the same time using an extra submit_only_server_urls in the terminus), the query above produces the same results as expected but it is much slower:
We could easily reproduce the (bad) ratio PDB4/PDB5 with queries of different complexities.
Looking at the possible indices that this query could use we found this one:
However, taking a look at the generated SQL, PuppetDB seems to be using jsonb_extract_path() to then compare to the value being looked for. However, this approach cannot benefit from the index above.
I have the whole query if needed but to illustrate the issue what I did was to analyze the query and extract the juiciest subquery that could show the problem which is:
As you can see it's a full table scan. However, if the query constructed by PuppetDB used for instance the @> operator which can make use of the GIN index above, the query would be much faster:
I haven't looked very deep into the code itself but it seems to be same in HEAD. No idea either how feasible it'd be to patch PuppetDB so a more efficient query was constructed.
This is a huge blocker for us as we cannot upgrade our systems taking such a big performance penalty. This would create bottlenecks in PuppetDB making Puppet runs much slower, increasing the load on the system and surely leading to time-outs when requesting catalogs.
Do you have any suggestion on how to work around this? I'm happy to provide full query plans, etc.