[PDB-3323] Checking if a report exists before inserting it is SLOW Created: 2017/03/07 Updated: 2018/07/23 Resolved: 2017/03/08
|Affects Version/s:||PDB 4.2.4, PDB 4.2.5, PDB 4.3.0, PDB 4.3.1|
|Fix Version/s:||PDB 220.127.116.11, PDB 4.3.2|
|Reporter:||Nick Walker||Assignee:||Russell Mull|
|Labels:||docs_reviewed, flanders-ss, maintenance, tcse|
|Remaining Estimate:||Not Specified|
|Time Spent:||Not Specified|
|Original Estimate:||Not Specified|
|Attachments:||Screen Shot 2017-03-07 at 10.45.00 AM.png|
|Epic Link:||System Team Escalations|
|Release Notes:||Known Issue|
|Release Notes Summary:|| PostgreSQL CPU usage increase
After upgrading to 2016.5.1 or 2016.5.2, PostgreSQL CPU usage increases by 50 percent. To workaround this issue, we've created the pdb_3323_workaround module.
1. On the Puppet master, download the [module](https://forge.puppet.com/puppetlabs/pdb_3323_workaround).
2. Classify the node hosting pe-postgresql with the `pdb_3323_workaround` class.
3. On the node hosting pe-postgresql, run Puppet.
After you install this module, it's safe to forget about. The index will be removed when you upgrade in the future.
|QA Risk Assessment:||Needs Assessment|
Working with a customer we noticed performance degradation after updating from PE 2016.4 to 2016.5.2.
Using PgBadger to analyze the PostgreSQL logs we can see that checking to see if a report already exists is taking longer than 5 seconds much more frequently than expected.
Researching on my own test install I can see that the query performs a Seq Scan on reports because the where clause doesn't match an index.
Adding an index on the hash column that isn't wrapped in a function causes the query plan to switch to using a Index Only Scan.
We saw that PuppetDB could only manage to work 5 commands / sec despite trying to bump command processing threads to 10 on a 8 core PuppetDB node.
CPU usage on the PostgreSQL node showed steady 40% use on 2016.4 and immediately increased to 60% after the upgrade to 2016.5. Not perfect science here but that's a 50% degradation in CPU use.
Adding the index should make the query faster, CREATE UNIQUE INDEX idx_reports_hash on reports(hash)
Otherwise we can simply stop checking if the report already exists before inserting it.
I think we should research the current index on the hash column and determine if queries are being designed to hit it or if queries are being constructed like the one above.
We hopefully shouldn't need two indexes on the same column but we should make it easy to construct queries without special knowledge of how to make the where clause.
|Comment by Russell Mull [ 2017/03/07 ]|
I can easily repro this on puppetdb 4.3.1; looks like it's limited to the insert check, fortunately. We should probably change the query to use the index we already have, to avoid a costly reindex migration.