[PDB-3323] Checking if a report exists before inserting it is SLOW Created: 2017/03/07  Updated: 2018/07/23  Resolved: 2017/03/08

Status: Closed
Project: PuppetDB
Component/s: None
Affects Version/s: PDB 4.2.4, PDB 4.2.5, PDB 4.3.0, PDB 4.3.1
Fix Version/s: PDB, PDB 4.3.2

Type: Bug Priority: Critical
Reporter: Nick Walker Assignee: Russell Mull
Resolution: Fixed Votes: 0
Labels: docs_reviewed, flanders-ss, maintenance, tcse
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2017-03-07 at 10.45.00 AM.png    
Issue Links:
relates to PDB-3231 A query in Everett is using too much cpu Closed
Epic Link: System Team Escalations
Team: Systems Engineering
Story Points: 2
Sprint: SE 2017-03-08
Release Notes: Known Issue
Release Notes Summary: PostgreSQL CPU usage increase

After upgrading to 2016.5.1 or 2016.5.2, PostgreSQL CPU usage increases by 50 percent. To workaround this issue, we've created the pdb_3323_workaround module.

1. On the Puppet master, download the [module](https://forge.puppet.com/puppetlabs/pdb_3323_workaround).
2. Classify the node hosting pe-postgresql with the `pdb_3323_workaround` class.
3. On the node hosting pe-postgresql, run Puppet.

After you install this module, it's safe to forget about. The index will be removed when you upgrade in the future.


QA Risk Assessment: Needs Assessment


Suggested workaround


The Problem

Working with a customer we noticed performance degradation after updating from PE 2016.4 to 2016.5.2.

Using PgBadger to analyze the PostgreSQL logs we can see that checking to see if a report already exists is taking longer than 5 seconds much more frequently than expected.

Researching on my own test install I can see that the query performs a Seq Scan on reports because the where clause doesn't match an index.

Adding an index on the hash column that isn't wrapped in a function causes the query plan to switch to using a Index Only Scan.

pe-puppetdb=# explain analyze select 1 from reports where hash = '\x7ad27615d5c7de439855b245e5eb6d7245e812f6';
                                            QUERY PLAN                                            
 Seq Scan on reports  (cost=0.00..39.31 rows=1 width=0) (actual time=0.012..0.105 rows=1 loops=1)
   Filter: (hash = '\x7ad27615d5c7de439855b245e5eb6d7245e812f6'::bytea)
   Rows Removed by Filter: 186
 Planning time: 0.142 ms
 Execution time: 0.133 ms
(5 rows)
pe-puppetdb=# create unique index test on reports(hash);
pe-puppetdb=# explain analyze select 1 from reports where hash = '\x7ad27615d5c7de439855b245e5eb6d7245e812f6';
                                                    QUERY PLAN                                                     
 Index Only Scan using test on reports  (cost=0.14..8.16 rows=1 width=0) (actual time=0.033..0.035 rows=1 loops=1)
   Index Cond: (hash = '\x7ad27615d5c7de439855b245e5eb6d7245e812f6'::bytea)
   Heap Fetches: 1
 Planning time: 0.346 ms
 Execution time: 0.062 ms
(5 rows)

Customer Impact / How it manifested

We saw that PuppetDB could only manage to work 5 commands / sec despite trying to bump command processing threads to 10 on a 8 core PuppetDB node.

CPU usage on the PostgreSQL node showed steady 40% use on 2016.4 and immediately increased to 60% after the upgrade to 2016.5. Not perfect science here but that's a 50% degradation in CPU use.

Possible Solutions

Adding the index should make the query faster, CREATE UNIQUE INDEX idx_reports_hash on reports(hash)

Otherwise we can simply stop checking if the report already exists before inserting it.

Additional Thoughts

I think we should research the current index on the hash column and determine if queries are being designed to hit it or if queries are being constructed like the one above.

We hopefully shouldn't need two indexes on the same column but we should make it easy to construct queries without special knowledge of how to make the where clause.

Comment by Russell Mull [ 2017/03/07 ]

I can easily repro this on puppetdb 4.3.1; looks like it's limited to the insert check, fortunately. We should probably change the query to use the index we already have, to avoid a costly reindex migration.

Generated at Sat Aug 08 17:53:09 PDT 2020 using Jira 8.5.2#805002-sha1:a66f9354b9e12ac788984e5d84669c903a370049.