Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
PDB 4.2.4, PDB 4.2.5, PDB 4.3.0, PDB 4.3.1
Description
Suggested workaround
https://forge.puppet.com/puppetlabs/pdb_3323_workaround
The Problem
Working with a customer we noticed performance degradation after updating from PE 2016.4 to 2016.5.2.
Using PgBadger to analyze the PostgreSQL logs we can see that checking to see if a report already exists is taking longer than 5 seconds much more frequently than expected.
Researching on my own test install I can see that the query performs a Seq Scan on reports because the where clause doesn't match an index.
Adding an index on the hash column that isn't wrapped in a function causes the query plan to switch to using a Index Only Scan.
pe-puppetdb=# explain analyze select 1 from reports where hash = '\x7ad27615d5c7de439855b245e5eb6d7245e812f6';
|
QUERY PLAN
|
--------------------------------------------------------------------------------------------------
|
Seq Scan on reports (cost=0.00..39.31 rows=1 width=0) (actual time=0.012..0.105 rows=1 loops=1)
|
Filter: (hash = '\x7ad27615d5c7de439855b245e5eb6d7245e812f6'::bytea)
|
Rows Removed by Filter: 186
|
Planning time: 0.142 ms
|
Execution time: 0.133 ms
|
(5 rows)
|
|
pe-puppetdb=# create unique index test on reports(hash);
|
CREATE INDEX
|
pe-puppetdb=# explain analyze select 1 from reports where hash = '\x7ad27615d5c7de439855b245e5eb6d7245e812f6';
|
QUERY PLAN
|
-------------------------------------------------------------------------------------------------------------------
|
Index Only Scan using test on reports (cost=0.14..8.16 rows=1 width=0) (actual time=0.033..0.035 rows=1 loops=1)
|
Index Cond: (hash = '\x7ad27615d5c7de439855b245e5eb6d7245e812f6'::bytea)
|
Heap Fetches: 1
|
Planning time: 0.346 ms
|
Execution time: 0.062 ms
|
(5 rows)
|
Customer Impact / How it manifested
We saw that PuppetDB could only manage to work 5 commands / sec despite trying to bump command processing threads to 10 on a 8 core PuppetDB node.
CPU usage on the PostgreSQL node showed steady 40% use on 2016.4 and immediately increased to 60% after the upgrade to 2016.5. Not perfect science here but that's a 50% degradation in CPU use.
Possible Solutions
Adding the index should make the query faster, CREATE UNIQUE INDEX idx_reports_hash on reports(hash)
Otherwise we can simply stop checking if the report already exists before inserting it.
Additional Thoughts
I think we should research the current index on the hash column and determine if queries are being designed to hit it or if queries are being constructed like the one above.
We hopefully shouldn't need two indexes on the same column but we should make it easy to construct queries without special knowledge of how to make the where clause.
Attachments
Issue Links
- relates to
-
PDB-3231 A query in Everett is using too much cpu
-
- Closed
-