Uploaded image for project: 'PuppetDB'
  1. PuppetDB
  2. PDB-3323

Checking if a report exists before inserting it is SLOW

    Details

    • Template:
    • Team:
      Systems Engineering
    • Story Points:
      2
    • Sprint:
      SE 2017-03-08
    • Release Notes:
      Known Issue
    • Release Notes Summary:
      Hide
      PostgreSQL CPU usage increase

      After upgrading to 2016.5.1 or 2016.5.2, PostgreSQL CPU usage increases by 50 percent. To workaround this issue, we've created the pdb_3323_workaround module.

      1. On the Puppet master, download the [module](https://forge.puppet.com/puppetlabs/pdb_3323_workaround).
      2. Classify the node hosting pe-postgresql with the `pdb_3323_workaround` class.
      3. On the node hosting pe-postgresql, run Puppet.

      After you install this module, it's safe to forget about. The index will be removed when you upgrade in the future.

      <!--PDB-3323-->

      Show
      PostgreSQL CPU usage increase After upgrading to 2016.5.1 or 2016.5.2, PostgreSQL CPU usage increases by 50 percent. To workaround this issue, we've created the pdb_3323_workaround module. 1. On the Puppet master, download the [module]( https://forge.puppet.com/puppetlabs/pdb_3323_workaround ). 2. Classify the node hosting pe-postgresql with the `pdb_3323_workaround` class. 3. On the node hosting pe-postgresql, run Puppet. After you install this module, it's safe to forget about. The index will be removed when you upgrade in the future. <!-- PDB-3323 -->
    • QA Risk Assessment:
      Needs Assessment

      Description

      Suggested workaround

      https://forge.puppet.com/puppetlabs/pdb_3323_workaround

      The Problem

      Working with a customer we noticed performance degradation after updating from PE 2016.4 to 2016.5.2.

      Using PgBadger to analyze the PostgreSQL logs we can see that checking to see if a report already exists is taking longer than 5 seconds much more frequently than expected.

      Researching on my own test install I can see that the query performs a Seq Scan on reports because the where clause doesn't match an index.

      Adding an index on the hash column that isn't wrapped in a function causes the query plan to switch to using a Index Only Scan.

      pe-puppetdb=# explain analyze select 1 from reports where hash = '\x7ad27615d5c7de439855b245e5eb6d7245e812f6';
                                                  QUERY PLAN                                            
      --------------------------------------------------------------------------------------------------
       Seq Scan on reports  (cost=0.00..39.31 rows=1 width=0) (actual time=0.012..0.105 rows=1 loops=1)
         Filter: (hash = '\x7ad27615d5c7de439855b245e5eb6d7245e812f6'::bytea)
         Rows Removed by Filter: 186
       Planning time: 0.142 ms
       Execution time: 0.133 ms
      (5 rows)
       
      pe-puppetdb=# create unique index test on reports(hash);
      CREATE INDEX
      pe-puppetdb=# explain analyze select 1 from reports where hash = '\x7ad27615d5c7de439855b245e5eb6d7245e812f6';
                                                          QUERY PLAN                                                     
      -------------------------------------------------------------------------------------------------------------------
       Index Only Scan using test on reports  (cost=0.14..8.16 rows=1 width=0) (actual time=0.033..0.035 rows=1 loops=1)
         Index Cond: (hash = '\x7ad27615d5c7de439855b245e5eb6d7245e812f6'::bytea)
         Heap Fetches: 1
       Planning time: 0.346 ms
       Execution time: 0.062 ms
      (5 rows)
      

      Customer Impact / How it manifested

      We saw that PuppetDB could only manage to work 5 commands / sec despite trying to bump command processing threads to 10 on a 8 core PuppetDB node.

      CPU usage on the PostgreSQL node showed steady 40% use on 2016.4 and immediately increased to 60% after the upgrade to 2016.5. Not perfect science here but that's a 50% degradation in CPU use.

      Possible Solutions

      Adding the index should make the query faster, CREATE UNIQUE INDEX idx_reports_hash on reports(hash)

      Otherwise we can simply stop checking if the report already exists before inserting it.

      Additional Thoughts

      I think we should research the current index on the hash column and determine if queries are being designed to hit it or if queries are being constructed like the one above.

      We hopefully shouldn't need two indexes on the same column but we should make it easy to construct queries without special knowledge of how to make the where clause.

        Attachments

          Issue Links

            Activity

              jsd-sla-details-panel

                People

                • Assignee:
                  russell.mull Russell Mull
                  Reporter:
                  nick.walker Nick Walker
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  11 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Zendesk Support