Uploaded image for project: 'PuppetDB'
  1. PuppetDB
  2. PDB-4504

add-resource-events-pk migration does not use batched inserts

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: PDB 6.3.4, PDB 6.5.0
    • Fix Version/s: PDB 6.3.5, PDB 6.7.0, PDB 6.8.0
    • Component/s: None
    • Labels:
      None
    • Template:
    • Team:
      PuppetDB
    • CS Priority:
      Needs Priority
    • Release Notes:
      Not Needed
    • QA Risk Assessment:
      Needs Assessment

      Description

      PuppetDB 6.3 added a primary key to the resource_events table in order to
      facilitate functionality such as repacking and partitioning. In order to create
      a unique key, the hashing algorithm for resource events was changed. This
      resulted in a migration that re-writes the entire table in order to update the
      hash and remove duplicates. The re-write portion does not use batched inserts
      when sending transformed data back to the database, which results in a separate
      INSERT statement being executed for each row in the resource_events table.

      Reproduction Case

      • Install PuppetDB 5.2
      • Upgrade to PuppetDB 6.3 and wait for migration 69 to finish
      • Compare the number of calls to INSERT INTO resource_events_transform to
        the number of rows modified by that query and the number of rows in the
        resource_events table:

      SELECT sum(calls) AS calls, sum(rows) AS rows FROM pg_stat_statements WHERE query ILIKE 'insert into resource_events_transform%';
      SELECT count(*) FROM resource_events;
      

      Outcome

      pg_stat_statements shows that the INSERT is executed once per row in
      the resource_events table:

      pe-puppetdb=# SELECT sum(calls) AS calls, sum(rows) AS rows FROM pg_stat_statements WHERE query ILIKE 'insert into resource_events_transform%';
      -[ RECORD 1 ]--
      calls | 5698404
      rows  | 5698404
       
      pe-puppetdb=# SELECT count(*) FROM resource_events;
      -[ RECORD 1 ]--
      count | 5779611
      

      Expected Outcome

      The insertion of transformed rows is batched such that the number of calls
      to INSERT is a few orders of magnitude smaller than the number of rows
      in the resource_events table.

        Attachments

          Activity

            People

            Assignee:
            chuck Charlie Sharpsteen
            Reporter:
            chuck Charlie Sharpsteen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Zendesk Support