Uploaded image for project: 'PuppetDB'
  1. PuppetDB
  2. PDB-3529

Report insertion should determine the latest_report_id without recalculating the latest report from the reports table

    XMLWordPrintable

Details

    • Improvement
    • Status: Closed
    • Normal
    • Resolution: Fixed
    • PDB 4.4.0
    • PDB 4.4.1, PDB 5.0.0
    • PuppetDB
    • None
    • Data Platform
    • Bug Fix
    • Report storage has been optimized by requiring fewer tables to be consulted when updating the record of which report is latest.
    • Needs Assessment

    Description

      The Problem

      Looking at the scale testing postgresql logs I can see that we are repeatedly calculating the latest_report_id from the reports table. This happens every time we insert a report but we should not need to recalculate the latest_report every time.

      https://github.com/puppetlabs/puppetdb/blob/def0dc5cedbc31c0d0315abd95dfb20107e2ea45/src/puppetlabs/puppetdb/scf/storage.clj#L1291-L1295

      The Suggestion

      We should store the latest_report_timestamp ( the producer_timestamp from reports ) in certnames with the latest_report_id. Then when we insert a new report we can compare the producer timestamp to the latest_report_timestamp and update the latest_report_id and _timestamp if it is actually the latest report.

      This would eliminate an expensive query that sorts all of the reports for a certname just to find the most recent report.

      Attachments

        Activity

          People

            Unassigned Unassigned
            nick.walker Nick Walker
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Zendesk Support