Uploaded image for project: 'PuppetDB'
  1. PuppetDB
  2. PDB-5141

Fix issue with lock_timeout format during partition drop

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Normal
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: PDB 6.17.0, PDB 7.4.1
    • Component/s: None
    • Labels:
      None
    • Template:
    • Team:
      HA
    • Story Points:
      1
    • Sprint:
      HA 2021-06-02, HA 2021-06-16
    • Method Found:
      Needs Assessment
    • Release Notes:
      Bug Fix
    • Release Notes Summary:
      Hide
      Lock timeouts should be parsed correctly now. Previously, if a lock timeout had been set either via the experimental [PDB_GC_DAILY_PARTITION_DROP_LOCK_TIMEOUT_MS](https://puppet.com/docs/puppetdb/latest/configure.html#experimental-environment-variables) variable, or other means, PuppetDB might fail to interpret the value correctly, and as a result, fail to prune older data correctly. [(PDB-5141)](https://tickets.puppetlabs.com/browse/PDB-5141)
      Show
      Lock timeouts should be parsed correctly now. Previously, if a lock timeout had been set either via the experimental [PDB_GC_DAILY_PARTITION_DROP_LOCK_TIMEOUT_MS]( https://puppet.com/docs/puppetdb/latest/configure.html#experimental-environment-variables ) variable, or other means, PuppetDB might fail to interpret the value correctly, and as a result, fail to prune older data correctly. [( PDB-5141 )]( https://tickets.puppetlabs.com/browse/PDB-5141 )
    • QA Risk Assessment:
      Needs Assessment

      Description

      The `show lock_timeout` query we do here to grab any existing system lock_timeout returns a result that isn't formatted properly for Long/parseLong. We need to fix this query so that it doesn't throw if there is a system lock_timeout already set.

      To reproduce with a local PDB:

      in psql: alter role pdb_test set lock_timeout=300;
      run: lein test :only puppetlabs.puppetdb.cli.services-test/regular-gc-drops-oldest-partitions-incrementally
      

      This will cause the test to fail with:

      21349 [pool-3-thread-3] ERROR puppetlabs.puppetdb.cli.services - Error while sweeping reports and resource events
      java.lang.NumberFormatException: For input string: "300ms"
      	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
      	at java.base/java.lang.Long.parseLong(Long.java:692)
      	at java.base/java.lang.Long.parseLong(Long.java:817)
      	at puppetlabs.puppetdb.scf.storage$prune_daily_partitions.invokeStatic(storage.clj:1571)
              ...
      

      If there is a lock_timeout set for postgres or for the puppetdb/pe-puppetdb role it will cause partition GC to fail and cause partitions to build up until PDB is restarted or the lock_timeout is reset. If there isn't a lock_timeout set the query returns 0 which isn't a problem for Long/parseLong.

      As a workaround resetting the lock_timeout should allow partition GC to succeed. For example:

       alter role "pe-puppetdb" reset lock_timeout;
      

      Will reset the lock_timeout on the pe-puppetdb role and should resolve the error seen above. Partition drops are still protected by a (5min default) lock_timeout which is defaulted and set via an env var here.

        Attachments

          Activity

            People

            Assignee:
            rob.browning Rob Browning
            Reporter:
            zachary.kent Zachary Kent
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Zendesk Support