Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
None
-
HA
-
1
-
HA 2021-06-02, HA 2021-06-16
-
Needs Assessment
-
Bug Fix
-
-
Needs Assessment
Description
The `show lock_timeout` query we do here to grab any existing system lock_timeout returns a result that isn't formatted properly for Long/parseLong. We need to fix this query so that it doesn't throw if there is a system lock_timeout already set.
To reproduce with a local PDB:
in psql: alter role pdb_test set lock_timeout=300; |
run: lein test :only puppetlabs.puppetdb.cli.services-test/regular-gc-drops-oldest-partitions-incrementally
|
This will cause the test to fail with:
21349 [pool-3-thread-3] ERROR puppetlabs.puppetdb.cli.services - Error while sweeping reports and resource events |
java.lang.NumberFormatException: For input string: "300ms" |
at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) |
at java.base/java.lang.Long.parseLong(Long.java:692) |
at java.base/java.lang.Long.parseLong(Long.java:817) |
at puppetlabs.puppetdb.scf.storage$prune_daily_partitions.invokeStatic(storage.clj:1571) |
...
|
If there is a lock_timeout set for postgres or for the puppetdb/pe-puppetdb role it will cause partition GC to fail and cause partitions to build up until PDB is restarted or the lock_timeout is reset. If there isn't a lock_timeout set the query returns 0 which isn't a problem for Long/parseLong.
As a workaround resetting the lock_timeout should allow partition GC to succeed. For example:
alter role "pe-puppetdb" reset lock_timeout; |
Will reset the lock_timeout on the pe-puppetdb role and should resolve the error seen above. Partition drops are still protected by a (5min default) lock_timeout which is defaulted and set via an env var here.