Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
- Create an isolated reproduction case that shows how sync got mismatched hashes for the same factset
- Create a ticket for any follow on work to address the issues found during this investigation
-
HA
-
5
-
HA 2020-02-24, HA 2020-03-10, HA 2020-04-07, HA 2020-04-21
-
Needs Assessment
-
Bug Fix
-
Fixed an issue causing unecessary factset sync
-
Needs Assessment
Description
Testing sync's memory usage with the new non-lazy approach I noticed that factsets were being repeatedly pulled on every sync run. Austin was able to reproduce this behavior on his machine using the following steps:
- set up a local sync pair using pe-pdbbox and Austin's helper pdb script (sync-1 & sync-2)
- stop sync-2
- load benchmark data into sync-1
- restart sync-2
Once initial sync runs we observed that periodic sync would continue to pull factsets in both directions. sync-1 would pull ~20 factsets out of 2,000 where sync-2 would pull ~1,000 out of 2,000. We saw similar behavior once before at a customer's site, but in that case it was only transferring a handful of factsets and the issue appeared to resolve itself after a while in the logs.
I'm wondering if this can be caused if a factset is first ingested via sync and not from the normal command ingestion path. That could help explain why we saw a similar issue at a customer that looked like got resolved after a bit of time.
Example of what we saw in the debug logs for sync when it was repeatedly pulling factsets:
2021-02-05 16:02:29,948 DEBUG [clojure-agent-send-off-pool-0] [p.p.s.core] Identified remote factset (host-1574 2021-02-06T00:00:05.374Z a657a432359dcd750c7df412d51a67570e9190a4) to sync due to local factset (host-1574 2021-02-06T00:00:05.374Z 23d9f61d312f3b72b46bf6a7974f8698ac9f9abd) |
You can see in the example above that the hash used to compare the contents of factsets in the sync summary query didn't line up which caused the sync-2 side to repeatedly pull the factsets.
We should investigate this issue and figure out how exactly can happen and see if there is a way to mitigate it in pe-puppetdb sync.