[PDB-3544] Fact Updating appears slower with Fact Reduplication Created: 2017/06/01  Updated: 2018/07/23  Resolved: 2017/06/12

Status: Closed
Project: PuppetDB
Component/s: None
Affects Version/s: PDB 4.4.0
Fix Version/s: PDB 4.4.1

Type: Bug Priority: Normal
Reporter: Nick Walker Assignee: Russell Mull
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates
Template:
Method Found: Customer Feedback
QA Risk Assessment: Needs Assessment

 Description   

The Problem

I'm seeing an increase in time to update the facts table in PE 2017.2.1.

See, https://explain.depesz.com/s/GG28

I'm also see a large increase in log messages where queries on the facts table are making temporary files.

2017-06-01 13:04:53.254 CDT [db:pe-puppetdb,sess:59305624.1a9d,pid:6813,vtid:16/146,tid:0] LOG:  temporary file: path "pg_tblspc/16384/PG_9.4_201409291/pgsql_tmp/pgsql_tmp6813.3", size 45768
2017-06-01 13:04:53.254 CDT [db:pe-puppetdb,sess:59305624.1a9d,pid:6813,vtid:16/146,tid:0] STATEMENT:  select    path, id, value_type_id, large_value_hash,    case when large_value_hash is null               and value_type_id = 5           then value    end as value,    case when large_value_hash is null then value_string    end as value_string,    value_integer, value_float, value_boolean  from facts as f  inner join fact_paths fp on f.fact_path_id = fp.id  where f.factset_id = $1

That query has a plan like this:

https://explain.depesz.com/s/Zsc

Suggested Solution

I believe both of these plans can be improved with the following index.

CREATE INDEX facts_factset_id_fact_path_id_idx ON facts(factset_id, fact_path_id)

Pull Request

https://github.com/puppetlabs/puppetdb/pull/2307



 Comments   
Comment by Nick Walker [ 2017/06/01 ]

On my test machine I'm noticing that it appears we make one update per fact changes which on my machine results in about 13 individual updates.

Without index: https://gist.github.com/npwalker/45cc8634509cdc7f0c9bca67691c2431

With Index: https://gist.github.com/npwalker/4fd885a7dd1c83c5b9624263a04e402b

You can see that the 13 updates become significantly faster ( 2-4x faster) with the index:

With Index:
LOG: duration: 0.059 ms plan:
LOG: duration: 0.089 ms plan:
LOG: duration: 0.022 ms plan:
LOG: duration: 0.071 ms plan:
LOG: duration: 0.021 ms plan:
LOG: duration: 0.023 ms plan:
LOG: duration: 0.022 ms plan:
LOG: duration: 0.062 ms plan:
LOG: duration: 0.067 ms plan:
LOG: duration: 0.020 ms plan:
LOG: duration: 0.071 ms plan:
LOG: duration: 0.065 ms plan:
LOG: duration: 0.066 ms plan:

Without Index:
LOG: duration: 0.198 ms plan:
LOG: duration: 0.374 ms plan:
LOG: duration: 0.150 ms plan:
LOG: duration: 0.194 ms plan:
LOG: duration: 0.148 ms plan:
LOG: duration: 0.098 ms plan:
LOG: duration: 0.083 ms plan:
LOG: duration: 0.080 ms plan:
LOG: duration: 0.090 ms plan:
LOG: duration: 0.114 ms plan:
LOG: duration: 0.168 ms plan:
LOG: duration: 1.385 ms plan:
LOG: duration: 0.244 ms plan:

Comment by Nick Walker [ 2017/06/12 ]

Russell Mull This was merged correct? Can you confirm the fix versions and close it?

Generated at Thu Nov 14 00:46:04 PST 2019 using JIRA 7.7.1#77002-sha1:e75ca93d5574d9409c0630b81c894d9065296414.