[PDB-470] Investigate and implement setStatementsCacheSize for repetitive SQL performance Created: 2014/02/20  Updated: 2016/01/20  Resolved: 2014/02/28

Status: Closed
Project: PuppetDB
Component/s: None
Affects Version/s: None
Fix Version/s: PDB 2.0.0

Type: Improvement Priority: Normal
Reporter: Ken Barber Assignee: Ken Barber
Resolution: Fixed Votes: 0
Labels: trivial
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2014-02-21 at 02.44.58.png     PNG File Screen Shot 2014-02-21 at 02.45.07.png     PNG File Screen Shot 2014-02-28 at 13.51.35.png     PNG File Screen Shot 2014-02-28 at 13.51.49.png    
Issue Links:
Duplicate
is duplicated by PDB-235 Investigate/default BoneCP Prepared S... Closed
Template:
Story Points: 2
Sprint: 20140226 to 20140305

 Description   

It seems the setting setStatementsCacheSize for BoneCPConfig will enable prepared statements cache.

This seems to have a great effect on our insert time, especially at high load.

So far I've tested this 3 times and found that the number of seconds spent doing inserts is almost halved.

The proposal is to implement this setting as a user configurable item and set it to a reasonable default perhaps. We should confirm the numbers I've seen here first, and also do some more research on the feature to understand its side-effects. We might also want to consider just doing our own prepared statement handling and the value of that versus this approach, although this might be a good phase 1 regardless.

The patch I was working with is here:

--- a/src/com/puppetlabs/jdbc.clj
+++ b/src/com/puppetlabs/jdbc.clj
@@ -300,6 +300,7 @@ (defn make-connection-pool
                           (.setStatisticsEnabled stats)
                           (.setIdleMaxAgeInMinutes (pl-time/to-minutes conn-max-age))
                           (.setIdleConnectionTestPeriodInMinutes (pl-time/to-minutes conn-keep-alive
+                          (.setStatementsCacheSize 1000)
                           ;; paste the URL back together from parts.
                           (.setJdbcUrl (str "jdbc:" subprotocol ":" subname))

My test code for submitting a random catalog is here:

#!/usr/bin/env ruby
 
require 'rubygems'
require 'net/http'
require 'uri'
require 'json'
require 'pp'
 
def replace_catalog(catalog)
  uri = URI.parse("http://localhost:8080/v3/commands")
  http = Net::HTTP.new(uri.host, uri.port)
 
  request = Net::HTTP::Post.new(uri.request_uri)
  request.set_form_data(
    {
      "payload" => {
        "command" => "replace catalog",
        "version" => 3,
        "payload" => catalog.to_json,
      }.to_json
    }
  )
  response = http.request(request)
  puts response.body
end
 
def random_string
  (0...50).map { ('a'..'z').to_a[rand(26)] }.join
end
 
def random_resource
  {
    "type" => "Class",
    "title" => random_string,
    "exported" => false,
    "file" => "asdf",
    "line" => 3,
    "tags" => ["asdf"],
    "parameters" => {
    }
  }
end
 
payload1 = {
  "metadata" => { "api_version" => 1 },
  "data" => {
    "name" => "host-0",
    "version" => "3",
    "transaction-uuid" => "foo",
    "edges" => [
      {
        "source" => {
          "type" => "Class",
          "title" => "foo",
        },
        "target" => {
          "type" => "File",
          "title" => "foo",
        },
        "relationship" => "contains",
      },
    ],
    "resources" => [
      {
        "type" => "Class",
        "title" => "foo",
        "exported" => false,
        "file" => "asdf",
        "line" => 3,
        "tags" => ["asdf"],
        "parameters" => {
        },
      },
      {
        "type" => "File",
        "title" => "foo",
        "exported" => false,
        "file" => "asdf",
        "line" => 3,
        "tags" => ["asdf"],
        "parameters" => {
        },
      },
    ]
  }
}
 
10000.times do
  payload1["data"]["resources"].push random_resource
end
 
#pp payload1
 
replace_catalog(payload1)



 Comments   
Comment by Ken Barber [ 2014/02/20 ]

Find attached a couple of screenshots for before/after patch results. Like I said, I've tried this 3 times and found similar results each time, so I'm sure its not by accident.

This appears to be a simple facility that provides hashed lookups of previously prepared statements so they can be reused. Without this, prepared statements are being prepared then destroyed after a single use. By using prepared statements we avoid a hard-parse in postgresql and with multiple use Postgresql can avoid plan calculation after a while once it considers a plan stable. So big performance benefits it would seem.

The alternative is to look at doing our own prepared statement management. This would require much larger changes to code however.

Comment by Ken Barber [ 2014/02/28 ]

PR is here: https://github.com/puppetlabs/puppetdb/pull/874

Comment by Ken Barber [ 2014/02/28 ]

Attached more profiling with this patch applied. Comparing the setting 'statements-cache-size' with a value of 0 and a value of 1000.

The value of 1000 shows a total insert time for resources_params_cache of 3403 milliseconds, for a value of 0 the insert time is 5541 milliseconds.

Generated at Thu Nov 14 11:33:11 PST 2019 using JIRA 7.7.1#77002-sha1:e75ca93d5574d9409c0630b81c894d9065296414.