Uploaded image for project: 'Puppet Communications Protocol'
  1. Puppet Communications Protocol
  2. PCP-862

pxp-agent connection flapping exhausts resources on broker

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: pcp-broker 1.5.3
    • Component/s: None
    • Labels:
      None
    • Template:
    • Team:
      Bolt
    • Sprint:
      Bolt Kanban
    • Method Found:
      Customer Feedback
    • Release Notes:
      Bug Fix
    • Release Notes Summary:
      Hide
      This fixes PCP broker to close superseded connections with `disconnect` in place of `close!`, which frees resources (namely file descriptors) much more quickly, avoiding using system resources unnecessarily when there's a flapping connection or multiple pxp-agents competing for connections.
      Show
      This fixes PCP broker to close superseded connections with `disconnect` in place of `close!`, which frees resources (namely file descriptors) much more quickly, avoiding using system resources unnecessarily when there's a flapping connection or multiple pxp-agents competing for connections.
    • QA Risk Assessment:
      Needs Assessment

      Description

      Summary:

      If a condition exists where pxp-agent connections are flapping (see PCP-833) this can exhaust resources on the broker

      Reproduction:

      SLES11 agents are susceptible to PCP-833, remove the pxp-agent pidfile and start a second pxp-agent process:

       

      rm /var/run/puppetlabs/pxp-agent.pid
      service pxp-agent start
      ## confirm two pids:
      pgrep pxp-agent
      

       

      On the broker, note the number of connection in TIME_WAIT:

       

      netstat -tunap | grep 8142 | grep TIME_WAIT 
      tcp6 0 0 192.168.0.6:8142 10.32.114.98:51034 TIME_WAIT - 
      tcp6 0 0 192.168.0.6:8142 10.32.114.98:50997 TIME_WAIT - 
      tcp6 0 0 192.168.0.6:8142 10.32.114.98:51144 TIME_WAIT - 
      tcp6 0 0 192.168.0.6:8142 10.32.114.98:51103 TIME_WAIT -
      <snip>
      

      Also note the number of file descriptors for the broker service increasing:

      watch 'ls /proc/$(systemctl show -p MainPID pe-orchestration-services|cut -d= -f2)/fd | wc -l'

      This will continue until the service either runs out of file descriptors or the heap for the service is exhausted.

      If the broker runs on a compile master, this interrupts the puppetserver service as well.

      Note: As part of this work, update Kim Oehmichen when this ticket is in merging and once it's merged.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                lucy Lucy Wyman
                Reporter:
                erik.hansen Erik Hansen
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Zendesk Support