Uploaded image for project: 'Puppet'
  1. Puppet
  2. PUP-10218

Puppet incorrectly detecting stale pidfile

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Normal
    • Resolution: Fixed
    • Affects Version/s: PUP 6.11.1
    • Fix Version/s: PUP 6.13.0
    • Component/s: None
    • Template:
      PUP Bug Template
    • Agent OS:
      CentOS 7
    • Master OS:
      CentOS 7
    • Team:
      Night's Watch
    • Story Points:
      2
    • Sprint:
      NW - 2020-02-05, NW - 2020-02-19
    • Method Found:
      Needs Assessment
    • Release Notes:
      Bug Fix
    • Release Notes Summary:
      Fixed pidfile lock removal for when Puppet Agent is started as a LightWeight Process and is incorrectly terminated on POSIX operating systems.
    • QA Risk Assessment:
      Needs Assessment

      Description

      Puppet Version: 6.11.1
      Puppet Server Version: 6.11.1
      OS Name/Version: CentOS 7

      When Puppet agent is incorrectly terminated (eg. killed by KILL signal) it might have a problem in detecting stale PID file. The code in question is this:

      puppet/lib/ruby/vendor_ruby/puppet/util/pidlock.rb

      def clear_if_stale
          begin
            Process.kill(0, lock_pid)
          rescue *errors
            return @lockfile.unlock
          end
       
          if Puppet.features.posix?
            procname = Puppet::Util::Execution.execute(["ps", "-p", lock_pid, "-o", "comm="]).strip
            args     = Puppet::Util::Execution.execute(["ps", "-p", lock_pid, "-o", "args="]).strip
            @lockfile.unlock unless procname =~ /ruby/ && args =~ /puppet/ || procname =~ /puppet(-.*)?$/
          elsif Puppet.features.microsoft_windows?
            # On Windows, we're checking if the filesystem path name of the running
            # process is our vendored ruby:
            exe_path = Puppet::Util::Windows::Process::get_process_image_name_by_pid(lock_pid)
            @lockfile.unlock unless exe_path =~ /\\bin\\ruby.exe$/
          end
      

      Process.kill(0, pid) tries to find out if process with certain pid exists. The problem is that Process.kill checks regular processes as well as LightWeight Processes (LWP), so it will verify if certain process or lightweight process currently exists. If it exists it will try to find the name of the command for it. Unfortunately ps -p command only cares about processes and not LightWeight Processes, so if stale file contains PID of LWP Puppet will never be able to recover (unless we remove stale lock file) as LWP usually are spawned by long running daemons. Please find an attached patch (tested on CentOS7) which addresses the above issue: it makes sure ps command also considers LWPs otherwise you might run into error shown below.

      Desired Behavior:

      Puppet agent starts up and runs correctly.

      Actual Behavior:

      # puppet agent --test
      Error: Could not run Puppet configuration client: Execution of 'ps -p 2181 -o comm=' returned 1: 
      

        Attachments

          Activity

            People

            Assignee:
            luchian.nemes Luchian Nemes
            Reporter:
            mderanek Marcin Deranek
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Zendesk Support