Uploaded image for project: 'Puppet'
  1. Puppet
  2. PUP-10218

Puppet incorrectly detecting stale pidfile

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • PUP 6.11.1
    • PUP 6.13.0
    • None
    • PUP Bug Template
    • CentOS 7
    • CentOS 7
    • Night's Watch
    • 2
    • NW - 2020-02-05, NW - 2020-02-19
    • Needs Assessment
    • Bug Fix
    • Fixed pidfile lock removal for when Puppet Agent is started as a LightWeight Process and is incorrectly terminated on POSIX operating systems.
    • Needs Assessment

    Description

      Puppet Version: 6.11.1
      Puppet Server Version: 6.11.1
      OS Name/Version: CentOS 7

      When Puppet agent is incorrectly terminated (eg. killed by KILL signal) it might have a problem in detecting stale PID file. The code in question is this:

      puppet/lib/ruby/vendor_ruby/puppet/util/pidlock.rb

      def clear_if_stale
          begin
            Process.kill(0, lock_pid)
          rescue *errors
            return @lockfile.unlock
          end
       
          if Puppet.features.posix?
            procname = Puppet::Util::Execution.execute(["ps", "-p", lock_pid, "-o", "comm="]).strip
            args     = Puppet::Util::Execution.execute(["ps", "-p", lock_pid, "-o", "args="]).strip
            @lockfile.unlock unless procname =~ /ruby/ && args =~ /puppet/ || procname =~ /puppet(-.*)?$/
          elsif Puppet.features.microsoft_windows?
            # On Windows, we're checking if the filesystem path name of the running
            # process is our vendored ruby:
            exe_path = Puppet::Util::Windows::Process::get_process_image_name_by_pid(lock_pid)
            @lockfile.unlock unless exe_path =~ /\\bin\\ruby.exe$/
          end
      

      Process.kill(0, pid) tries to find out if process with certain pid exists. The problem is that Process.kill checks regular processes as well as LightWeight Processes (LWP), so it will verify if certain process or lightweight process currently exists. If it exists it will try to find the name of the command for it. Unfortunately ps -p command only cares about processes and not LightWeight Processes, so if stale file contains PID of LWP Puppet will never be able to recover (unless we remove stale lock file) as LWP usually are spawned by long running daemons. Please find an attached patch (tested on CentOS7) which addresses the above issue: it makes sure ps command also considers LWPs otherwise you might run into error shown below.

      Desired Behavior:

      Puppet agent starts up and runs correctly.

      Actual Behavior:

      # puppet agent --test
      Error: Could not run Puppet configuration client: Execution of 'ps -p 2181 -o comm=' returned 1: 
      

      Attachments

        1. pidlock.patch
          0.9 kB
          Marcin Deranek

        Activity

          People

            luchian.nemes Luchian Nemes
            mderanek Marcin Deranek
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Zendesk Support