Uploaded image for project: 'MCollective'
  1. MCollective
  2. MCO-802

MCollective service fails to either start/stop intermittently on windows agents

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Normal
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Platform
    • Labels:
    • Environment:

      Windows 10 enterprise 64 bit, Windows 2012 64 bit

    • Template:
    • Team:
      Dumpling
    • QA Risk Assessment:
      Needs Assessment

      Description

      mcollective service fails to either start or stop intermittently on windows agents.

      Background: While trying to trouble shoot and investigate the issue, came across this test:I ran the https://github.com/puppetlabs/puppet/blob/master/acceptance/tests/windows/QA-563_windows_exit_mcollective.rb.

      The above test succeeds fine on windows agent with same version of OS and PE that fails from within my script. So mimicked the same methods used to start/stop the service in above (working) test however same sequence for mcollective service on same/identical configuration when invoked from within my test script (attached - windows_agent_tls_verification.rb). For past couple of times the failure has been mostly unable to stop the service though I have seen it fail when trying to start the service as well. The later seems to happen when using the puppet('resource service mcollective ensure=running') as against 'net start mcollective'. When it gets into that state of being unable to start the service, it is not because the prior instance had not died/terminated properly as sc query does not yield any pid and it will just stubbornly refuse to start even when attempted manually a number of times.
      With the present sequence/methods used in the script it will mostly fail when trying to stop the service first time.

      I tried this on two different windows versions till with similar observations. It has not been very consistent with these failures although there seems to be randomness when using puppet('resource service mcollective ensure=xxxx') method compared to net start/stop.

      Versions: I have seen this happen with 2016.4.4 versions of PE builds and think it may also be happening with 2017.2 versions.

      Attachments: these scripts should be executed/placed in pe_acceptance_tests

      • windows_agent_tls_verification.rb (acceptance/tests/security folder)
      • beaker_helper.rb (lib folder)
      • host_helpers.rb (lib/puppet_enterprise_acceptance folder)

      Use the standard beaker command to run the above script while specifying a host config with windows agents (in addition to master etc.)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jayant.sane Jayant Sane
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Zendesk Support