Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disabled (but running) systemd service is not restarted with mod_watch #47361

Open
edusperoni opened this issue Apr 27, 2018 · 17 comments
Open
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE help-wanted Community help is needed to resolve this pending-community-assignment Pending community contributor assignment severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@edusperoni
Copy link
Contributor

Description of Issue/Question

When a systemd service is created, started, but not enabled, salt does not restart the service on mod_watch:

----------
          ID: /etc/zabbix/zabbix_agentd.conf
    Function: file.managed
      Result: True
     Comment: File /etc/zabbix/zabbix_agentd.conf updated
     Started: 10:14:52.621831
    Duration: 4298.499 ms
     Changes:
              ----------
              diff:
                  ---
                  <snip>
----------
          ID: zabbix-agent-service
    Function: service.running
        Name: zabbix-agent
      Result: True
     Comment: Service zabbix-agent has been enabled, and is in the desired state
     Started: 10:14:56.922787
    Duration: 1427.353 ms
     Changes:
              ----------
              zabbix-agent:
                  True

Some packages (zabbix, especifically), don't enable their services when installed, so Salt only enables it, instead of enabling+restarting.

This issue is discussed more thoroughly at this PR:
saltstack-formulas/zabbix-formula#77

Setup

Formula: https://github.com/edusperoni/zabbix-formula
states:

  • zabbix-agent.repo
  • zabbix-agent.conf

Steps to Reproduce Issue

On a machine that has never installed zabbix-agent, install it using the formula above.

On a machine that has zabbix-agent installed, run systemctl disable zabbix-agent && apt-get purge zabbix-agent -y and then apply the state.

Versions Report

Salt Version:
           Salt: 2018.3.0

Dependency Versions:
           cffi: 0.8.6
       cherrypy: 3.5.0
       dateutil: 2.2
      docker-py: Not Installed
          gitdb: 0.5.4
      gitpython: 0.3.2 RC1
          ioflo: Not Installed
         Jinja2: 2.9.4
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.2
   mysql-python: 1.2.3
      pycparser: 2.10
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.9 (default, Jun 29 2016, 13:08:31)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 14.4.0
           RAET: Not Installed
          smmap: 0.8.2
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.5

System Versions:
           dist: debian 8.10
         locale: UTF-8
        machine: x86_64
        release: 3.16.0-4-amd64
         system: Linux
        version: debian 8.10
@Ch3LL
Copy link
Contributor

Ch3LL commented Apr 30, 2018

I'm having a hard time replicating this with a small use case as below:

root@localhost:~# cat /srv/salt/test.sls 
test:
  file.managed:
    - name: /tmp/test
    - watch_in:
      - service: apache2

apache2:
  service.running:
    - enable: True

When running the state it restarts the service:

----------
          ID: apache2
    Function: service.running
      Result: True
     Comment: Service restarted
     Started: 14:31:23.317472
    Duration: 2242.205 ms
     Changes:   
              ----------
              apache2:
                  True

Summary for local

I tried capturing what I found in the states you pointed out and applyling it to this service. Is the use case the same?

@Ch3LL Ch3LL added the info-needed waiting for more info label Apr 30, 2018
@Ch3LL Ch3LL added this to the Blocked milestone Apr 30, 2018
@edusperoni
Copy link
Contributor Author

Try running the following commands:

systemctl disable apache2
service apache2 start

I'm not near my computer at this moment, but it should do the trick.

The idea is that systemctl status should report the service as disabled AND running

@Ch3LL
Copy link
Contributor

Ch3LL commented May 1, 2018

I just tripled check that i ran those commands before running the state and I am still seeing the service restart. I even just changed it to zabbix-agent and cannot replicate.

Heres a detailed use case:

root@localhost:~# systemctl disable zabbix-agent
Synchronizing state for zabbix-agent.service with sysvinit using update-rc.d...
Executing /usr/sbin/update-rc.d zabbix-agent defaults
insserv: warning: current start runlevel(s) (empty) of script `zabbix-agent' overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `zabbix-agent' overrides LSB defaults (0 1 6).
Executing /usr/sbin/update-rc.d zabbix-agent disable
insserv: warning: current start runlevel(s) (empty) of script `zabbix-agent' overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `zabbix-agent' overrides LSB defaults (0 1 6).
root@localhost:~# service zabbix-agent start
root@localhost:~# sudo salt-call --local state.sls test
[WARNING ] State for file: /tmp/test - Neither 'source' nor 'contents' nor 'contents_pillar' nor 'contents_grains' was defined, yet 'replace' was set to 'True'. As there is no source to replace the file with, 'replace' has been set to 'False' to avoid reading the file unnecessarily.
local:
----------
          ID: test
    Function: file.managed
        Name: /tmp/test
      Result: True
     Comment: Empty file
     Started: 14:01:50.789805
    Duration: 36.405 ms
     Changes:   
              ----------
              new:
                  file /tmp/test created
----------
          ID: zabbix-agent
    Function: service.running
      Result: True
     Comment: Service restarted
     Started: 14:01:51.760442
    Duration: 109.101 ms
     Changes:   
              ----------
              zabbix-agent:
                  True

Summary for local
------------
Succeeded: 2 (changed=2)
Failed:    0
------------
Total states run:     2
Total run time: 145.506 ms

i'm wondering if my state is just not setup in the same way as your use case

@edusperoni
Copy link
Contributor Author

edusperoni commented May 1, 2018

Oh, I see now, you're using sysvinit, not systemd. Try it using either Ubuntu 16.04 or Debian 8 (or newer)

@Ch3LL
Copy link
Contributor

Ch3LL commented May 2, 2018

my understanding is debian ensures they synchronize the services through both sysvinit and systemd. I even confirmed on the minion that it was running the salt/modules/systemd.py file.

@Ch3LL
Copy link
Contributor

Ch3LL commented May 2, 2018

also to note i tested on both debian 8 and ubuntu 16

@edusperoni
Copy link
Contributor Author

That's weird, it happened to both me and another commenter on the issue I linked, and we were able to replicate it by running:

systemctl disable zabbix-agent
apt-get purge zabbix-agent -y

and then applying the formula. (or applying the formula on a clean system)

Can you try with the formula I linked (my fork is on the correct commit, before the workaround)? Just use zabbix.agent.repo and zabbix.agent.conf and this pillar (just in case):

zabbix:
  lookup:
     version_repo: 3.4

Although it should work the same on debian 8, you could try it on debian 9, which is the exact version I'm using.

@hatifnatt
Copy link

I can confirm, problem exist. I just checked one more time.
Disable and remove agent:
systemctl disable zabbix-agent.service; systemctl stop zabbix-agent.service; apt-get -y purge zabbix-agent

Install it again:
salt 'testvm*' state.sls zabbix.agent.repo,zabbix.agent.conf

  Name: deb http://repo.zabbix.com/zabbix/3.0/debian jessie main - Function: pkgrepo.managed - Result: Clean Started: - 22:00:40.098412 Duration: 30.905 ms
----------
          ID: zabbix-agent
    Function: pkg.installed
      Result: True
     Comment: The following packages were installed/updated: zabbix-agent
              The following packages were already installed: zabbix-sender
     Started: 22:00:40.129466
    Duration: 7967.602 ms
     Changes:
              ----------
              zabbix-agent:
                  ----------
                  new:
                      1:3.0.16-1+jessie
                  old:
  Name: zabbix - Function: group.present - Result: Clean Started: - 22:00:48.098539 Duration: 1.68 ms
  Name: zabbix - Function: user.present - Result: Clean Started: - 22:00:48.100595 Duration: 14.298 ms
  Name: /var/log/zabbix - Function: file.directory - Result: Clean Started: - 22:00:48.469282 Duration: 0.6 ms
  Name: /var/run/zabbix - Function: file.directory - Result: Clean Started: - 22:00:48.470001 Duration: 0.519 ms
----------
          ID: /etc/zabbix/zabbix_agentd.conf
    Function: file.managed
      Result: True
     Comment: File /etc/zabbix/zabbix_agentd.conf updated
     Started: 22:00:48.470621
    Duration: 99.676 ms
     Changes:
              ----------
              diff:
                  ---
                  +++
                  @@ -1,3 +1,4 @@
                  +# Managed by saltstack. Do not edit this file.
                  ... snip ...
                   # TLSPSKFile=
                  +
----------
          ID: zabbix-agent
    Function: service.running
      Result: True
     Comment: Service zabbix-agent has been enabled, and is in the desired state
     Started: 22:00:48.570671
    Duration: 173.452 ms
     Changes:
              ----------
              zabbix-agent:
                  True

zabbix-agent log file

 12498:20180502:220047.523 agent #2 started [listener #1]
 12501:20180502:220047.525 active check configuration update from [127.0.0.1:10051] started to fail (cannot connect to [[127.0.0.1]:10051]: [111] Connection refused)

It's still using 127.0.0.1 as server, restart was not performed.

I will try to replicate this issue with simple test state later. May be there is some race condition.

@zerthimon
Copy link
Contributor

Something weird is going on with this state.
From what I observed, when the service is dead and needs to be started AND enabled it's only gets started. The next run enables the service.

@Ch3LL
Copy link
Contributor

Ch3LL commented Jun 21, 2018

are any of you seeing this with any other services or only zabbix? Also are you certain this was working on previous salt versions?

@zerthimon
Copy link
Contributor

I'm seeing this with docker.service, not sure if it's salt, docker or systemd problem :)

@Ch3LL
Copy link
Contributor

Ch3LL commented Jun 28, 2018

and do you know if it was working on previous versions?

@bandikoot
Copy link

Hello, @Ch3LL!

I'd try to bring this up again. The issue is reproducable and can be simplified to the description below.

Situation 1

Assume, we have an .sls with:

  1. config file's state:
/etc/unbound/unbound.conf:
  file.managed:
    - require:
      - pkg: unbound
    - user: unbound
    - contents: |
        include: "/etc/unbound/unbound.conf.d/*.conf"
        #test file changes
  1. service state with a watch requisite:
unbound.service:
  service.running:
    - enable: True
    - require:
      - cmd: unbound-systemctl-daemon-reload
      - pkg: unbound
    - watch:
      - file: /etc/systemd/system/unbound.service
      - file: /etc/unbound/unbound.conf
      - file: /etc/unbound/unbound.conf.d
  1. config file is going to have changes
  2. systemd-service being currently disabled, but running

In this case on applying state we get:

----------
          ID: /etc/unbound/unbound.conf
    Function: file.managed
      Result: True
     Comment: File /etc/unbound/unbound.conf updated
     Started: 20:18:22.972941
    Duration: 14.977 ms
     Changes:   
              ----------
              diff:
                  --- 
                  +++ 
                  @@ -1 +1,2 @@
                   include: "/etc/unbound/unbound.conf.d/*.conf"
                  +#test file changes
...
----------
          ID: unbound.service
    Function: service.running
        Name: unbound
      Result: True
     Comment: Service unbound has been enabled, and is in the desired state
     Started: 20:18:23.218767
    Duration: 531.541 ms
     Changes:   
              ----------
              unbound:
                  True

--------------------------------

$ systemctl status unbound
● unbound.service - Unbound recursive Domain Name Server
   Loaded: loaded (/etc/systemd/system/unbound.service; enabled; vendor preset: enabled)
  Drop-In: /run/systemd/generator/unbound.service.d
           └─50-insserv.conf-$named.conf, 50-unbound-$named.conf
   Active: active (running) since Thu 2019-08-08 16:56:43 MSK; 3h 21min ago
 Main PID: 1133 (unbound)
   CGroup: /system.slice/unbound.service
           └─1133 /usr/sbin/unbound -c /etc/unbound/unbound.conf

and the service hasn't been restarted (take a look on active since) despite the watch for the config.

Situation 2

When the service is already enabled, but we still have changes in the watched config file, applying the state results in the service's restart correctly:

----------
          ID: /etc/unbound/unbound.conf
    Function: file.managed
      Result: True
     Comment: File /etc/unbound/unbound.conf updated
     Started: 20:29:21.331920
    Duration: 6.818 ms
     Changes:   
              ----------
              diff:
                  --- 
                  +++ 
                  @@ -1,2 +1 @@
                   include: "/etc/unbound/unbound.conf.d/*.conf"
                  -#test file changes
...
----------
          ID: unbound.service
    Function: service.running
        Name: unbound
      Result: True
     Comment: Service restarted
     Started: 20:29:21.638848
    Duration: 377.89 ms
     Changes:   
              ----------
              unbound:
                  True

--------------------------------

$ systemctl status unbound
● unbound.service - Unbound recursive Domain Name Server
   Loaded: loaded (/etc/systemd/system/unbound.service; enabled; vendor preset: enabled)
  Drop-In: /run/systemd/generator/unbound.service.d
           └─50-insserv.conf-$named.conf, 50-unbound-$named.conf
   Active: active (running) since Thu 2019-08-08 20:31:09 MSK; 37s ago
  Process: 13367 ExecStart=/usr/sbin/unbound -c /etc/unbound/unbound.conf (code=exited, status=0/SUCCESS)
  Process: 13360 ExecStartPre=/usr/sbin/unbound-checkconf (code=exited, status=0/SUCCESS)
  Process: 13356 ExecStartPre=/usr/sbin/unbound-anchor -a /var/lib/unbound/root.key -v (code=exited, status=0/SUCCESS)
 Main PID: 13369 (unbound)
    Tasks: 1
   Memory: 5.6M
      CPU: 36ms
   CGroup: /system.slice/unbound.service
           └─13369 /usr/sbin/unbound -c /etc/unbound/unbound.conf

I believe, service state should trigger systemd-service's enabling and restarting (or starting if it was inactive) in Situation 1. It also seems to me, that the issue is not somehow service-related. We're currently using 2018.3.2 version.

Could you take another look on it, please?

@stale
Copy link

stale bot commented Jan 8, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

@stale stale bot added the stale label Jan 8, 2020
@Ch3LL
Copy link
Contributor

Ch3LL commented Jan 8, 2020

ping @bandikoot thanks for the example and sorry for the late reply. thanks to stalebot I saw this issue today.

I was finally able to replicate this and I'm just gonna paste my exact steps so i don't have an issue replicating this in the future:

Using this vagrant file:

Vagrant.configure("2") do |config|
  config.vm.box = "debian/buster64"
end
  1. install salt
  2. install unbound - apt-get install unbound
  3. disable the service - systemctl disable unbound
  4. run this state:
/etc/unbound/unbound.conf:
  file.managed:
    - user: unbound
    - contents: |
        include: "/etc/unbound/unbound.conf.d/*.conf"
        #test file changes

unbound.service:
  service.running:
    - enable: True
    - watch:
      - file: /etc/unbound/unbound.conf

salt-call --local state.sls test -ldebug

and now i see the issue. will label as a bug for us to get fixed thanks

@stale
Copy link

stale bot commented Jan 8, 2020

Thank you for updating this issue. It is no longer marked as stale.

@stale stale bot removed the stale label Jan 8, 2020
@Ch3LL Ch3LL added Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around P3 Priority 3 and removed info-needed waiting for more info labels Jan 8, 2020
@Ch3LL Ch3LL modified the milestones: Blocked, Approved Jan 8, 2020
@Ch3LL Ch3LL self-assigned this Jan 13, 2020
@nicholasmhughes
Copy link
Collaborator

Confirmed on:

Versions Report
Salt Version:
          Salt: 3003.3
 
Dependency Versions:
          cffi: Not Installed
      cherrypy: unknown
      dateutil: 2.7.3
     docker-py: Not Installed
         gitdb: 2.0.5
     gitpython: 2.1.11
        Jinja2: 2.10
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 0.5.6
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: Not Installed
      pycrypto: Not Installed
  pycryptodome: 3.6.1
        pygit2: Not Installed
        Python: 3.7.3 (default, Jan 22 2021, 20:04:44)
  python-gnupg: Not Installed
        PyYAML: 3.13
         PyZMQ: 17.1.2
         smmap: 2.0.5
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.1
 
System Versions:
          dist: debian 10 buster
        locale: UTF-8
       machine: x86_64
       release: 4.19.0-17-amd64
        system: Linux
       version: Debian GNU/Linux 10 buster

Given the text in the watch requisite documentation, it seems this is the intended (or at least expected) behavior.

If the watching state changes key contains values, then mod_watch will not be called. If you're using watch or watch_in then it's a good idea to have a state that only enforces one attribute - such as splitting out service.running into its own state and have service.enabled in another.

I haven't dug into the code for this too deep yet, but does anyone know if this is something that's "fixable" in mod_watch for the service state without fundamentally changing global watch behavior?

@twangboy twangboy added help-wanted Community help is needed to resolve this pending-community-assignment Pending community contributor assignment labels Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE help-wanted Community help is needed to resolve this pending-community-assignment Pending community contributor assignment severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

No branches or pull requests

8 participants