+
Skip to content

Ansible tasks slow down as host facts grow #73654

@mwhahaha

Description

@mwhahaha
SUMMARY

As the size of host facts is increased, there is a serious slow down in the number of tasks that can be processed per second. This can be viewed at smaller scale, but is dramatic at ~150 nodes. This can happen because of network facts (with many interfaces 100+), service facts, or package facts. This can also be easily reproduced with loading a large static fact structure (>100KB) and filtering to ansible_local.

I was able to pin point one of the issues coming from the get_vars in the host task processing:
https://github.com/ansible/ansible/blob/stable-2.9/lib/ansible/plugins/strategy/linear.py#L282-L283

This was seen to take <10ms with no facts and ~200ms with facts. Since host task processing is performed serially, the result is that <= 5 tasks can be processed per second at this scale.

There is another issue with the amount of memory and the workers being spawned, but the task execution slowdown seems to be more consistent.

ISSUE TYPE
  • Bug Report
COMPONENT NAME

lib/ansible/vars/manager.py
lib/ansible/plugins/strategy/linear.py

ANSIBLE VERSION

ansible-2.9.17-1.el8ae.noarch

$ ansible --version
ansible 2.9.17
  config file = /home/stack/scale-ssh-ansible/ansible.cfg
  configured module search path = ['/home/stack/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.6/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
CONFIGURATION
ANSIBLE_PIPELINING(/home/stack/scale-ssh-ansible/ansible.cfg) = True
ANSIBLE_SSH_ARGS(/home/stack/scale-ssh-ansible/ansible.cfg) = -o UserKnownHostsFile=/dev/null -o StrictHostKey>
ANSIBLE_SSH_CONTROL_PATH_DIR(/home/stack/scale-ssh-ansible/ansible.cfg) = /tmp/scale-ansible-ssh
ANSIBLE_SSH_RETRIES(/home/stack/scale-ssh-ansible/ansible.cfg) = 8
DEFAULT_CALLBACK_WHITELIST(/home/stack/scale-ssh-ansible/ansible.cfg) = ['profile_tasks']
DEFAULT_FORKS(/home/stack/scale-ssh-ansible/ansible.cfg) = 32
DEFAULT_INTERNAL_POLL_INTERVAL(/home/stack/scale-ssh-ansible/ansible.cfg) = 0.005
DEFAULT_TIMEOUT(/home/stack/scale-ssh-ansible/ansible.cfg) = 30
INTERPRETER_PYTHON(/home/stack/scale-ssh-ansible/ansible.cfg) = auto
OS / ENVIRONMENT

RHEL8.2

This has been seen on OpenStack compute nodes with an average of 200 network interfaces. This same effect can be reproduced without network facts and using service facts or package facts.

STEPS TO REPRODUCE
# baseline.yml
- hosts: all
  gather_facts: false
  tasks:
    - meta: clear_facts
    - name: Gather facts
      setup:
        gather_subset:
          - '!all'
          - min
    - name: Sleep 1
      shell: sleep 1
# package-facts.yml
- hosts: all
  gather_facts: false
  tasks:
    - meta: clear_facts
    - name: Gather facts
      setup:
        gather_subset:
          - '!all'
          - min
    - name: Gather package facts
      package_facts:
        manager: auto
    - name: Sleep 1
      shell: sleep 1
# service-facts.yml
- hosts: all
  gather_facts: false
  tasks:
    - meta: clear_facts
    - name: Gather facts
      setup:
        gather_subset:
          - '!all'
          - min
    - name: Gather service facts
      service_facts: # sup
    - name: Sleep 1
      shell: sleep 1
EXPECTED RESULTS

Task execution should not slow down when the number of Ansible host facts increase.

ACTUAL RESULTS

Execution against 179 hosts

$ ansible-playbook -i inventory.ini baseline.yml
...SNIP...
Thursday 18 February 2021  15:32:19 +0000 (0:00:04.784)       0:00:08.455 ***** 
=============================================================================== 
Sleep 1 ----------------------------------------------------------------- 4.78s
Gather facts ------------------------------------------------------------ 3.58s
$ ansible-playbook -i inventory.ini package-facts.yml
...SNIP...
Thursday 18 February 2021  15:34:26 +0000 (0:00:26.821)       0:00:52.222 ***** 
=============================================================================== 
Sleep 1 ---------------------------------------------------------------- 26.82s
Gather package facts --------------------------------------------------- 21.53s
Gather facts ------------------------------------------------------------ 3.78s
$ ansible-playbook -i inventory.ini service-facts.yml
...SNIP...
Thursday 18 February 2021  15:36:00 +0000 (0:00:08.965)       0:00:26.258 ***** 
=============================================================================== 
Gather service facts --------------------------------------------------- 12.44s
Sleep 1 ----------------------------------------------------------------- 8.97s
Gather facts ------------------------------------------------------------ 3.94s

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载