-
Notifications
You must be signed in to change notification settings - Fork 24.1k
Description
SUMMARY
As the size of host facts is increased, there is a serious slow down in the number of tasks that can be processed per second. This can be viewed at smaller scale, but is dramatic at ~150 nodes. This can happen because of network facts (with many interfaces 100+), service facts, or package facts. This can also be easily reproduced with loading a large static fact structure (>100KB) and filtering to ansible_local.
I was able to pin point one of the issues coming from the get_vars in the host task processing:
https://github.com/ansible/ansible/blob/stable-2.9/lib/ansible/plugins/strategy/linear.py#L282-L283
This was seen to take <10ms with no facts and ~200ms with facts. Since host task processing is performed serially, the result is that <= 5 tasks can be processed per second at this scale.
There is another issue with the amount of memory and the workers being spawned, but the task execution slowdown seems to be more consistent.
ISSUE TYPE
- Bug Report
COMPONENT NAME
lib/ansible/vars/manager.py
lib/ansible/plugins/strategy/linear.py
ANSIBLE VERSION
ansible-2.9.17-1.el8ae.noarch
$ ansible --version
ansible 2.9.17
config file = /home/stack/scale-ssh-ansible/ansible.cfg
configured module search path = ['/home/stack/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3.6/site-packages/ansible
executable location = /usr/bin/ansible
python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
CONFIGURATION
ANSIBLE_PIPELINING(/home/stack/scale-ssh-ansible/ansible.cfg) = True
ANSIBLE_SSH_ARGS(/home/stack/scale-ssh-ansible/ansible.cfg) = -o UserKnownHostsFile=/dev/null -o StrictHostKey>
ANSIBLE_SSH_CONTROL_PATH_DIR(/home/stack/scale-ssh-ansible/ansible.cfg) = /tmp/scale-ansible-ssh
ANSIBLE_SSH_RETRIES(/home/stack/scale-ssh-ansible/ansible.cfg) = 8
DEFAULT_CALLBACK_WHITELIST(/home/stack/scale-ssh-ansible/ansible.cfg) = ['profile_tasks']
DEFAULT_FORKS(/home/stack/scale-ssh-ansible/ansible.cfg) = 32
DEFAULT_INTERNAL_POLL_INTERVAL(/home/stack/scale-ssh-ansible/ansible.cfg) = 0.005
DEFAULT_TIMEOUT(/home/stack/scale-ssh-ansible/ansible.cfg) = 30
INTERPRETER_PYTHON(/home/stack/scale-ssh-ansible/ansible.cfg) = auto
OS / ENVIRONMENT
RHEL8.2
This has been seen on OpenStack compute nodes with an average of 200 network interfaces. This same effect can be reproduced without network facts and using service facts or package facts.
STEPS TO REPRODUCE
# baseline.yml
- hosts: all
gather_facts: false
tasks:
- meta: clear_facts
- name: Gather facts
setup:
gather_subset:
- '!all'
- min
- name: Sleep 1
shell: sleep 1
# package-facts.yml
- hosts: all
gather_facts: false
tasks:
- meta: clear_facts
- name: Gather facts
setup:
gather_subset:
- '!all'
- min
- name: Gather package facts
package_facts:
manager: auto
- name: Sleep 1
shell: sleep 1
# service-facts.yml
- hosts: all
gather_facts: false
tasks:
- meta: clear_facts
- name: Gather facts
setup:
gather_subset:
- '!all'
- min
- name: Gather service facts
service_facts: # sup
- name: Sleep 1
shell: sleep 1
EXPECTED RESULTS
Task execution should not slow down when the number of Ansible host facts increase.
ACTUAL RESULTS
Execution against 179 hosts
$ ansible-playbook -i inventory.ini baseline.yml
...SNIP...
Thursday 18 February 2021 15:32:19 +0000 (0:00:04.784) 0:00:08.455 *****
===============================================================================
Sleep 1 ----------------------------------------------------------------- 4.78s
Gather facts ------------------------------------------------------------ 3.58s
$ ansible-playbook -i inventory.ini package-facts.yml
...SNIP...
Thursday 18 February 2021 15:34:26 +0000 (0:00:26.821) 0:00:52.222 *****
===============================================================================
Sleep 1 ---------------------------------------------------------------- 26.82s
Gather package facts --------------------------------------------------- 21.53s
Gather facts ------------------------------------------------------------ 3.78s
$ ansible-playbook -i inventory.ini service-facts.yml
...SNIP...
Thursday 18 February 2021 15:36:00 +0000 (0:00:08.965) 0:00:26.258 *****
===============================================================================
Gather service facts --------------------------------------------------- 12.44s
Sleep 1 ----------------------------------------------------------------- 8.97s
Gather facts ------------------------------------------------------------ 3.94s