Description
Apache Airflow version
2.8.1
If "Other Airflow 2 version" selected, which one?
No response
What happened?
Scheduler stops processing DAGs and moving them to the queued status. When looking at the scheduler is debug mode following information is displayed.
[2024-01-24T13:40:15.828+0000] {scheduler_job_runner.py:1092} DEBUG - Executor full, skipping critical section
[2024-01-24T13:40:15.828+0000] {base_executor.py:217} DEBUG - 32 running task instances
[2024-01-24T13:40:15.829+0000] {base_executor.py:218} DEBUG - 0 in queue
[2024-01-24T13:40:15.829+0000] {base_executor.py:219} DEBUG - 0 open slots
We noticed that a fix was addressed here #36240, however still seeing the same issues.
We are utilizing the airflow helm chart version 1.10, and we have the same issue happening in multiple environments.
Two environments have parallelism set to 32 with 1 scheduler running.
The other has 3 schedulers all with 32 parallelism.
What you think should happen instead?
When a task is complete it should release the slot.
How to reproduce
Currently it seems to just be a time thing, after a certain period of time running the slots fill up with completed tasks.
Operating System
Debian GNU/Linux 12
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==8.16.0
apache-airflow-providers-celery==3.5.1
apache-airflow-providers-cncf-kubernetes==7.13.0
apache-airflow-providers-common-io==1.2.0
apache-airflow-providers-common-sql==1.10.0
apache-airflow-providers-docker==3.9.1
apache-airflow-providers-elasticsearch==5.3.1
apache-airflow-providers-ftp==3.7.0
apache-airflow-providers-google==10.13.1
apache-airflow-providers-grpc==3.4.1
apache-airflow-providers-hashicorp==3.6.1
apache-airflow-providers-http==4.8.0
apache-airflow-providers-imap==3.5.0
apache-airflow-providers-microsoft-azure==8.5.1
apache-airflow-providers-mysql==5.5.1
apache-airflow-providers-odbc==4.4.0
apache-airflow-providers-openlineage==1.4.0
apache-airflow-providers-postgres==5.10.0
apache-airflow-providers-redis==3.6.0
apache-airflow-providers-sendgrid==3.4.0
apache-airflow-providers-sftp==4.8.1
apache-airflow-providers-slack==8.5.1
apache-airflow-providers-snowflake==5.2.1
apache-airflow-providers-sqlite==3.7.0
apache-airflow-providers-ssh==3.10.0
Deployment
Official Apache Airflow Helm Chart
Deployment details
Deploy via helm chart (1.10) to an azure aks.
Deploy our own image with required packages/dags copied FROM apache/airflow:2.8.1-python3.11
Process is synced with ArgoCD deployment pipeline.
Anything else?
This problem for the most part occurs daily. We have a test instance with only 5 running dags that run once every hour and we are still seeing the issue.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct