+
Skip to content

Kubernetes Executor Task Leak #36998

Closed
Closed
@smhood

Description

@smhood

Apache Airflow version

2.8.1

If "Other Airflow 2 version" selected, which one?

No response

What happened?

Scheduler stops processing DAGs and moving them to the queued status. When looking at the scheduler is debug mode following information is displayed.

[2024-01-24T13:40:15.828+0000] {scheduler_job_runner.py:1092} DEBUG - Executor full, skipping critical section
[2024-01-24T13:40:15.828+0000] {base_executor.py:217} DEBUG - 32 running task instances
[2024-01-24T13:40:15.829+0000] {base_executor.py:218} DEBUG - 0 in queue
[2024-01-24T13:40:15.829+0000] {base_executor.py:219} DEBUG - 0 open slots

We noticed that a fix was addressed here #36240, however still seeing the same issues.

We are utilizing the airflow helm chart version 1.10, and we have the same issue happening in multiple environments.
Two environments have parallelism set to 32 with 1 scheduler running.
The other has 3 schedulers all with 32 parallelism.

What you think should happen instead?

When a task is complete it should release the slot.

How to reproduce

Currently it seems to just be a time thing, after a certain period of time running the slots fill up with completed tasks.

Operating System

Debian GNU/Linux 12

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==8.16.0
apache-airflow-providers-celery==3.5.1
apache-airflow-providers-cncf-kubernetes==7.13.0
apache-airflow-providers-common-io==1.2.0
apache-airflow-providers-common-sql==1.10.0
apache-airflow-providers-docker==3.9.1
apache-airflow-providers-elasticsearch==5.3.1
apache-airflow-providers-ftp==3.7.0
apache-airflow-providers-google==10.13.1
apache-airflow-providers-grpc==3.4.1
apache-airflow-providers-hashicorp==3.6.1
apache-airflow-providers-http==4.8.0
apache-airflow-providers-imap==3.5.0
apache-airflow-providers-microsoft-azure==8.5.1
apache-airflow-providers-mysql==5.5.1
apache-airflow-providers-odbc==4.4.0
apache-airflow-providers-openlineage==1.4.0
apache-airflow-providers-postgres==5.10.0
apache-airflow-providers-redis==3.6.0
apache-airflow-providers-sendgrid==3.4.0
apache-airflow-providers-sftp==4.8.1
apache-airflow-providers-slack==8.5.1
apache-airflow-providers-snowflake==5.2.1
apache-airflow-providers-sqlite==3.7.0
apache-airflow-providers-ssh==3.10.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

Deploy via helm chart (1.10) to an azure aks.
Deploy our own image with required packages/dags copied FROM apache/airflow:2.8.1-python3.11
Process is synced with ArgoCD deployment pipeline.

Anything else?

This problem for the most part occurs daily. We have a test instance with only 5 running dags that run once every hour and we are still seeing the issue.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载