You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy of [AIRFLOW-5071] JIRA: Thousands of Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally? #10790
Kubernetes version (if you are using kubernetes) (use kubectl version): Server: v1.10.13, Client: v1.17.0
Environment:
Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release): Debian GNU/Linux 9 (stretch)
Kernel (e.g. uname -a): Linux airflow-web-54fc4fb694-ftkp5 4.19.123-coreos #1 SMP Fri May 22 19:21:11 -00 2020 x86_64 GNU/Linux
Others: Redis, CeleryExecutor
What happened:
In line with the guidelines laid out in AIRFLOW-7120, I'm copying over a JIRA for a bug that has significant negative impact on our pipeline SLAs. The original ticket is AIRFLOW-5071 which has a lot of details from various users who use ExternalTaskSensors in reschedule mode and see their tasks going through the following unexpected state transitions:
In our case, this issue seems to affect approximately ~2000 tasks per day.
What you expected to happen:
I would expect that tasks would go through the following state transitions instead: running -> up_for_reschedule -> scheduled -> queued -> running
How to reproduce it:
Unfortunately, I don't have configuration available that could be used to easily reproduce the issue at the moment. However, based on the thread in AIRFLOW-5071, the problem seems to arise in deployments that use a large number of sensors in reschedule mode.
thiagophx, damienpalacio, zstipanicev, ipeluffo, alexandruionita-gyg and 18 moreturbaszek and ranjithkumar-glean