+
Skip to content

container remains running when Process pid is died #9274

Open
@lance5890

Description

@lance5890

What happened?

  1. When the system disk if full, and then the router container is died
  2. kubelet liveness probe error(as the pid can not found)
[root@yf-master-0-0 crio]# journalctl -u kubelet.service  --since "2025-06-23 14:08:28" | grep "router-default-8d89fcb6d-pf8tx\|913720ce-6f30"
Jun 23 14:08:28 yf-master-0-0 hyperkube[3381926]: E0623 14:08:28.608550 3381926 prober.go:104] "Probe errored" err="rpc error: code = NotFound desc = container is not created or running: checking if PID of c4743be5a13d8787a72acb82642057c03a9fd076dbfb36809fef960f287b1b29 is running failed: open /proc/2563009/stat: no such file or directory: container process not found" probeType="Liveness" pod="ccos-ingress/router-default-8d89fcb6d-pf8tx" podUID="913720ce-6f30-4612-8ba0-d451dc991303" containerName="router"
Jun 23 14:08:28 yf-master-0-0 hyperkube[3381926]: E0623 14:08:28.608625 3381926 prober.go:104] "Probe errored" err="rpc error: code = NotFound desc = container is not created or running: checking if PID of c4743be5a13d8787a72acb82642057c03a9fd076dbfb36809fef960f287b1b29 is running failed: open /proc/2563009/stat: no such file or directory: container process not found" probeType="Readiness" pod="ccos-ingress/router-default-8d89fcb6d-pf8tx" podUID="913720ce-6f30-4612-8ba0-d451dc991303" containerName="router"
Jun 23 14:08:38 yf-master-0-0 hyperkube[3381926]: E0623 14:08:38.608556 3381926 prober.go:104] "Probe errored" err="rpc error: code = NotFound desc = container is not created or running: checking if PID of c4743be5a13d8787a72acb82642057c03a9fd076dbfb36809fef960f287b1b29 is running failed: open /proc/2563009/stat: no such file or directory: container process not found" probeType="Liveness" pod="ccos-ingress/router-default-8d89fcb6d-pf8tx" podUID="913720ce-6f30-4612-8ba0-d451dc991303" containerName="router"
Jun 23 14:08:38 yf-master-0-0 hyperkube[3381926]: E0623 14:08:38.608612 3381926 prober.go:104] "Probe errored" err="rpc error: code = NotFound desc = container is not created or running: checking if PID of c4743be5a13d8787a72acb82642057c03a9fd076dbfb36809fef960f287b1b29 is running failed: open /proc/2563009/stat: no such file or directory: container process not found" probeType="Readiness" pod="ccos-ingress/router-default-8d89fcb6d-pf8tx" podUID="913720ce-6f30-4612-8ba0-d451dc991303" containerName="router"
Image

What did you expect to happen?

I know the exit file is writed by the conmon, and the system disk is full, maybe the conmon can not write the exit code in this file, and then lead to the container status not updated anymore

If there is another way to recover the container except for restarting cri-o

How can we reproduce it (as minimally and precisely as possible)?

make the system disk full, and kill the pid process of the container(maybe stop the conmon process at the same time)

Anything else we need to know?

No response

CRI-O and Kubernetes version

$ crio --version
# paste output here

1.29.11

$ kubectl version --output=json
# paste output here

1.29.6

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

Additional environment details (AWS, VirtualBox, physical, etc.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载