-
Notifications
You must be signed in to change notification settings - Fork 235
cmd, pkg/utils: Stop the container once the last session finishes #1679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
cmd, pkg/utils: Stop the container once the last session finishes #1679
Conversation
Build succeeded. ✔️ unit-test SUCCESS in 5m 56s |
This pull request can't be merged until the work to enable Ptyxis to use |
Build succeeded. ✔️ unit-test SUCCESS in 6m 17s |
c7c8158
to
0bf4ee2
Compare
Build succeeded. ✔️ unit-test SUCCESS in 5m 43s |
Build failed. ✔️ unit-test SUCCESS in 5m 31s |
recheck |
Build failed. ✔️ unit-test SUCCESS in 5m 59s |
Weird. One of the tests for the
|
recheck |
Build succeeded. ✔️ unit-test SUCCESS in 5m 57s |
ea2db5d
to
d410d09
Compare
Build succeeded. ✔️ unit-test SUCCESS in 7m 14s |
Build succeeded. ✔️ unit-test SUCCESS in 5m 39s |
A subsequent commit will use this to stop the Toolbx container once the last 'enter' or 'run' session finishes. containers#114
Currently, once a Toolbx container gets started with 'podman start', as part of the 'enter' or 'run' commands, it doesn't stop unless the host is shut down or someone explicitly calls 'podman stop'. This becomes annoying if someone tries to remove the container because commands like 'podman rm' and such don't work without the '--force' flag, even if all active 'enter' and 'run' sessions have ended, and the lingering entry points of those containers are can be considered a waste of resources. A system of reference counting based on advisory file locks has been used to automatically exit the container's entry point once all the active sessions have ended. Two locks are used - a global lock that's common for all containers, and a local lock that's specific to each container. The initialization stamp file is conveniently used as the local lock. The 'enter' and 'run' sessions acquire shared file locks and the container's entry point acquires ones that are exclusive. All attempts at acquiring the locks are blocking unless otherwise noted. The global lock is acquired at the beginning of 'enter' and 'run' before they inspect the container, negotiate the path to the local lock (ie., the initialization stamp file) with the entry point, and the local lock is created by the entry point. Once the local lock is known by 'enter' and 'run', they acquire it and only then release the global. The Toolbx container's entry point tries to acquire the global lock as it creates the initialization stamp file (ie., the local lock). This waits for the 'enter' and 'run' invocations to receive the location of the local lock, acquire it and release the global. Once the entry point acquires the global lock, it releases it, and waits trying to acquire the local lock. This sequence of acquiring and releasing the locks lets the entry point track the state of the 'enter' and 'run' invocations. It should only try to acquire the local lock after the 'enter' and 'run' invocations have acquired it before invoking 'podman exec'. The entry point is able to acquire the local lock after all 'enter' and 'run' sessions end and release their local locks. At this point, a new 'enter' or 'run' invocation might be in the process of starting. Both sides need to be careful not to race against each other and up in an invalid state. eg., a 'podman start' being invoked against a container whose entry point is just about to exit, or a 'podman exec' being invoked against a container whose entry point is about to exit or has already exited. Therefore, the entry point makes a non-blocking attempt to acquire the global lock while holding the local. If it fails, then it's because a new 'enter' or 'run' was invoked that is in the process of negotiating the path to the local lock with the entry point. In this case, the entry point releases the local lock and goes back trying to acquire the global lock, as it did when creating the initialization stamp file (ie., the local lock). If it succeeds, then no new 'enter' or 'run' is in the process of starting, and the entry point can exit. If this system of reference counting is simplified to just the global lock, then all the entry points of all Toolbx containers will exit only after all the 'enter' and 'run' sessions across all Toolbx containers have ended. The local lock makes it possible to do this for each container separately. This system will not work without the global lock. It will cause a few races if a new 'enter' or 'run' is invoked, just as the last of the previous batch of sessions end, letting the entry point acquire the local lock and prepare to exit. Sometimes, a Toolbx container's entry point is started directly with 'podman start', without going through the 'enter' or 'run' commands, for debugging. Care was taken to detect this case by making a non-blocking attempt to acquire the global lock from the entry point before creating the initialization stamp file (ie., the local lock). If it fails, then it's because an 'enter' or 'run' is waiting for the container to get initialized by the entry point, and things proceed as described above. If it succeeds, then it's because the entry point was started directly. In this case, the entry point releases the global lock, and adds a timeout after creating the initialization stamp file before trying to acquire any other locks to give the user time to invoke 'enter' or 'run'. A timeout of 25 seconds is used, as is the default for D-Bus method calls [1] and when waiting for the entry point to initialize the container. A variation of this system of reference counting can only use the advisory file locks in the 'enter' and 'run' commands, and invoke 'podman inspect --format {{.ExecIDs}} ...' after each 'podman exec' to find out if there are any remaining sessions [2]. This was not done because each podman(1) invocation is sufficiently expensive and there is a desire to keep them to minimum in the 'enter' and 'run' commands, because these are the most frequently used commands and users expect them to be as lean as possible [3,4]. A totally different approach could be to pass an AF_UNIX socket to the Toolbx container through the NOTIFY_SOCKET environment variable and 'podman create --sdnotify container ...', and do the reference counting by sending messages from the host to the entry point before and after each 'podman exec' [2]. One downside is that the reference counting will break if the host process crashes before sending the message to deduct the count after a 'podman exec' ends. Another downside is that it becomes complicated to directly call 'podman start', without going through the 'enter' or 'run' commands, for debugging. [1] https://docs.gtk.org/gio/property.DBusProxy.g-default-timeout.html [2] containers/podman#26589 [3] Commit 4536e2c containers@4536e2c8c28f6c4f containers#813 containers#654 [4] Commit 74d4fcf containers@74d4fcf00c6ec3d1 containers#1491 containers#1070 containers#114
3578ee3
to
0882df7
Compare
Build failed. ✔️ unit-test SUCCESS in 5m 39s |
Build failed. ✔️ unit-test SUCCESS in 5m 40s |
Build failed. ✔️ unit-test SUCCESS in 5m 52s |
Currently, once a Toolbx container gets started with
podman start
, aspart of the
enter
orrun
commands, it doesn't stop unless the hostis shut down or someone explicitly calls
podman stop
. This becomesannoying if someone tries to remove the container because commands like
podman rm
and such don't work without the--force
flag, even if allactive
enter
andrun
sessions have ended.A system of reference counting based on advisory file locks has been
used to automatically exit the container's entry point once all the
active sessions have ended. Two locks are used - a global lock that's
common for all containers, and a local lock that's specific to each
container. The initialization stamp file is conveniently used as the
local lock.
The
enter
andrun
sessions acquire shared file locks and thecontainer's entry point acquires ones that are exclusive. All attempts
at acquiring the locks are blocking unless otherwise noted.
The global lock is acquired at the beginning of
enter
andrun
beforethey inspect the container, negotiate the path to the local lock (ie.,
the initialization stamp file) with the entry point and create it. Once
the local lock is known by
enter
andrun
, they acquire it and onlythen release the global.
The Toolbx container's entry point tries to acquire the global lock as
it creates the initialization stamp file (ie., the local lock). This
waits for the
enter
andrun
invocations to receive the location ofthe local lock, acquire it and release the global. Once the entry point
acquires the global lock, it releases it, and waits trying to acquire
the local lock.
This sequence of acquiring and releasing the locks lets the entry point
track the state of the
enter
andrun
invocations. It should onlytry to acquire the local lock after the
enter
andrun
invocationshave acquired it before invoking
podman exec
.The entry point is able to acquire the local lock after all
enter
andrun
sessions end and release their local locks.At this point, a new
enter
orrun
invocation might be in the processof starting. Both sides need to be careful not to race against each
other and up in an invalid state. eg., a
podman start
being invokedagainst a container whose entry point is just about to exit, or a
podman exec
being invoked against a container whose entry point isabout to exit or has already exited.
Therefore, the entry point makes a non-blocking attempt to acquire the
global lock while holding the local. If it fails, then it's because a
new
enter
orrun
was invoked that is in the process of negotiatingthe path to the local lock with the entry point. In this case, the
entry point releases the local lock and goes back trying to acquire the
global lock, as it did when creating the initialization stamp file (ie.,
the local lock). If it succeeds, then no new
enter
orrun
is in theprocess of starting, and the entry point can exit.
If this system of reference counting is simplified to just the global
lock, then all the entry points of all Toolbx containers will exit only
after all the
enter
and 'run' sessions across all Toolbx containershave ended. The local lock makes it possible to do this for each
container separately.
This system will not work without the global lock. It will cause a few
races if a new
enter
orrun
is invoked, just as the last of theprevious batch of sessions end, letting the entry point acquire the
local lock and prepare to exit.
Sometimes, a Toolbx container's entry point is started directly with
podman start
, without going through theenter
orrun
commands, fordebugging. Care was taken to detect this case by making a non-blocking
attempt to acquire the global lock from the entry point before creating
the initialization stamp file (ie., the local lock).
If it fails, then it's because an
enter
orrun
is waiting for thecontainer to get initialized by the entry point, and things proceed as
described above. If it succeeds, then it's because the entry point was
started directly. In this case, the entry point releases the global
lock, and adds a timeout after creating the initialization stamp file
before trying to acquire any other locks to give the user time to invoke
enter
orrun
. A timeout of 25 seconds is used, as is the defaultfor D-Bus method calls [1] and when waiting for the entry point to
initialize the container.
[1] https://docs.gtk.org/gio/property.DBusProxy.g-default-timeout.html
#114