-
Notifications
You must be signed in to change notification settings - Fork 586
Set object lock for volumes for cephfs encryption #4697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2b27638 to
adb8ef9
Compare
nixpanic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a nit to improve the readability of the code a little, looks quite good already
adb8ef9 to
d1aef8f
Compare
internal/cephfs/nodeserver.go
Outdated
| } | ||
| defer ioctx.Destroy() | ||
|
|
||
| res, err := ioctx.LockExclusive(volOptions.VolID, lockName, string(lockCookie), lockDesc, lockDuration, &flags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| res, err := ioctx.LockExclusive(volOptions.VolID, lockName, string(lockCookie), lockDesc, lockDuration, &flags) | |
| res, err := ioctx.LockExclusive(volOptions.VolID, lockName, lockCookie, lockDesc, lockDuration, &flags) |
lockCookie is already a string
internal/cephfs/nodeserver.go
Outdated
| log.DebugLog(ctx, "Releasing following lock %s", lockName) | ||
|
|
||
| return nil | ||
| _, err = ioctx.Unlock(string(volID), lockName, lockCookie) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wont we get into the problem if we try to unlock if someone else is already holding the lock? in case of RWX and multiple nodes
internal/cephfs/nodeserver.go
Outdated
|
|
||
| if res == 0 { | ||
| log.DebugLog(ctx, "cephfs: unlocking fscrypt on volume %q path %s", volID, stagingTargetPath) | ||
| return fscrypt.Unlock(ctx, volOptions.Encryption, stagingTargetPath, string(volID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont we need to release lock once the fscrypt.Unlock is done?
internal/cephfs/nodeserver.go
Outdated
| } | ||
| }() | ||
|
|
||
| return fmt.Errorf("There is already one file system with name %s", string(volID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update the error message
d1aef8f to
7af9685
Compare
|
@Sunnatillo can you please check the CI failures? |
7af9685 to
44838e4
Compare
cf7fa8d to
75eab26
Compare
75eab26 to
866f3e4
Compare
2202913 to
80b84a8
Compare
80b84a8 to
31bd222
Compare
Madhu-1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Sunnatillo 🎉
|
@Mergifyio rebase |
❌ Pull request can't be updated with latest base branch changesMergify needs the author permission to update the base branch of the pull request. |
The way fscrypt client handles metadata and policy creation causing errors when multiple instances start simultaneously. This commit adds a lock to ensure the initial setup completes correctly, preventing race conditions and mismatches. Signed-off-by: Sunnatillo <sunnat.samadov@est.tech>
31bd222 to
502178b
Compare
|
/test ci/centos/k8s-e2e-external-storage/1.30 |
|
/test ci/centos/k8s-e2e-external-storage/1.29 |
|
/test ci/centos/k8s-e2e-external-storage/1.27 |
|
/test ci/centos/mini-e2e-helm/k8s-1.30 |
|
/test ci/centos/mini-e2e-helm/k8s-1.29 |
|
/test ci/centos/mini-e2e-helm/k8s-1.27 |
|
/test ci/centos/mini-e2e/k8s-1.30 |
|
/test ci/centos/mini-e2e/k8s-1.29 |
|
/test ci/centos/mini-e2e/k8s-1.27 |
|
/test ci/centos/k8s-e2e-external-storage/1.28 |
|
/test ci/centos/mini-e2e-helm/k8s-1.28 |
|
/test ci/centos/mini-e2e/k8s-1.28 |
|
/test ci/centos/upgrade-tests-cephfs |
|
/test ci/centos/upgrade-tests-rbd |
|
@nixpanic, I see ceph-csi/internal/rbd/nodeserver.go Lines 467 to 474 in e71a95f
|
No, for RBD that is not needed. RBD with a filesystem is never attached to more than one workernode at the same time, so initialization/unlocking of the filesystem will always be on a single node. It is different with ReadWriteMany RBD volumes in block-mode with encryption. The LUKS calls would benefit from a similar locking. |
iPraveenParihar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
NymanRobin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for fixing this 🎉
But I have a slight concern in the scenario when facrypt.Unlock fails. What do you think?
|
Is there a config to use the lock? We built an image from devel branch (https://github.com/ceph/ceph-csi/tree/devel) and tried it out. However pods failed to come up and we got the following error: Warning FailedMount 15s (x6 over 31s) kubelet MountVolume.MountDevice failed for volume "pvc-4642810a-7f00-4ae0-8583-ddbcee6ed314" : rpc error: code = Internal desc = Failed to lock volume ID 0001-0009-rook-ceph-00000000000000ad-d71d2949-db34-434e-8008-5497dad5c70b: rados: ret=-1, Operation not permitted |
|
I have tested this and encountered the same error as @z2000l. @Sunnatillo, could you please document any additional settings required or anything we might be overlooking? |
| } | ||
| defer ioctx.Destroy() | ||
|
|
||
| res, err := ioctx.LockExclusive(volOptions.VolID, lockName, lockCookie, lockDesc, lockDuration, &flags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When reading the code in go-ceph it seems to be that the object that we want to lock should exist and this won't create a new object? Are we sure there exist an object in rados corresponding to VolID, my initial assumption is no but I am not 100% confident, without further investigation
// LockExclusive takes an exclusive lock on an object.
func (ioctx *IOContext) LockExclusive(oid, name, cookie, desc string, duration time.Duration, flags *byte) (int, error) {
Edit: The object seems not to be needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuing my dig here, I added the following logging before the LockExclusive
_, err = ioctx.Stat(volOptions.VolID)
if err != nil {
log.ErrorLog(ctx, "The error when object is not here: %s", err)
log.ErrorLog(ctx, "Object %s does not exist", volOptions.VolID)
}
I get these logs:
E0724 09:03:43.715019 1 nodeserver.go:156] ID: 142 Req-ID: 0001-0009-rook-ceph-0000000000000001-489ccf33-13a5-40fc-8460-7dd866bc44de The error when object is not here: rados: ret=-2, No such file or directory
E0724 09:03:43.715051 1 nodeserver.go:157] ID: 142 Req-ID: 0001-0009-rook-ceph-0000000000000001-489ccf33-13a5-40fc-8460-7dd866bc44de Object csi-vol-489ccf33-13a5-40fc-8460-7dd866bc44de does not exist
E0724 09:03:43.715477 1 utils.go:240] ID: 142 Req-ID: 0001-0009-rook-ceph-0000000000000001-489ccf33-13a5-40fc-8460-7dd866bc44de GRPC error: rpc error: code = Internal desc = Failed to lock volume ID 0001-0009-rook-ceph-0000000000000001-489ccf33-13a5-40fc-8460-7dd866bc44de: rados: ret=-1, Operation not permitted
This makes me highly suspicious... I however wonder why in Ceph they do not return -2, in case a lock is tried to be acquired on a object that does not exist? Maybe they have not thought of this scenario and it could be reported?I guess next step in this journey to the core would be to create the object first and see if the lock can the be acquired 😄
Edit: Actual problem is comments below, the object seems to not be needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I know at least why the -1 comes, and it is correct that the permissions are not there!
After adding the following to the authentication I can acquire the lock:
ceph auth caps client.csi-cephfs-node \
mon 'allow *' \
mgr 'allow *' \
osd 'allow *' \
mds 'allow *'
But I think what is actually needed is the following:
ceph auth caps client.csi-cephfs-node rados 'allow rw'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay to conclude this it seems the lock can be created without the object existing so that is all good and my ramblings from above can be ignored!
Except some more permission for the client.csi-cephfs-node is needed to be able to create the lock
However it is extremely slow currently for me, bringing up five pods with 3 replicas and after 5 minutes I have only one replica of each. The error reported is the following:
Warning FailedMount 3m32s (x2 over 5m47s) kubelet Unable to attach or mount volumes: unmounted volumes=[my-volume], unattached volumes=[my-volume kube-api-access-8cmvd]: timed out waiting for the condition
Warning FailedMount 81s (x11 over 7m35s) kubelet MountVolume.MountDevice failed for volume "pvc-6afa784b-dc4f-44fb-bed9-8c6fe985c7e8" : rpc error: code = Internal desc = Lock is already held by another client and cookie pair for 0001-0009-rook-ceph-0000000000000001-c7f5d8f9-f9ca-4af8-8457-7ffd1bfb4437 volume
Warning FailedMount 74s kubelet Unable to attach or mount volumes: unmounted volumes=[my-volume], unattached volumes=[kube-api-access-8cmvd my-volume]: timed out waiting for the condition
This is the status of the pods after 8m46s:
$ kubectl get pods -n default
NAME READY STATUS RESTARTS AGE
deploy-pod-1-7f7789fd5f-92x4f 1/1 Running 0 8m46s
deploy-pod-1-7f7789fd5f-99qmp 1/1 Running 0 8m46s
deploy-pod-1-7f7789fd5f-dnr79 0/1 ContainerCreating 0 8m46s
deploy-pod-2-58c654874c-j2mvp 0/1 ContainerCreating 0 8m46s
deploy-pod-2-58c654874c-jh2wp 1/1 Running 0 8m46s
deploy-pod-2-58c654874c-svjws 1/1 Running 0 8m46s
deploy-pod-3-6674fc79f-rbtk2 1/1 Running 0 8m46s
deploy-pod-3-6674fc79f-w452p 1/1 Running 0 8m46s
deploy-pod-3-6674fc79f-w8wqb 0/1 ContainerCreating 0 8m46s
deploy-pod-4-6c958fb787-cqxlb 1/1 Running 0 8m46s
deploy-pod-4-6c958fb787-kvgbl 0/1 ContainerCreating 0 8m46s
deploy-pod-4-6c958fb787-q794c 1/1 Running 0 8m46s
deploy-pod-5-86b9bb47-5vmrq 0/1 ContainerCreating 0 8m45s
deploy-pod-5-86b9bb47-dd6v7 1/1 Running 0 8m45s
deploy-pod-5-86b9bb47-g5jvg 0/1 ContainerCreating 0 8m45s
After a whopping 13 minutes all the pods were finally running!
Should we actually poll the lock when trying to acquire, I think that might make it faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @NymanRobin for testing.
I also tested. it is slow.
It took me 8 minutes to 6 replicas to be ready.
We should definitely improve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can write up a ticket for the performance and for the permissions!
Describe what this PR does
Provide some context for the reviewer
Is there anything that requires special attention
Do you have any questions?
Is the change backward compatible?
Are there concerns around backward compatibility?
Provide any external context for the change, if any.
For example:
Related issues
Mention any github issues relevant to this PR. Adding below line
will help to auto close the issue once the PR is merged.
Closes: #4688
Fixes: #4654
Future concerns
List items that are not part of the PR and do not impact it's
functionality, but are work items that can be taken up subsequently.
Checklist:
guidelines in the developer
guide.
Request
notes
updated with breaking and/or notable changes for the next major release.
Show available bot commands
These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:
/retest ci/centos/<job-name>: retest the<job-name>after unrelatedfailure (please report the failure too!)