Set object lock for volumes for cephfs encryption #4697

Sunnatillo · 2024-06-27T13:17:13Z

Describe what this PR does

Provide some context for the reviewer

Is there anything that requires special attention

Do you have any questions?

Is the change backward compatible?

Are there concerns around backward compatibility?

Provide any external context for the change, if any.

For example:

Kubernetes links that explain why the change is required
CSI spec related changes/catch-up that necessitates this patch
golang related practices that necessitates this change

Related issues

Mention any github issues relevant to this PR. Adding below line
will help to auto close the issue once the PR is merged.

Closes: #4688
Fixes: #4654

Future concerns

List items that are not part of the PR and do not impact it's
functionality, but are work items that can be taken up subsequently.

Checklist:

Commit Message Formatting: Commit titles and messages follow
guidelines in the developer
guide.
Reviewed the developer guide on Submitting a Pull
Request
Pending release
notes
updated with breaking and/or notable changes for the next major release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

/retest ci/centos/<job-name>: retest the <job-name> after unrelated
failure (please report the failure too!)

nixpanic

just a nit to improve the readability of the code a little, looks quite good already

internal/cephfs/nodeserver.go

Madhu-1 · 2024-07-01T11:19:27Z

internal/cephfs/nodeserver.go

+	}
+	defer ioctx.Destroy()
+
+	res, err := ioctx.LockExclusive(volOptions.VolID, lockName, string(lockCookie), lockDesc, lockDuration, &flags)


Suggested change

res, err := ioctx.LockExclusive(volOptions.VolID, lockName, string(lockCookie), lockDesc, lockDuration, &flags)

res, err := ioctx.LockExclusive(volOptions.VolID, lockName, lockCookie, lockDesc, lockDuration, &flags)

lockCookie is already a string

internal/cephfs/nodeserver.go

Madhu-1 · 2024-07-01T11:22:03Z

internal/cephfs/nodeserver.go

+		log.DebugLog(ctx, "Releasing following lock  %s", lockName)

-	return nil
+		_, err = ioctx.Unlock(string(volID), lockName, lockCookie)


wont we get into the problem if we try to unlock if someone else is already holding the lock? in case of RWX and multiple nodes

Madhu-1 · 2024-07-01T11:22:23Z

internal/cephfs/nodeserver.go

+
+	if res == 0 {
+		log.DebugLog(ctx, "cephfs: unlocking fscrypt on volume %q path %s", volID, stagingTargetPath)
 		return fscrypt.Unlock(ctx, volOptions.Encryption, stagingTargetPath, string(volID))


dont we need to release lock once the fscrypt.Unlock is done?

Madhu-1 · 2024-07-01T11:22:44Z

internal/cephfs/nodeserver.go

+		}
+	}()
+
+	return fmt.Errorf("There is already one file system with name %s", string(volID))


Need to update the error message

Madhu-1 · 2024-07-03T09:05:06Z

@Sunnatillo can you please check the CI failures?

internal/cephfs/nodeserver.go

Makefile

internal/cephfs/nodeserver.go

Madhu-1

Thank you @Sunnatillo 🎉

nixpanic · 2024-07-11T12:46:00Z

@Mergifyio rebase

mergify · 2024-07-11T12:46:04Z

rebase

❌ Pull request can't be updated with latest base branch changes

Mergify needs the author permission to update the base branch of the pull request.
@Nordix needs to authorize modification on its head branch.

The way fscrypt client handles metadata and policy creation causing errors when multiple instances start simultaneously. This commit adds a lock to ensure the initial setup completes correctly, preventing race conditions and mismatches. Signed-off-by: Sunnatillo <sunnat.samadov@est.tech>

ceph-csi-bot · 2024-07-11T13:13:48Z

/test ci/centos/k8s-e2e-external-storage/1.30

ceph-csi-bot · 2024-07-11T13:13:48Z

/test ci/centos/k8s-e2e-external-storage/1.29

ceph-csi-bot · 2024-07-11T13:13:48Z

/test ci/centos/k8s-e2e-external-storage/1.27

ceph-csi-bot · 2024-07-11T13:13:48Z

/test ci/centos/mini-e2e-helm/k8s-1.30

ceph-csi-bot · 2024-07-11T13:13:48Z

/test ci/centos/mini-e2e-helm/k8s-1.29

ceph-csi-bot · 2024-07-11T13:13:49Z

/test ci/centos/mini-e2e-helm/k8s-1.27

ceph-csi-bot · 2024-07-11T13:13:49Z

/test ci/centos/mini-e2e/k8s-1.30

ceph-csi-bot · 2024-07-11T13:13:49Z

/test ci/centos/mini-e2e/k8s-1.29

ceph-csi-bot · 2024-07-11T13:13:49Z

/test ci/centos/mini-e2e/k8s-1.27

ceph-csi-bot · 2024-07-11T13:13:57Z

/test ci/centos/k8s-e2e-external-storage/1.28

ceph-csi-bot · 2024-07-11T13:13:58Z

/test ci/centos/mini-e2e-helm/k8s-1.28

ceph-csi-bot · 2024-07-11T13:13:59Z

/test ci/centos/mini-e2e/k8s-1.28

ceph-csi-bot · 2024-07-11T13:14:05Z

/test ci/centos/upgrade-tests-cephfs

ceph-csi-bot · 2024-07-11T13:14:05Z

/test ci/centos/upgrade-tests-rbd

iPraveenParihar · 2024-07-11T14:54:07Z

@nixpanic, I see fscrypt.Unlock call for RBD fscrypt. Do we need to do the same here as well?
do we need to

ceph-csi/internal/rbd/nodeserver.go

Lines 467 to 474 in e71a95f

    
           if volOptions.isFileEncrypted() { 
        
           	log.DebugLog(ctx, "rbd fscrypt: trying to unlock filesystem on %s image %s", stagingTargetPath, volOptions.VolID) 
        
           	err = fscrypt.Unlock(ctx, volOptions.fileEncryption, stagingTargetPath, volOptions.VolID) 
        
           	if err != nil { 
        
           		return transaction, fmt.Errorf("file system encryption unlock in %s image %s failed: %w", 
        
           			stagingTargetPath, volOptions.VolID, err) 
        
           	} 
        
           }

nixpanic · 2024-07-11T15:27:28Z

@nixpanic, I see fscrypt.Unlock call for RBD fscrypt. Do we need to do the same here as well?

No, for RBD that is not needed. RBD with a filesystem is never attached to more than one workernode at the same time, so initialization/unlocking of the filesystem will always be on a single node. It is different with ReadWriteMany RBD volumes in block-mode with encryption. The LUKS calls would benefit from a similar locking.

iPraveenParihar

Thanks!

NymanRobin

Thanks a lot for fixing this 🎉

But I have a slight concern in the scenario when facrypt.Unlock fails. What do you think?

internal/cephfs/nodeserver.go

z2000l · 2024-07-23T20:11:25Z

Is there a config to use the lock? We built an image from devel branch (https://github.com/ceph/ceph-csi/tree/devel) and tried it out. However pods failed to come up and we got the following error:

Warning FailedMount 15s (x6 over 31s) kubelet MountVolume.MountDevice failed for volume "pvc-4642810a-7f00-4ae0-8583-ddbcee6ed314" : rpc error: code = Internal desc = Failed to lock volume ID 0001-0009-rook-ceph-00000000000000ad-d71d2949-db34-434e-8008-5497dad5c70b: rados: ret=-1, Operation not permitted

NymanRobin · 2024-07-24T06:39:45Z

I have tested this and encountered the same error as @z2000l. @Sunnatillo, could you please document any additional settings required or anything we might be overlooking?

NymanRobin · 2024-07-24T08:29:43Z

internal/cephfs/nodeserver.go

+	}
+	defer ioctx.Destroy()
+
+	res, err := ioctx.LockExclusive(volOptions.VolID, lockName, lockCookie, lockDesc, lockDuration, &flags)


When reading the code in go-ceph it seems to be that the object that we want to lock should exist and this won't create a new object? Are we sure there exist an object in rados corresponding to VolID, my initial assumption is no but I am not 100% confident, without further investigation

// LockExclusive takes an exclusive lock on an object. func (ioctx *IOContext) LockExclusive(oid, name, cookie, desc string, duration time.Duration, flags *byte) (int, error) {

Edit: The object seems not to be needed

Continuing my dig here, I added the following logging before the LockExclusive

_, err = ioctx.Stat(volOptions.VolID) if err != nil { log.ErrorLog(ctx, "The error when object is not here: %s", err) log.ErrorLog(ctx, "Object %s does not exist", volOptions.VolID) }

I get these logs:

E0724 09:03:43.715019 1 nodeserver.go:156] ID: 142 Req-ID: 0001-0009-rook-ceph-0000000000000001-489ccf33-13a5-40fc-8460-7dd866bc44de The error when object is not here: rados: ret=-2, No such file or directory E0724 09:03:43.715051 1 nodeserver.go:157] ID: 142 Req-ID: 0001-0009-rook-ceph-0000000000000001-489ccf33-13a5-40fc-8460-7dd866bc44de Object csi-vol-489ccf33-13a5-40fc-8460-7dd866bc44de does not exist E0724 09:03:43.715477 1 utils.go:240] ID: 142 Req-ID: 0001-0009-rook-ceph-0000000000000001-489ccf33-13a5-40fc-8460-7dd866bc44de GRPC error: rpc error: code = Internal desc = Failed to lock volume ID 0001-0009-rook-ceph-0000000000000001-489ccf33-13a5-40fc-8460-7dd866bc44de: rados: ret=-1, Operation not permitted

This makes me highly suspicious... I however wonder why in Ceph they do not return -2, in case a lock is tried to be acquired on a object that does not exist? Maybe they have not thought of this scenario and it could be reported?I guess next step in this journey to the core would be to create the object first and see if the lock can the be acquired 😄

Edit: Actual problem is comments below, the object seems to not be needed

Now I know at least why the -1 comes, and it is correct that the permissions are not there!
After adding the following to the authentication I can acquire the lock:

ceph auth caps client.csi-cephfs-node \ mon 'allow *' \ mgr 'allow *' \ osd 'allow *' \ mds 'allow *'

But I think what is actually needed is the following:
ceph auth caps client.csi-cephfs-node rados 'allow rw'

Okay to conclude this it seems the lock can be created without the object existing so that is all good and my ramblings from above can be ignored!

Except some more permission for the client.csi-cephfs-node is needed to be able to create the lock

However it is extremely slow currently for me, bringing up five pods with 3 replicas and after 5 minutes I have only one replica of each. The error reported is the following:

Warning FailedMount 3m32s (x2 over 5m47s) kubelet Unable to attach or mount volumes: unmounted volumes=[my-volume], unattached volumes=[my-volume kube-api-access-8cmvd]: timed out waiting for the condition Warning FailedMount 81s (x11 over 7m35s) kubelet MountVolume.MountDevice failed for volume "pvc-6afa784b-dc4f-44fb-bed9-8c6fe985c7e8" : rpc error: code = Internal desc = Lock is already held by another client and cookie pair for 0001-0009-rook-ceph-0000000000000001-c7f5d8f9-f9ca-4af8-8457-7ffd1bfb4437 volume Warning FailedMount 74s kubelet Unable to attach or mount volumes: unmounted volumes=[my-volume], unattached volumes=[kube-api-access-8cmvd my-volume]: timed out waiting for the condition

This is the status of the pods after 8m46s:

$ kubectl get pods -n default NAME READY STATUS RESTARTS AGE deploy-pod-1-7f7789fd5f-92x4f 1/1 Running 0 8m46s deploy-pod-1-7f7789fd5f-99qmp 1/1 Running 0 8m46s deploy-pod-1-7f7789fd5f-dnr79 0/1 ContainerCreating 0 8m46s deploy-pod-2-58c654874c-j2mvp 0/1 ContainerCreating 0 8m46s deploy-pod-2-58c654874c-jh2wp 1/1 Running 0 8m46s deploy-pod-2-58c654874c-svjws 1/1 Running 0 8m46s deploy-pod-3-6674fc79f-rbtk2 1/1 Running 0 8m46s deploy-pod-3-6674fc79f-w452p 1/1 Running 0 8m46s deploy-pod-3-6674fc79f-w8wqb 0/1 ContainerCreating 0 8m46s deploy-pod-4-6c958fb787-cqxlb 1/1 Running 0 8m46s deploy-pod-4-6c958fb787-kvgbl 0/1 ContainerCreating 0 8m46s deploy-pod-4-6c958fb787-q794c 1/1 Running 0 8m46s deploy-pod-5-86b9bb47-5vmrq 0/1 ContainerCreating 0 8m45s deploy-pod-5-86b9bb47-dd6v7 1/1 Running 0 8m45s deploy-pod-5-86b9bb47-g5jvg 0/1 ContainerCreating 0 8m45s

After a whopping 13 minutes all the pods were finally running!

Should we actually poll the lock when trying to acquire, I think that might make it faster

Thank you @NymanRobin for testing.
I also tested. it is slow.
It took me 8 minutes to 6 replicas to be ready.
We should definitely improve it.

I can write up a ticket for the performance and for the permissions!

Sunnatillo force-pushed the set-mutex-lock/sunnat branch from 2b27638 to adb8ef9 Compare June 27, 2024 13:25

nixpanic reviewed Jun 27, 2024

View reviewed changes

internal/cephfs/nodeserver.go Outdated Show resolved Hide resolved

Sunnatillo force-pushed the set-mutex-lock/sunnat branch from adb8ef9 to d1aef8f Compare July 1, 2024 06:47

Sunnatillo changed the title ~~WIP:Set object lock for volumes for cephfs encryption~~ WIP: Set object lock for volumes for cephfs encryption Jul 1, 2024

Madhu-1 reviewed Jul 1, 2024

View reviewed changes

Sunnatillo force-pushed the set-mutex-lock/sunnat branch from d1aef8f to 7af9685 Compare July 2, 2024 14:36

Madhu-1 reviewed Jul 3, 2024

View reviewed changes

internal/cephfs/nodeserver.go Outdated Show resolved Hide resolved

Sunnatillo force-pushed the set-mutex-lock/sunnat branch from 7af9685 to 44838e4 Compare July 8, 2024 07:18

Sunnatillo changed the title ~~WIP: Set object lock for volumes for cephfs encryption~~ Set object lock for volumes for cephfs encryption Jul 8, 2024

Sunnatillo marked this pull request as ready for review July 8, 2024 07:40

nixpanic reviewed Jul 8, 2024

View reviewed changes

internal/cephfs/nodeserver.go Outdated Show resolved Hide resolved

internal/cephfs/nodeserver.go Outdated Show resolved Hide resolved

Sunnatillo force-pushed the set-mutex-lock/sunnat branch 2 times, most recently from cf7fa8d to 75eab26 Compare July 10, 2024 14:51

nixpanic added the component/cephfs Issues related to CephFS label Jul 11, 2024

nixpanic reviewed Jul 11, 2024

View reviewed changes

internal/cephfs/nodeserver.go Outdated Show resolved Hide resolved

Madhu-1 reviewed Jul 11, 2024

View reviewed changes

internal/cephfs/nodeserver.go Show resolved Hide resolved

Sunnatillo force-pushed the set-mutex-lock/sunnat branch from 75eab26 to 866f3e4 Compare July 11, 2024 09:08

nixpanic reviewed Jul 11, 2024

View reviewed changes

Makefile Outdated Show resolved Hide resolved

Sunnatillo force-pushed the set-mutex-lock/sunnat branch 2 times, most recently from 2202913 to 80b84a8 Compare July 11, 2024 11:08

Madhu-1 reviewed Jul 11, 2024

View reviewed changes

internal/cephfs/nodeserver.go Show resolved Hide resolved

Sunnatillo force-pushed the set-mutex-lock/sunnat branch from 80b84a8 to 31bd222 Compare July 11, 2024 12:28

nixpanic previously approved these changes Jul 11, 2024

View reviewed changes

nixpanic requested a review from Madhu-1 July 11, 2024 12:39

Madhu-1 previously approved these changes Jul 11, 2024

View reviewed changes

Sunnatillo force-pushed the set-mutex-lock/sunnat branch from 31bd222 to 502178b Compare July 11, 2024 12:51

nixpanic requested a review from Madhu-1 July 11, 2024 13:13

ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label Jul 11, 2024

nixpanic requested a review from iPraveenParihar July 11, 2024 15:52

iPraveenParihar approved these changes Jul 11, 2024

View reviewed changes

mergify bot merged commit e7762ac into ceph:devel Jul 11, 2024
3 checks passed

NymanRobin reviewed Jul 13, 2024

View reviewed changes

internal/cephfs/nodeserver.go Show resolved Hide resolved

Sunnatillo deleted the set-mutex-lock/sunnat branch July 15, 2024 10:28

NymanRobin mentioned this pull request Jul 16, 2024

Ensure ceph-lock is always released when staging encrypted cephfs volume #4715

Closed

Sunnatillo mentioned this pull request Jul 23, 2024

cephfs: Avoid hanging lock in volume mutex lock #4725

Merged

6 tasks

NymanRobin reviewed Jul 24, 2024

View reviewed changes

This was referenced Jul 24, 2024

Update the capabilities needed for client.csi-cephfs-node in case of encryption #4728

Closed

cephfs: correct the id for cephfs locks #4731

Merged

	res, err := ioctx.LockExclusive(volOptions.VolID, lockName, string(lockCookie), lockDesc, lockDuration, &flags)
	res, err := ioctx.LockExclusive(volOptions.VolID, lockName, lockCookie, lockDesc, lockDuration, &flags)

Set object lock for volumes for cephfs encryption #4697

Set object lock for volumes for cephfs encryption #4697

Uh oh!

Conversation

Sunnatillo commented Jun 27, 2024 • edited by nixpanic Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe what this PR does

Is there anything that requires special attention

Related issues

Future concerns

Uh oh!

nixpanic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Madhu-1 commented Jul 3, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Madhu-1 left a comment

Choose a reason for hiding this comment

Uh oh!

nixpanic commented Jul 11, 2024

Uh oh!

mergify bot commented Jul 11, 2024

❌ Pull request can't be updated with latest base branch changes

Uh oh!

ceph-csi-bot commented Jul 11, 2024

Uh oh!

ceph-csi-bot commented Jul 11, 2024

Uh oh!

ceph-csi-bot commented Jul 11, 2024

Uh oh!

ceph-csi-bot commented Jul 11, 2024

Uh oh!

ceph-csi-bot commented Jul 11, 2024

Uh oh!

ceph-csi-bot commented Jul 11, 2024

Uh oh!

ceph-csi-bot commented Jul 11, 2024

Uh oh!

ceph-csi-bot commented Jul 11, 2024

Uh oh!

ceph-csi-bot commented Jul 11, 2024

Uh oh!

ceph-csi-bot commented Jul 11, 2024

Uh oh!

ceph-csi-bot commented Jul 11, 2024

Uh oh!

ceph-csi-bot commented Jul 11, 2024

Uh oh!

ceph-csi-bot commented Jul 11, 2024

Uh oh!

ceph-csi-bot commented Jul 11, 2024

Uh oh!

iPraveenParihar commented Jul 11, 2024

Uh oh!

nixpanic commented Jul 11, 2024

Uh oh!

iPraveenParihar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NymanRobin left a comment

Choose a reason for hiding this comment

Sunnatillo commented Jun 27, 2024 •

edited by nixpanic

Loading

NymanRobin Jul 24, 2024 •

edited

Loading

NymanRobin Jul 24, 2024 •

edited

Loading