这是indexloc提供的服务,不要输入任何密码
Skip to content

Race condition in nvmetcli operations causes NVMe-oF volume mount failures #504

@mlipscombe

Description

@mlipscombe

When multiple NVMe-oF volumes are created in quick succession, democratic-csi encounters a race condition where multiple nvmetcli processes attempt to write to the same configuration file simultaneously. This results in volumes that appear to be created successfully but fail to mount with connection errors.

Symptoms
Target host kernel logs:

[219274.455448] nvmet: connect request for invalid subsystem nqn.2003-01.org.linux-nvme:pvc-767c946f-4190-4bb8-99c6-e9fe5c0e9bf4!

Pod mount error:

MountVolume.MountDevice failed for volume "pvc-767c946f-4190-4bb8-99c6-e9fe5c0e9bf4" : rpc error: code = Unknown desc = unable to attach any nvme devices

nvmetcli log.txt:

[ERROR] 2025-07-23 11:33:33 [Errno 2] No such file or directory: '/etc/nvmet/config.json.temp' -> '/etc/nvmet/config.json'

Cause
The issue occurs because democratic-csi runs multiple nvmetcli commands concurrently without proper synchronization. When nvmetcli restore operations happen simultaneously, one will win the race to write to /etc/nvmet/config.json, leading to loss of the other newly-added configuration.

Reproduction
Create multiple PVCs rapidly in Kubernetes.
Observe that some volumes fail to mount despite appearing to be provisioned successfully.
Check target host kernel logs for "invalid subsystem" errors.

Workaround
Create a wrapper script that serializes nvmetcli operations using file locking:

#!/bin/bash

# Use flock to ensure only one nvmetcli process runs at a time
exec 200>/var/lock/nvmetcli.lock
flock -x 200

# Pass all arguments directly to nvmetcli
exec nvmetcli "$@"

Make the script executable and configure democratic-csi to use it by setting:

nvmeof:
  shareStrategyNvmetCli:
    nvmetCli: "/path/to/nvmetcli-wrapper.sh"

This wrapper ensures that only one nvmetcli operation runs at a time, preventing the race condition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions