+
Skip to content

Docker Overlay Network Bug: Standalone Containers Cannot Communicate Across Nodes Despite Working VXLAN #50282

Open
@SkoricIT

Description

@SkoricIT

Description

Standalone containers attached to attachable overlay networks cannot communicate across swarm nodes, despite VXLAN tunneling working correctly at the network layer. This contradicts the official documentation which states that attachable overlay networks should support standalone container communication.

Reproduce

Environment Setup

  • Two Debian hosts in Docker Swarm (hp-server-1 as manager, hp-server-2 as worker)
  • All required ports open: 2377/tcp, 7946/tcp+udp, 4789/udp
  • Docker Swarm services work perfectly on overlay networks

Reproduction Steps

  1. Create attachable overlay network:

    # On manager node
    docker network create -d overlay --attachable test-standalone-overlay
  2. Create standalone containers on both nodes:

    # On hp-server-1:
    docker run -d --name standalone1 --network test-standalone-overlay alpine sleep 3600
    
    # On hp-server-2:
    docker run -d --name standalone2 --network test-standalone-overlay alpine sleep 3600
  3. Attempt cross-node communication:

    # From either container to the other
    docker exec standalone1 ping standalone2  # Hangs indefinitely
    docker exec standalone2 ping standalone1  # Hangs indefinitely

Expected behavior

According to [official Docker documentation](https://docs.docker.com/engine/network/tutorials/overlay/#use-an-overlay-network-for-standalone-containers), standalone containers should be able to communicate across nodes when using attachable overlay networks.

Actual Behavior

  • Ping hangs indefinitely with 100% packet loss
  • ARP resolution works - containers can see each other's MAC addresses
  • DNS resolution works - container names resolve to correct overlay network IPs
  • Gateway connectivity works - containers can ping the overlay gateway (10.0.1.1)

docker version

Client: Docker Engine - Community
 Version:           28.1.1
 API version:       1.49
 Go version:        go1.23.8
 Git commit:        4eba377
 Built:             Fri Apr 18 09:52:57 2025
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          28.1.1
  API version:      1.49 (minimum version 1.24)
  Go version:       go1.23.8
  Git commit:       01f442b
  Built:            Fri Apr 18 09:52:57 2025
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.27
  GitCommit:        05044ec0a9a75232cad458027ca83437aae3f4da
 runc:
  Version:          1.2.5
  GitCommit:        v1.2.5-0-g59923ef
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
 Version:    28.1.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.23.0
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.35.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 17
 Server Version: 28.1.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: active
  NodeID: c2ux8ri1xd4witjv78uzeyphv
  Is Manager: true
  ClusterID: t9edk1tc04u7nrrndjqsmq5sh
  Managers: 1
  Nodes: 2
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 192.168.1.6
  Manager Addresses:
   192.168.1.6:2377
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 05044ec0a9a75232cad458027ca83437aae3f4da
 runc version: v1.2.5-0-g59923ef
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.1.0-34-amd64
 Operating System: Debian GNU/Linux 12 (bookworm)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 30.29GiB
 Name: hp-server-1
 ID: 2fa4566a-662e-44fa-8a8a-00372f9d6b8b
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  ::1/128
  127.0.0.0/8
 Live Restore Enabled: false

Additional Info

Root Cause Analysis

Through detailed packet capture analysis, I've identified the precise failure point:

1. VXLAN Tunnel Works Perfectly

Host-level tcpdump shows bidirectional VXLAN traffic on port 4789:

# VXLAN traffic flows correctly in both directions
hp-server-2.44558 > hp-server-1.4789: VXLAN, flags [I] (0x08), vni 4098
IP 10.0.1.4 > 10.0.1.2: ICMP echo request, id 13, seq 364, length 64

hp-server-1.45607 > hp-server-2.4789: VXLAN, flags [I] (0x08), vni 4098  
IP 10.0.1.2 > 10.0.1.4: ICMP echo reply, id 13, seq 364, length 64

2. Container-Level Routing Failure

Container-level tcpdump shows only outgoing packets:

# Container tcpdump shows requests but NO replies
docker exec standalone2 tcpdump -i eth0 host 10.0.1.2
# Output shows only: "ICMP echo request" packets
# Missing: "ICMP echo reply" packets

3. Network Configuration is Correct

Both containers are properly attached to the overlay network:

{
  "Containers": {
    "standalone1": {
      "IPv4Address": "10.0.1.2/24",
      "MacAddress": "02:42:0a:00:01:02"
    },
    "standalone2": {
      "IPv4Address": "10.0.1.4/24", 
      "MacAddress": "02:42:0a:00:01:04"
    }
  }
}

The Bug

ICMP reply packets reach the target host via VXLAN but are not routed into the container's network namespace. This appears to be a routing issue in Docker's overlay network implementation specific to standalone containers.

Comparison: Swarm Services vs Standalone Containers

Swarm services work perfectly:

docker service create --name test1 --network test-overlay --replicas 1 alpine sleep 3600
docker service create --name test2 --network test-overlay --replicas 1 alpine sleep 3600
# Cross-node ping works: 0% packet loss ✅

Standalone containers fail:

docker run -d --name standalone1 --network test-overlay alpine sleep 3600
docker run -d --name standalone2 --network test-overlay alpine sleep 3600  
# Cross-node ping fails: 100% packet loss ❌

Impact

This bug makes the --attachable flag essentially non-functional for cross-node standalone container communication, contradicting official documentation and user expectations.

Additional Notes

Update: HTTP Protocol Also Affected - Complete Communication Failure

Tested HTTP connectivity and confirmed all protocols are affected:

Bidirectional HTTP Test Results:

# hp-server-2 → hp-server-1
docker exec http-server2 wget -O- --timeout=10 http://http-server1
# Result: Connecting to http-server1 (10.0.1.6:80)
#         wget: download timed out

# hp-server-1 → hp-server-2  
docker exec http-server1 wget -O- --timeout=10 http-server2
# Result: Connecting to http-server2 (10.0.1.7:80)
#         wget: download timed out

Update: Bug Confirmed Across Multiple Major Docker Versions

Tested with Docker 20.10.24 (Debian repository version) and confirmed the issue persists:

Environment:

  • hp-server-1: Docker 20.10.24 (manager)
  • hp-server-2: Docker 20.10.24 (worker)
  • Fresh swarm cluster with identical versions

Test Results:

# Both directions still fail with 100% packet loss
docker exec standalone1-v20 ping -c 3 standalone2-v20
# PING standalone2-v20 (10.0.1.4): 3 packets transmitted, 0 packets received, 100% packet loss

docker exec standalone2-v20 ping -c 3 standalone1-v20  
# PING standalone1-v20 (10.0.1.2): 3 packets transmitted, 0 packets received, 100% packet loss

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载