Description
Description
Standalone containers attached to attachable overlay networks cannot communicate across swarm nodes, despite VXLAN tunneling working correctly at the network layer. This contradicts the official documentation which states that attachable overlay networks should support standalone container communication.
Reproduce
Environment Setup
- Two Debian hosts in Docker Swarm (hp-server-1 as manager, hp-server-2 as worker)
- All required ports open: 2377/tcp, 7946/tcp+udp, 4789/udp
- Docker Swarm services work perfectly on overlay networks
Reproduction Steps
-
Create attachable overlay network:
# On manager node docker network create -d overlay --attachable test-standalone-overlay
-
Create standalone containers on both nodes:
# On hp-server-1: docker run -d --name standalone1 --network test-standalone-overlay alpine sleep 3600 # On hp-server-2: docker run -d --name standalone2 --network test-standalone-overlay alpine sleep 3600
-
Attempt cross-node communication:
# From either container to the other docker exec standalone1 ping standalone2 # Hangs indefinitely docker exec standalone2 ping standalone1 # Hangs indefinitely
Expected behavior
According to [official Docker documentation](https://docs.docker.com/engine/network/tutorials/overlay/#use-an-overlay-network-for-standalone-containers), standalone containers should be able to communicate across nodes when using attachable overlay networks.
Actual Behavior
- Ping hangs indefinitely with 100% packet loss
- ARP resolution works - containers can see each other's MAC addresses
- DNS resolution works - container names resolve to correct overlay network IPs
- Gateway connectivity works - containers can ping the overlay gateway (10.0.1.1)
docker version
Client: Docker Engine - Community
Version: 28.1.1
API version: 1.49
Go version: go1.23.8
Git commit: 4eba377
Built: Fri Apr 18 09:52:57 2025
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 28.1.1
API version: 1.49 (minimum version 1.24)
Go version: go1.23.8
Git commit: 01f442b
Built: Fri Apr 18 09:52:57 2025
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.7.27
GitCommit: 05044ec0a9a75232cad458027ca83437aae3f4da
runc:
Version: 1.2.5
GitCommit: v1.2.5-0-g59923ef
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client: Docker Engine - Community
Version: 28.1.1
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.23.0
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.35.1
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 17
Server Version: 28.1.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: active
NodeID: c2ux8ri1xd4witjv78uzeyphv
Is Manager: true
ClusterID: t9edk1tc04u7nrrndjqsmq5sh
Managers: 1
Nodes: 2
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Data Path Port: 4789
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 192.168.1.6
Manager Addresses:
192.168.1.6:2377
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 05044ec0a9a75232cad458027ca83437aae3f4da
runc version: v1.2.5-0-g59923ef
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.1.0-34-amd64
Operating System: Debian GNU/Linux 12 (bookworm)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 30.29GiB
Name: hp-server-1
ID: 2fa4566a-662e-44fa-8a8a-00372f9d6b8b
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
::1/128
127.0.0.0/8
Live Restore Enabled: false
Additional Info
Root Cause Analysis
Through detailed packet capture analysis, I've identified the precise failure point:
1. VXLAN Tunnel Works Perfectly
Host-level tcpdump shows bidirectional VXLAN traffic on port 4789:
# VXLAN traffic flows correctly in both directions
hp-server-2.44558 > hp-server-1.4789: VXLAN, flags [I] (0x08), vni 4098
IP 10.0.1.4 > 10.0.1.2: ICMP echo request, id 13, seq 364, length 64
hp-server-1.45607 > hp-server-2.4789: VXLAN, flags [I] (0x08), vni 4098
IP 10.0.1.2 > 10.0.1.4: ICMP echo reply, id 13, seq 364, length 64
2. Container-Level Routing Failure
Container-level tcpdump shows only outgoing packets:
# Container tcpdump shows requests but NO replies
docker exec standalone2 tcpdump -i eth0 host 10.0.1.2
# Output shows only: "ICMP echo request" packets
# Missing: "ICMP echo reply" packets
3. Network Configuration is Correct
Both containers are properly attached to the overlay network:
{
"Containers": {
"standalone1": {
"IPv4Address": "10.0.1.2/24",
"MacAddress": "02:42:0a:00:01:02"
},
"standalone2": {
"IPv4Address": "10.0.1.4/24",
"MacAddress": "02:42:0a:00:01:04"
}
}
}
The Bug
ICMP reply packets reach the target host via VXLAN but are not routed into the container's network namespace. This appears to be a routing issue in Docker's overlay network implementation specific to standalone containers.
Comparison: Swarm Services vs Standalone Containers
Swarm services work perfectly:
docker service create --name test1 --network test-overlay --replicas 1 alpine sleep 3600
docker service create --name test2 --network test-overlay --replicas 1 alpine sleep 3600
# Cross-node ping works: 0% packet loss ✅
Standalone containers fail:
docker run -d --name standalone1 --network test-overlay alpine sleep 3600
docker run -d --name standalone2 --network test-overlay alpine sleep 3600
# Cross-node ping fails: 100% packet loss ❌
Impact
This bug makes the --attachable
flag essentially non-functional for cross-node standalone container communication, contradicting official documentation and user expectations.
Additional Notes
- This issue appears to be a long-standing bug based on similar reports in issues Can't ping containers across swarm overlay network #30972, Swarm overlay network doesn't work when advertised over IPv6 #43643, and Docker overlay network allows pings but not other network traffic through #34641
- The problem is specifically with Docker's internal routing from host network namespace to container network namespace
- VXLAN infrastructure works correctly, indicating this is a Docker software issue, not a network configuration problem
Update: HTTP Protocol Also Affected - Complete Communication Failure
Tested HTTP connectivity and confirmed all protocols are affected:
Bidirectional HTTP Test Results:
# hp-server-2 → hp-server-1
docker exec http-server2 wget -O- --timeout=10 http://http-server1
# Result: Connecting to http-server1 (10.0.1.6:80)
# wget: download timed out
# hp-server-1 → hp-server-2
docker exec http-server1 wget -O- --timeout=10 http-server2
# Result: Connecting to http-server2 (10.0.1.7:80)
# wget: download timed out
Update: Bug Confirmed Across Multiple Major Docker Versions
Tested with Docker 20.10.24 (Debian repository version) and confirmed the issue persists:
Environment:
- hp-server-1: Docker 20.10.24 (manager)
- hp-server-2: Docker 20.10.24 (worker)
- Fresh swarm cluster with identical versions
Test Results:
# Both directions still fail with 100% packet loss
docker exec standalone1-v20 ping -c 3 standalone2-v20
# PING standalone2-v20 (10.0.1.4): 3 packets transmitted, 0 packets received, 100% packet loss
docker exec standalone2-v20 ping -c 3 standalone1-v20
# PING standalone1-v20 (10.0.1.2): 3 packets transmitted, 0 packets received, 100% packet loss
Metadata
Metadata
Assignees
Type
Projects
Status