how to use mooncake backend?

ip route show
default via 10.16.6.254 dev eth10 proto static metric 100
10.16.6.0/24 dev eth10 proto kernel scope link src 10.16.6.13 metric 100
12.12.0.0/16 dev eth2 proto kernel scope link src 12.12.12.43 metric 101
13.13.2.0/24 dev eth3 proto kernel scope link src 13.13.2.13 metric 102
13.13.3.0/24 dev eth4 proto kernel scope link src 13.13.3.13 metric 103
13.13.4.0/24 dev eth5 proto kernel scope link src 13.13.4.13 metric 104
13.13.5.0/24 dev eth6 proto kernel scope link src 13.13.5.13 metric 105
13.13.6.0/24 dev eth7 proto kernel scope link src 13.13.6.13 metric 106
13.13.7.0/24 dev eth8 proto kernel scope link src 13.13.7.13 metric 107
13.13.8.0/24 dev eth9 proto kernel scope link src 13.13.8.13 metric 108
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1



i use lmcache nixl backend to deploy my pd disagg scenario, use mooncake to transfer kvcache , Prefiill node report the error:

 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_9, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:39.981940 567815 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_9, mark it inactive
E1105 08:51:41.005860 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_2, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:41.005877 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_2, mark it inactive
E1105 08:51:42.029819 567815 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_7, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:42.029832 567815 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_7, mark it inactive
E1105 08:51:43.053889 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_8, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:43.053903 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_8, mark it inactive
E1105 08:51:44.077847 567815 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_5, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:44.077865 567815 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_5, mark it inactive
E1105 08:51:45.101872 567815 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_3, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:45.101890 567815 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_3, mark it inactive
E1105 08:51:46.125849 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_4, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:46.125866 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_4, mark it inactive
E1105 08:51:47.149879 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_2, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:47.149895 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_2, mark it inactive
E1105 08:51:48.173839 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_3, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:48.173853 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_3, mark it inactive
E1105 08:51:49.198797 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_4, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:49.198813 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_4, mark it inactive
E1105 08:51:50.221827 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_5, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:50.221841 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_5, mark it inactive









it seems the topo is not right,  all nics use the same ip address?


this is the full log:
I1105 08:49:14.909394 567554 rdma_context.cpp:133] RDMA device: mlx5_3, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:02:0d
I1105 08:49:14.912278 567553 rdma_context.cpp:540] Find best gid index: 3 on mlx5_2/
I1105 08:49:14.913288 567553 rdma_context.cpp:133] RDMA device: mlx5_2, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0c:0c:0c:2b
I1105 08:49:14.924805 567554 rdma_context.cpp:540] Find best gid index: 3 on mlx5_4/
I1105 08:49:14.925806 567554 rdma_context.cpp:133] RDMA device: mlx5_4, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:03:0d
I1105 08:49:14.929670 567554 rdma_context.cpp:540] Find best gid index: 3 on mlx5_5/
I1105 08:49:14.929729 567553 rdma_context.cpp:540] Find best gid index: 3 on mlx5_3/
I1105 08:49:14.930569 567554 rdma_context.cpp:133] RDMA device: mlx5_5, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:04:0d
I1105 08:49:14.930648 567553 rdma_context.cpp:133] RDMA device: mlx5_3, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:02:0d
I1105 08:49:14.934268 567554 rdma_context.cpp:540] Find best gid index: 3 on mlx5_6/
I1105 08:49:14.935166 567554 rdma_context.cpp:133] RDMA device: mlx5_6, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:05:0d
I1105 08:49:14.963706 567553 rdma_context.cpp:540] Find best gid index: 3 on mlx5_4/
I1105 08:49:14.964675 567553 rdma_context.cpp:133] RDMA device: mlx5_4, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:03:0d
I1105 08:49:14.968204 567553 rdma_context.cpp:540] Find best gid index: 3 on mlx5_5/
I1105 08:49:14.969019 567553 rdma_context.cpp:133] RDMA device: mlx5_5, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:04:0d
I1105 08:49:14.978929 567554 rdma_context.cpp:540] Find best gid index: 3 on mlx5_7/
I1105 08:49:14.980372 567554 rdma_context.cpp:133] RDMA device: mlx5_7, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:06:0d
I1105 08:49:14.984109 567554 rdma_context.cpp:540] Find best gid index: 3 on mlx5_8/
I1105 08:49:14.984907 567553 rdma_context.cpp:540] Find best gid index: 3 on mlx5_6/
I1105 08:49:14.985038 567554 rdma_context.cpp:133] RDMA device: mlx5_8, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:07:0d
I1105 08:49:14.985738 567553 rdma_context.cpp:133] RDMA device: mlx5_6, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:05:0d
I1105 08:49:14.989444 567553 rdma_context.cpp:540] Find best gid index: 3 on mlx5_7/
I1105 08:49:14.990455 567553 rdma_context.cpp:133] RDMA device: mlx5_7, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:06:0d
I1105 08:49:14.994062 567553 rdma_context.cpp:540] Find best gid index: 3 on mlx5_8/
I1105 08:49:14.995044 567553 rdma_context.cpp:133] RDMA device: mlx5_8, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:07:0d
I1105 08:49:14.999848 567554 rdma_context.cpp:540] Find best gid index: 3 on mlx5_9/
I1105 08:49:15.000885 567554 rdma_context.cpp:133] RDMA device: mlx5_9, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:08:0d
I1105 08:49:15.000991  567554 nixl_agent.cpp:392] Created backend: Mooncake
(VllmWorker rank=1 pid=567554) 2025-11-05 08:49:15 NIXL INFO    _api.py:361 Backend Mooncake was instantiated
(VllmWorker rank=1 pid=567554) 2025-11-05 08:49:15 NIXL INFO    _api.py:251 Initialized NIXL agent: aab220e3-e6ef-4566-bd31-9de87a45cd04
(VllmWorker rank=1 pid=567554) [2025-11-05 08:49:15,005] LMCache INFO: Initializing LRUCachePolicy (lru.py:20:lmcache.v1.storage_backend.cache_policy.lru)
(VllmWorker rank=1 pid=567554) [2025-11-05 08:49:15,005] LMCache INFO: NUMA mapping None (local_cpu_backend.py:279:lmcache.v1.storage_backend.local_cpu_backend)
I1105 08:49:15.009850 567553 rdma_context.cpp:540] Find best gid index: 3 on mlx5_9/
I1105 08:49:15.010886 567553 rdma_context.cpp:133] RDMA device: mlx5_9, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:08:0d
I1105 08:49:15.010982  567553 nixl_agent.cpp:392] Created backend: Mooncake
(VllmWorker rank=0 pid=567553) 2025-11-05 08:49:15 NIXL INFO    _api.py:361 Backend Mooncake was instantiated
(VllmWorker rank=0 pid=567553) 2025-11-05 08:49:15 NIXL INFO    _api.py:251 Initialized NIXL agent: 1fae4e72-af57-4911-9dd8-30282f6e2490
(VllmWorker rank=0 pid=567553) [2025-11-05 08:49:15,014] LMCache INFO: Initializing LRUCachePolicy (lru.py:20:lmcache.v1.storage_backend.cache_policy.lru)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:49:15,015] LMCache INFO: NUMA mapping None (local_cpu_backend.py:279:lmcache.v1.storage_backend.local_cpu_backend)
(VllmWorker rank=1 pid=567554) [2025-11-05 08:49:29,689] LMCache INFO: Initializing usage context. (usage_context.py:357:lmcache.usage_context)
(VllmWorker rank=1 pid=567554) [2025-11-05 08:49:31,814] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=2, port=7001, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server)
(VllmWorker rank=1 pid=567554) [2025-11-05 08:49:31,814] LMCache INFO: LMCache initialized for role KVConnectorRole.WORKER with version 0.3.8.dev27-gfb0d8c08a, vllm version 0.9.2, lmcache cache_engine metadata: LMCacheEngineMetadata(model_name='/models/deepseek-v2/DeepSeek-V2-Lite', world_size=2, worker_id=1, fmt='vllm', kv_dtype=torch.bfloat16, kv_shape=(27, 1, 256, 1, 576), use_mla=True) (vllm_v1_adapter.py:694:lmcache.integration.vllm.vllm_v1_adapter)
(VllmWorker rank=1 pid=567554) INFO 11-05 08:49:31 [gpu_model_runner.py:1785] Starting to load model /models/deepseek-v2/DeepSeek-V2-Lite...
(VllmWorker rank=1 pid=567554) INFO 11-05 08:49:32 [gpu_model_runner.py:1790] Loading model from scratch...
(VllmWorker rank=1 pid=567554) INFO 11-05 08:49:32 [rocm.py:215] Using FlashMLA backend on V1 engine.
(VllmWorker rank=0 pid=567553) [2025-11-05 08:49:36,439] LMCache INFO: Initializing usage context. (usage_context.py:357:lmcache.usage_context)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:49:38,592] LMCache INFO: lmcache lookup server start on /tmp/engine_53bee696-4f9d-4fe1-9bbf-285f550c0ea4_service_lookup_lmcache_rpc_port_producer10 (lmcache_lookup_client.py:263:lmcache.v1.lookup_client.lmcache_lookup_client)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:49:38,594] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=1, port=7000, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:49:38,594] LMCache INFO: LMCache initialized for role KVConnectorRole.WORKER with version 0.3.8.dev27-gfb0d8c08a, vllm version 0.9.2, lmcache cache_engine metadata: LMCacheEngineMetadata(model_name='/models/deepseek-v2/DeepSeek-V2-Lite', world_size=2, worker_id=0, fmt='vllm', kv_dtype=torch.bfloat16, kv_shape=(27, 1, 256, 1, 576), use_mla=True) (vllm_v1_adapter.py:694:lmcache.integration.vllm.vllm_v1_adapter)
(VllmWorker rank=0 pid=567553) INFO 11-05 08:49:38 [gpu_model_runner.py:1785] Starting to load model /models/deepseek-v2/DeepSeek-V2-Lite...
(VllmWorker rank=0 pid=567553) INFO 11-05 08:49:39 [gpu_model_runner.py:1790] Loading model from scratch...
(VllmWorker rank=0 pid=567553) INFO 11-05 08:49:39 [rocm.py:215] Using FlashMLA backend on V1 engine.
Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:04<00:13,  4.48s/it]
Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:08<00:08,  4.18s/it]
(VllmWorker rank=1 pid=567554) INFO 11-05 08:49:49 [default_loader.py:272] Loading weights took 16.41 seconds
(VllmWorker rank=1 pid=567554) INFO 11-05 08:49:49 [gpu_model_runner.py:1816] Model loading took 14.7097 GiB and 16.912219 seconds
Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:11<00:03,  3.63s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:15<00:00,  3.86s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:15<00:00,  3.91s/it]
(VllmWorker rank=0 pid=567553)
(VllmWorker rank=0 pid=567553) INFO 11-05 08:49:55 [default_loader.py:272] Loading weights took 15.83 seconds
(VllmWorker rank=0 pid=567553) INFO 11-05 08:49:56 [gpu_model_runner.py:1816] Model loading took 14.7097 GiB and 16.015288 seconds
(VllmWorker rank=0 pid=567553) INFO 11-05 08:50:06 [fused_moe.py:913] Using configuration from /data/ysj/vllm/vllm/model_executor/layers/fused_moe/configs/E=64,N=704,device_name=BW200_nn.json for MoE layer.
(VllmWorker rank=1 pid=567554) INFO 11-05 08:50:06 [fused_moe.py:913] Using configuration from /data/ysj/vllm/vllm/model_executor/layers/fused_moe/configs/E=64,N=704,device_name=BW200_nn.json for MoE layer.
I1105 08:50:06.842206 567553 ProcessGroupNCCL.cpp:2279] [PG ID 2 PG GUID 3 Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.05677 ms
I1105 08:50:06.842324 567554 ProcessGroupNCCL.cpp:2279] [PG ID 2 PG GUID 3 Rank 1] ProcessGroupNCCL broadcast unique ID through store took 0.42184 ms
I1105 08:50:06.903478 567553 ProcessGroupNCCL.cpp:2318] [PG ID 2 PG GUID 3 Rank 0] ProcessGroupNCCL created ncclComm_ 0x557d5995cd50 on CUDA device:
I1105 08:50:06.903497 567553 ProcessGroupNCCL.cpp:2323] [PG ID 2 PG GUID 3 Rank 0] NCCL_DEBUG: N/A
I1105 08:50:06.903535 567554 ProcessGroupNCCL.cpp:2318] [PG ID 2 PG GUID 3 Rank 1] ProcessGroupNCCL created ncclComm_ 0x55a1b9327f20 on CUDA device:
I1105 08:50:06.903553 567554 ProcessGroupNCCL.cpp:2323] [PG ID 2 PG GUID 3 Rank 1] NCCL_DEBUG: N/A
(VllmWorker rank=0 pid=567553) INFO 11-05 08:50:07 [gpu_worker.py:239] Available KV cache memory: 40.69 GiB
(VllmWorker rank=1 pid=567554) INFO 11-05 08:50:08 [gpu_worker.py:239] Available KV cache memory: 40.69 GiB
INFO 11-05 08:50:08 [kv_cache_utils.py:716] GPU KV cache size: 1,404,480 tokens
INFO 11-05 08:50:08 [kv_cache_utils.py:720] Maximum concurrency for 32,768 tokens per request: 42.86x
INFO 11-05 08:50:08 [kv_cache_utils.py:716] GPU KV cache size: 1,404,480 tokens
INFO 11-05 08:50:08 [kv_cache_utils.py:720] Maximum concurrency for 32,768 tokens per request: 42.86x
INFO 11-05 08:50:09 [core.py:174] init engine (profile, create kv cache, warmup model) took 13.27 seconds
INFO 11-05 08:50:10 [factory.py:74] Creating v1 connector with name: LMCacheConnectorV1 and engine_id: 53bee696-4f9d-4fe1-9bbf-285f550c0ea4
WARNING 11-05 08:50:10 [base.py:71] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
[2025-11-05 08:50:10,151] LMCache INFO: Loading LMCache config file /data/ysj/configs/lmcache-prefiller-config.yaml (utils.py:72:lmcache.integration.vllm.utils)
[2025-11-05 08:50:10,153] LMCache INFO: lmcache lookup client connect to tp_rank 0 with socket path /tmp/engine_53bee696-4f9d-4fe1-9bbf-285f550c0ea4_service_lookup_lmcache_rpc_port_producer10 (lmcache_lookup_client.py:77:lmcache.v1.lookup_client.lmcache_lookup_client)
[2025-11-05 08:50:10,203] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=0, port=6999, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server)
[2025-11-05 08:50:10,204] LMCache INFO: LMCache initialized for role KVConnectorRole.SCHEDULER with version 0.3.8.dev27-gfb0d8c08a, vllm version 0.9.2, lmcache cache_engine metadata: None (vllm_v1_adapter.py:694:lmcache.integration.vllm.vllm_v1_adapter)
INFO 11-05 08:50:10 [loggers.py:137] Engine 000: vllm cache_config_info with initialization after num_gpu_blocks is: 21945
WARNING 11-05 08:50:10 [config.py:1408] Default sampling parameters have been overridden by the model's Hugging Face generation config recommended from the model creator. If this is not intended, please relaunch vLLM instance with `--generation-config vllm`.
INFO 11-05 08:50:10 [serving_chat.py:125] Using default chat sampling params from model: {'temperature': 0.3, 'top_p': 0.95}
INFO 11-05 08:50:10 [serving_completion.py:72] Using default completion sampling params from model: {'temperature': 0.3, 'top_p': 0.95}
INFO 11-05 08:50:10 [api_server.py:1457] Starting vLLM API server 0 on http://0.0.0.0:7100
INFO 11-05 08:50:10 [launcher.py:29] Available routes are:
INFO 11-05 08:50:10 [launcher.py:37] Route: /openapi.json, Methods: GET, HEAD
INFO 11-05 08:50:10 [launcher.py:37] Route: /docs, Methods: GET, HEAD
INFO 11-05 08:50:10 [launcher.py:37] Route: /docs/oauth2-redirect, Methods: GET, HEAD
INFO 11-05 08:50:10 [launcher.py:37] Route: /redoc, Methods: GET, HEAD
INFO 11-05 08:50:10 [launcher.py:37] Route: /health, Methods: GET
INFO 11-05 08:50:10 [launcher.py:37] Route: /load, Methods: GET
INFO 11-05 08:50:10 [launcher.py:37] Route: /ping, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /ping, Methods: GET
INFO 11-05 08:50:10 [launcher.py:37] Route: /tokenize, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /detokenize, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /v1/models, Methods: GET
INFO 11-05 08:50:10 [launcher.py:37] Route: /version, Methods: GET
INFO 11-05 08:50:10 [launcher.py:37] Route: /v1/chat/completions, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /v1/completions, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /v1/embeddings, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /pooling, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /classify, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /score, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /v1/score, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /v1/audio/transcriptions, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /v1/audio/translations, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /rerank, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /v1/rerank, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /v2/rerank, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /invocations, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /metrics, Methods: GET
INFO:     Started server process [566348]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
[2025-11-05 08:51:38,785] LMCache INFO: Reqid: cmpl-0e43e724dee34be7b05a64cce801ba47-0, Total tokens 49, LMCache hit tokens: 0, need to load: 0 (vllm_v1_adapter.py:1190:lmcache.integration.vllm.vllm_v1_adapter)
[2025-11-05 08:51:38,787] LMCache INFO: is_last_prefill is true (vllm_v1_adapter.py:290:lmcache.integration.vllm.vllm_v1_adapter)
(VllmWorker rank=1 pid=567554) [2025-11-05 08:51:38,792] LMCache INFO: Post-initializing LMCacheEngine (cache_engine.py:170:lmcache.v1.cache_engine)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:51:38,792] LMCache INFO: Post-initializing LMCacheEngine (cache_engine.py:170:lmcache.v1.cache_engine)
(VllmWorker rank=1 pid=567554) [2025-11-05 08:51:38,924] LMCache INFO: Storing KV cache for 49 out of 49 tokens (skip_leading_tokens=0) for request cmpl-0e43e724dee34be7b05a64cce801ba47-0 (vllm_v1_adapter.py:1078:lmcache.integration.vllm.vllm_v1_adapter)
(VllmWorker rank=1 pid=567554) [2025-11-05 08:51:38,924] LMCache INFO: rank=1 ignore store (cache_engine.py:204:lmcache.v1.cache_engine)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:51:38,924] LMCache INFO: Storing KV cache for 49 out of 49 tokens (skip_leading_tokens=0) for request cmpl-0e43e724dee34be7b05a64cce801ba47-0 (vllm_v1_adapter.py:1078:lmcache.integration.vllm.vllm_v1_adapter)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:51:38,943] LMCache INFO: batch_submit_put_task enter. (pd_backend.py:375:lmcache.v1.storage_backend.pd_backend)
I1105 08:51:38.947638  567553 nixl_agent.cpp:1612] Loading remote metadata for agent: 840e69f3-0924-43b7-a2e6-4f778f85717b
I1105 08:51:38.951772  567553 nixl_agent.cpp:810] reqH descList size down to 1
E1105 08:51:39.981916 567815 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_9, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:39.981940 567815 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_9, mark it inactive
E1105 08:51:41.005860 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_2, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:41.005877 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_2, mark it inactive
E1105 08:51:42.029819 567815 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_7, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:42.029832 567815 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_7, mark it inactive
E1105 08:51:43.053889 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_8, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:43.053903 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_8, mark it inactive
E1105 08:51:44.077847 567815 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_5, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:44.077865 567815 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_5, mark it inactive
E1105 08:51:45.101872 567815 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_3, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:45.101890 567815 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_3, mark it inactive
E1105 08:51:46.125849 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_4, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:46.125866 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_4, mark it inactive
E1105 08:51:47.149879 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_2, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:47.149895 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_2, mark it inactive
E1105 08:51:48.173839 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_3, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:48.173853 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_3, mark it inactive
E1105 08:51:49.198797 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_4, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:49.198813 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_4, mark it inactive
E1105 08:51:50.221827 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_5, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:50.221841 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_5, mark it inactive
(VllmWorker rank=0 pid=567553) [2025-11-05 08:51:50,223] LMCache INFO: proxy side channel send proxy_url: localhost:7500 req_id: 5 (pd_backend.py:429:lmcache.v1.storage_backend.pd_backend)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:51:50,224] LMCache INFO: Allocating memory in local cpu backend with busy loop: False (local_cpu_backend.py:323:lmcache.v1.storage_backend.local_cpu_backend)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:51:50,251] LMCache INFO: batched_submit_put_task in local cpu backend (local_cpu_backend.py:159:lmcache.v1.storage_backend.local_cpu_backend)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:51:50,251] LMCache INFO: Stored 49 out of total 49 tokens. size: 0.0014 gb, cost 11327.2511 ms, throughput: 0.0001 GB/s; offload_time: 19.0638 ms, put_time: 11308.1872 ms (cache_engine.py:289:lmcache.v1.cache_engine)
INFO:     127.0.0.1:58952 - "POST /v1/completions HTTP/1.1" 200 OK


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

how to use mooncake backend? #979

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

how to use mooncake backend? #979

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions