这是indexloc提供的服务,不要输入任何密码
Skip to content

how to use mooncake backend? #979

@yangshanjun

Description

@yangshanjun

ip route show
default via 10.16.6.254 dev eth10 proto static metric 100
10.16.6.0/24 dev eth10 proto kernel scope link src 10.16.6.13 metric 100
12.12.0.0/16 dev eth2 proto kernel scope link src 12.12.12.43 metric 101
13.13.2.0/24 dev eth3 proto kernel scope link src 13.13.2.13 metric 102
13.13.3.0/24 dev eth4 proto kernel scope link src 13.13.3.13 metric 103
13.13.4.0/24 dev eth5 proto kernel scope link src 13.13.4.13 metric 104
13.13.5.0/24 dev eth6 proto kernel scope link src 13.13.5.13 metric 105
13.13.6.0/24 dev eth7 proto kernel scope link src 13.13.6.13 metric 106
13.13.7.0/24 dev eth8 proto kernel scope link src 13.13.7.13 metric 107
13.13.8.0/24 dev eth9 proto kernel scope link src 13.13.8.13 metric 108
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1

i use lmcache nixl backend to deploy my pd disagg scenario, use mooncake to transfer kvcache , Prefiill node report the error:

rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_9, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:39.981940 567815 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_9, mark it inactive
E1105 08:51:41.005860 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_2, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:41.005877 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_2, mark it inactive
E1105 08:51:42.029819 567815 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_7, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:42.029832 567815 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_7, mark it inactive
E1105 08:51:43.053889 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_8, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:43.053903 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_8, mark it inactive
E1105 08:51:44.077847 567815 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_5, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:44.077865 567815 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_5, mark it inactive
E1105 08:51:45.101872 567815 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_3, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:45.101890 567815 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_3, mark it inactive
E1105 08:51:46.125849 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_4, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:46.125866 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_4, mark it inactive
E1105 08:51:47.149879 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_2, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:47.149895 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_2, mark it inactive
E1105 08:51:48.173839 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_3, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:48.173853 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_3, mark it inactive
E1105 08:51:49.198797 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_4, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:49.198813 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_4, mark it inactive
E1105 08:51:50.221827 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_5, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:50.221841 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_5, mark it inactive

it seems the topo is not right, all nics use the same ip address?

this is the full log:
I1105 08:49:14.909394 567554 rdma_context.cpp:133] RDMA device: mlx5_3, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:02:0d
I1105 08:49:14.912278 567553 rdma_context.cpp:540] Find best gid index: 3 on mlx5_2/
I1105 08:49:14.913288 567553 rdma_context.cpp:133] RDMA device: mlx5_2, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0c:0c:0c:2b
I1105 08:49:14.924805 567554 rdma_context.cpp:540] Find best gid index: 3 on mlx5_4/
I1105 08:49:14.925806 567554 rdma_context.cpp:133] RDMA device: mlx5_4, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:03:0d
I1105 08:49:14.929670 567554 rdma_context.cpp:540] Find best gid index: 3 on mlx5_5/
I1105 08:49:14.929729 567553 rdma_context.cpp:540] Find best gid index: 3 on mlx5_3/
I1105 08:49:14.930569 567554 rdma_context.cpp:133] RDMA device: mlx5_5, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:04:0d
I1105 08:49:14.930648 567553 rdma_context.cpp:133] RDMA device: mlx5_3, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:02:0d
I1105 08:49:14.934268 567554 rdma_context.cpp:540] Find best gid index: 3 on mlx5_6/
I1105 08:49:14.935166 567554 rdma_context.cpp:133] RDMA device: mlx5_6, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:05:0d
I1105 08:49:14.963706 567553 rdma_context.cpp:540] Find best gid index: 3 on mlx5_4/
I1105 08:49:14.964675 567553 rdma_context.cpp:133] RDMA device: mlx5_4, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:03:0d
I1105 08:49:14.968204 567553 rdma_context.cpp:540] Find best gid index: 3 on mlx5_5/
I1105 08:49:14.969019 567553 rdma_context.cpp:133] RDMA device: mlx5_5, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:04:0d
I1105 08:49:14.978929 567554 rdma_context.cpp:540] Find best gid index: 3 on mlx5_7/
I1105 08:49:14.980372 567554 rdma_context.cpp:133] RDMA device: mlx5_7, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:06:0d
I1105 08:49:14.984109 567554 rdma_context.cpp:540] Find best gid index: 3 on mlx5_8/
I1105 08:49:14.984907 567553 rdma_context.cpp:540] Find best gid index: 3 on mlx5_6/
I1105 08:49:14.985038 567554 rdma_context.cpp:133] RDMA device: mlx5_8, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:07:0d
I1105 08:49:14.985738 567553 rdma_context.cpp:133] RDMA device: mlx5_6, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:05:0d
I1105 08:49:14.989444 567553 rdma_context.cpp:540] Find best gid index: 3 on mlx5_7/
I1105 08:49:14.990455 567553 rdma_context.cpp:133] RDMA device: mlx5_7, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:06:0d
I1105 08:49:14.994062 567553 rdma_context.cpp:540] Find best gid index: 3 on mlx5_8/
I1105 08:49:14.995044 567553 rdma_context.cpp:133] RDMA device: mlx5_8, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:07:0d
I1105 08:49:14.999848 567554 rdma_context.cpp:540] Find best gid index: 3 on mlx5_9/
I1105 08:49:15.000885 567554 rdma_context.cpp:133] RDMA device: mlx5_9, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:08:0d
I1105 08:49:15.000991 567554 nixl_agent.cpp:392] Created backend: Mooncake
(VllmWorker rank=1 pid=567554) 2025-11-05 08:49:15 NIXL INFO _api.py:361 Backend Mooncake was instantiated
(VllmWorker rank=1 pid=567554) 2025-11-05 08:49:15 NIXL INFO _api.py:251 Initialized NIXL agent: aab220e3-e6ef-4566-bd31-9de87a45cd04
(VllmWorker rank=1 pid=567554) [2025-11-05 08:49:15,005] LMCache INFO: Initializing LRUCachePolicy (lru.py:20:lmcache.v1.storage_backend.cache_policy.lru)
(VllmWorker rank=1 pid=567554) [2025-11-05 08:49:15,005] LMCache INFO: NUMA mapping None (local_cpu_backend.py:279:lmcache.v1.storage_backend.local_cpu_backend)
I1105 08:49:15.009850 567553 rdma_context.cpp:540] Find best gid index: 3 on mlx5_9/
I1105 08:49:15.010886 567553 rdma_context.cpp:133] RDMA device: mlx5_9, LID: 0, GID: (GID_Index 3) 00:00:00:00:00:00:00:00:00:00:ff:ff:0d:0d:08:0d
I1105 08:49:15.010982 567553 nixl_agent.cpp:392] Created backend: Mooncake
(VllmWorker rank=0 pid=567553) 2025-11-05 08:49:15 NIXL INFO api.py:361 Backend Mooncake was instantiated
(VllmWorker rank=0 pid=567553) 2025-11-05 08:49:15 NIXL INFO api.py:251 Initialized NIXL agent: 1fae4e72-af57-4911-9dd8-30282f6e2490
(VllmWorker rank=0 pid=567553) [2025-11-05 08:49:15,014] LMCache INFO: Initializing LRUCachePolicy (lru.py:20:lmcache.v1.storage_backend.cache_policy.lru)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:49:15,015] LMCache INFO: NUMA mapping None (local_cpu_backend.py:279:lmcache.v1.storage_backend.local_cpu_backend)
(VllmWorker rank=1 pid=567554) [2025-11-05 08:49:29,689] LMCache INFO: Initializing usage context. (usage_context.py:357:lmcache.usage_context)
(VllmWorker rank=1 pid=567554) [2025-11-05 08:49:31,814] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=2, port=7001, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server)
(VllmWorker rank=1 pid=567554) [2025-11-05 08:49:31,814] LMCache INFO: LMCache initialized for role KVConnectorRole.WORKER with version 0.3.8.dev27-gfb0d8c08a, vllm version 0.9.2, lmcache cache_engine metadata: LMCacheEngineMetadata(model_name='/models/deepseek-v2/DeepSeek-V2-Lite', world_size=2, worker_id=1, fmt='vllm', kv_dtype=torch.bfloat16, kv_shape=(27, 1, 256, 1, 576), use_mla=True) (vllm_v1_adapter.py:694:lmcache.integration.vllm.vllm_v1_adapter)
(VllmWorker rank=1 pid=567554) INFO 11-05 08:49:31 [gpu_model_runner.py:1785] Starting to load model /models/deepseek-v2/DeepSeek-V2-Lite...
(VllmWorker rank=1 pid=567554) INFO 11-05 08:49:32 [gpu_model_runner.py:1790] Loading model from scratch...
(VllmWorker rank=1 pid=567554) INFO 11-05 08:49:32 [rocm.py:215] Using FlashMLA backend on V1 engine.
(VllmWorker rank=0 pid=567553) [2025-11-05 08:49:36,439] LMCache INFO: Initializing usage context. (usage_context.py:357:lmcache.usage_context)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:49:38,592] LMCache INFO: lmcache lookup server start on /tmp/engine_53bee696-4f9d-4fe1-9bbf-285f550c0ea4_service_lookup_lmcache_rpc_port_producer10 (lmcache_lookup_client.py:263:lmcache.v1.lookup_client.lmcache_lookup_client)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:49:38,594] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=1, port=7000, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:49:38,594] LMCache INFO: LMCache initialized for role KVConnectorRole.WORKER with version 0.3.8.dev27-gfb0d8c08a, vllm version 0.9.2, lmcache cache_engine metadata: LMCacheEngineMetadata(model_name='/models/deepseek-v2/DeepSeek-V2-Lite', world_size=2, worker_id=0, fmt='vllm', kv_dtype=torch.bfloat16, kv_shape=(27, 1, 256, 1, 576), use_mla=True) (vllm_v1_adapter.py:694:lmcache.integration.vllm.vllm_v1_adapter)
(VllmWorker rank=0 pid=567553) INFO 11-05 08:49:38 [gpu_model_runner.py:1785] Starting to load model /models/deepseek-v2/DeepSeek-V2-Lite...
(VllmWorker rank=0 pid=567553) INFO 11-05 08:49:39 [gpu_model_runner.py:1790] Loading model from scratch...
(VllmWorker rank=0 pid=567553) INFO 11-05 08:49:39 [rocm.py:215] Using FlashMLA backend on V1 engine.
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:04<00:13, 4.48s/it]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:08<00:08, 4.18s/it]
(VllmWorker rank=1 pid=567554) INFO 11-05 08:49:49 [default_loader.py:272] Loading weights took 16.41 seconds
(VllmWorker rank=1 pid=567554) INFO 11-05 08:49:49 [gpu_model_runner.py:1816] Model loading took 14.7097 GiB and 16.912219 seconds
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:11<00:03, 3.63s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:15<00:00, 3.86s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:15<00:00, 3.91s/it]
(VllmWorker rank=0 pid=567553)
(VllmWorker rank=0 pid=567553) INFO 11-05 08:49:55 [default_loader.py:272] Loading weights took 15.83 seconds
(VllmWorker rank=0 pid=567553) INFO 11-05 08:49:56 [gpu_model_runner.py:1816] Model loading took 14.7097 GiB and 16.015288 seconds
(VllmWorker rank=0 pid=567553) INFO 11-05 08:50:06 [fused_moe.py:913] Using configuration from /data/ysj/vllm/vllm/model_executor/layers/fused_moe/configs/E=64,N=704,device_name=BW200_nn.json for MoE layer.
(VllmWorker rank=1 pid=567554) INFO 11-05 08:50:06 [fused_moe.py:913] Using configuration from /data/ysj/vllm/vllm/model_executor/layers/fused_moe/configs/E=64,N=704,device_name=BW200_nn.json for MoE layer.
I1105 08:50:06.842206 567553 ProcessGroupNCCL.cpp:2279] [PG ID 2 PG GUID 3 Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.05677 ms
I1105 08:50:06.842324 567554 ProcessGroupNCCL.cpp:2279] [PG ID 2 PG GUID 3 Rank 1] ProcessGroupNCCL broadcast unique ID through store took 0.42184 ms
I1105 08:50:06.903478 567553 ProcessGroupNCCL.cpp:2318] [PG ID 2 PG GUID 3 Rank 0] ProcessGroupNCCL created ncclComm
0x557d5995cd50 on CUDA device:
I1105 08:50:06.903497 567553 ProcessGroupNCCL.cpp:2323] [PG ID 2 PG GUID 3 Rank 0] NCCL_DEBUG: N/A
I1105 08:50:06.903535 567554 ProcessGroupNCCL.cpp:2318] [PG ID 2 PG GUID 3 Rank 1] ProcessGroupNCCL created ncclComm
0x55a1b9327f20 on CUDA device:
I1105 08:50:06.903553 567554 ProcessGroupNCCL.cpp:2323] [PG ID 2 PG GUID 3 Rank 1] NCCL_DEBUG: N/A
(VllmWorker rank=0 pid=567553) INFO 11-05 08:50:07 [gpu_worker.py:239] Available KV cache memory: 40.69 GiB
(VllmWorker rank=1 pid=567554) INFO 11-05 08:50:08 [gpu_worker.py:239] Available KV cache memory: 40.69 GiB
INFO 11-05 08:50:08 [kv_cache_utils.py:716] GPU KV cache size: 1,404,480 tokens
INFO 11-05 08:50:08 [kv_cache_utils.py:720] Maximum concurrency for 32,768 tokens per request: 42.86x
INFO 11-05 08:50:08 [kv_cache_utils.py:716] GPU KV cache size: 1,404,480 tokens
INFO 11-05 08:50:08 [kv_cache_utils.py:720] Maximum concurrency for 32,768 tokens per request: 42.86x
INFO 11-05 08:50:09 [core.py:174] init engine (profile, create kv cache, warmup model) took 13.27 seconds
INFO 11-05 08:50:10 [factory.py:74] Creating v1 connector with name: LMCacheConnectorV1 and engine_id: 53bee696-4f9d-4fe1-9bbf-285f550c0ea4
WARNING 11-05 08:50:10 [base.py:71] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
[2025-11-05 08:50:10,151] LMCache INFO: Loading LMCache config file /data/ysj/configs/lmcache-prefiller-config.yaml (utils.py:72:lmcache.integration.vllm.utils)
[2025-11-05 08:50:10,153] LMCache INFO: lmcache lookup client connect to tp_rank 0 with socket path /tmp/engine_53bee696-4f9d-4fe1-9bbf-285f550c0ea4_service_lookup_lmcache_rpc_port_producer10 (lmcache_lookup_client.py:77:lmcache.v1.lookup_client.lmcache_lookup_client)
[2025-11-05 08:50:10,203] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=0, port=6999, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server)
[2025-11-05 08:50:10,204] LMCache INFO: LMCache initialized for role KVConnectorRole.SCHEDULER with version 0.3.8.dev27-gfb0d8c08a, vllm version 0.9.2, lmcache cache_engine metadata: None (vllm_v1_adapter.py:694:lmcache.integration.vllm.vllm_v1_adapter)
INFO 11-05 08:50:10 [loggers.py:137] Engine 000: vllm cache_config_info with initialization after num_gpu_blocks is: 21945
WARNING 11-05 08:50:10 [config.py:1408] Default sampling parameters have been overridden by the model's Hugging Face generation config recommended from the model creator. If this is not intended, please relaunch vLLM instance with --generation-config vllm.
INFO 11-05 08:50:10 [serving_chat.py:125] Using default chat sampling params from model: {'temperature': 0.3, 'top_p': 0.95}
INFO 11-05 08:50:10 [serving_completion.py:72] Using default completion sampling params from model: {'temperature': 0.3, 'top_p': 0.95}
INFO 11-05 08:50:10 [api_server.py:1457] Starting vLLM API server 0 on http://0.0.0.0:7100
INFO 11-05 08:50:10 [launcher.py:29] Available routes are:
INFO 11-05 08:50:10 [launcher.py:37] Route: /openapi.json, Methods: GET, HEAD
INFO 11-05 08:50:10 [launcher.py:37] Route: /docs, Methods: GET, HEAD
INFO 11-05 08:50:10 [launcher.py:37] Route: /docs/oauth2-redirect, Methods: GET, HEAD
INFO 11-05 08:50:10 [launcher.py:37] Route: /redoc, Methods: GET, HEAD
INFO 11-05 08:50:10 [launcher.py:37] Route: /health, Methods: GET
INFO 11-05 08:50:10 [launcher.py:37] Route: /load, Methods: GET
INFO 11-05 08:50:10 [launcher.py:37] Route: /ping, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /ping, Methods: GET
INFO 11-05 08:50:10 [launcher.py:37] Route: /tokenize, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /detokenize, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /v1/models, Methods: GET
INFO 11-05 08:50:10 [launcher.py:37] Route: /version, Methods: GET
INFO 11-05 08:50:10 [launcher.py:37] Route: /v1/chat/completions, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /v1/completions, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /v1/embeddings, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /pooling, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /classify, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /score, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /v1/score, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /v1/audio/transcriptions, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /v1/audio/translations, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /rerank, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /v1/rerank, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /v2/rerank, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /invocations, Methods: POST
INFO 11-05 08:50:10 [launcher.py:37] Route: /metrics, Methods: GET
INFO: Started server process [566348]
INFO: Waiting for application startup.
INFO: Application startup complete.
[2025-11-05 08:51:38,785] LMCache INFO: Reqid: cmpl-0e43e724dee34be7b05a64cce801ba47-0, Total tokens 49, LMCache hit tokens: 0, need to load: 0 (vllm_v1_adapter.py:1190:lmcache.integration.vllm.vllm_v1_adapter)
[2025-11-05 08:51:38,787] LMCache INFO: is_last_prefill is true (vllm_v1_adapter.py:290:lmcache.integration.vllm.vllm_v1_adapter)
(VllmWorker rank=1 pid=567554) [2025-11-05 08:51:38,792] LMCache INFO: Post-initializing LMCacheEngine (cache_engine.py:170:lmcache.v1.cache_engine)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:51:38,792] LMCache INFO: Post-initializing LMCacheEngine (cache_engine.py:170:lmcache.v1.cache_engine)
(VllmWorker rank=1 pid=567554) [2025-11-05 08:51:38,924] LMCache INFO: Storing KV cache for 49 out of 49 tokens (skip_leading_tokens=0) for request cmpl-0e43e724dee34be7b05a64cce801ba47-0 (vllm_v1_adapter.py:1078:lmcache.integration.vllm.vllm_v1_adapter)
(VllmWorker rank=1 pid=567554) [2025-11-05 08:51:38,924] LMCache INFO: rank=1 ignore store (cache_engine.py:204:lmcache.v1.cache_engine)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:51:38,924] LMCache INFO: Storing KV cache for 49 out of 49 tokens (skip_leading_tokens=0) for request cmpl-0e43e724dee34be7b05a64cce801ba47-0 (vllm_v1_adapter.py:1078:lmcache.integration.vllm.vllm_v1_adapter)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:51:38,943] LMCache INFO: batch_submit_put_task enter. (pd_backend.py:375:lmcache.v1.storage_backend.pd_backend)
I1105 08:51:38.947638 567553 nixl_agent.cpp:1612] Loading remote metadata for agent: 840e69f3-0924-43b7-a2e6-4f778f85717b
I1105 08:51:38.951772 567553 nixl_agent.cpp:810] reqH descList size down to 1
E1105 08:51:39.981916 567815 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_9, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:39.981940 567815 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_9, mark it inactive
E1105 08:51:41.005860 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_2, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:41.005877 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_2, mark it inactive
E1105 08:51:42.029819 567815 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_7, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:42.029832 567815 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_7, mark it inactive
E1105 08:51:43.053889 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_8, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:43.053903 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_8, mark it inactive
E1105 08:51:44.077847 567815 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_5, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:44.077865 567815 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_5, mark it inactive
E1105 08:51:45.101872 567815 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_3, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:45.101890 567815 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_3, mark it inactive
E1105 08:51:46.125849 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_4, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:46.125866 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_4, mark it inactive
E1105 08:51:47.149879 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_2, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:47.149895 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_2, mark it inactive
E1105 08:51:48.173839 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_3, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:48.173853 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_3, mark it inactive
E1105 08:51:49.198797 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_4, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:49.198813 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_4, mark it inactive
E1105 08:51:50.221827 567816 rdma_endpoint.cpp:157] Invalid argument: received packet mismatch, local.local_nic_path: 12.12.12.43:16116@mlx5_6, local.peer_nic_path: 12.12.12.43:15608@mlx5_5, peer.local_nic_path: , peer.peer_nic_path:
E1105 08:51:50.221841 567816 worker_pool.cpp:242] Worker: Cannot make connection for endpoint: 12.12.12.43:15608@mlx5_5, mark it inactive
(VllmWorker rank=0 pid=567553) [2025-11-05 08:51:50,223] LMCache INFO: proxy side channel send proxy_url: localhost:7500 req_id: 5 (pd_backend.py:429:lmcache.v1.storage_backend.pd_backend)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:51:50,224] LMCache INFO: Allocating memory in local cpu backend with busy loop: False (local_cpu_backend.py:323:lmcache.v1.storage_backend.local_cpu_backend)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:51:50,251] LMCache INFO: batched_submit_put_task in local cpu backend (local_cpu_backend.py:159:lmcache.v1.storage_backend.local_cpu_backend)
(VllmWorker rank=0 pid=567553) [2025-11-05 08:51:50,251] LMCache INFO: Stored 49 out of total 49 tokens. size: 0.0014 gb, cost 11327.2511 ms, throughput: 0.0001 GB/s; offload_time: 19.0638 ms, put_time: 11308.1872 ms (cache_engine.py:289:lmcache.v1.cache_engine)
INFO: 127.0.0.1:58952 - "POST /v1/completions HTTP/1.1" 200 OK

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions