-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Labels
bugConfirmed bugsConfirmed bugs
Description
[ Bug]
Speculative decoding fails on Android OpenCL with CL_INVALID_BUFFER_SIZE
Hi, I'm currently trying to run speculative decoding (Medusa or Eagle mode) on Android using an OpenCL device. However, when I start the MLC LLM server with speculative decoding enabled, I encounter the following error CL_INVALID_BUFFER_SIZE
Notably, the server runs fine without speculative decoding, and memory should be sufficient. This issue only appears when enabling speculative decoding modes.
Has anyone else experienced this issue or knows what might be causing it? Any insights or suggestions would be greatly appreciated.
Thanks in advance!
error:
[15:13:43] /opt/mlc-llm/3rdparty/tvm/src/runtime/opencl/opencl_device_api.cc:845: Warning: Trying to release all unused memory and reallocate...
libc++abi: terminating due to uncaught exception of type tvm::runtime::InternalError: Traceback (most recent call last):
File "/opt/mlc-llm/3rdparty/tvm/src/runtime/opencl/opencl_device_api.cc", line 317, in
InternalError: Check failed: (err_code == CL_SUCCESS) is false: OpenCL Error, code=-61: CL_INVALID_BUFFER_SIZE
Steps to reproduce the behavior:
## data type 14fp16_1
python3 python/mlc_llm/__main__.py serve \
/model/Llama-3.2-3B-Instruct-MLC --model-lib /model/Llama-3.2-3B-Instruct-MLC/model_opencl.so \
--mode server --additional-models /model/EAGLE-Llama-3.2-3B-Instruct-MLC,/model/EAGLE-Llama-3.2-3B-Instruct-MLC/model_opencl.so \
--speculative-mode eagle --device opencl --overrides "spec_draft_length=1"
Expected behavior
Environment
- Platform Android Termux
- Operating system (e.g. Ubuntu/Windows/MacOS/...):
- Device OPENCL
- How you installed MLC-LLM ( source):
- How you installed TVM-Unity (source):
- Python version (e.g. 3.12):
- TVM Unity Hash Tag (
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models):
BUILD_STATIC_RUNTIME: OFF
BUILD_DUMMY_LIBTVM: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
CUDA_VERSION: NOT-FOUND
DLPACK_PATH: 3rdparty/dlpack/include
DMLC_PATH: 3rdparty/dmlc-core/include
GIT_COMMIT_HASH: 2d2d2ea7763b3cf5ed42cda79315103cc82d2309
GIT_COMMIT_TIME: 2025-07-09 10:06:40 -0400
HIDE_PRIVATE_SYMBOLS: ON
INDEX_DEFAULT_I64: ON
INSTALL_DEV: OFF
LLVM_VERSION: NOT-FOUND
MLIR_VERSION: NOT-FOUND
PICOJSON_PATH: 3rdparty/picojson
RANG_PATH: 3rdparty/rang/include
ROCM_PATH: /opt/rocm
SUMMARIZE: OFF
TVM_CXX_COMPILER_PATH: /opt/android-ndk-r28/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++
USE_ALTERNATIVE_LINKER: AUTO
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_ARM_COMPUTE_LIB: OFF
USE_BLAS: none
USE_BNNS: OFF
USE_BYODT_POSIT: OFF
USE_COREML: OFF
USE_CPP_RPC: OFF
USE_CPP_RTVM: OFF
USE_CUBLAS: OFF
USE_CUDA: OFF
USE_NVTX: OFF
USE_NCCL: OFF
USE_MSCCL: OFF
USE_CUDNN: OFF
USE_CUSTOM_LOGGING: OFF
USE_CUTLASS: OFF
USE_AMX: OFF
USE_DNNL: OFF
USE_FALLBACK_STL_MAP: OFF
USE_GTEST: AUTO
USE_HEXAGON: OFF
USE_HEXAGON_RPC: OFF
USE_HEXAGON_SDK: /path/to/sdk
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_IOS_RPC: OFF
USE_KHRONOS_SPIRV: OFF
USE_LIBBACKTRACE: AUTO
USE_LIBTORCH: OFF
USE_LLVM: OFF
USE_MLIR: OFF
USE_METAL: OFF
USE_MIOPEN: OFF
USE_MKL: OFF
USE_MRVL: OFF
USE_MSVC_MT: OFF
USE_NNPACK: OFF
USE_OPENCL: ON
USE_OPENCL_ENABLE_HOST_PTR: OFF
USE_OPENCL_EXTN_QCOM: NOT-FOUND
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_OPENMP: none
USE_PAPI: OFF
USE_RANDOM: ON
TVM_DEBUG_WITH_ABI_CHANGE: OFF
TVM_LOG_BEFORE_THROW: OFF
USE_ROCBLAS: OFF
USE_HIPBLAS: OFF
USE_ROCM: OFF
USE_RCCL: OFF
USE_RPC: ON
USE_RTTI: ON
USE_RUST_EXT: OFF
USE_SORT: ON
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_TENSORFLOW_PATH: none
USE_TENSORRT_CODEGEN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_TFLITE: OFF
USE_THREADS: ON
USE_THRUST: OFF
USE_CURAND: OFF
USE_VULKAN: OFF
USE_CLML: OFF
TVM_CLML_VERSION:
USE_CLML_GRAPH_EXECUTOR: OFF
USE_UMA: OFF
USE_MSC: OFF
USE_CCACHE: AUTO
USE_NVSHMEM: OFF
USE_NNAPI_CODEGEN: OFF
USE_NNAPI_RUNTIME: OFF
BACKTRACE_ON_SEGFAULT: OFF
Memory
clinfo | grep -i "global memory size"
Global memory size 7923193856 (7.379GiB)
···
Metadata
Metadata
Assignees
Labels
bugConfirmed bugsConfirmed bugs