这是indexloc提供的服务,不要输入任何密码
Skip to content

[Bug] Speculative decoding fails on Android OpenCL with CL_INVALID_BUFFER_SIZE #3290

@hj-wei

Description

@hj-wei

[ Bug]

Speculative decoding fails on Android OpenCL with CL_INVALID_BUFFER_SIZE

Hi, I'm currently trying to run speculative decoding (Medusa or Eagle mode) on Android using an OpenCL device. However, when I start the MLC LLM server with speculative decoding enabled, I encounter the following error CL_INVALID_BUFFER_SIZE
Notably, the server runs fine without speculative decoding, and memory should be sufficient. This issue only appears when enabling speculative decoding modes.

Has anyone else experienced this issue or knows what might be causing it? Any insights or suggestions would be greatly appreciated.

Thanks in advance!
error:

[15:13:43] /opt/mlc-llm/3rdparty/tvm/src/runtime/opencl/opencl_device_api.cc:845: Warning: Trying to release all unused memory and reallocate...
libc++abi: terminating due to uncaught exception of type tvm::runtime::InternalError: Traceback (most recent call last):
  File "/opt/mlc-llm/3rdparty/tvm/src/runtime/opencl/opencl_device_api.cc", line 317, in 
InternalError: Check failed: (err_code == CL_SUCCESS) is false: OpenCL Error, code=-61: CL_INVALID_BUFFER_SIZE

Steps to reproduce the behavior:

## data type 14fp16_1
 python3 python/mlc_llm/__main__.py serve  \
/model/Llama-3.2-3B-Instruct-MLC --model-lib /model/Llama-3.2-3B-Instruct-MLC/model_opencl.so \
--mode server --additional-models /model/EAGLE-Llama-3.2-3B-Instruct-MLC,/model/EAGLE-Llama-3.2-3B-Instruct-MLC/model_opencl.so \
--speculative-mode eagle --device opencl --overrides "spec_draft_length=1"

Expected behavior

Environment

  • Platform Android Termux
  • Operating system (e.g. Ubuntu/Windows/MacOS/...):
  • Device OPENCL
  • How you installed MLC-LLM ( source):
  • How you installed TVM-Unity (source):
  • Python version (e.g. 3.12):
  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
    BUILD_STATIC_RUNTIME: OFF
    BUILD_DUMMY_LIBTVM: OFF
    COMPILER_RT_PATH: 3rdparty/compiler-rt
    CUDA_VERSION: NOT-FOUND
    DLPACK_PATH: 3rdparty/dlpack/include
    DMLC_PATH: 3rdparty/dmlc-core/include
    GIT_COMMIT_HASH: 2d2d2ea7763b3cf5ed42cda79315103cc82d2309
    GIT_COMMIT_TIME: 2025-07-09 10:06:40 -0400
    HIDE_PRIVATE_SYMBOLS: ON
    INDEX_DEFAULT_I64: ON
    INSTALL_DEV: OFF
    LLVM_VERSION: NOT-FOUND
    MLIR_VERSION: NOT-FOUND
    PICOJSON_PATH: 3rdparty/picojson
    RANG_PATH: 3rdparty/rang/include
    ROCM_PATH: /opt/rocm
    SUMMARIZE: OFF
    TVM_CXX_COMPILER_PATH: /opt/android-ndk-r28/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++
    USE_ALTERNATIVE_LINKER: AUTO
    USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
    USE_ARM_COMPUTE_LIB: OFF
    USE_BLAS: none
    USE_BNNS: OFF
    USE_BYODT_POSIT: OFF
    USE_COREML: OFF
    USE_CPP_RPC: OFF
    USE_CPP_RTVM: OFF
    USE_CUBLAS: OFF
    USE_CUDA: OFF
    USE_NVTX: OFF
    USE_NCCL: OFF
    USE_MSCCL: OFF
    USE_CUDNN: OFF
    USE_CUSTOM_LOGGING: OFF
    USE_CUTLASS: OFF
    USE_AMX: OFF
    USE_DNNL: OFF
    USE_FALLBACK_STL_MAP: OFF
    USE_GTEST: AUTO
    USE_HEXAGON: OFF
    USE_HEXAGON_RPC: OFF
    USE_HEXAGON_SDK: /path/to/sdk
    USE_HEXAGON_GTEST: /path/to/hexagon/gtest
    USE_HEXAGON_EXTERNAL_LIBS: OFF
    USE_IOS_RPC: OFF
    USE_KHRONOS_SPIRV: OFF
    USE_LIBBACKTRACE: AUTO
    USE_LIBTORCH: OFF
    USE_LLVM: OFF
    USE_MLIR: OFF
    USE_METAL: OFF
    USE_MIOPEN: OFF
    USE_MKL: OFF
    USE_MRVL: OFF
    USE_MSVC_MT: OFF
    USE_NNPACK: OFF
    USE_OPENCL: ON
    USE_OPENCL_ENABLE_HOST_PTR: OFF
    USE_OPENCL_EXTN_QCOM: NOT-FOUND
    USE_OPENCL_GTEST: /path/to/opencl/gtest
    USE_OPENMP: none
    USE_PAPI: OFF
    USE_RANDOM: ON
    TVM_DEBUG_WITH_ABI_CHANGE: OFF
    TVM_LOG_BEFORE_THROW: OFF
    USE_ROCBLAS: OFF
    USE_HIPBLAS: OFF
    USE_ROCM: OFF
    USE_RCCL: OFF
    USE_RPC: ON
    USE_RTTI: ON
    USE_RUST_EXT: OFF
    USE_SORT: ON
    USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
    USE_TENSORFLOW_PATH: none
    USE_TENSORRT_CODEGEN: OFF
    USE_TENSORRT_RUNTIME: OFF
    USE_TFLITE: OFF
    USE_THREADS: ON
    USE_THRUST: OFF
    USE_CURAND: OFF
    USE_VULKAN: OFF
    USE_CLML: OFF
    TVM_CLML_VERSION:
    USE_CLML_GRAPH_EXECUTOR: OFF
    USE_UMA: OFF
    USE_MSC: OFF
    USE_CCACHE: AUTO
    USE_NVSHMEM: OFF
    USE_NNAPI_CODEGEN: OFF
    USE_NNAPI_RUNTIME: OFF
    BACKTRACE_ON_SEGFAULT: OFF

Memory

clinfo | grep  -i "global memory size" 
  Global memory size                              7923193856 (7.379GiB)
···

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed bugs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions