这是indexloc提供的服务,不要输入任何密码
Skip to content

Memory leak on each execute_query #1205

@danvratil

Description

@danvratil

We are running a fairly simple Python service in Kubernetes - it's a simple webserver (via aiohttp) with REST API and it interacts with Google BigTable (both through mutations and SQL queries). The service is handling about 600 requests/second currently. We noticed, that the memory usage of the service kept growing indefinitely (well, until it hit the pod memory limit). After much trial and error we narrowed it down to Google BigTable. The main indicator for us was that how fast the memory grows depends on the amount of requests handled. So it was either the webserver, or BigTable...

We are using both mutate_row and execute_query and we did an experiment in production, where we patched the service to be able to stop it from sending either mutate_row or execute_query requests at runtime without restarting the service. We observed, that when we disabled mutate_row, the memory usage kept growing, while when we disabled execute_query, the curve flattened almost immediately - once we re-enabled execute_query, the memory usage started growing again.

Eventually we arrived to the smallest possible reproducer, which is basically just calling SELECT 1 FROM table WHERE _key = @key in a loop - see the code below and the attached memory usage graph.

We have only one instance of BigTableAsyncClient and one instance of TableAsync, we are running latest version of google-cloud-bigtable.

Environment details

  • OS type and version: Debian 12 docker image in GKE
  • Python version: 3.11.11
  • pip version: 24.3.1
  • google-cloud-bigtable version: 2.32.0

Steps to reproduce

  1. Run the attached example
  2. Observe unbounded memory growth

Code example

import asyncio
import os
import uuid
from typing import Any

from google.cloud.bigtable.data import BigtableDataClientAsync

try:
    BIGTABLE_PROJECT = os.environ.get("BIGTABLE_PROJECT")
    BIGTABLE_INSTANCE = os.environ.get("BIGTABLE_INSTANCE")
    BIGTABLE_TABLE = os.environ.get("BIGTABLE_TABLE")
except KeyError as e:
    raise RuntimeError(f"Missing environment variable: {e}") from e


async def run_query(client: BigtableDataClientAsync, key: str) -> bool:
    result = await client.execute_query(
        f"SELECT 1 FROM {BIGTABLE_TABLE} WHERE _key = @key",
        instance_id=BIGTABLE_INSTANCE,
        parameters={"key": key.encode()},
        app_profile_id="default",
    )

    row = await anext(result, None)
    if row:
        # Make sure we exhaust the iterator, although there should only ever be a single result
        async for _ in result:
            pass
        return True
    else:
        return False


async def loop(client: BigtableDataClientAsync) -> None:
    # just keep querying the client with random keys
    while True:
        await run_query(client, uuid.uuid4().hex)


async def run() -> None:
    client  = BigtableDataClientAsync(project=BIGTABLE_PROJECT)
    # 10 tasks in parallel
    tasks = [asyncio.create_task(loop(client)) for _ in range(10)]
    await asyncio.gather(*tasks)


def main() -> None:
    asyncio.run(run())


if __name__ == "__main__":
    main()

Memory Usage

This is a graph from Kubernetes, showcasing the unbounded memory growth of the Python script over the past few days.

Image

We also tried to debug the problem via memray with grpcio compiled with debug symbols. There's some indication that the memory might be leaking inside grpcio - but we think it's weird that it would only leak when calling execute_query and not when calling mutate_row - so we think it maybe is some due to some interaction between this library and grpcio - but that's all just pure speculations from our side.

memray-flamegraph-netrep-20250825-231600.html.gz

Metadata

Metadata

Assignees

Labels

api: bigtableIssues related to the googleapis/python-bigtable API.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions