-
Notifications
You must be signed in to change notification settings - Fork 61
Description
We are running a fairly simple Python service in Kubernetes - it's a simple webserver (via aiohttp) with REST API and it interacts with Google BigTable (both through mutations and SQL queries). The service is handling about 600 requests/second currently. We noticed, that the memory usage of the service kept growing indefinitely (well, until it hit the pod memory limit). After much trial and error we narrowed it down to Google BigTable. The main indicator for us was that how fast the memory grows depends on the amount of requests handled. So it was either the webserver, or BigTable...
We are using both mutate_row and execute_query and we did an experiment in production, where we patched the service to be able to stop it from sending either mutate_row or execute_query requests at runtime without restarting the service. We observed, that when we disabled mutate_row, the memory usage kept growing, while when we disabled execute_query, the curve flattened almost immediately - once we re-enabled execute_query, the memory usage started growing again.
Eventually we arrived to the smallest possible reproducer, which is basically just calling SELECT 1 FROM table WHERE _key = @key in a loop - see the code below and the attached memory usage graph.
We have only one instance of BigTableAsyncClient and one instance of TableAsync, we are running latest version of google-cloud-bigtable.
Environment details
- OS type and version: Debian 12 docker image in GKE
- Python version: 3.11.11
- pip version: 24.3.1
google-cloud-bigtableversion: 2.32.0
Steps to reproduce
- Run the attached example
- Observe unbounded memory growth
Code example
import asyncio
import os
import uuid
from typing import Any
from google.cloud.bigtable.data import BigtableDataClientAsync
try:
BIGTABLE_PROJECT = os.environ.get("BIGTABLE_PROJECT")
BIGTABLE_INSTANCE = os.environ.get("BIGTABLE_INSTANCE")
BIGTABLE_TABLE = os.environ.get("BIGTABLE_TABLE")
except KeyError as e:
raise RuntimeError(f"Missing environment variable: {e}") from e
async def run_query(client: BigtableDataClientAsync, key: str) -> bool:
result = await client.execute_query(
f"SELECT 1 FROM {BIGTABLE_TABLE} WHERE _key = @key",
instance_id=BIGTABLE_INSTANCE,
parameters={"key": key.encode()},
app_profile_id="default",
)
row = await anext(result, None)
if row:
# Make sure we exhaust the iterator, although there should only ever be a single result
async for _ in result:
pass
return True
else:
return False
async def loop(client: BigtableDataClientAsync) -> None:
# just keep querying the client with random keys
while True:
await run_query(client, uuid.uuid4().hex)
async def run() -> None:
client = BigtableDataClientAsync(project=BIGTABLE_PROJECT)
# 10 tasks in parallel
tasks = [asyncio.create_task(loop(client)) for _ in range(10)]
await asyncio.gather(*tasks)
def main() -> None:
asyncio.run(run())
if __name__ == "__main__":
main()Memory Usage
This is a graph from Kubernetes, showcasing the unbounded memory growth of the Python script over the past few days.
We also tried to debug the problem via memray with grpcio compiled with debug symbols. There's some indication that the memory might be leaking inside grpcio - but we think it's weird that it would only leak when calling execute_query and not when calling mutate_row - so we think it maybe is some due to some interaction between this library and grpcio - but that's all just pure speculations from our side.