Cache: Allow using shared memory

Currently, the storage pickle cache is private memory, allocated per-process. 

A common architecture for servers (e.g., gunicorn) is to spawn many worker processes on a single machine as a way to utilize multiple cores. Each such worker process gets its own pickle cache (per RelStorage storage, which could be greater than 1 in a multi-db scenario). 

As the number of cores and workers goes up, the amount of memory needed to keep a reasonable-sized RelStorage cache also goes up. Even if the memory was initially shared due to `fork()`, because of the nature of the cache, the pages quickly become dirty and have to be copied.

I've been investigating, and think it should be possible to move the storage caches into shared memory on Unix and Windows. The option that requires the least code changes and keeps most of the caching logic intact uses [boost.interprocess](https://www.boost.org/doc/libs/1_75_0/doc/html/interprocess.html) (we're already using `boost.intrusive` in the cache).

Benefits include:

- A larger cache, shared among workers, could use overall less memory, while still effectively being larger. For example, instead of 8 workers with a 500MB cache = 4GB, you might use a single shared memory cache of 2GB. Overall memory use goes down, but effective cache size goes up. 
- If the workers are performing similar operations (e.g., there's nothing like `zc.resumelb` in use that tries to direct similar work to the same worker) this should result in overall better hit rates.
- When one worker performs a write, the cached value would be immediately available to other workers on the same machine without need for a database hit. Some goes for a read, too.
- The ability to drop the GIL. Right now we're relying on the GIL for all cache operations, but that will have to change.
- The possibility to store the cache as a memory-mapped file, meaning it takes 0 time to load/store to the SQLite database.

Possible drawbacks/open questions include:

- There will be cross-process synchronization required. Benchmarks will be needed to test the overhead in different workloads. (But this is the part that lets us drop the GIL).
- The memory limitations will be stricter, and depending on the allocation strategy, fragmentation may be an issue. Benchmarks/tests will be needed.
- Currently on CPython, we keep `byte` objects is the cache directly, meaning there is no memory copy involved to read or write to the cache. Shared memory will require at least a write copy; it may or may not be possible to implement 0-copy reads. 


Initially, for the smallest code changes, shared memory caches will only work with processes on Unix that are related via `fork()`: this is because the C++ objects have vtables in them and those same vtable pointers must be valid in all processes accessing the cache. Only child processes have that guarantee (and only if RelStorage was loaded in the parent process before the `fork()`). Over time, it should be possible to remove this restriction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache: Allow using shared memory #446

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cache: Allow using shared memory #446

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions