discussion: chunked simulation from start to end (improve capabiility to emit more rays)

we are capable to emitting between $10^7$ to $10^8$ rays. although this sounds like a lot, when creating histograms from the collected data, the images siill appear noisy. also then maximum amount of rays highly depends on the system running on (e.g. amount of RAM).

#### current approach: partially chunked
generate and trace are chunked, but all rays are collected before writing to h5
```mermaid
graph LR
A["generate rays"]
A --> B["trace rays"]
B --> A
B --> C["collect rays"]
C --> D["write all  rays to h5"]
```

#### proposed approach: fully chunked:
the whole sequence of steps is executed for each chunk
limits the amount of resources. required at each instance.
steps could be executed simultaneously (pipelined), potentially improving runtime performance
```mermaid
graph LR
A["generate rays"]
A --> B["trace rays"]
B --> C["collect rays"]
C --> D["write chunk of rays to h5"]
D --> A
```

## rayx-core

h5 chunking&compression should be considered to balance compute and I/O

## rayx-python
in order to make this feature available, rayx-python has to integrate into the pipeline. essentially it replaces the `write to h5` step. We probably have to create an API for the chunked processing.

```mermaid
graph LR
A["generate rays"]
A --> B["trace rays"]
B --> C["collect rays"]
C --> D["process chunk by python"]
D --> A
```

## reading large h5 from python (no rayx-python)
filtering the data before making it accessible to the user, reduces the amount of data, at each instance. hopefully enough to read larger amounts of data, that usually wouldnt fit into memory.
```python
df = pd.read_hdf("data.h5", key="mydata", where="temperature > 300")
```
if the amount of data is still too large, the data could be loaded, filtered and processed slice by slice

## finally: processing chunks in python
### e.g. creating a histogram

a histogram has a fixed number of bins, each represent accumulated data consisting of a potentially large amount of datapoints. we can feed chunks of datapoints into the histogram, limiting the amount of resources, required at each instance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

discussion: chunked simulation from start to end (improve capabiility to emit more rays) #406

current approach: partially chunked

proposed approach: fully chunked:

rayx-core

rayx-python

reading large h5 from python (no rayx-python)

finally: processing chunks in python

e.g. creating a histogram

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

discussion: chunked simulation from start to end (improve capabiility to emit more rays) #406

Description

current approach: partially chunked

proposed approach: fully chunked:

rayx-core

rayx-python

reading large h5 from python (no rayx-python)

finally: processing chunks in python

e.g. creating a histogram

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions