+
Skip to content

discussion: chunked simulation from start to end (improve capabiility to emit more rays) #406

@ichinii

Description

@ichinii

we are capable to emitting between $10^7$ to $10^8$ rays. although this sounds like a lot, when creating histograms from the collected data, the images siill appear noisy. also then maximum amount of rays highly depends on the system running on (e.g. amount of RAM).

current approach: partially chunked

generate and trace are chunked, but all rays are collected before writing to h5

graph LR
A["generate rays"]
A --> B["trace rays"]
B --> A
B --> C["collect rays"]
C --> D["write all  rays to h5"]
Loading

proposed approach: fully chunked:

the whole sequence of steps is executed for each chunk
limits the amount of resources. required at each instance.
steps could be executed simultaneously (pipelined), potentially improving runtime performance

graph LR
A["generate rays"]
A --> B["trace rays"]
B --> C["collect rays"]
C --> D["write chunk of rays to h5"]
D --> A
Loading

rayx-core

h5 chunking&compression should be considered to balance compute and I/O

rayx-python

in order to make this feature available, rayx-python has to integrate into the pipeline. essentially it replaces the write to h5 step. We probably have to create an API for the chunked processing.

graph LR
A["generate rays"]
A --> B["trace rays"]
B --> C["collect rays"]
C --> D["process chunk by python"]
D --> A
Loading

reading large h5 from python (no rayx-python)

filtering the data before making it accessible to the user, reduces the amount of data, at each instance. hopefully enough to read larger amounts of data, that usually wouldnt fit into memory.

df = pd.read_hdf("data.h5", key="mydata", where="temperature > 300")

if the amount of data is still too large, the data could be loaded, filtered and processed slice by slice

finally: processing chunks in python

e.g. creating a histogram

a histogram has a fixed number of bins, each represent accumulated data consisting of a potentially large amount of datapoints. we can feed chunks of datapoints into the histogram, limiting the amount of resources, required at each instance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-coreArea: The core library (rayx-core)C-enhancementCategory: Adding a new feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载