Note
fasthep-flow
is still in early development, which means it is
incomplete and the API is not yet stable. Please report any issues you find on
the GitHub issue tracker.
fasthep-flow
is a package for describing data analysis workflows in YAML and
converting them into serializable compute graphs that can be evaluated by
software like Dask, as
Interaction Combinator
or step-wise for with workflow managers like
Snakemake. fasthep-flow
is
designed to be used with the FAST-HEP package
ecosystem, but can be used independently.
The primary use-case of this package is to define a data processing workflow, e.g. a High-Energy-Physics (HEP) analysis, in a YAML file, and then convert that YAML file into a compute graphs that can be converted in meaningful node sets to Direct Acyclic Graphs (DAGs) that can be processed by external (to fasthep-flow) executors. These executors can then run on a local machine, or on a cluster using CERN's HTCondor (via Dask) or Google Cloud Composer.
In fasthep-flow
's YAML files draw inspiration from Continuous Integration (CI)
pipelines and Ansible Playbooks to define the workflow, where each independent
task that can be run in parallel. fasthep-flow
will check the parameters of
each task, and then generate the compute graph. The compute graph consists of
nodes that describe input/output data and the compute task and edges for the
dependencies between the tasks.
This project is in early development. The documentation is available at
fasthep-flow.readthedocs.io
and contains mostly fictional features. The most useful information can be found
in the FAST-HEP documentation. It describes the
current status and plans for the FAST-HEP projects, including fasthep-flow
(see Developer's Corner).
pip install fasthep-flow[dask, visualisation]
fasthep-flow execute docs/examples/hello_world.yaml
# example with plugins
fasthep-flow execute tests/data/plugins.yaml --dev --save-path=$PWD/output
You had a look and are interested to contribute? That's great! There are three main ways to contribute to this project:
- Head to the issues tab and see if there is anything you can help with.
- If you have a new feature in mind, please open an issue first to discuss it. This way we can ensure that your work is not in vain.
- You can also help by improving the documentation or fixing typos.
Once you have something to work on, you can have a look at the contributing guidelines. It contains recommendations for setting up your development environment, testing, and more (compiled by the Scientific Python Community).
Important
How you customise your development environment is up to you.
You like uv? Be our guest.
You prefer nox? That's fine too.
You want to use ? Go ahead. We are happy as long as you are happy.
Ideally you should be able to run pylint
, pytest
, and the pre-commit hooks.
If you can do that, you are good to go.
Tip
If you are looking for example workflows to run, have a look at the
tests/data/
directory. Each of the configs can be run via
fasthep-flow execute <config> --dev --save-path=$PWD/output
The --dev
flag will enable the development mode, which will allow you to
(optionally) overwrite the workflow snapshot. This is useful for changes in
fasthep-flow
code without changes to the config file.
This project is licensed under the terms of the Apache 2.0 license. See LICENSE for more details.
Special thanks to the gracious help of FAST-HEP contributors:
Maciek Glowacki |
Null |
Luke Kreczko |