+
Skip to content

FAST-HEP/fasthep-flow

Repository files navigation

fasthep-flow

Actions Status Documentation Status

PyPI version PyPI platforms

GitHub Discussion

Introduction

Note

fasthep-flow is still in early development, which means it is incomplete and the API is not yet stable. Please report any issues you find on the GitHub issue tracker.

fasthep-flow is a package for describing data analysis workflows in YAML and converting them into serializable compute graphs that can be evaluated by software like Dask, as Interaction Combinator or step-wise for with workflow managers like Snakemake. fasthep-flow is designed to be used with the FAST-HEP package ecosystem, but can be used independently.

The primary use-case of this package is to define a data processing workflow, e.g. a High-Energy-Physics (HEP) analysis, in a YAML file, and then convert that YAML file into a compute graphs that can be converted in meaningful node sets to Direct Acyclic Graphs (DAGs) that can be processed by external (to fasthep-flow) executors. These executors can then run on a local machine, or on a cluster using CERN's HTCondor (via Dask) or Google Cloud Composer.

In fasthep-flow's YAML files draw inspiration from Continuous Integration (CI) pipelines and Ansible Playbooks to define the workflow, where each independent task that can be run in parallel. fasthep-flow will check the parameters of each task, and then generate the compute graph. The compute graph consists of nodes that describe input/output data and the compute task and edges for the dependencies between the tasks.

Documentation

This project is in early development. The documentation is available at fasthep-flow.readthedocs.io and contains mostly fictional features. The most useful information can be found in the FAST-HEP documentation. It describes the current status and plans for the FAST-HEP projects, including fasthep-flow (see Developer's Corner).

Installation

pip install fasthep-flow[dask, visualisation]

Examples

fasthep-flow execute docs/examples/hello_world.yaml
# example with plugins
fasthep-flow execute tests/data/plugins.yaml --dev --save-path=$PWD/output

Contributing

You had a look and are interested to contribute? That's great! There are three main ways to contribute to this project:

  1. Head to the issues tab and see if there is anything you can help with.
  2. If you have a new feature in mind, please open an issue first to discuss it. This way we can ensure that your work is not in vain.
  3. You can also help by improving the documentation or fixing typos.

Once you have something to work on, you can have a look at the contributing guidelines. It contains recommendations for setting up your development environment, testing, and more (compiled by the Scientific Python Community).

Important

How you customise your development environment is up to you. You like uv? Be our guest. You prefer nox? That's fine too. You want to use ? Go ahead. We are happy as long as you are happy. Ideally you should be able to run pylint, pytest, and the pre-commit hooks. If you can do that, you are good to go.

Tip

If you are looking for example workflows to run, have a look at the tests/data/ directory. Each of the configs can be run via

fasthep-flow execute <config> --dev --save-path=$PWD/output

The --dev flag will enable the development mode, which will allow you to (optionally) overwrite the workflow snapshot. This is useful for changes in fasthep-flow code without changes to the config file.

License

This project is licensed under the terms of the Apache 2.0 license. See LICENSE for more details.

Acknowledgements

Special thanks to the gracious help of FAST-HEP contributors:

m-glowacki
Maciek Glowacki
seriksen
Null
kreczko
Luke Kreczko

About

Convert YAML into a workflow DAG

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载