Myrio

Myrio is a command-line application designed to identify the taxonomy of plants (and potentially other organisms) using amplified sequences (potentially mixed) of barcode genes.

The name Myrio is inspired by the scientific name of the plant Myriophyllum Spicatum, commonly known as Eurasian Watermilfoil, an aquatic plant found in the Léman.

Features

Cross-platform (windows, macOS, and linux are all supported)
Built with Rust (free from the hassle of installing/using Python code)
Zero external dependencies, myrio won't crash if you haven't installed another binary or library
Optimized codebase, including but not limited to:
- Custom sparse vector implementation with efficient operations
- Parallelism via Rayon
- Specialized database format able to store pre-computed k-mer counts efficiently
Flexible output, results can be exported as .csv or visualized as a tree in .txt format
Computation over heuristics, relies more on raw parallel computation and memory-efficient design rather than heuristics. For example, Myrio uses full k-mer counts (not just sets, and no minimizers).

Installation

To build and install myrio on your system, you'll need rust installed. Then run:

git clone https://github.com/anesthetice/Myrio
cd Myrio
RUSTFLAGS="-C target-cpu=native" cargo install --path myrio-cli

Usage

Creating a database (a "tree")

Each reference database corresponds to a single barcode gene (e.g. one database for matK, another for rbcL, etc.). Databases are generated from a single .fasta file.

FASTA entries must contain a tax={...} annotation. For example:

>BOLD_PROCESS_ID=ZPLPP049-13|tax={p:Tracheophyta, c:Magnoliopsida, o:Rosales, f:Rosaceae, g:Prunus, s:Prunus persica}
ATACCCTACCCCATTCATCTGGAAATCTTGGTTCAAACCCTTCGCTATTGGGTGAAAGACGCCTCTTCTTTGCATTTATTACGACTCTTTCTTCACGAGTATTATAATTGGAATAG...

The parser is flexible, so the following would also pass:

>BOLD_PROCESS_ID=ZPLPP049-13|tax={domain: Eukarya, kingdom: Plantae, phylum: Tracheophyta, class: Magnoliopsida, order: Rosales, family: Rosaceae, genus: Prunus, species: Prunus persica}
ATACCCTACCCCATTCATCTGGAAATCTTGGTTCAAACCCTTCGCTATTGGGTGAAAGACGCCTCTTCTTTGCATTTATTACGACTCTTTCTTCACGAGTATTATAATTGGAATAG...

>BOLD_PROCESS_ID=ZPLPP049-13|tax={g:prunus; species: Prunus persica;}
ATACCCTACCCCATTCATCTGGAAATCTTGGTTCAAACCCTTCGCTATTGGGTGAAAGACGCCTCTTCTTTGCATTTATTACGACTCTTTCTTCACGAGTATTATAATTGGAATAG...

Important constraints:

All entries must share the same highest rank. For example, if the highest rank of the first record is family, then every other record must also have family as their highest rank (note that the highest rank defined is Domain, while the lowest is Species).
No rank gaps are allowed. For instance, if you specify family, you cannot skip genus and go directly to species.

Once your FASTA database precursor file is ready, you can create a database with:

myrio tree new --input BOLD_Plantae_20250831_ITS.fasta --gene "ITS"
# And if we want to pre-compute k-mer counts with `k=18` (highly recommended, will significantly increase database size however)
myrio tree new --input BOLD_Plantae_20250831_ITS.fasta --gene "ITS" -k 18

This will create a file called BOLD_Plantae_20250831_ITS.myrtree.

If errors are encountered, they will be reported and the problematic entry skipped. For example:

Failed to parse taxonomic identity of the record starting on line 356021, Failed to parse string into a list of clade: cannot have rank gaps, expected 6 elements, got 5; string: '>BOLD_PROCESS_ID=MHPAF950-11|tax={p:Tracheophyta, c:Liliopsida, o:Poales, f:Poaceae, s:Poaceae A.guadamuz275}'

Bio-seq parsing error for the record starting on line 393997, Unrecognised character: 'I' (0x49)

See myrio-py/db_gen.py for an example of how to generate a FASTA precursor database (it's a marimo notebook)

Pre-built databases are also available on the releases page.

Running

Example runs:

# Input must be a single `.fastq` file.
# `--trees` can be a directory containing multiple `.myrtree` databases.
myrio run --input Berberis_Julianae_matK_rbcL_psbA-trnH_ITS.fastq --trees myrio-db/

# Also, k-mer counts computed (if not already pre-computed) can be cached directly into their respective database file.
myrio run --input Berberis_Julianae_matK_rbcL_psbA-trnH_ITS.fastq --trees myrio-db/ --k-search 19 --cache-counts
# Note that if `--k-search` is not provided, the value is read from the configuration file (`~/.config/myrio/myrio.conf.toml`).

# If you expect more clusters than gene databases, you can set `--nb-clusters`.
myrio run --input Berberis_Julianae_matK_rbcL_psbA-trnH_ITS.fastq --trees myrio-db/BOLD_Plantae_20250831_ITS.myrtree --nb-clusters 4

Acknowledgments

Special thanks to the Paoli Lab for hosting this project.
Special thanks to GenoRobotics, and especially our team for the 2025 Lemanic Life Sciences Hackathon, which built the proof-of-concept for this application.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
assets		assets
myrio-cli		myrio-cli
myrio-core		myrio-core
myrio-exp		myrio-exp
myrio-proc		myrio-proc
myrio-py		myrio-py
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md
justfile		justfile
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Myrio

Features

Table of Contents

Installation

Usage

Creating a database (a "tree")

Running

Acknowledgments

About

Uh oh!

Releases 3

Packages

Languages

anesthetice/Myrio

Folders and files

Latest commit

History

Repository files navigation

Myrio

Features

Table of Contents

Installation

Usage

Creating a database (a "tree")

Running

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages