A high-performance Rust implementation of the Z-DNA formation prediction algorithm, originally developed by Ping-jung Chou under the instruction of Pui S. Ho.
Z-HUNT-3 is a computational tool that predicts the formation of Z-DNA (left-handed DNA) in naturally occurring sequences using a thermodynamic approach. This implementation is based on the seminal paper "A computer aided thermodynamic approach for predicting the formation of Z-DNA in naturally occurring sequences" published in The EMBO Journal, Vol.5, No.10, pp2737-2744, 1986.
- High Performance: Optimized Rust implementation with parallel processing capabilities
- Memory Efficient: Uses dynamic programming to reduce complexity from O(2^n) to O(n)
- Configurable Threading: Support for both parallel and sequential processing
- FASTA Support: Processes DNA sequences in FASTA format
- Statistical Analysis: Provides Z-scores and probability calculations for Z-DNA formation
- Rust 1.70 or later
- Cargo (comes with Rust)
git clone <repository-url>
cd zhunt
cargo build --release
The optimized binary will be available at target/release/zhunt
.
zhunt <window_size> <min_size> <max_size> <filename> [OPTIONS]
window_size
: Window size in dinucleotidesmin_size
: Minimum size for analysismax_size
: Maximum size for analysisfilename
: Input sequence file (FASTA format)
-t, --threads <THREADS>
: Number of threads to use (0 = auto-detect) [default: 0]-s, --sequential
: Use sequential processing instead of parallel-h, --help
: Print help information-V, --version
: Print version information
# Basic usage with auto-detected threads
./zhunt 50 6 50 sequence.fasta
# Use specific number of threads
./zhunt 50 6 50 sequence.fasta --threads 8
# Force sequential processing
./zhunt 50 6 50 sequence.fasta --sequential
The program generates a .Z-SCORE
file containing:
- Position information (start and end positions)
- Sequence length
- Delta linking number
- Slope values
- Probability scores
- DNA sequence segments
- Anti-syn configuration strings
filename sequence_length min_size max_size
position end_position length delta_linking slope probability sequence antisyn_config
The algorithm uses thermodynamic calculations to predict Z-DNA formation by:
- Energy Calculation: Computing delta BZ energy for dinucleotide pairs
- Configuration Analysis: Evaluating anti-syn conformations using dynamic programming
- Linking Number: Calculating optimal delta linking numbers
- Statistical Assessment: Providing probability scores based on normal distribution
- Parallel Processing: Uses Rayon for multi-threaded computation
- Memory Layout: Optimized data structures for cache efficiency
- Vectorization: Loop unrolling and SIMD-friendly operations
- Dynamic Programming: Reduces exponential complexity to linear
clap
: Command-line argument parsinganyhow
: Error handlingmemmap2
: Memory-mapped file I/Orand
: Random number generationrayon
: Data parallelism
Z-DNA is a left-handed double helix form of DNA that can form under certain conditions, particularly in sequences with alternating purine-pyrimidine patterns. The formation of Z-DNA has biological significance in:
- Gene regulation
- Chromatin structure
- DNA-protein interactions
- Genomic instability
This tool helps researchers identify potential Z-DNA forming regions in genomic sequences, which can be important for understanding regulatory mechanisms and structural biology.
The Rust implementation provides significant performance improvements over the original implementation:
- Multi-threading: Scales with available CPU cores
- Memory Efficiency: Reduced memory footprint through optimized data structures
- Cache Optimization: Improved memory access patterns
- Vectorization: Better utilization of modern CPU features
- FASTA format: Standard nucleotide sequence files
- Supported bases: A, T, G, C, N (case-insensitive)
- Circular sequences: Automatically handles sequence circularity
- Z-SCORE files: Tab-separated values with detailed analysis results
- Progress reporting: Real-time processing status updates
Contributions are welcome! Please ensure that:
- Code follows Rust best practices
- Performance optimizations maintain correctness
- Tests are included for new features
- Documentation is updated accordingly
This project maintains the spirit of the original research while providing a modern, high-performance implementation.
If you use Z-HUNT-3 in your research, please cite the original paper:
Chou, P.J. and Ho, P.S. (1986) A computer aided thermodynamic approach for predicting the formation of Z-DNA in naturally occurring sequences. The EMBO Journal, 5(10), 2737-2744.
- Original algorithm by Ping-jung Chou and Pui S. Ho
- Rust implementation optimizations and modernization
- Community contributions and feedback