This project provides a collection of scripts for analyzing small RNA sequencing data. The main functionality is centered around generating nucleotide density plots, visualizing read distributions, and processing sequencing data files.
-
src/: Contains the source code for the project.
- nucleotide_density_plot.py: Generate density plots from BAM files
- barcode_splitter.py: Split FASTQ files by barcode sequences
- SAM_read_counter.py: Count and extract reads from SAM files
- utils/: Utility functions package
-
tests/: Unit tests and test data
-
data/: Example datasets and file formats
-
docs/: Documentation and usage examples
-
requirements.txt: Python package dependencies
# Clone the repository
git clone https://github.com/username/small-RNA-bioinformatics-code.git
cd small-RNA-bioinformatics-code
# Install dependencies
pip install -r requirements.txt
python src/nucleotide_density_plot.py --input <input.bam> \
--output <output.pdf> \
--chromosome chr1 \
--start 1000 \
--end 2000 \
--rpm 1.5 \
--norm-factor 0.5
python src/barcode_splitter.py --input <input.fastq> \
--output-dir output/ \
--barcodes AGCG:PP333 CGTC:PP334
python src/SAM_read_counter.py --input <input.sam> \
--output filtered.sam \
--chromosome chr1 \
--start 1000 \
--end 2000 \
--max-length 30
--input
: Input file path (BAM/SAM/FASTQ)--output
: Output file path--output-dir
: Output directory for split files
--chromosome
: Target chromosome name--start
: Region start position--end
: Region end position--rpm
: Reads per million--norm-factor
: Normalization factor
--barcodes
: Space-separated list of "sequence:name" pairs- Allows 1 mismatch in barcode matching
- Removes barcode from sequence after matching
--chromosome
: Target chromosome--start
: Region start position--end
: Region end position--max-length
: Maximum read length (default: 30)
If you use this code in your research, please cite:
Spracklin G, Fields B, Wan G, Becker D, Wallig A, Shukla A, Kennedy S. The RNAi Inheritance Machinery of Caenorhabditis elegans. Genetics. 2017 Jul;206(3):1403-1416. doi: 10.1534/genetics.116.198812. PMID: 28533440; PMCID: PMC5500139.
Contributions are welcome! Please submit a pull request or open an issue for any suggestions or improvements.