+
Skip to content

gspracklin/small-RNA-bioinformatics-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

small-RNA-bioinformatics-code

This project provides a collection of scripts for analyzing small RNA sequencing data. The main functionality is centered around generating nucleotide density plots, visualizing read distributions, and processing sequencing data files.

Project Structure

  • src/: Contains the source code for the project.

    • nucleotide_density_plot.py: Generate density plots from BAM files
    • barcode_splitter.py: Split FASTQ files by barcode sequences
    • SAM_read_counter.py: Count and extract reads from SAM files
    • utils/: Utility functions package
  • tests/: Unit tests and test data

  • data/: Example datasets and file formats

  • docs/: Documentation and usage examples

  • requirements.txt: Python package dependencies

Installation

# Clone the repository
git clone https://github.com/username/small-RNA-bioinformatics-code.git
cd small-RNA-bioinformatics-code

# Install dependencies
pip install -r requirements.txt

Usage

Nucleotide Density Plots

python src/nucleotide_density_plot.py --input <input.bam> \
                                     --output <output.pdf> \
                                     --chromosome chr1 \
                                     --start 1000 \
                                     --end 2000 \
                                     --rpm 1.5 \
                                     --norm-factor 0.5

Barcode Splitting

python src/barcode_splitter.py --input <input.fastq> \
                              --output-dir output/ \
                              --barcodes AGCG:PP333 CGTC:PP334

SAM Read Counting

python src/SAM_read_counter.py --input <input.sam> \
                              --output filtered.sam \
                              --chromosome chr1 \
                              --start 1000 \
                              --end 2000 \
                              --max-length 30

Command Line Arguments

Common Arguments

  • --input: Input file path (BAM/SAM/FASTQ)
  • --output: Output file path
  • --output-dir: Output directory for split files

Nucleotide Density Plot

  • --chromosome: Target chromosome name
  • --start: Region start position
  • --end: Region end position
  • --rpm: Reads per million
  • --norm-factor: Normalization factor

Barcode Splitter

  • --barcodes: Space-separated list of "sequence:name" pairs
  • Allows 1 mismatch in barcode matching
  • Removes barcode from sequence after matching

SAM Read Counter

  • --chromosome: Target chromosome
  • --start: Region start position
  • --end: Region end position
  • --max-length: Maximum read length (default: 30)

Citation

If you use this code in your research, please cite:

Spracklin G, Fields B, Wan G, Becker D, Wallig A, Shukla A, Kennedy S. The RNAi Inheritance Machinery of Caenorhabditis elegans. Genetics. 2017 Jul;206(3):1403-1416. doi: 10.1534/genetics.116.198812. PMID: 28533440; PMCID: PMC5500139.

Contributing

Contributions are welcome! Please submit a pull request or open an issue for any suggestions or improvements.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载