+
Skip to content

selkamand/biotest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Biotest

Small examples of common filetypes for unit-testing bioinformatics frameworks.

All mutation data is either simulated, completely artificial, or subsampled somatic calls from publicly available cell lines.

Most files describe a single-sample. Samples describing a cohort, will be prefixed with 'cohort'.

Mutations (Small Variants)

VCFs

tumor_normal.2sample.purple.pave.hg38.vcf

  • Example somatic mutations in typical tumor-normal VCF, as produced by oncoanalyser (purple enriched VCF calls). Filter status of some PASS variants was manually changed to 'readStrandBias' or '.'. Variants have been annotated with PAVE.

tumor_normal.2sample.purple.minimal.hg38.vcf

  • Removed INFO and FORMAT fields except for GT using bcftools annotate -x "INFO,FORMAT"

tumor_normal.2sample.purple.minimal.vep.hg38.vcf

  • VEP annotated (with identify canonical transcripts on).

tumor.1sample.purple.pave.hg38.vcf

  • Single sample version of tumor_normal.2sample.purple.pave.hg38.vcf. 'Normal' sample dropped using bcftools -s tumor

tumor.1sample.purple.minimal.hg38.vcf

  • More minimal version of tumor_normal.2sample.purple.hg38.vcf with INFO and FORMAT fields dropped using bcftools annotate -x 'INFO,FORMAT'. Only GT field remains.

tumor.0sample.purple.minimal.hg38.vcf

  • Minimal VCF with no sample information. Mutations describe a single sample whose ID is not described anywhere in the file.

tumor.1sample.purple.vep.hg38.vcf

  • Annotated with VEP (CSQ info field). See header for command

tumor.1sample.purple.vep_and_pave.hg38.vcf

  • Annotated with VEP (CSQ info field). Pave annotations remain present. See tumor.singlesample.purple.vep.hg38.vcf for a VEP only version and
  • Options: GRCh38.p14; GENCODE 48; Cache Version 114_GRCh38

Tabular

annovar.hg38.txt & annovar.hg38.csv

  • Annovar annotation files generated by running tumor.singlasample.purple.hg38.vcf through wAnnovar (tsv and csv version)

chromposrefalt.1based.hg38.tsv

  • Minimal Tabular Variant Format (pass only).

bcftools view -f PASS -H tumor.singlesample.purple.minimal.hg38.vc f | cut -f1,2,4,5 | awk 'BEGIN{print "Chromosome","Position","Ref","Alt"}{print $0}' OFS="\t" | head > chromposrefalt.1based.hg38.tsv

Mutations (Structural Variants)

VCFs

tumor_normal.2sample.purple.sv.hg38.vcf

  • purple somatic SVs (PASS & INFERRED). Oncoanalyser Output.

tumor.1sample.purple.sv.hg38.vcf

  • somatic SVs (PASS & INFERRED) with only 1 sample (tumor sample) described.

tumor.0sample.purple.sv.hg38.vcf

  • somatic SVs (PASS & INFERRED) describing a single sample, with no sample ID in VCF.

Tabular

purple.sv.breakpoints.hg38.bedpe

  • Somatic breakpoints from tumor_normal.2sample.purple.sv.hg38.vcf. Does not include single breakends, where second breakpoint could not be found. See scripts/sv_vcf_to_tabular.R for code to reproduce.

purple.sv.breakends.hg38.bed

  • Somatic single breakends from tumor_normal.2sample.purple.sv.hg38.vcf. Does not include SVs where both ends of breakpoint are found. Score of breakends inferred by copynumnber change are set to zero. See scripts/sv_vcf_to_tabular.R for code to reproduce.

Mutations (Copy Number Variants)

purple.cnv.somatic.hg38.tsv

  • Copy number profile of all (contiguous) segments of a tumor sample

cohort.3sample.purple.hg38.tsv

  • Cohort segment file. Contains three samples with identical copynumber profiles (purple.cnv.somatic.hg38.tsv triplicated)

About

Files for unit-testing bioinformatic workflows

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载