Description
Description of the bug
Background
Akin to the exact same error behaviour and outcome to a related issue here: #440 (comment):
We have metadata from a large scale plant chipseq study called ChipHub, in which there are cases where they run a chipseq pipeline for samples that have a 1-treatment bio replicate -to- many-control bio replicate relationship:
sample,fastq_1,fastq_2,replicate,antibody,control,control_replicate
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,1
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,2
WT_INPUT,BLA203A6_S32_L006_R1_001.fastq.gz,,1,,,
WT_INPUT,BLA203A30_S21_L001_R1_001.fastq.gz,,2,,,
In this minimal example, we accomodate for the case where the sample
and replicate
values must be flattened (repeated) vs. each different control replicate specified by ChipHub for us to perform peak calling against (i.e. we want to make a comparison of each treatment replicate vs. each different control replicate:
For clarity, we focus just on the treatment rows, annotated in comments as repeat_i=0, and repeat_i=1 respectively:for:
sample,fastq_1,fastq_2,replicate,antibody,control,control_replicate
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,1 // <-- repeat_i=0,
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,2 // <-- repeat_i=1, but diff control rep
Is a repeated, in the sense that we want to keep everything equal (sample,replicate,control,antibody), but only differ in which control bio replicate we want to get a comparison against (e.g. for peak calling):
Matching columns
sample,fastq_1,fastq_2,replicate,antibody,control,
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1
Differing columns:
Command used and terminal output
Relevant files
No response
System information
No response