-
Notifications
You must be signed in to change notification settings - Fork 83
DSL2 - Genotyping Input Updates #1126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
… name coll. + not correct wet-lab-wise)
… same file, but still)
…ol and consistent saving of files.
Current issue, Read Group assignment is done at least in some cases to be consistent across all libraries of a given sample -- but this breaks mpiluep calling which expects each input file to have a unique Read Group Sample ID. will check if it breaks everything to swap to libID read groups (possibly only in the case we are running genotyping on unmerged libs?) 🗡️ RG assigned:
RG not assigned (?):
|
|
Got a working implementation for redeclaring the Read Group ID's. Needs test with multiple ref genomes/multiple libs from same sample. Then on to ANGSD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Various small-ish changes.
The PR seems contains various changes from the nf-test conversion branch. Merging this in its current state would likely break testing for everything. Please either rebase/revert commits causing merging that history, OR we wait till #1063 is merged.
@TCLamnidis Please take another look -- I have merged in all the changes from dev. There were a few merge conflicts in the modules.config file which were due to naming changes when I generalized the sample merging post mapping to be able to be run separately for damage manipulation. The CI nf-core test failed on MALT due to insufficient memory. Not sure how this has behaved for others. |
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.2.1. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, please reply to my comment on the issue thread. I had my doubts on this altogether, but still reviewed the code so it's ready for further development once/if I am sold on the idea.
- Default value of genotyping_source should stay
null
, to throw an error unless explicitly set by the user. - Should pileupCaller + Bed + SNP file check be deactivated?
- Multiple TODOs are left in modules.config. Should those be addressed here?
- Harshil alignment in modules.config please!
genotyping_tool = null | ||
genotyping_source = null | ||
skip_bcftools_stats = false | ||
genotyping_source = 'raw' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
genotyping_source = 'raw' | |
genotyping_source = null |
Setting this to null
was done on purpose, because users would often activate bam trimming or rescaling and not realise their genotypes were still from the raw bams. This way, they get an error if they try genotyping without explicitly setting the source.
}, | ||
"genotyping_source": { | ||
"type": "string", | ||
"default": "raw", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"default": "raw", |
See my previous comment
"default": 2, | ||
"description": "Specify the ploidy of the reference organism.", | ||
"help_text": "Specify the desired ploidy value of your reference organism for genotyping with GATK or FreeBayes. E.g. if you want to allow heterozygous calls this value should be >= 2.\n\n> Modifies GATK UnifiedGenotyper parameter: `--sample_ploidy`\n> Modifies GATK HaplotypeCaller parameter: `--sample_ploidy`\n> Modifies FreeBayes parameter: `-p`", | ||
"help_text": "Specify the desired ploidy value of your reference organism for genotyping with GATK or FreeBayes. E.g. if you want to allow heterozygous calls this value should be >= 2.\n\n> Modifies GATK UnifiedGenotyper parameter: `--sample_ploidy`\n> Modifies GATK HaplotypeCaller parameter: `--sample-ploidy`\n> Modifies FreeBayes parameter: `-p`.\n\n Note: Note: ploidy in the input fasta list take priority", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"help_text": "Specify the desired ploidy value of your reference organism for genotyping with GATK or FreeBayes. E.g. if you want to allow heterozygous calls this value should be >= 2.\n\n> Modifies GATK UnifiedGenotyper parameter: `--sample_ploidy`\n> Modifies GATK HaplotypeCaller parameter: `--sample-ploidy`\n> Modifies FreeBayes parameter: `-p`.\n\n Note: Note: ploidy in the input fasta list take priority", | |
"help_text": "Specify the desired ploidy value of your reference organism for genotyping with GATK or FreeBayes. E.g. if you want to allow heterozygous calls this value should be >= 2.\n\n> Modifies GATK UnifiedGenotyper parameter: `--sample_ploidy`\n> Modifies GATK HaplotypeCaller parameter: `--sample-ploidy`\n> Modifies FreeBayes parameter: `-p`.\n\n Note: When a reference sheet is provided, the ploidy specified in the sheet is used instead.", |
if ( params.genotyping_source == 'pmd' && ! params.run_pmd_filtering ) { exit 1, ("[nf-core/eager] ERROR: --genotyping_source cannot be 'pmd' unless PMD-filtering is ran.") } | ||
if ( params.genotyping_source == 'rescaled' && ! params.run_mapdamage_rescaling ) { exit 1, ("[nf-core/eager] ERROR: --genotyping_source cannot be 'rescaled' unless aDNA damage rescaling is ran.") } | ||
if ( ( params.genotyping_source == 'rescaled' || params.genotyping_source == 'pmd_timmed' || params.genotyping_source == 'pmd' ) && ! params.genotyping_use_unmerged_libraries ) { log.warn("[nf-core/eager] WARNING: Combining multiple libraries with rescaled damage for genotyping may be inappropriate!") } | ||
// if ( params.fasta && params.run_genotyping && params.genotyping_tool == 'pileupcaller' && ! (params.genotyping_pileupcaller_bedfile || params.genotyping_pileupcaller_snpfile ) ) { exit 1, ("[nf-core/eager] ERROR: Genotyping with pileupcaller requires both '--genotyping_pileupcaller_bedfile' AND '--genotyping_pileupcaller_snpfile' to be provided.") } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be uncommented? Is there another check for this?
pileupCaller cannot run without both.
|
||
// SUBWORKFLOW: merge libraries for saving (in final bams) and (potentially) genotyping | ||
|
||
ch_for_merging = ch_rescaled_bams.mix(ch_pmd_filtered_bams).mix(ch_trimmed_bams) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will currently publish all final_bams, across all damage manipulations?
Main bulk of this adjustment relates to #1128.
Otherwise, this is a QOL improvement/update for post mapping inputs into genotyping and its interactions with damage manipulation. Ascurrently implemented, this is a bit inflexible and requires the user to have specific parameter combinations for saving damage_manipulation outputs.
goal for improvement:
PR checklist
scrape_software_versions.py
nf-core lint .
).nextflow run . -profile test,docker
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).