Extended Data Fig. 2: Flowchart of the bioinformatic data processing. | Nature

Extended Data Fig. 2: Flowchart of the bioinformatic data processing.

From: Whole-genome sequencing of patients with rare diseases in a national health system

Extended Data Fig. 2

Flowchart describing the processing of samples and variants. Beginning at the top left, all samples were checked for data quality (Extended Data Fig. 3). Quick kinship and sex checks were regularly performed to ensure consistency with reported sex and family information. Samples that failed quality control, samples with clearly discordant sex data and the sub-optimal replicates of repeated samples were removed before further analysis (pink boxes). Sex chromosome karyotypes, ethnicities and relatedness/family trees were computed on these filtered samples (orange boxes) and variants were recalled for those samples with X/Y-chromosome ploidies different to those automatically predicted by the quick checks. After variant normalization, variant calls were loaded into HBase and merged, and summary statistics were calculated, stratified by technical factors (100, 125 and 150 bp) and ancestry (for example, African) (green boxes). Variant-specific minimum overall pass rates were calculated and used to filter inaccurately genotyped variants (Extended Data Fig. 4). Finally, variants were annotated in HBase with predicted consequence information and information from external databases, including allele frequencies (AF) (for example, gnomAD) (blue box).

Back to article page