+

WO2019173991A1 - Malignant lymphoma marker and application thereof - Google Patents

Malignant lymphoma marker and application thereof Download PDF

Info

Publication number
WO2019173991A1
WO2019173991A1 PCT/CN2018/079061 CN2018079061W WO2019173991A1 WO 2019173991 A1 WO2019173991 A1 WO 2019173991A1 CN 2018079061 W CN2018079061 W CN 2018079061W WO 2019173991 A1 WO2019173991 A1 WO 2019173991A1
Authority
WO
WIPO (PCT)
Prior art keywords
mutation
sequencing
probe
candidate
sequence
Prior art date
Application number
PCT/CN2018/079061
Other languages
French (fr)
Chinese (zh)
Inventor
潘嫱
叶晓飞
苏红
刘栋兵
任伟成
吴逵
朱师达
Original Assignee
深圳华大生命科学研究院
潘嫱
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大生命科学研究院, 潘嫱 filed Critical 深圳华大生命科学研究院
Priority to PCT/CN2018/079061 priority Critical patent/WO2019173991A1/en
Priority to CN201880083693.1A priority patent/CN111655868A/en
Publication of WO2019173991A1 publication Critical patent/WO2019173991A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids

Definitions

  • the invention relates to the field of gene sequencing and medical detection, in particular to a malignant lymphoma marker and application thereof, in particular to a malignant lymphoma marker and a probe and a chip for detecting the marker, and constructing a malignant sample to be tested
  • a method and system for determining a genetic mutation in a malignant lymphoma in a sample to be tested by a method for detecting a sequencing library of lymphoma A method and system for determining a genetic mutation in a malignant lymphoma in a sample to be tested by a method for detecting a sequencing library of lymphoma.
  • Malignant lymphoma is a type of systemic disease that is closely related to the functional status of the body's immune system. It is different from other solid malignant tumors and different from blood tumors. It includes a disease of Hodgkin's lymphoma and a group of diseases of non-Hodgkin's lymphoma. The clinical manifestations are complicated by the type of pathology, stage and invasion. Currently, multiple FDA-approved molecularly targeted drugs are available for malignant lymphomas such as Ibrutinib (BTK) and Idelalisib (PI3K delta), so accurate and timely detection of malignant lymphoma gene mutations is significant for clinical diagnosis and treatment. The meaning.
  • BTK Ibrutinib
  • PI3K delta Idelalisib
  • an object of the present invention is to provide a malignant lymphoma marker and a probe and a chip for detecting the same, and a method for constructing a sequencing library of a malignant lymphoma detection sample to determine a malignant lymphocyte in a sample to be tested. Methods and systems for gene mutations in tumors.
  • the invention provides a malignant lymphoma marker comprising the genes in the following table:
  • the present invention selects 212 genes which are highly associated with malignant lymphoma as markers related to malignant lymphoma, and the present invention is stronger than the technique based on detection of all genes associated with multiple cancers at one time. Targeted, and the detection range is smaller, the detection cost is lower, and the efficiency can be significantly improved while improving the efficiency.
  • the malignant lymphoma is a diffuse large B-cell lymphoma.
  • the invention provides a probe for a malignant lymphoma.
  • the probe is designed for all exon regions in the marker described in the above table and the junction region of the exon and the intron, the probe specifically recognizing the above The at least one portion of the malignant lymphoma marker coding region, and the probe satisfies at least one selected from the group consisting of:
  • the length of the probe is 75-85 bp, preferably 81 bp;
  • the probe specifically recognizes a sequence from 10 bp upstream to 10 bp downstream of the marker coding region described in the above Examples;
  • the melting temperature of the probe and the target sequence is 60-10 degrees Celsius, preferably 80 degrees Celsius;
  • the probe does not comprise a hairpin structure
  • the window sliding size when the probe is selected is 10 bp.
  • the present invention provides a gene chip.
  • the gene chip comprises a probe and a support, the probe being located on the surface of the support, the probe being the probe described in the above embodiments.
  • the gene chip may further add the following technical features:
  • the gene chip is a liquid phase chip and the support is a microsphere containing different fluorescent labels.
  • the present invention provides a method for constructing a sequencing library of a malignant lymphoma detection sample to be tested, comprising: enriching a target sequence of a sample to be tested, the target sequence being as described in the above table A malignant lymphoma marker, and the enriched target sequence constitutes the sequencing library for malignant lymphoma detection.
  • the method may further add the following technical features:
  • the target sequence of the sample to be tested is subjected to hybridization capture using the probe described in the above embodiment or the gene chip described in the above embodiment, thereby achieving the enrichment.
  • the method further comprises: sequencing the sequencing library for malignant lymphoma detection to obtain a sequencing sequence.
  • the sequencing library for malignant lymphoma detection is sequenced using a BGISeq-500 sequencing platform.
  • the sequencing sequence has a sequencing depth of 400 ⁇ or more, and the coverage of the sequencing sequence reaches 99% or more.
  • the raw data amount of the sequencing sequence is above 3Gb.
  • the present invention provides a method of determining a gene mutation of a malignant lymphoma in a sample to be tested. According to an embodiment of the invention, the method comprises:
  • the sequencing sequence is aligned to a reference genome, and mutation detection is performed to obtain candidate mutation data;
  • the potential mutation data is annotated to obtain target mutation data.
  • the above method for determining a gene mutation for a malignant lymphoma may further include the following technical features:
  • the reference genome is the human reference genome hg19.
  • the mutation detection is performed using VarScan software.
  • screening the candidate mutation data comprises: filtering out low quality, low coverage, candidate mutations located at both ends of the repeat region and the sequence, and having chain bias, wherein A low-mass candidate mutation refers to a candidate mutation having a base mass value of less than 20 or a ratio of less than 30, and the candidate mutation of the low coverage refers to a candidate mutation having a minimum support number of less than 3.
  • the annotation is performed by using ANNOVA software
  • the polymorphic site is filtered out by using a population mutation database
  • the benign mutation is filtered out by using the pathogenic mutation database.
  • the population mutation database is selected from at least one of a thousand human genome database, an ExAc database, and an Esp6500 database.
  • the disease-causing mutation database is ClinVar.
  • the method further comprises:
  • the sequencing sequence Prior to said mutation detection, the sequencing sequence is quality controlled to filter out low quality and linker contamination sequences, and the filtered sequences are then aligned to the reference genome.
  • the present invention provides a system for determining a gene mutation of a malignant lymphoma in a sample to be tested.
  • the system comprises:
  • target region library construction unit wherein the target region library construction unit is based on the marker described in the above embodiment as a target region, thereby constructing a target region library
  • the sequencing unit is connected to the target region library building unit, and the sequencing unit detects the target region library to obtain a sequencing sequence;
  • a candidate mutation determining unit wherein the candidate mutation determining unit is connected to the sequencing unit, wherein the candidate mutation determining unit is configured to compare the sequencing sequence in the target region library to the reference genome, and perform mutation detection to obtain candidate mutation data;
  • a potential mutation determining unit the potential mutation determining unit being connected to the candidate mutation determining unit, wherein the potential mutation determining unit is configured to screen the candidate mutation data to obtain potential mutation data;
  • the target mutation determining unit is connected to the potential mutation determining unit, and the target mutation determining unit is configured to annotate the potential mutation data to obtain target mutation data.
  • the system for determining a gene mutation for a malignant lymphoma may further include the following technical features:
  • the reference genome is a human reference genome hg.
  • the mutation detection is performed using VarScan software.
  • screening the candidate mutation data comprises: filtering out low quality candidate mutations, low coverage candidate mutations, candidate mutations at both ends of the repeat region and the sequence, and having a chain bias
  • candidate mutations in which the low-mass candidate mutation refers to a candidate mutation having a base mass value of less than 20 or a specific mass value of less than 30, and the low-coverage candidate mutation refers to a candidate mutation having a minimum support number of less than 3.
  • the annotation is performed by using ANNOVA software, the polymorphic site is filtered out by using a population mutation database, and the benign mutation is filtered out by using the pathogenic mutation database.
  • the population mutation database is selected from at least one of a thousand human genome database, an ExAc database, and an Esp6500 database.
  • the disease-causing mutation database is ClinVar.
  • system further comprises:
  • a quality control unit the quality control unit being coupled to the sequencing unit, the quality control unit for performing quality control on the sequencing sequence prior to the detecting of the mutation, thereby filtering out low quality and joint contamination sequences, and then filtering The filtered sequences are aligned to the reference genome.
  • the invention provides the use of a combination of 212 markers in the above table for the preparation of a reagent for the detection and/or determination of a mutation in a malignant lymphoma gene.
  • the invention provides the use of the combination of 212 markers in the above table for the detection and/or determination of genetic mutations in malignant lymphoma.
  • the present invention enriches 212 specific malignant lymphoma specific target genes, and then uses high-throughput sequencing means for detecting and determining a mutant gene associated with malignant lymphoma. It can be quickly and effectively used to detect single base substitutions, single base/multibase insertions or deletions in target sequences, and large fragment deletion/amplification mutation types, which can meet the high-efficiency and comprehensive detection of common malignant lymphoma gene mutations. .
  • the method of detection by means of the BGISEQ-500 second-generation sequencing platform has the advantages of wide application range, high efficiency, comprehensiveness and easy operation, and realizes rapid and efficient determination of genes related to malignant lymphoma.
  • FIG. 1 is a schematic diagram of a system for determining genetic mutations in a malignant lymphoma, in accordance with an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a system for determining genetic mutations in a malignant lymphoma, in accordance with an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of obtaining a target mutation by analyzing a sequencing sequence according to an embodiment of the present invention.
  • the method for target sequence capture and high-throughput sequencing of malignant lymphoma genes as described in the present invention is designed based on the needs of the gene mutation detection technology for malignant lymphoma.
  • the present invention targets all exon regions and exon-intron junction regions of common malignant lymphoma genes (212 genes shown in Table 1) as target capture regions, and designs probes capable of simultaneously capturing all target sequence regions. Combine and then customize the liquid phase chip (produced by Huada Gene) and combine the BGISEQ-second generation high-throughput sequencing technology and information analysis technology to sequence all captured target sequences and different types of mutation information.
  • the invention has the advantages of wide application range, high efficiency, comprehensiveness, easy operation, and the like, and detects single base substitution, single base/multibase insertion or deletion, and large fragment deletion/amplification in the target sequence, and satisfies malignancy. Efficient, comprehensive detection of lymphoma gene mutations.
  • the inventors of the present invention collected and analyzed a plurality of genes associated with malignant lymphoma by conducting research and analysis, and finally determined 212 gene combinations related to malignant lymphoma according to their correlation and pathogenicity (eg, 1)), as a marker for identifying malignant lymphoma, can also use these markers as target regions to enrich them, and can effectively detect and/or identify genetic mutations associated with malignant lymphoma, including However, it is not limited to single base substitution, single base/multibase insertion or deletion, and large fragment deletion/amplification, so that it can satisfy the high-efficiency and comprehensive detection of malignant lymphoma gene mutations. The sensitivity is over 93%.
  • Tables 2 and 3 list the names of the malignant lymphoma cancer genes and their corresponding malignant lymphoma names, respectively. Based on a series of theoretical studies and experimental verification work, the inventors discovered and demonstrated the correlation between the 212 genes in the above table, and concluded that the effective detection of malignant lymphoma can be achieved by using this group of genes, and With a single gene or other combination of genes as markers, the test results are more accurate, reliable, and reproducible.
  • this group of genes is involved in important pathogenic signaling pathways of lymphoma, such as BCR, chromatin modification, apoptosis and cell cycle regulation, immunosuppression, and Notch.
  • This group of genes has broad and comprehensive advantages in the field of lymphoma cancer gene detection.
  • these genes are also listed separately in the literature with high impact factors, and, to date, no report has been made to use the combination of these 212 genes as a marker for malignant lymphoma.
  • KLF2 gene, ZFP36L1 gene, and TMSB4X gene are the first inventors to discover that the frequency of mutations in Asian ethnic groups is significantly higher than that of Caucasians. gene.
  • genes associated with malignant lymphoma are associated with diffuse large B-cell lymphoma, mantle cell lymphoma, follicular lymphoma, Burkitt's lymphoma, especially with diffuse large B-cell lymphoma.
  • Important in Table 1 refers to an important pathogenic signaling pathway present in lymphoma.
  • the inventors designed a probe and gene chip that can be used for malignant lymphoma.
  • the probe is designed with all exon regions and exon and intron junction regions of the 212 target cancer genes as the total target region, and the probe specifically recognizes the above 212 markers.
  • the probe satisfies at least one selected from the group consisting of: (1) the probe has a length of 75-85 bp, preferably 81 bp; (2) the probe specifically recognizes this 212 sequences between 10 bp and 10 bp downstream of the marker coding region; (3) probes that specifically recognize regions with GC content higher than 0.6 and below 0.3, multiplier greater than 2; (4) The melting temperature of the probe to the target sequence is 60-10 degrees Celsius, preferably 80 degrees Celsius; (5) the probe does not comprise a hairpin structure; (6) the probe matches at most 2 sites on the reference genome; (7) The window sliding size when the probe is selected is 10 bp.
  • the probes for malignant lymphoma designed according to the above principles contain a total of 32779 probes, each of which has a length of 81 bp, and each of which contains a 16 bp and 15 bp tag sequence, and the sequence of the two tag sequences.
  • the two tag sequences, GAAGCGAGGATCAACT (SEQ ID NO: 1) and CATTGCGTGAACCGA (SEQ ID NO: 2), respectively, are the restriction sites and transcription sites, and both ends are used to design PCR primers, and the transcription sites are simultaneously Used for transcription and functions as an RNA probe.
  • the inventors have also devised a gene chip comprising a probe and a support, the probe being located on the surface of the support.
  • the gene chip may be designed as a liquid phase chip, and the support is a microsphere containing different fluorescent labels.
  • a sequencing library for malignant lymphoma can be constructed, and on the basis of this, the sequencing is performed.
  • Bioinformatics analysis of the library can effectively detect and/or identify genetic mutations associated with malignant lymphoma, including but not limited to single base substitutions, single base/multibase insertions or deletions, and large fragment deletions/amplifications
  • the type of mutation can meet the high-efficiency and comprehensive detection of gene mutations in malignant lymphoma. It has been proved by experiments that the sensitivity is over 93%.
  • the method for determining a gene mutation of a malignant lymphoma in a sample to be tested comprises: enriching a target sequence of a sample to be tested, wherein the target sequence is a combination of 212 malignant lymphoma markers,
  • the obtained target sequence constitutes the sequencing library for detection of malignant lymphoma;
  • the sequencing library of the malignant lymphoma detection is sequenced to obtain a sequencing sequence;
  • the sequencing sequence is aligned to a reference genome for mutation Detecting, obtaining candidate mutation data; screening the candidate mutation data to obtain potential mutation data; annotating the potential mutation data to obtain target mutation data.
  • the sample to be tested in the present invention may be derived from a tissue sample.
  • the method for determining a gene mutation of a malignant lymphoma in a sample to be tested can also be expressed as a method for detecting and/or determining a gene mutation for a malignant lymphoma, which is not a method for diagnosing a disease.
  • the mutation results detected by the present invention can only indicate that the cancer tissue of the relevant individual carries a consistent cancer-driven gene mutation, and in practice, it is also necessary to combine the clinical results to confirm the individual's disease.
  • target region DNA enrichment method based on multiplex PCR technology (such as Thermo Fisher Scientific AmpliSeq technology) and Target region DNA enrichment methods based on probe hybridization techniques (such as Agilent's SureSelect technology, and Nimble's SeqCap EZ technology).
  • Illumina's Hiseq/Miseq/NextSeq, Thermo Fisher Scientific's Ion Proton/Ion PGM, and BGI SEQ-500 of the BGI gene can be used.
  • the sequencing sequence is obtained by sequencing using a BGISeq-500 sequencing platform.
  • High-throughput sequencing using a self-developed sequencer from Huada has stronger compatibility and better sequencing results.
  • the original data volume of the constructed sequencing library reaches 3 Gb or more
  • the target region has a sequencing depth of 400 ⁇ or more
  • the target region coverage reaches 99% or more.
  • the sequencing depth refers to the ratio of the total number of bases (bp) and the genome size (Genome) obtained by sequencing, reflecting the average number of times a single base on the tested genome is sequenced.
  • Sequencing coverage refers to the proportion of sequences obtained by sequencing to the entire genome.
  • the candidate mutation data is screened, including screening for removal of low-quality, low-coverage, candidate mutation data located at the ends of the repeat region and the sequence, and having strand bias.
  • the low-quality candidate mutation refers to a sequence having a base mass value of less than 20 (base quality ⁇ 20) or a pair of quality values (mapping quality ⁇ 30), and a low-coverage candidate mutation refers to a minimum support number of less than 3 ( Candidate mutations for minimal support depth ⁇ 3).
  • Candidate mutations with strand bias refer to candidate mutations that occur only on one strand.
  • the present invention is based on a combination of 212 genes associated with a lymphoma associated with the discovery of the inventors, and a combination of the 212 specific genes as a target gene, and a system for determining a gene mutation of a malignant lymphoma in a sample to be tested is designed.
  • the system for determining a gene mutation of a malignant lymphoma in a sample to be tested according to the present invention can also be understood as a system for detecting a gene mutation of a malignant lymphoma in a sample to be tested, and is used for detecting and determining a malignant sample in a sample to be tested.
  • the present invention provides a system for determining a gene mutation of a malignant lymphoma in a sample to be tested, as shown in FIG. 1, the system comprising: a target region library building unit, and the target region library construction The unit constructs a library of target regions based on the combination of markers in the present invention 212 as a target region; a sequencing unit, the sequencing unit is connected to the target region library building unit, and the sequencing unit detects the target region library so that Obtaining a sequencing sequence; a candidate mutation determining unit, wherein the candidate mutation determining unit is connected to the sequencing unit, wherein the candidate mutation determining unit is configured to compare the sequencing sequence in the target region library to the reference genome, and perform mutation detection to obtain Candidate mutation data; a potential mutation determining unit, the potential mutation determining unit being linked to the candidate mutation determining unit, wherein the potential mutation determining unit is configured to screen the candidate mutation data to obtain potential mutation data; Unit, the target mutation determining unit and the potential The mutation determining unit is
  • the system for determining a gene mutation of a malignant lymphoma in a sample to be tested may also be as shown in FIG. 2, the system comprising: a target region library building unit, wherein the target region library building unit is based on The marker combination in the invention 212 is used as a target region to construct a target region library; the sequencing unit is connected to the target region library construction unit, and the sequencing unit detects the target region library; the quality control unit, The quality control unit is coupled to the sequencing unit, and the quality control unit is configured to perform quality control on the sequencing sequence before the mutation detection, thereby filtering out low quality and joint contamination sequences, and then filtering the filtered Aligning a sequence to the reference genome; a candidate mutation determining unit, the candidate mutation determining unit being coupled to a quality control unit, the candidate mutation determining unit for aligning the filtered sequence to the reference genome , performing mutation detection to obtain candidate mutation data; potential mutation determining unit, the latent a mutation determining unit is coupled to the candidate mutation
  • the length of the probe is 81 bp
  • the probe specifically recognizes a sequence of from 10 bp upstream to 10 bp downstream of 212 of the marker coding regions in Table 1;
  • the melting temperature of the probe and the target sequence is 60-10 degrees Celsius, preferably 80 degrees Celsius;
  • the probe does not comprise a hairpin structure
  • the window sliding size when the probe is selected is 10 bp.
  • the finally obtained target region probe sequence contains 32779 probes, each of which has a length of 81 bp, and each of which contains a 16 bp and 15 bp tag sequence, and the sequence of the two tag sequences is GAAGCGAGGATCAACT (SEQ ID NO). : 1) and CATTGCGTGAACCGA (SEQ ID NO: 2).
  • the two tag sequences are respectively an enzyme cleavage site and a transcription site, and both ends are used to design PCR primers, and the transcription site is used for transcription and functions as an RNA probe.
  • KLF2 gene probe sequence (SEQ ID NO: 3)
  • KLF2 gene probe sequence (SEQ ID NO: 4)
  • KLF2 gene probe sequence (SEQ ID NO: 5)
  • KLF2 gene probe sequence (SEQ ID NO: 6)
  • KLF2 gene probe sequence (SEQ ID NO: 7)
  • ZFP36L1 gene probe sequence (SEQ ID NO: 8)
  • ZFP36L1 gene probe sequence (SEQ ID NO: 9)
  • ZFP36L1 gene probe sequence (SEQ ID NO: 10)
  • TMSB4X gene probe sequence (SEQ ID NO: 13)
  • TMSB4X gene probe sequence (SEQ ID NO: 14)
  • TMSB4X gene probe sequence (SEQ ID NO: 15)
  • TMSB4X gene probe sequence (SEQ ID NO: 16)
  • TMSB4X gene probe sequence (SEQ ID NO: 17)
  • the liquid phase chip is prepared by using polysphere microspheres having a diameter of about 5.6 ⁇ m, a carboxyl group on the surface, and red and orange dyes inside, according to the ratio of the two dyes. The difference can be divided into 100 kinds of microspheres, each with a number. Each microsphere has a specific spectral characteristic due to the difference in internal fluorescence ratio and can be specifically recognized by the laser. Different probe molecules are coated with different numbered microspheres to detect the target molecule in the sample, and the target molecule is then combined with the reporter molecule with fluorescence. The detection of the molecule of interest is then achieved by fluorescence detection.
  • the experimental samples used were 16 tissue samples clinically diagnosed as diffuse large B-cell lymphoma.
  • the specific experimental methods are as follows:
  • Genomic DNA was extracted from diffuse large B-cell lymphoma tissue samples using the QIAGEN DNA Tissue and Blood mini kit and using the QIAGEN DNA Tissue and Blood mini kit, as described in the kit's extraction instructions.
  • Fluorescence analyzer to detect DNA concentration, the required concentration is greater than 5ng / ⁇ L, the volume is greater than 30 ⁇ L, and in principle, the DNA yield of each sample is ⁇ 2 ⁇ g, then the DNA is detected by electrophoresis and its degradation degree, which is not suitable for the seriously degraded sample.
  • the library wherein the electrophoresis conditions were: 1% agarose gel, electrophoresis voltage 4 V/cm, electrophoresis time 45 min. The results of agarose gel electrophoresis showed that the DNA of all samples was intact and substantially free of degradation.
  • genomic DNA 100 ng was taken and randomly interrupted by enzyme digestion using a DNA interrupter, and the terminal repair and A were simultaneously performed; followed by ligation and purification, PCR amplification, obtaining a pre-hybrid library, and using the Agient 2100 bioanalyzer.
  • the second fragment is screened to obtain a length fragment of 150-500 bp; then the PCR product is subjected to target region hybridization capture using a liquid phase capture chip, and the target DNA is eluted from the probe by an elution reagent to obtain a desired target DNA. After that, PCR amplification is performed.
  • the resulting product was cyclized to construct a library captured in the region of interest, wherein the yield of the hybrid library obtained was greater than 160 ng.
  • liquid phase capture chip used was prepared as in Example 1.
  • the library DNA after the quality control was subjected to sequencing on the basis of the operation instructions of BGISeq-500 sequencing.
  • the obtained raw data amount of each sample reached more than 3Gb, the average sequencing depth of the target area reached 400 ⁇ , and the target area coverage was over 99%.
  • the quality of the sequencing data of 16 samples is shown in Table 4 below.
  • quality control is performed on the prepared reads, thereby removing sequences whose sequencing quality is not in conformity with the requirements and sequencing of the junction contamination, and obtaining a clean sequence (ie, the filtered sequence).
  • the filtered sequence was then aligned to the human reference genome Hg19 (http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/) using bwa (Burrows-Wheeler Aligner) software to obtain alignment results.
  • VarScan software to detect mutations, obtain candidate mutations, and perform initial filtering on candidate mutation results, filtering out low quality (base quality ⁇ 20 or mapping quality ⁇ 30), low coverage (minimal support depth ⁇ 3), and repeating The region and the ends of the reads, with strand-biased mutation sites, eventually yielded a list of potential mutations.
  • a list of potential mutations obtained is annotated with ANNOVA software, excluding synonymous mutations therein. Then use a population mutation database (such as the Thousand Genome Database (http://www.1000genomes.org), ExAC database and Esp6500 database) to filter the common polymorphic sites in the population. Using a pathogenic mutation database (such as ClinVar), the benign mutation is filtered out and the final mutation result is obtained, that is, the target mutation data is obtained.
  • the synonymous mutation is a neutral mutation. Due to the degenerate phenomenon of the genetic code of the organism, when the synonymous mutation occurs, the base is replaced, and a new codon is generated, but the new and old codons are encoded. The amino acid type remains unchanged, so this part of the mutation does not have any effect on the pathogenic condition.
  • the experimental results showed that 163 cancer mutation sites were detected by analyzing and filtering the sequencing data of 16 samples.
  • This example utilizes the same sample as in the second embodiment, and uses the hiseq2000 sequencing platform to construct a sequencing library corresponding to each sample according to its operation guide according to the method of whole genome sequencing, and according to the same method as step 4 in the second embodiment.
  • Example 2 Comparing the experimental results of Example 2 and Example 3, it can be seen that a total of 174 cancer mutations were detected in all 16 patients with diffuse large B-cell lymphoma using whole-genome sequencing, and the use was related to malignant lymphoma. Of the 212 target gene capture methods, a total of 163 cancer mutations were detected in the 174 cancer mutations. Comparing the two results, it was observed that mutations were made using 212 target gene captures associated with malignant lymphoma. Detection, compared to the whole genome sequencing for mutation detection, the overall sensitivity reached 93.7%. The detailed detection of each sample is shown in Table 5 below, including SNP mutation sites and insertion deletion variants (Indel mutations):
  • the correlation between the minimum allele frequency detected by the target gene capture and the minimum allele frequency detected by whole genome sequencing was as high as 0.8186 (r2, Pearson correlation coefficient) as shown in FIG.
  • the abscissa in Figure 4 represents the minimum allele frequency (MAF in WGS, Minor allele frequency in Whole-genome-sequencing) obtained by whole genome sequencing, and the ordinate represents the minimum obtained by target gene capture sequencing.
  • the terms “installation”, “connected”, “connected”, “fixed” and the like shall be understood broadly, and may be either a fixed connection or a detachable connection, unless explicitly stated and defined otherwise. Or in one piece; it may be a mechanical connection, or it may be an electrical connection or a communication with each other; it may be directly connected or indirectly connected through an intermediate medium, and may be an internal connection of two elements or an interaction relationship between two elements. Unless otherwise expressly defined. For those skilled in the art, the specific meanings of the above terms in the present invention can be understood on a case-by-case basis.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided is a malignant lymphoma marker. The malignant lymphoma marker comprises 212 genes in total, such as AICDA and AKT1. Further provided is an application of the marker in the fields of gene sequencing and medical detection.

Description

恶性淋巴瘤标志物及其应用Malignant lymphoma marker and its application 技术领域Technical field
本发明涉及基因测序以及医学检测领域,具体涉及一种恶性淋巴瘤标志物及其应用,尤其涉及一种恶性淋巴瘤标志物及检测该标志物的探针和芯片,以及构建待测样品的恶性淋巴瘤检测测序文库的方法,确定待测样品中恶性淋巴瘤的基因突变的方法以及系统。The invention relates to the field of gene sequencing and medical detection, in particular to a malignant lymphoma marker and application thereof, in particular to a malignant lymphoma marker and a probe and a chip for detecting the marker, and constructing a malignant sample to be tested A method and system for determining a genetic mutation in a malignant lymphoma in a sample to be tested by a method for detecting a sequencing library of lymphoma.
背景技术Background technique
恶性淋巴瘤是一类全身性疾病,与机体免疫系统功能状态密切相关,既不同于其他实体恶性肿瘤,也有别于血液肿瘤。它包括了霍奇金淋巴瘤一种疾病和非霍奇金淋巴瘤一组疾病,临床表现因病理类型、分期及侵犯部位不同而错综复杂。目前,多个FDA批准的分子靶向药物适用于恶性淋巴瘤,例如Ibrutinib(BTK)和Idelalisib(PI3K delta),因此对恶性淋巴瘤基因突变进行准确及时的检测对于其临床诊断及治疗有着很大的意义。Malignant lymphoma is a type of systemic disease that is closely related to the functional status of the body's immune system. It is different from other solid malignant tumors and different from blood tumors. It includes a disease of Hodgkin's lymphoma and a group of diseases of non-Hodgkin's lymphoma. The clinical manifestations are complicated by the type of pathology, stage and invasion. Currently, multiple FDA-approved molecularly targeted drugs are available for malignant lymphomas such as Ibrutinib (BTK) and Idelalisib (PI3K delta), so accurate and timely detection of malignant lymphoma gene mutations is significant for clinical diagnosis and treatment. The meaning.
然而对于与恶性淋巴瘤相关的基因突变的检测和确定还需要改进。However, the detection and determination of genetic mutations associated with malignant lymphomas needs to be improved.
发明内容Summary of the invention
本发明旨在至少在一定程度上解决相关技术中的技术问题之一,提高对于恶性淋巴瘤基因突变的检测的高效性和灵敏度。为此,本发明的一个目的在于提出一种恶性淋巴瘤标志物及检测该标志物的探针和芯片,以及构建待测样品的恶性淋巴瘤检测测序文库的方法,确定待测样品中恶性淋巴瘤的基因突变的方法和系统。The present invention aims to solve at least to some extent one of the technical problems in the related art, and to improve the efficiency and sensitivity of detection of a gene mutation of a malignant lymphoma. To this end, an object of the present invention is to provide a malignant lymphoma marker and a probe and a chip for detecting the same, and a method for constructing a sequencing library of a malignant lymphoma detection sample to determine a malignant lymphocyte in a sample to be tested. Methods and systems for gene mutations in tumors.
根据本发明的一方面,本发明提供了一种恶性淋巴瘤标志物,包括下表中的基因:According to an aspect of the invention, the invention provides a malignant lymphoma marker comprising the genes in the following table:
Figure PCTCN2018079061-appb-000001
Figure PCTCN2018079061-appb-000001
Figure PCTCN2018079061-appb-000002
Figure PCTCN2018079061-appb-000002
本发明通过选取与恶性淋巴瘤相关性强的212个基因作为与恶性淋巴瘤相关的标志物,其相对于基于一次性对多种癌症相关的所有基因进行检测的技术,本发明具有更强的针对性,而且检测范围更小,检测成本更低,而且在保证灵敏度的同时可以显著提高效率。The present invention selects 212 genes which are highly associated with malignant lymphoma as markers related to malignant lymphoma, and the present invention is stronger than the technique based on detection of all genes associated with multiple cancers at one time. Targeted, and the detection range is smaller, the detection cost is lower, and the efficiency can be significantly improved while improving the efficiency.
根据本发明的实施例,所述恶性淋巴瘤为弥漫性大B细胞淋巴瘤。According to an embodiment of the invention, the malignant lymphoma is a diffuse large B-cell lymphoma.
根据本发明的另一方面,本发明提供了一种用于恶性淋巴瘤的探针。根据本发明的实施例,所述探针针对以上表格中所述的标志物中的所有外显子区域以及外显子与内含子的连接区域设计而成,所述探针特异性识别以上所述的恶性淋巴瘤标志物编码区的至少一部分,且所述探针满足选自下列条件的至少之一:According to another aspect of the invention, the invention provides a probe for a malignant lymphoma. According to an embodiment of the invention, the probe is designed for all exon regions in the marker described in the above table and the junction region of the exon and the intron, the probe specifically recognizing the above The at least one portion of the malignant lymphoma marker coding region, and the probe satisfies at least one selected from the group consisting of:
(1)所述探针的长度为75-85bp,优选为81bp;(1) the length of the probe is 75-85 bp, preferably 81 bp;
(2)所述探针特异性识别以上实施例所述的标志物编码区上游10bp至下游10bp之间的序列;(2) the probe specifically recognizes a sequence from 10 bp upstream to 10 bp downstream of the marker coding region described in the above Examples;
(3)特异性识别GC含量高于0.6及低于0.3的区域的探针,乘数大于2;(3) a probe that specifically recognizes a region having a GC content higher than 0.6 and lower than 0.3, the multiplier is greater than 2;
(4)所述探针与目标序列的熔解温度为60-10摄氏度,优选80摄氏度;(4) the melting temperature of the probe and the target sequence is 60-10 degrees Celsius, preferably 80 degrees Celsius;
(5)所述探针不包含发夹结构;(5) the probe does not comprise a hairpin structure;
(6)所述探针与参考基因组上的至多2个位点匹配;(6) the probe matches at most 2 sites on the reference genome;
(7)所述探针选择时的窗口滑动大小为10bp。(7) The window sliding size when the probe is selected is 10 bp.
根据本发明的又一方面,本发明提供了一种基因芯片。根据本发明的实施例,所述基因芯片包括探针和支持物,所述探针位于所述支持物表面,所述探针为以上实施例所述的探针。According to still another aspect of the present invention, the present invention provides a gene chip. According to an embodiment of the invention, the gene chip comprises a probe and a support, the probe being located on the surface of the support, the probe being the probe described in the above embodiments.
根据本发明的实施例,所述基因芯片可以进一步附加如下技术特征:According to an embodiment of the present invention, the gene chip may further add the following technical features:
根据本发明的实施例,所述基因芯片为液相芯片,所述支持物为含有不同荧光标记的微球。According to an embodiment of the invention, the gene chip is a liquid phase chip and the support is a microsphere containing different fluorescent labels.
根据本发明的另一方面,本发明提供了一种构建待测样品的恶性淋巴瘤检测测序文库的方法,包括:对待测样品的目标序列进行富集,所述目标序列为上表中所述的恶性淋巴瘤标志物,且富集获得的目标序列构成所述用于恶性淋巴瘤检测的测序文库。According to another aspect of the present invention, the present invention provides a method for constructing a sequencing library of a malignant lymphoma detection sample to be tested, comprising: enriching a target sequence of a sample to be tested, the target sequence being as described in the above table A malignant lymphoma marker, and the enriched target sequence constitutes the sequencing library for malignant lymphoma detection.
根据本发明的实施例,所述方法可以进一步附加如下技术特征:According to an embodiment of the invention, the method may further add the following technical features:
根据本发明的实施例,所述方法中,利用以上实施例所述的探针或者以上实施例所述的基因芯片对待测样品的目标序列进行杂交捕获,从而实现所述富集。According to an embodiment of the present invention, in the method, the target sequence of the sample to be tested is subjected to hybridization capture using the probe described in the above embodiment or the gene chip described in the above embodiment, thereby achieving the enrichment.
根据本发明的实施例,所述方法中,进一步包括:对所述用于恶性淋巴瘤检测的测序文库进行测序,以便获得测序序列。According to an embodiment of the invention, the method further comprises: sequencing the sequencing library for malignant lymphoma detection to obtain a sequencing sequence.
根据本发明的实施例,所述方法中,采用BGISeq-500测序平台对所述用于恶性淋巴瘤检测的测序文库进行测序。According to an embodiment of the invention, in the method, the sequencing library for malignant lymphoma detection is sequenced using a BGISeq-500 sequencing platform.
根据本发明的实施例,所述方法中,所述测序序列的测序深度达到400×以上,所述测序序列的覆盖度达到99%以上。According to an embodiment of the present invention, in the method, the sequencing sequence has a sequencing depth of 400× or more, and the coverage of the sequencing sequence reaches 99% or more.
根据本发明的实施例,所述方法中,所述测序序列的原始数据量在3Gb以上。According to an embodiment of the invention, in the method, the raw data amount of the sequencing sequence is above 3Gb.
根据本发明的另一方面,本发明提供了一种确定待测样品中恶性淋巴瘤的基因突变的方法。根据本发明的实施例,所述方法包括:According to another aspect of the present invention, the present invention provides a method of determining a gene mutation of a malignant lymphoma in a sample to be tested. According to an embodiment of the invention, the method comprises:
按照以上实施例所述的构建方法构建所述待测样品的恶性淋巴瘤检测测序文库;Constructing a sequencing library of malignant lymphoma detection of the sample to be tested according to the construction method described in the above examples;
对所述恶性淋巴瘤检测的测序文库进行测序,以便获得测序序列;Sequencing the sequencing library of the malignant lymphoma detection to obtain a sequencing sequence;
将所述测序序列比对到参考基因组上,进行突变检测,得到候选突变数据;The sequencing sequence is aligned to a reference genome, and mutation detection is performed to obtain candidate mutation data;
对所述候选突变数据进行筛选,获得潜在突变数据;Screening the candidate mutation data to obtain potential mutation data;
对所述潜在突变数据进行注释,从而获得目标突变数据。The potential mutation data is annotated to obtain target mutation data.
根据本发明的实施例,以上用于恶性淋巴瘤的基因突变的确定方法可以进一步附加如下 技术特征:According to an embodiment of the present invention, the above method for determining a gene mutation for a malignant lymphoma may further include the following technical features:
根据本发明的实施例,所述方法中,所述参考基因组为人类参考基因组hg19。According to an embodiment of the invention, in the method, the reference genome is the human reference genome hg19.
根据本发明的实施例,所述方法中,利用VarScan软件进行所述突变检测。According to an embodiment of the invention, in the method, the mutation detection is performed using VarScan software.
根据本发明的实施例,所述方法中,对所述候选突变数据进行筛选包括:过滤掉低质量、低覆盖度、位于重复区及序列两端的以及具有链偏向性的候选突变,其中所述低质量的候选突变是指碱基质量值小于20或比对质量值小于30的候选突变,所述低覆盖度的候选突变是指最小支持数小于3的候选突变。According to an embodiment of the present invention, in the method, screening the candidate mutation data comprises: filtering out low quality, low coverage, candidate mutations located at both ends of the repeat region and the sequence, and having chain bias, wherein A low-mass candidate mutation refers to a candidate mutation having a base mass value of less than 20 or a ratio of less than 30, and the candidate mutation of the low coverage refers to a candidate mutation having a minimum support number of less than 3.
根据本发明的实施例,所述方法中,利用ANNOVA软件进行所述注释,利用人群突变数据库过滤掉多态性位点,利用致病突变数据库过滤掉良性突变。According to an embodiment of the present invention, in the method, the annotation is performed by using ANNOVA software, the polymorphic site is filtered out by using a population mutation database, and the benign mutation is filtered out by using the pathogenic mutation database.
根据本发明的实施例,所述方法中,所述人群突变数据库选自千人基因组数据库、ExAc数据库和Esp6500数据库中的至少一种。According to an embodiment of the present invention, in the method, the population mutation database is selected from at least one of a thousand human genome database, an ExAc database, and an Esp6500 database.
根据本发明的实施例,所述方法中,所述致病突变数据库为ClinVar。According to an embodiment of the invention, in the method, the disease-causing mutation database is ClinVar.
根据本发明的实施例,所述方法进一步包括:According to an embodiment of the invention, the method further comprises:
在所述突变检测之前,对所述测序序列进行质量控制,从而过滤掉低质量及接头污染序列,然后将过滤后的序列比对到所述参考基因组上。Prior to said mutation detection, the sequencing sequence is quality controlled to filter out low quality and linker contamination sequences, and the filtered sequences are then aligned to the reference genome.
根据本发明的又一方面,本发明提供了一种确定待测样品中恶性淋巴瘤的基因突变的系统。根据本发明的实施例,所述系统包括:According to still another aspect of the present invention, the present invention provides a system for determining a gene mutation of a malignant lymphoma in a sample to be tested. According to an embodiment of the invention, the system comprises:
目标区域文库构建单元,所述目标区域文库构建单元基于以上实施例所述的标志物作为目标区域,从而构建目标区域文库;a target region library construction unit, wherein the target region library construction unit is based on the marker described in the above embodiment as a target region, thereby constructing a target region library;
测序单元,所述测序单元与所述目标区域文库构建单元相连,所述测序单元对所述目标区域文库进行检测,以便获得测序序列;a sequencing unit, the sequencing unit is connected to the target region library building unit, and the sequencing unit detects the target region library to obtain a sequencing sequence;
候选突变确定单元,所述候选突变确定单元与所述测序单元相连,所述候选突变确定单元用于将目标区域文库中的测序序列比对到参考基因组上,进行突变检测,得到候选突变数据;a candidate mutation determining unit, wherein the candidate mutation determining unit is connected to the sequencing unit, wherein the candidate mutation determining unit is configured to compare the sequencing sequence in the target region library to the reference genome, and perform mutation detection to obtain candidate mutation data;
潜在突变确定单元,所述潜在突变确定单元与所述候选突变确定单元相连,所述潜在突变确定单元用于对所述候选突变数据进行筛选,以便获得潜在突变数据;a potential mutation determining unit, the potential mutation determining unit being connected to the candidate mutation determining unit, wherein the potential mutation determining unit is configured to screen the candidate mutation data to obtain potential mutation data;
目标突变确定单元,所述目标突变确定单元与所述潜在突变确定单元相连,所述目标突变确定单元用于对所述潜在突变数据进行注释,从而获得目标突变数据。The target mutation determining unit is connected to the potential mutation determining unit, and the target mutation determining unit is configured to annotate the potential mutation data to obtain target mutation data.
根据本发明的实施例,所述用于恶性淋巴瘤的确定基因突变的系统可以进一步附加如下技术特征:According to an embodiment of the present invention, the system for determining a gene mutation for a malignant lymphoma may further include the following technical features:
根据本发明的实施例,所述系统中,所述参考基因组为人类参考基因组hg。According to an embodiment of the invention, in the system, the reference genome is a human reference genome hg.
根据本发明的实施例,所述系统中,利用VarScan软件进行所述突变检测。According to an embodiment of the invention, in the system, the mutation detection is performed using VarScan software.
根据本发明的实施例,所述系统中,对所述候选突变数据进行筛选包括:过滤掉低质量的候选突变、低覆盖度的候选突变、位于重复区及序列两端的候选突变以及具有链偏向性的候选突变,其中所述低质量的候选突变是指碱基质量值小于20或比对质量值小于30的候选突变,所述低覆盖度的候选突变是指最小支持数小于3的候选突变。According to an embodiment of the invention, in the system, screening the candidate mutation data comprises: filtering out low quality candidate mutations, low coverage candidate mutations, candidate mutations at both ends of the repeat region and the sequence, and having a chain bias Candidate mutations in which the low-mass candidate mutation refers to a candidate mutation having a base mass value of less than 20 or a specific mass value of less than 30, and the low-coverage candidate mutation refers to a candidate mutation having a minimum support number of less than 3. .
根据本发明的实施例,所述系统中,所述利用ANNOVA软件进行所述注释,利用人群突变数据库过滤掉多态性位点,利用致病突变数据库过滤掉良性突变。According to an embodiment of the invention, in the system, the annotation is performed by using ANNOVA software, the polymorphic site is filtered out by using a population mutation database, and the benign mutation is filtered out by using the pathogenic mutation database.
根据本发明的实施例,所述系统中,所述人群突变数据库选自千人基因组数据库、ExAc数据库和Esp6500数据库中的至少一种。According to an embodiment of the invention, in the system, the population mutation database is selected from at least one of a thousand human genome database, an ExAc database, and an Esp6500 database.
根据本发明的实施例,所述系统中,所述致病突变数据库为ClinVar。According to an embodiment of the invention, in the system, the disease-causing mutation database is ClinVar.
根据本发明的实施例,所述系统进一步包括:According to an embodiment of the invention, the system further comprises:
质量控制单元,所述质量控制单元与所述测序单元相连,所述质量控制单元用于在所述突变检测之前,对所述测序序列进行质量控制,从而过滤掉低质量及接头污染序列,然后将过滤后的序列比对到所述参考基因组上。a quality control unit, the quality control unit being coupled to the sequencing unit, the quality control unit for performing quality control on the sequencing sequence prior to the detecting of the mutation, thereby filtering out low quality and joint contamination sequences, and then filtering The filtered sequences are aligned to the reference genome.
根据本发明的另一方面,本发明提供了以上表格中212个标志物组合在制备恶性淋巴瘤基因突变的检测和/或确定的试剂中的用途。According to another aspect of the invention, the invention provides the use of a combination of 212 markers in the above table for the preparation of a reagent for the detection and/or determination of a mutation in a malignant lymphoma gene.
根据本发明的又一方面,本发明提供了以上表格中212个标志物组合在恶性淋巴瘤的基因突变的检测和/或确定领域中的用途。According to yet another aspect of the invention, the invention provides the use of the combination of 212 markers in the above table for the detection and/or determination of genetic mutations in malignant lymphoma.
本发明所取得的有益效果是:本发明通过对212个恶性淋巴瘤特定目标基因DNA进行富集,然后借助于高通量测序手段,用于与恶性淋巴瘤相关的突变基因的检测和确定,可以快速有效用于检测目标序列中的单碱基替换、单碱基/多碱基插入或缺失,以及大片段缺失/扩增等突变类型,能够满足常见恶性淋巴瘤基因突变的高效、全面检测。尤其是借助于BGISEQ-500二代测序平台进行检测的方法,具有适用范围广、高效、全面、易操作的优势,实现恶性淋巴瘤相关的基因的快速高效测定。The beneficial effects obtained by the present invention are as follows: the present invention enriches 212 specific malignant lymphoma specific target genes, and then uses high-throughput sequencing means for detecting and determining a mutant gene associated with malignant lymphoma. It can be quickly and effectively used to detect single base substitutions, single base/multibase insertions or deletions in target sequences, and large fragment deletion/amplification mutation types, which can meet the high-efficiency and comprehensive detection of common malignant lymphoma gene mutations. . In particular, the method of detection by means of the BGISEQ-500 second-generation sequencing platform has the advantages of wide application range, high efficiency, comprehensiveness and easy operation, and realizes rapid and efficient determination of genes related to malignant lymphoma.
附图说明DRAWINGS
图1为根据本发明的实施例提供的一种用于恶性淋巴瘤的确定基因突变的系统示意图。1 is a schematic diagram of a system for determining genetic mutations in a malignant lymphoma, in accordance with an embodiment of the present invention.
图2为根据本发明的实施例提供的一种用于恶性淋巴瘤的确定基因突变的系统示意图。2 is a schematic diagram of a system for determining genetic mutations in a malignant lymphoma, in accordance with an embodiment of the present invention.
图3为根据本发明的实施例提供的一种对测序序列分析获得目标突变的示意图。3 is a schematic diagram of obtaining a target mutation by analyzing a sequencing sequence according to an embodiment of the present invention.
图4为根据本发明的实施例提供的利用两种检测方法获得的突变频率的一致性示意图,其中横坐标代表的利用全基因组测序获得的最小等位基因频率(MAF in WGS),纵坐标代表的是利用目标基因捕获测序获得的最小等位基因频率(MAF in LC)。4 is a graphical representation of the consistency of mutation frequencies obtained using two detection methods, wherein the abscissa represents the minimum allele frequency (MAF in WGS) obtained using whole genome sequencing, and the ordinate represents the sigmoid representation, in accordance with an embodiment of the present invention. The minimum allele frequency (MAF in LC) obtained by target gene capture sequencing.
具体实施方式detailed description
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本发明,而不能理解为对本发明的限制。The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the drawings are intended to be illustrative of the invention and are not to be construed as limiting.
本发明所阐述的用于恶性淋巴瘤基因目标序列捕获联合高通量测序的方法是基于恶性淋巴瘤基因突变检测技术的需求而设计。本发明以常见恶性淋巴瘤基因(表1所示的212种基因)的所有外显子区域和外显子与内含子连接区域为目标捕获区域,设计能够同时捕获所有目标序列区域的探针组合,然后定制液相芯片(由华大基因生产),并联合华大基因BGISEQ-第二代高通量测序技术和信息分析技术,对所有捕获到的目标序列进行测序及不同类型的突变信息分析,以解读目标样本中是否存在恶性淋巴瘤癌症驱动基因及靶向用药基因突变,并根据突变性质指导恶性淋巴瘤的分型及用药,同时可迅速积累恶性淋巴瘤基因突变数据,为产业化提供有力数据支持。该发明具有适用范围广、高效、全面、易操作等优势,同时检测目标序列中的单碱基替换、单碱基/多碱基插入或缺失以及大片段缺失/扩增等突变类型,满足恶性淋巴瘤基因突变的高效、全面检测。The method for target sequence capture and high-throughput sequencing of malignant lymphoma genes as described in the present invention is designed based on the needs of the gene mutation detection technology for malignant lymphoma. The present invention targets all exon regions and exon-intron junction regions of common malignant lymphoma genes (212 genes shown in Table 1) as target capture regions, and designs probes capable of simultaneously capturing all target sequence regions. Combine and then customize the liquid phase chip (produced by Huada Gene) and combine the BGISEQ-second generation high-throughput sequencing technology and information analysis technology to sequence all captured target sequences and different types of mutation information. Analysis to interpret the presence of malignant lymphoma cancer-driven genes and targeted drug gene mutations in the target sample, and to guide the classification and medication of malignant lymphoma according to the nature of the mutation, and to rapidly accumulate malignant lymphoma gene mutation data for industrialization. Provide strong data support. The invention has the advantages of wide application range, high efficiency, comprehensiveness, easy operation, and the like, and detects single base substitution, single base/multibase insertion or deletion, and large fragment deletion/amplification in the target sequence, and satisfies malignancy. Efficient, comprehensive detection of lymphoma gene mutations.
恶性淋巴瘤标志物Malignant lymphoma marker
本发明的发明人通过进行调研分析,收集和分析了多个与恶性淋巴瘤相关的基因,然后根据其相关性以及致病性,最终确定了212个与恶性淋巴瘤相关的基因组合(如表1所示),作为用来确定恶性淋巴瘤的标志物,同时可以利用这些标志物作为目标区域,对其进行富集,可以有效的检测和/或确定与恶性淋巴瘤相关的基因突变,包括但是不限于单碱基替换、单碱基/多碱基插入或缺失以及大片段缺失/扩增等突变类型,从而可以满足恶性淋巴瘤基因突变的高效、全面检测,经过实验证实检测快速,其灵敏度达到93%以上。The inventors of the present invention collected and analyzed a plurality of genes associated with malignant lymphoma by conducting research and analysis, and finally determined 212 gene combinations related to malignant lymphoma according to their correlation and pathogenicity (eg, 1)), as a marker for identifying malignant lymphoma, can also use these markers as target regions to enrich them, and can effectively detect and/or identify genetic mutations associated with malignant lymphoma, including However, it is not limited to single base substitution, single base/multibase insertion or deletion, and large fragment deletion/amplification, so that it can satisfy the high-efficiency and comprehensive detection of malignant lymphoma gene mutations. The sensitivity is over 93%.
表1 目标基因组合Table 1 Target gene combination
Figure PCTCN2018079061-appb-000003
Figure PCTCN2018079061-appb-000003
Figure PCTCN2018079061-appb-000004
Figure PCTCN2018079061-appb-000004
表2和表3分别列出了各恶性淋巴瘤癌症基因的名称和其所对应的恶性淋巴瘤名称。发明人基于一系列的理论研究和实验验证工作,发现并论证了上表中的212个基因与之间的相关性,认为采用这组基因即可实现对恶性淋巴瘤的有效检出,并且相对于以其中的某单个基因或其他基因组合作为标志物来说,检测结果更准确、真实可靠,可重复性好。Tables 2 and 3 list the names of the malignant lymphoma cancer genes and their corresponding malignant lymphoma names, respectively. Based on a series of theoretical studies and experimental verification work, the inventors discovered and demonstrated the correlation between the 212 genes in the above table, and concluded that the effective detection of malignant lymphoma can be achieved by using this group of genes, and With a single gene or other combination of genes as markers, the test results are more accurate, reliable, and reproducible.
需要说明的是,这组基因囊括于淋巴瘤的重要致病信号通路,例如,BCR、染色质修饰、细胞凋亡和细胞周期调控、免疫抑制以及Notch。这组基因在淋巴瘤癌症基因检测领域具有广泛、全面的优势。并且,这些基因在影响因子较高的文献中也有分别列出,而且,至今未有任何报道是将这212个基因的组合用作一个恶性淋巴瘤的标志物的。此外,其中的KLF2基因,ZFP36L1基因,TMSB4X基因(KLF2和ZFP36L1是NOTCH信号通路的重要调节 因子)是发明人首次发现在亚洲人种中突变频率显著高于高加索人种的恶性淋巴瘤致病相关基因。It should be noted that this group of genes is involved in important pathogenic signaling pathways of lymphoma, such as BCR, chromatin modification, apoptosis and cell cycle regulation, immunosuppression, and Notch. This group of genes has broad and comprehensive advantages in the field of lymphoma cancer gene detection. Moreover, these genes are also listed separately in the literature with high impact factors, and, to date, no report has been made to use the combination of these 212 genes as a marker for malignant lymphoma. In addition, the KLF2 gene, ZFP36L1 gene, and TMSB4X gene (KLF2 and ZFP36L1 are important regulators of NOTCH signaling pathway) are the first inventors to discover that the frequency of mutations in Asian ethnic groups is significantly higher than that of Caucasians. gene.
这些与恶性淋巴瘤相关的基因与弥漫性大B细胞淋巴瘤、套细胞淋巴瘤、滤泡型淋巴瘤、伯基特淋巴瘤相关,尤其与弥漫性大B细胞淋巴瘤相关。These genes associated with malignant lymphoma are associated with diffuse large B-cell lymphoma, mantle cell lymphoma, follicular lymphoma, Burkitt's lymphoma, especially with diffuse large B-cell lymphoma.
表2 检测基因列表Table 2 List of test genes
Figure PCTCN2018079061-appb-000005
Figure PCTCN2018079061-appb-000005
Figure PCTCN2018079061-appb-000006
Figure PCTCN2018079061-appb-000006
Figure PCTCN2018079061-appb-000007
Figure PCTCN2018079061-appb-000007
Figure PCTCN2018079061-appb-000008
Figure PCTCN2018079061-appb-000008
备注:表1中的“Important”是指:存在于淋巴瘤的重要的致病信号通路。Remarks: “Important” in Table 1 refers to an important pathogenic signaling pathway present in lymphoma.
表3 对应恶性淋巴瘤及参考文献列表Table 3 Corresponding to malignant lymphoma and list of references
Figure PCTCN2018079061-appb-000009
Figure PCTCN2018079061-appb-000009
Figure PCTCN2018079061-appb-000010
Figure PCTCN2018079061-appb-000010
发明人基于发现的这212个与恶性淋巴瘤相关的标志物组合,设计了一种可以用于恶性淋巴瘤的探针以及基因芯片。所述探针以这212个目标癌症基因的所有外显子区和外显子与内含子连接区域为总目标区域设计而成,且这组探针特异性识别以上212个所述的标志物编码区的至少一部分,且所述探针满足选自下列条件的至少之一:(1)所述探针的长度为75-85bp,优选81bp;(2)所述探针特异性识别这212个所述的标志物编码区上游10bp至下游10bp之间的序列;(3)特异性识别GC含量高于0.6及低于0.3的区域的探针,乘数大于2;(4)所述探针与目标序列的熔解温度为60-10摄氏度,优选80摄氏度;(5)所述探针不包含发夹结构;(6)所述探针与参考基因组上的至多2个位点匹配;(7)所述探针选择时的窗口滑动大小为10bp。Based on the discovery of these 212 markers associated with malignant lymphoma, the inventors designed a probe and gene chip that can be used for malignant lymphoma. The probe is designed with all exon regions and exon and intron junction regions of the 212 target cancer genes as the total target region, and the probe specifically recognizes the above 212 markers. At least a portion of the coding region, and the probe satisfies at least one selected from the group consisting of: (1) the probe has a length of 75-85 bp, preferably 81 bp; (2) the probe specifically recognizes this 212 sequences between 10 bp and 10 bp downstream of the marker coding region; (3) probes that specifically recognize regions with GC content higher than 0.6 and below 0.3, multiplier greater than 2; (4) The melting temperature of the probe to the target sequence is 60-10 degrees Celsius, preferably 80 degrees Celsius; (5) the probe does not comprise a hairpin structure; (6) the probe matches at most 2 sites on the reference genome; (7) The window sliding size when the probe is selected is 10 bp.
根据以上原则设计的用于恶性淋巴瘤的探针,共计包含32779条探针,每条探针序列的长度为81bp,序列前后各包含16bp和15bp的标签序列,前后两个标签序列的序列组成分别是GAAGCGAGGATCAACT(SEQ ID NO:1)和CATTGCGTGAACCGA(SEQ ID NO:2)这两个标签序列分别为酶切位点和转录位点,两端均是用来设计PCR引物的, 同时转录位点用来做转录,起到转录为RNA探针的作用。The probes for malignant lymphoma designed according to the above principles contain a total of 32779 probes, each of which has a length of 81 bp, and each of which contains a 16 bp and 15 bp tag sequence, and the sequence of the two tag sequences. The two tag sequences, GAAGCGAGGATCAACT (SEQ ID NO: 1) and CATTGCGTGAACCGA (SEQ ID NO: 2), respectively, are the restriction sites and transcription sites, and both ends are used to design PCR primers, and the transcription sites are simultaneously Used for transcription and functions as an RNA probe.
在此基础上,发明人又设计了一种基因芯片,所述基因芯片包括探针和支持物,所述探针位于所述支持物表面。根据本发明的一种具体实施方式,所述基因芯片可以设计为液相芯片,所述支持物为含有不同荧光标记的微球。On this basis, the inventors have also devised a gene chip comprising a probe and a support, the probe being located on the surface of the support. According to a specific embodiment of the present invention, the gene chip may be designed as a liquid phase chip, and the support is a microsphere containing different fluorescent labels.
确定待测样品中恶性淋巴瘤的基因突变的方法Method for determining gene mutation of malignant lymphoma in a sample to be tested
本发明的发明人发现:通过这212个与恶性淋巴瘤相关的标志物组合作为目标区域,对其进行富集,可以构建得到用于恶性淋巴瘤的测序文库,在此基础上,对该测序文库进行生物信息学分析,可以有效的检测和/或确定与恶性淋巴瘤相关的基因突变,包括但是不限于单碱基替换、单碱基/多碱基插入或缺失以及大片段缺失/扩增等突变类型,从而可以满足恶性淋巴瘤基因突变的高效、全面检测,经过实验证实检测快速,其灵敏度达到93%以上。从而可以实现恶性淋巴瘤癌症基因突变位点的全面检测,具有检测通量高、灵敏度高、特异性强、准确性高、覆盖度广等技术优势,有效解决恶性淋巴瘤癌症基因突变区域广、突变位点不确定等问题。The inventors of the present invention found that by combining these 212 markers associated with malignant lymphoma as a target region and enriching them, a sequencing library for malignant lymphoma can be constructed, and on the basis of this, the sequencing is performed. Bioinformatics analysis of the library can effectively detect and/or identify genetic mutations associated with malignant lymphoma, including but not limited to single base substitutions, single base/multibase insertions or deletions, and large fragment deletions/amplifications The type of mutation can meet the high-efficiency and comprehensive detection of gene mutations in malignant lymphoma. It has been proved by experiments that the sensitivity is over 93%. Therefore, it can realize comprehensive detection of cancer gene mutation sites of malignant lymphoma, and has the advantages of high detection flux, high sensitivity, high specificity, high accuracy, wide coverage, etc., and effectively solves a wide range of cancer gene mutations in malignant lymphoma. Problems such as the uncertainty of the mutation site.
根据本发明的实施例,所述确定待测样品中恶性淋巴瘤的基因突变的方法包括:对待测样品的目标序列进行富集,所述目标序列为212个恶性淋巴瘤标志物组合,通过富集获得的目标序列构成所述用于恶性淋巴瘤检测的测序文库;对所述恶性淋巴瘤检测的测序文库进行测序,以便获得测序序列;将所述测序序列比对到参考基因组上,进行突变检测,得到候选突变数据;对所述候选突变数据进行筛选,获得潜在突变数据;对所述潜在突变数据进行注释,从而获得目标突变数据。本发明中待测样品可以来自于组织样品。According to an embodiment of the present invention, the method for determining a gene mutation of a malignant lymphoma in a sample to be tested comprises: enriching a target sequence of a sample to be tested, wherein the target sequence is a combination of 212 malignant lymphoma markers, The obtained target sequence constitutes the sequencing library for detection of malignant lymphoma; the sequencing library of the malignant lymphoma detection is sequenced to obtain a sequencing sequence; the sequencing sequence is aligned to a reference genome for mutation Detecting, obtaining candidate mutation data; screening the candidate mutation data to obtain potential mutation data; annotating the potential mutation data to obtain target mutation data. The sample to be tested in the present invention may be derived from a tissue sample.
需要说明的是,本发明的确定待测样品中恶性淋巴瘤的基因突变的方法,也可以表述为用于恶性淋巴瘤的基因突变的检测和/或确定方法,该方法并非疾病的诊断方法,适用本发明检测到的突变结果只能说明相关个体的癌症组织携带一致的癌症驱动基因突变情况,实践中还需要结合临床结果才能确认个体患病情况。It should be noted that the method for determining a gene mutation of a malignant lymphoma in a sample to be tested can also be expressed as a method for detecting and/or determining a gene mutation for a malignant lymphoma, which is not a method for diagnosing a disease. The mutation results detected by the present invention can only indicate that the cancer tissue of the relevant individual carries a consistent cancer-driven gene mutation, and in practice, it is also necessary to combine the clinical results to confirm the individual's disease.
本领域技术人员可以根据实际需要,借助于不同的技术对于目标区域的DNA序列进行富集,包括但是不限于基于多重PCR技术的目标区域DNA富集方法(如Thermo Fisher Scientific公司的AmpliSeq技术)和基于探针杂交技术的目标区域DNA富集方法(如Agilent公司的SureSelect技术,及Nimble公司的SeqCap EZ技术)。Those skilled in the art can enrich the DNA sequence of the target region by different techniques according to actual needs, including but not limited to the target region DNA enrichment method based on multiplex PCR technology (such as Thermo Fisher Scientific AmpliSeq technology) and Target region DNA enrichment methods based on probe hybridization techniques (such as Agilent's SureSelect technology, and Nimble's SeqCap EZ technology).
在借助于高通量测序技术对目标DNA进行测序的过程中,可以利用Illumina公司的Hiseq/Miseq/NextSeq,Thermo Fisher Scientific公司的Ion Proton/Ion PGM,以及华大基因的BGISEQ-500等二代测序平台,以及PacBio等三代测序平台。根据本发明的一种具体实施方式,利用BGISeq-500测序平台进行测序获得所述测序序列。采用华大自主研发的测序仪进行高通量测序,具有更强的兼容性,测序效果更好。In the process of sequencing target DNA by means of high-throughput sequencing technology, Illumina's Hiseq/Miseq/NextSeq, Thermo Fisher Scientific's Ion Proton/Ion PGM, and BGI SEQ-500 of the BGI gene can be used. Sequencing platform, and three generations of sequencing platforms such as PacBio. According to a specific embodiment of the present invention, the sequencing sequence is obtained by sequencing using a BGISeq-500 sequencing platform. High-throughput sequencing using a self-developed sequencer from Huada has stronger compatibility and better sequencing results.
根据本发明的一种具体实施方式,所构建的测序文库的原始数据量达到3Gb以上,目标区域测序深度达到400×以上,目标区域覆盖度达到99%以上。其中测序深度指的是测序得到的碱基总量(bp)与基因组大小(Genome)的比值,反映被测基因组上单个碱基被测序的平均次数。测序覆盖度是指测序获得序列占整个基因组的比例。According to a specific embodiment of the present invention, the original data volume of the constructed sequencing library reaches 3 Gb or more, the target region has a sequencing depth of 400× or more, and the target region coverage reaches 99% or more. The sequencing depth refers to the ratio of the total number of bases (bp) and the genome size (Genome) obtained by sequencing, reflecting the average number of times a single base on the tested genome is sequenced. Sequencing coverage refers to the proportion of sequences obtained by sequencing to the entire genome.
根据本发明的一种具体实施方式,对所述候选突变数据进行筛选,包括通过筛选去除掉低质量、低覆盖度、位于重复区及序列两端的以及具有链偏向性的候选突变数据。所述低质量的候选突变是指碱基质量值小于20(base quality<20)或比对质量值(mapping quality<30)的序列、低覆盖度的候选突变指的是最小支持数小于3(minimal support depth<3)的候选突变。具有链偏向性的候选突变是指只发生在一条链上的候选突变。According to a specific embodiment of the present invention, the candidate mutation data is screened, including screening for removal of low-quality, low-coverage, candidate mutation data located at the ends of the repeat region and the sequence, and having strand bias. The low-quality candidate mutation refers to a sequence having a base mass value of less than 20 (base quality < 20) or a pair of quality values (mapping quality < 30), and a low-coverage candidate mutation refers to a minimum support number of less than 3 ( Candidate mutations for minimal support depth<3). Candidate mutations with strand bias refer to candidate mutations that occur only on one strand.
确定待测样品中的恶性淋巴瘤的基因突变的系统System for determining genetic mutations in malignant lymphoma in a sample to be tested
本发明基于发明人发现的同恶淋巴瘤相关的212个基因的组合,以这212个特定基因的组合作为目标基因,设计了一种确定待测样品中的恶性淋巴瘤的基因突变的系统。本发明所述确定待测样品中的恶性淋巴瘤的基因突变的系统,也可以理解为检测待测样品中的恶性淋巴瘤的基因突变的系统,是用来检测和确定待测样品中与恶性淋巴瘤相关的基因是否发生突变的系统。利用该系统可以检测目标序列中的单碱基替换、单碱基/多碱基插入或缺失,以及大片段缺失/扩增等突变类型,能够满足常见恶性淋巴瘤基因突变的高效、全面的检测和确定。The present invention is based on a combination of 212 genes associated with a lymphoma associated with the discovery of the inventors, and a combination of the 212 specific genes as a target gene, and a system for determining a gene mutation of a malignant lymphoma in a sample to be tested is designed. The system for determining a gene mutation of a malignant lymphoma in a sample to be tested according to the present invention can also be understood as a system for detecting a gene mutation of a malignant lymphoma in a sample to be tested, and is used for detecting and determining a malignant sample in a sample to be tested. A system in which lymphoma-associated genes are mutated. The system can detect single base substitutions, single base/multibase insertions or deletions in target sequences, and large fragment deletion/amplification mutation types, which can meet the high-efficiency and comprehensive detection of common malignant lymphoma gene mutations. And ok.
根据本发明的实施例,本发明提供了一种确定待测样品中恶性淋巴瘤的基因突变的系统,如图1所示,所述系统包括:目标区域文库构建单元,所述目标区域文库构建单元基于本发明212中标志物组合作为目标区域,来构建目标区域文库;测序单元,所述测序单元与所述目标区域文库构建单元相连,所述测序单元对所述目标区域文库进行检测,以便获得测序序列;候选突变确定单元,所述候选突变确定单元与所述测序单元相连,所述候选突变确定单元用于将目标区域文库中的测序序列比对到参考基因组上,进行突变检测,得到候选突变数据;潜在突变确定单元,所述潜在突变确定单元与所述候选突变确定单元相连,所述潜在突变确定单元用于对所述候选突变数据进行筛选,以便获得潜在突变数据;目标突变确定单元,所述目标突变确定单元与所述潜在突变确定单元相连,所述目标突变确定单元用于对所述潜在突变数据进行注释,从而获得目标突变数据。According to an embodiment of the present invention, the present invention provides a system for determining a gene mutation of a malignant lymphoma in a sample to be tested, as shown in FIG. 1, the system comprising: a target region library building unit, and the target region library construction The unit constructs a library of target regions based on the combination of markers in the present invention 212 as a target region; a sequencing unit, the sequencing unit is connected to the target region library building unit, and the sequencing unit detects the target region library so that Obtaining a sequencing sequence; a candidate mutation determining unit, wherein the candidate mutation determining unit is connected to the sequencing unit, wherein the candidate mutation determining unit is configured to compare the sequencing sequence in the target region library to the reference genome, and perform mutation detection to obtain Candidate mutation data; a potential mutation determining unit, the potential mutation determining unit being linked to the candidate mutation determining unit, wherein the potential mutation determining unit is configured to screen the candidate mutation data to obtain potential mutation data; Unit, the target mutation determining unit and the potential The mutation determining unit is connected, and the target mutation determining unit is configured to annotate the potential mutation data to obtain target mutation data.
根据本发明的实施例,所述确定待测样品中恶性淋巴瘤的基因突变的系统还可以如图2所示,所述系统包括:目标区域文库构建单元,所述目标区域文库构建单元基于本发明212中标志物组合作为目标区域,来构建目标区域文库;测序单元,所述测序单元与所述目标区域文库构建单元相连,所述测序单元对所述目标区域文库进行检测;质量控制单元,所述质量控制单元与所述测序单元相连,所述质量控制单元用于在所述突变检测之前,对所 述测序序列进行质量控制,从而过滤掉低质量及接头污染序列,然后将过滤后的序列比对到所述参考基因组上;候选突变确定单元,所述候选突变确定单元与质量控制单元相连,所述候选突变确定单元用于将所述过滤后的序列比对到所述参考基因组上,进行突变检测,得到候选突变数据;潜在突变确定单元,所述潜在突变确定单元与所述候选突变确定单元相连,所述潜在突变确定单元用于对所述候选突变数据进行筛选,以便获得潜在突变数据;目标突变确定单元,所述目标突变确定单元与所述潜在突变确定单元相连,所述目标突变确定单元用于对所述潜在突变数据进行注释,从而获得目标突变数据。According to an embodiment of the present invention, the system for determining a gene mutation of a malignant lymphoma in a sample to be tested may also be as shown in FIG. 2, the system comprising: a target region library building unit, wherein the target region library building unit is based on The marker combination in the invention 212 is used as a target region to construct a target region library; the sequencing unit is connected to the target region library construction unit, and the sequencing unit detects the target region library; the quality control unit, The quality control unit is coupled to the sequencing unit, and the quality control unit is configured to perform quality control on the sequencing sequence before the mutation detection, thereby filtering out low quality and joint contamination sequences, and then filtering the filtered Aligning a sequence to the reference genome; a candidate mutation determining unit, the candidate mutation determining unit being coupled to a quality control unit, the candidate mutation determining unit for aligning the filtered sequence to the reference genome , performing mutation detection to obtain candidate mutation data; potential mutation determining unit, the latent a mutation determining unit is coupled to the candidate mutation determining unit for screening the candidate mutation data to obtain potential mutation data; a target mutation determining unit, the target mutation determining unit and the potential The mutation determining unit is connected, and the target mutation determining unit is configured to annotate the potential mutation data to obtain target mutation data.
下面将结合实施例对本发明的方案进行解释。本领域技术人员将会理解,下面的实施例仅用于说明本发明,而不应视为限定本发明的范围。实施例中未注明具体技术或条件的,按照本领域内的文献所描述的技术或条件或者按照产品说明书进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。The solution of the present invention will be explained below in conjunction with the embodiments. Those skilled in the art will appreciate that the following examples are merely illustrative of the invention and are not to be considered as limiting the scope of the invention. Where specific techniques or conditions are not indicated in the examples, they are carried out according to the techniques or conditions described in the literature in the art or in accordance with the product specifications. The reagents or instruments used are not indicated by the manufacturer, and are conventional products that can be obtained commercially.
实施例一制备探针和芯片Example 1 Preparation of Probes and Chips
以表1中的恶性淋巴瘤标志物,即212个目的基因的所有外显子区和外显子与内含子连接区域为总目标区域(总共约500kb),按照以下探针设计原则制备探针:The malignant lymphoma markers in Table 1, ie all exon regions of 212 target genes and the exon-intron junction region as the total target region (about 500 kb in total), were prepared according to the following probe design principles. needle:
(1)所述探针的长度为81bp;(1) the length of the probe is 81 bp;
(2)所述探针特异性识别表1中的212个所述的标志物编码区上游10bp至下游10bp之间的序列;(2) the probe specifically recognizes a sequence of from 10 bp upstream to 10 bp downstream of 212 of the marker coding regions in Table 1;
(3)特异性识别GC含量高于0.6及低于0.3的区域的探针,乘数大于2;(3) a probe that specifically recognizes a region having a GC content higher than 0.6 and lower than 0.3, the multiplier is greater than 2;
(4)所述探针与目标序列的熔解温度为60-10摄氏度,优选80摄氏度;(4) the melting temperature of the probe and the target sequence is 60-10 degrees Celsius, preferably 80 degrees Celsius;
(5)所述探针不包含发夹结构;(5) the probe does not comprise a hairpin structure;
(6)所述探针与参考基因组上的至多2个位点匹配;(6) the probe matches at most 2 sites on the reference genome;
(7)所述探针选择时的窗口滑动大小为10bp。(7) The window sliding size when the probe is selected is 10 bp.
由此,获得212个目的基因的探针,这组探针特异性识别这212个所述的标志物的编码区的至少一部分。最终获得的目标区域探针序列包含32779条探针,每条探针序列的长度为81bp,序列前后各包含16bp和15bp的标签序列,前后两个标签序列的序列组成分别为GAAGCGAGGATCAACT(SEQ ID NO:1)和CATTGCGTGAACCGA(SEQ ID NO:2)。其中,这两个标签序列分别为酶切位点和转录位点,两端均是用来设计PCR引物的,同时转录位点用来做转录,起到转录为RNA探针的作用。Thus, 212 probes of the gene of interest were obtained, which specifically recognized at least a portion of the coding regions of the 212 of the markers. The finally obtained target region probe sequence contains 32779 probes, each of which has a length of 81 bp, and each of which contains a 16 bp and 15 bp tag sequence, and the sequence of the two tag sequences is GAAGCGAGGATCAACT (SEQ ID NO). : 1) and CATTGCGTGAACCGA (SEQ ID NO: 2). Among them, the two tag sequences are respectively an enzyme cleavage site and a transcription site, and both ends are used to design PCR primers, and the transcription site is used for transcription and functions as an RNA probe.
由于探针数量巨大,此处仅给出其中个别基因的几条探针序列作为示例,具体如下:Due to the large number of probes, only a few probe sequences of individual genes are given here as examples, as follows:
KLF2基因探针序列(SEQ ID NO:3)KLF2 gene probe sequence (SEQ ID NO: 3)
Figure PCTCN2018079061-appb-000011
Figure PCTCN2018079061-appb-000011
KLF2基因探针序列(SEQ ID NO:4)KLF2 gene probe sequence (SEQ ID NO: 4)
Figure PCTCN2018079061-appb-000012
Figure PCTCN2018079061-appb-000012
KLF2基因探针序列(SEQ ID NO:5)KLF2 gene probe sequence (SEQ ID NO: 5)
Figure PCTCN2018079061-appb-000013
Figure PCTCN2018079061-appb-000013
KLF2基因探针序列(SEQ ID NO:6)KLF2 gene probe sequence (SEQ ID NO: 6)
Figure PCTCN2018079061-appb-000014
Figure PCTCN2018079061-appb-000014
KLF2基因探针序列(SEQ ID NO:7)KLF2 gene probe sequence (SEQ ID NO: 7)
Figure PCTCN2018079061-appb-000015
Figure PCTCN2018079061-appb-000015
ZFP36L1基因探针序列(SEQ ID NO:8)ZFP36L1 gene probe sequence (SEQ ID NO: 8)
Figure PCTCN2018079061-appb-000016
Figure PCTCN2018079061-appb-000016
ZFP36L1基因探针序列(SEQ ID NO:9)ZFP36L1 gene probe sequence (SEQ ID NO: 9)
Figure PCTCN2018079061-appb-000017
Figure PCTCN2018079061-appb-000017
ZFP36L1基因探针序列(SEQ ID NO:10)ZFP36L1 gene probe sequence (SEQ ID NO: 10)
Figure PCTCN2018079061-appb-000018
Figure PCTCN2018079061-appb-000018
ZFP36L1基因探针序列(SEQ ID NO:11)ZFP36L1 gene probe sequence (SEQ ID NO: 11)
Figure PCTCN2018079061-appb-000019
Figure PCTCN2018079061-appb-000019
ZFP36L1基因探针序列(SEQ ID NO:12)ZFP36L1 gene probe sequence (SEQ ID NO: 12)
Figure PCTCN2018079061-appb-000020
Figure PCTCN2018079061-appb-000021
Figure PCTCN2018079061-appb-000020
Figure PCTCN2018079061-appb-000021
TMSB4X基因探针序列(SEQ ID NO:13)TMSB4X gene probe sequence (SEQ ID NO: 13)
Figure PCTCN2018079061-appb-000022
Figure PCTCN2018079061-appb-000022
TMSB4X基因探针序列(SEQ ID NO:14)TMSB4X gene probe sequence (SEQ ID NO: 14)
Figure PCTCN2018079061-appb-000023
Figure PCTCN2018079061-appb-000023
TMSB4X基因探针序列(SEQ ID NO:15)TMSB4X gene probe sequence (SEQ ID NO: 15)
Figure PCTCN2018079061-appb-000024
Figure PCTCN2018079061-appb-000024
TMSB4X基因探针序列(SEQ ID NO:16)TMSB4X gene probe sequence (SEQ ID NO: 16)
Figure PCTCN2018079061-appb-000025
Figure PCTCN2018079061-appb-000025
TMSB4X基因探针序列(SEQ ID NO:17)TMSB4X gene probe sequence (SEQ ID NO: 17)
Figure PCTCN2018079061-appb-000026
Figure PCTCN2018079061-appb-000026
进一步,利用上述得到的212个目的基因的探针,制备液相捕获芯片,备用。其中,液相芯片利用聚苯乙烯微球(Microspheres)制备而成,聚苯乙烯微球直径约为5.6μm,表面带有羧基基团,内部含有红色和橙色两种染料,根据两种染料比例的不同可以将微球分为100种,每一个拥有一个编号。每种微球因为内部荧光比例的不同,具有特定的光谱特征,可被激光特异的识别。利用不同编号的微球包被不同的探针分子,从而实现检测样品中的目的分子,目的分子再与带有荧光的报告分子结合。然后通过荧光检测,实现目的分子的检测。Further, using the probes of the 212 target genes obtained above, a liquid phase capture chip was prepared and used. Among them, the liquid phase chip is prepared by using polysphere microspheres having a diameter of about 5.6 μm, a carboxyl group on the surface, and red and orange dyes inside, according to the ratio of the two dyes. The difference can be divided into 100 kinds of microspheres, each with a number. Each microsphere has a specific spectral characteristic due to the difference in internal fluorescence ratio and can be specifically recognized by the laser. Different probe molecules are coated with different numbered microspheres to detect the target molecule in the sample, and the target molecule is then combined with the reporter molecule with fluorescence. The detection of the molecule of interest is then achieved by fluorescence detection.
实施例二Embodiment 2
本实施例基于16例弥漫性大B细胞淋巴瘤癌症样品,利用芯片捕获联合高通量测序技术,检测分析16例弥漫性大B细胞淋巴瘤癌症基因的SNP和Indel突变情况,从而用来确认该批样本中的癌症基因驱动突变。This study was based on 16 cases of diffuse large B-cell lymphoma cancer samples, using chip capture combined with high-throughput sequencing technology to detect and analyze the SNP and Indel mutations of 16 diffuse large B-cell lymphoma cancer genes, which were used to confirm The cancer gene in this batch of samples drives mutations.
其中,所用到的实验样本为16例临床上确诊为弥漫性大B细胞淋巴瘤的组织样品。具体实验方法如下:Among them, the experimental samples used were 16 tissue samples clinically diagnosed as diffuse large B-cell lymphoma. The specific experimental methods are as follows:
1、癌症组织基因组DNA提取1. Genomic DNA extraction from cancer tissues
使用QIAGEN组织和血液DNA提取试剂盒(QIAGEN DNA Tissue and Blood mini kit),并按照该试剂盒的提取说明书中的记载,从弥漫性大B细胞淋巴瘤组织样本中提取基因组DNA,使用Qubit3.0荧光分析计检测DNA浓度,要求浓度大于5ng/μL,体积大于30μL,而且原则上每份样本的DNA获得量≥2μg,然后电泳检测DNA是否完整及其降解程度,对于严重降解的样品不适合建库,其中电泳条件为:1%的琼脂糖凝胶,电泳电压4V/cm,电泳时间45min。琼脂糖凝胶电泳的结果显示所有样本的DNA完整,基本没有降解。Genomic DNA was extracted from diffuse large B-cell lymphoma tissue samples using the QIAGEN DNA Tissue and Blood mini kit and using the QIAGEN DNA Tissue and Blood mini kit, as described in the kit's extraction instructions. Fluorescence analyzer to detect DNA concentration, the required concentration is greater than 5ng / μL, the volume is greater than 30μL, and in principle, the DNA yield of each sample is ≥ 2μg, then the DNA is detected by electrophoresis and its degradation degree, which is not suitable for the seriously degraded sample. The library, wherein the electrophoresis conditions were: 1% agarose gel, electrophoresis voltage 4 V/cm, electrophoresis time 45 min. The results of agarose gel electrophoresis showed that the DNA of all samples was intact and substantially free of degradation.
2、测序前的文库构建2. Library construction before sequencing
取100ng基因组DNA,利用DNA打断仪使用酶切法随机进行打断,同步进行末端修复及加A;紧接着进行接头连接及纯化、PCR扩增,获得杂交前文库,并利用Agient2100生物分析仪进行二次片段筛选,获得150-500bp的长度片段;然后使用液相捕获芯片对PCR产物进行目标区域杂交捕获,再通过洗脱试剂将目标DNA从探针上洗脱下来,获得需要的目标DNA;其后,再进行PCR扩增。所得产物进行环化,即构建成目标区域捕获的文库,其中所获得的杂交文库的产量大于160ng。100 ng of genomic DNA was taken and randomly interrupted by enzyme digestion using a DNA interrupter, and the terminal repair and A were simultaneously performed; followed by ligation and purification, PCR amplification, obtaining a pre-hybrid library, and using the Agient 2100 bioanalyzer. The second fragment is screened to obtain a length fragment of 150-500 bp; then the PCR product is subjected to target region hybridization capture using a liquid phase capture chip, and the target DNA is eluted from the probe by an elution reagent to obtain a desired target DNA. After that, PCR amplification is performed. The resulting product was cyclized to construct a library captured in the region of interest, wherein the yield of the hybrid library obtained was greater than 160 ng.
其中,所使用的液相捕获芯片为实施例一制备获得的。Among them, the liquid phase capture chip used was prepared as in Example 1.
3、高通量测序3. High-throughput sequencing
对质控合格后的文库DNA,按照BGISeq-500测序的操作说明进行上机测序。获得的每个样本的测序原始数据量达到3Gb以上,目标区域的平均测序深度达到400×,目标区域覆盖度为99%以上。16例样本的测序数据质量情况如下表4所示。The library DNA after the quality control was subjected to sequencing on the basis of the operation instructions of BGISeq-500 sequencing. The obtained raw data amount of each sample reached more than 3Gb, the average sequencing depth of the target area reached 400×, and the target area coverage was over 99%. The quality of the sequencing data of 16 samples is shown in Table 4 below.
表4 测序样本的基本信息Table 4 Basic information of sequencing samples
Figure PCTCN2018079061-appb-000027
Figure PCTCN2018079061-appb-000027
Figure PCTCN2018079061-appb-000028
Figure PCTCN2018079061-appb-000028
4、测序数据过滤、比对、突变分析4, sequencing data filtering, comparison, mutation analysis
测序完成后,对下机数据进行生物信息分析,流程如下(如图3所示):After the sequencing is completed, the biological information analysis is performed on the offline data, and the flow is as follows (as shown in FIG. 3):
首先,对测序得到的reads进行质量控制(QC),从而去除测序质量不符合要求及测序接头污染的序列,获得干净序列(即过滤后的序列)。然后使用bwa(Burrows-Wheeler Aligner)软件将过滤后的序列比对到人类的参考基因组Hg19(http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/)上,获得比对结果,然后利用VarScan软件进行突变检测,获得候选突变,并对候选突变结果进行初始过滤,过滤掉低质量(base quality<20或mapping quality<30)、低覆盖度(minimal support depth<3)、位于重复区及reads两端、具有链偏向性的突变位点,最终得到潜在的突变列表。First, quality control (QC) is performed on the prepared reads, thereby removing sequences whose sequencing quality is not in conformity with the requirements and sequencing of the junction contamination, and obtaining a clean sequence (ie, the filtered sequence). The filtered sequence was then aligned to the human reference genome Hg19 (http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/) using bwa (Burrows-Wheeler Aligner) software to obtain alignment results. Then use VarScan software to detect mutations, obtain candidate mutations, and perform initial filtering on candidate mutation results, filtering out low quality (base quality<20 or mapping quality<30), low coverage (minimal support depth<3), and repeating The region and the ends of the reads, with strand-biased mutation sites, eventually yielded a list of potential mutations.
对得到的潜在突变列表通过ANNOVA软件进行注释,排除其中同义突变。然后使用人群突变数据库(如千人基因组数据库(http://www.1000genomes.org),ExAC数据库和Esp6500数据库),过滤人群中常见的多态性位点。使用致病突变数据库(如ClinVar),过滤掉良性突变,并得到最终突变结果,即获得目标突变数据。其中,同义突变是一种中性突变,由于生物的遗传密码子存在简并现象,当发生同义突变后,碱基虽然被替换,产生了新的密码子,但是新旧密码子所编码的氨基酸种类保持不变,因此,这部分突变并不会对致病情况带来任何影响。A list of potential mutations obtained is annotated with ANNOVA software, excluding synonymous mutations therein. Then use a population mutation database (such as the Thousand Genome Database (http://www.1000genomes.org), ExAC database and Esp6500 database) to filter the common polymorphic sites in the population. Using a pathogenic mutation database (such as ClinVar), the benign mutation is filtered out and the final mutation result is obtained, that is, the target mutation data is obtained. Among them, the synonymous mutation is a neutral mutation. Due to the degenerate phenomenon of the genetic code of the organism, when the synonymous mutation occurs, the base is replaced, and a new codon is generated, but the new and old codons are encoded. The amino acid type remains unchanged, so this part of the mutation does not have any effect on the pathogenic condition.
实验结果表明:通过对16例样本的测序数据进行分析及过滤,共检测到163个癌症突变位点。The experimental results showed that 163 cancer mutation sites were detected by analyzing and filtering the sequencing data of 16 samples.
实施例三Embodiment 3
本实施例利用与实施例二同样的样品,利用hiseq2000测序平台,根据其全基因组测序的方法,按照其操作指南构建每个样本对应的测序文库,同时按照与实施例二中步骤4相同的方法,确定癌症突变。实验结果表明,利用全基因组测序的方法,对这16例弥漫性大B细胞淋巴癌患者进行检测,相比较于正常样品,共检测到174个癌症突变位点。This example utilizes the same sample as in the second embodiment, and uses the hiseq2000 sequencing platform to construct a sequencing library corresponding to each sample according to its operation guide according to the method of whole genome sequencing, and according to the same method as step 4 in the second embodiment. To determine cancer mutations. The results of the experiment showed that 16 patients with diffuse large B-cell lymphoma were detected by whole-genome sequencing. A total of 174 cancer mutation sites were detected compared with normal samples.
对比实施例二和实施例三的实验结果可以看出,利用全基因组测序的方式,在所有16例弥漫性大B细胞淋巴癌患者中共计检测出174个癌症突变,而利用与恶性淋巴瘤相关的212个目的基因捕获的方式,在这174个癌症突变中,共计检测到163个癌症突变,对比两个检测结果可以看出,采用与恶性淋巴瘤相关的212个目的基因捕获的方式进行突变检测, 相比较于全基因组进行测序来进行突变检测,其整体灵敏度达到93.7%。每个样品的详细检测情况如下表5所示,其中包括SNP突变位点以及插入缺失变异(Indel变异):Comparing the experimental results of Example 2 and Example 3, it can be seen that a total of 174 cancer mutations were detected in all 16 patients with diffuse large B-cell lymphoma using whole-genome sequencing, and the use was related to malignant lymphoma. Of the 212 target gene capture methods, a total of 163 cancer mutations were detected in the 174 cancer mutations. Comparing the two results, it was observed that mutations were made using 212 target gene captures associated with malignant lymphoma. Detection, compared to the whole genome sequencing for mutation detection, the overall sensitivity reached 93.7%. The detailed detection of each sample is shown in Table 5 below, including SNP mutation sites and insertion deletion variants (Indel mutations):
表5 每个样本的详细检测情况Table 5 Detailed test results for each sample
Figure PCTCN2018079061-appb-000029
Figure PCTCN2018079061-appb-000029
同时,通过目的基因捕获的方式检测到的最小等位基因频率与通过全基因组测序检测到的最小等位基因频率的相关性高达0.8186(r2,皮尔森相关性系数)如图4所示。其中,图4中横坐标代表的利用全基因组测序获得的最小等位基因频率(MAF in WGS,Minor allele frequency in Whole-genome-sequencing),纵坐标代表的是利用目标基因捕获测序获得的最小等位基因频率(MAF in LC,Minor allele frequency in Low-coverage whole-genome-sequencing)。由此可见,通过目的基因捕获的方式检测到的最小等位基因频率与通过全基因组测序检测到的最小等位基因频率的相关性高达80%以上,表现出良好的相关性。At the same time, the correlation between the minimum allele frequency detected by the target gene capture and the minimum allele frequency detected by whole genome sequencing was as high as 0.8186 (r2, Pearson correlation coefficient) as shown in FIG. Among them, the abscissa in Figure 4 represents the minimum allele frequency (MAF in WGS, Minor allele frequency in Whole-genome-sequencing) obtained by whole genome sequencing, and the ordinate represents the minimum obtained by target gene capture sequencing. The gene frequency (MAF in LC, Minor allele frequency in Low-coverage whole-genome-sequencing). It can be seen that the correlation between the minimum allele frequency detected by the target gene capture and the minimum allele frequency detected by whole genome sequencing is as high as 80% or more, showing a good correlation.
在本发明中,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”、“固定”等术语应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或成一体;可以是机 械连接,也可以是电连接或彼此可通讯;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系,除非另有明确的限定。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本发明中的具体含义。In the present invention, the terms "installation", "connected", "connected", "fixed" and the like shall be understood broadly, and may be either a fixed connection or a detachable connection, unless explicitly stated and defined otherwise. Or in one piece; it may be a mechanical connection, or it may be an electrical connection or a communication with each other; it may be directly connected or indirectly connected through an intermediate medium, and may be an internal connection of two elements or an interaction relationship between two elements. Unless otherwise expressly defined. For those skilled in the art, the specific meanings of the above terms in the present invention can be understood on a case-by-case basis.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of the present specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" and the like means a specific feature described in connection with the embodiment or example. A structure, material or feature is included in at least one embodiment or example of the invention. In the present specification, the schematic representation of the above terms is not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples. In addition, various embodiments or examples described in the specification, as well as features of various embodiments or examples, may be combined and combined.
尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present invention have been shown and described, it is understood that the above-described embodiments are illustrative and are not to be construed as limiting the scope of the invention. The embodiments are subject to variations, modifications, substitutions and variations.

Claims (30)

  1. 一种恶性淋巴瘤标志物,其特征在于,包括下表中的基因:A marker of malignant lymphoma characterized by including the genes in the following table:
    Figure PCTCN2018079061-appb-100001
    Figure PCTCN2018079061-appb-100001
    Figure PCTCN2018079061-appb-100002
    Figure PCTCN2018079061-appb-100002
  2. 根据权利要求1所述的标志物,其特征在于,所述恶性淋巴瘤为弥漫性大B细胞淋巴瘤。The marker according to claim 1, wherein the malignant lymphoma is a diffuse large B-cell lymphoma.
  3. 一种探针,其特征在于,所述探针针对权利要求1或2所述的标志物中的所有外显子区域以及外显子与内含子的连接区域设计而成,所述探针特异性识别权利要求1或2所述的标志物编码区的至少一部分,且所述探针满足选自下列条件的至少之一:A probe characterized in that all of the exon regions and the junction regions of exons and introns in the marker according to claim 1 or 2 are designed, the probe Specifically identifying at least a portion of the marker coding region of claim 1 or 2, and the probe satisfies at least one selected from the group consisting of:
    (1)所述探针的长度为75-85bp;(1) the length of the probe is 75-85 bp;
    (2)所述探针特异性识别权利要求1所述的标志物编码区上游10bp至下游10bp之间的序列;(2) the probe specifically recognizes a sequence from 10 bp upstream to 10 bp downstream of the marker coding region of claim 1;
    (3)特异性识别GC含量高于0.6及低于0.3的区域的探针,乘数大于2;(3) a probe that specifically recognizes a region having a GC content higher than 0.6 and lower than 0.3, the multiplier is greater than 2;
    (4)所述探针与目标序列的熔解温度为60-10摄氏度,优选80摄氏度;(4) the melting temperature of the probe and the target sequence is 60-10 degrees Celsius, preferably 80 degrees Celsius;
    (5)所述探针不包含发夹结构;(5) the probe does not comprise a hairpin structure;
    (6)所述探针与参考基因组上的至多2个位点匹配;(6) the probe matches at most 2 sites on the reference genome;
    (7)所述探针选择时的窗口滑动大小为10bp。(7) The window sliding size when the probe is selected is 10 bp.
  4. 根据权利要求3所述的探针,其特征在于,条件(1)中所述探针的长度为81bp。The probe according to claim 3, wherein the probe has a length of 81 bp in the condition (1).
  5. 一种基因芯片,其特征在于,所述基因芯片包括探针和支持物,所述探针位于所述支持物表面,所述探针为权利要求3或4所述的探针。A gene chip, characterized in that the gene chip comprises a probe and a support, the probe being located on the surface of the support, and the probe is the probe according to claim 3 or 4.
  6. 根据权利要求5所述的基因芯片,其特征在于,所述基因芯片为液相芯片,所述支持物为含有不同荧光标记的微球。The gene chip according to claim 5, wherein the gene chip is a liquid phase chip, and the support is a microsphere containing different fluorescent labels.
  7. 一种构建待测样品的恶性淋巴瘤检测测序文库的方法,其特征在于,包括:A method for constructing a sequencing library for detecting a malignant lymphoma of a sample to be tested, comprising:
    对待测样品的目标序列进行富集,所述目标序列为权利要求1或2所述的标志物,且富集获得的目标序列构成所述用于恶性淋巴瘤检测的测序文库。The target sequence of the sample to be tested is enriched, the target sequence is the marker of claim 1 or 2, and the enriched target sequence constitutes the sequencing library for malignant lymphoma detection.
  8. 根据权利要求7所述的方法,其特征在于,利用权利要求3或4所述的探针或者权利要求5或6所述的基因芯片对所述待测样品的目标序列进行杂交捕获,从而实现所述富集。The method according to claim 7, wherein the target sequence of the sample to be tested is hybridized and captured by using the probe according to claim 3 or 4 or the gene chip according to claim 5 or 6, thereby realizing The enrichment.
  9. 根据权利要求7所述的方法,其特征在于,进一步包括:对所述用于恶性淋巴瘤检测的测序文库进行测序,以便获得测序序列。The method of claim 7, further comprising sequencing the sequencing library for malignant lymphoma detection to obtain a sequencing sequence.
  10. 根据权利要求9所述的方法,其特征在于,采用BGISeq-500测序平台对所述用于恶性淋巴瘤检测的测序文库进行测序。The method of claim 9, wherein the sequencing library for malignant lymphoma detection is sequenced using a BGISeq-500 sequencing platform.
  11. 根据权利要求9所述的方法,其特征在于,所述测序序列的测序深度达到400×以上,所述测序序列的覆盖度达到99%以上。The method according to claim 9, wherein the sequencing sequence has a sequencing depth of 400× or more, and the coverage of the sequencing sequence reaches 99% or more.
  12. 根据权利要求9所述的方法,其特征在于,所述测序序列的原始数据量在3Gb以上。The method of claim 9 wherein the raw data amount of said sequencing sequence is above 3 Gb.
  13. 一种确定待测样品中恶性淋巴瘤的基因突变的方法,其特征在于,包括:A method for determining a gene mutation of a malignant lymphoma in a sample to be tested, characterized in that it comprises:
    按照权利要求7-12中任一项所述的方法构建所述待测样品的恶性淋巴瘤检测测序文库;Constructing a malignant lymphoma detection sequencing library of the sample to be tested according to the method of any one of claims 7-12;
    对所述恶性淋巴瘤检测的测序文库进行测序,以便获得测序序列;Sequencing the sequencing library of the malignant lymphoma detection to obtain a sequencing sequence;
    将所述测序序列比对到参考基因组上,进行突变检测,得到候选突变数据;The sequencing sequence is aligned to a reference genome, and mutation detection is performed to obtain candidate mutation data;
    对所述候选突变数据进行筛选,获得潜在突变数据;Screening the candidate mutation data to obtain potential mutation data;
    对所述潜在突变数据进行注释,从而获得目标突变数据。The potential mutation data is annotated to obtain target mutation data.
  14. 根据权利要求13所述的方法,其特征在于,所述参考基因组为人类参考基因组hg19。The method of claim 13 wherein said reference genome is human reference genome hg19.
  15. 根据权利要求13所述的方法,其特征在于,利用VarScan软件进行所述突变检测。The method of claim 13 wherein said mutation detection is performed using VarScan software.
  16. 根据权利要求13所述的方法,其特征在于,对所述候选突变数据进行筛选包括:过滤掉低质量、低覆盖度、位于重复区及序列两端的以及具有链偏向性的候选突变,The method according to claim 13, wherein the screening of the candidate mutation data comprises: filtering out low quality, low coverage, candidate mutations located at the ends of the repeat region and the sequence, and having chain bias,
    其中,所述低质量的候选突变是指碱基质量值小于20或比对质量值小于30的候选突变,所述低覆盖度的候选突变是指最小支持数小于3的候选突变。Wherein the low-mass candidate mutation refers to a candidate mutation having a base mass value of less than 20 or a pairwise mass value of less than 30, and the low-coverage candidate mutation refers to a candidate mutation having a minimum support number of less than 3.
  17. 根据权利要求13所述的方法,其特征在于,利用ANNOVA软件进行所述注释,利用人群突变数据库过滤掉多态性位点,利用致病突变数据库过滤掉良性突变。The method according to claim 13, wherein the annotation is performed using ANNOVA software, the polymorphic site is filtered out using a population mutation database, and the benign mutation is filtered using the disease-causing mutation database.
  18. 根据权利要求17所述的方法,其特征在于,所述人群突变数据库选自千人基因组数据库、ExAc数据库和Esp6500数据库中的至少一种。The method according to claim 17, wherein said population mutation database is selected from at least one of a thousand human genome database, an ExAc database, and an Esp6500 database.
  19. 根据权利要求17所述的方法,其特征在于,所述致病突变数据库为ClinVar。The method of claim 17 wherein said pathogenic mutation database is ClinVar.
  20. 根据权利要求13-19中任一项所述的方法,其特征在于,进一步包括:The method of any of claims 13 to 19, further comprising:
    在所述突变检测之前,对所述测序序列进行质量控制,从而过滤掉低质量及接头污染序列,然后将过滤后的序列比对到所述参考基因组上。Prior to said mutation detection, the sequencing sequence is quality controlled to filter out low quality and linker contamination sequences, and the filtered sequences are then aligned to the reference genome.
  21. 一种确定待测样品中恶性淋巴瘤的基因突变的系统,其特征在于,包括:A system for determining a gene mutation of a malignant lymphoma in a sample to be tested, characterized in that it comprises:
    目标区域文库构建单元,所述目标区域文库构建单元基于权利要求1或2所述的标志物作为目标区域,从而构建目标区域文库;a target region library construction unit, wherein the target region library construction unit is based on the marker according to claim 1 or 2 as a target region, thereby constructing a target region library;
    测序单元,所述测序单元与所述目标区域文库构建单元相连,所述测序单元对所述目标区域文库进行检测,以便获得测序序列;a sequencing unit, the sequencing unit is connected to the target region library building unit, and the sequencing unit detects the target region library to obtain a sequencing sequence;
    候选突变确定单元,所述候选突变确定单元与所述测序单元相连,所述候选突变确定单元用于将目标区域文库中的测序序列比对到参考基因组上,进行突变检测,得到候选突变 数据;a candidate mutation determining unit, wherein the candidate mutation determining unit is connected to the sequencing unit, and the candidate mutation determining unit is configured to compare the sequencing sequence in the target region library to the reference genome, and perform mutation detection to obtain candidate mutation data;
    潜在突变确定单元,所述潜在突变确定单元与所述候选突变确定单元相连,所述潜在突变确定单元用于对所述候选突变数据进行筛选,以便获得潜在突变数据;a potential mutation determining unit, the potential mutation determining unit being connected to the candidate mutation determining unit, wherein the potential mutation determining unit is configured to screen the candidate mutation data to obtain potential mutation data;
    目标突变确定单元,所述目标突变确定单元与所述潜在突变确定单元相连,所述目标突变确定单元用于对所述潜在突变数据进行注释,从而获得目标突变数据。The target mutation determining unit is connected to the potential mutation determining unit, and the target mutation determining unit is configured to annotate the potential mutation data to obtain target mutation data.
  22. 根据权利要求21所述的系统,其特征在于,所述参考基因组为人类参考基因组hg。The system of claim 21 wherein said reference genome is human reference genome hg.
  23. 根据权利要求21所述的系统,其特征在于,利用VarScan软件进行所述突变检测。The system of claim 21 wherein said mutation detection is performed using VarScan software.
  24. 根据权利要求21所述的系统,其特征在于,对所述候选突变数据进行筛选包括:过滤掉低质量、低覆盖度、位于重复区及序列两端的以及具有链偏向性的候选突变,The system according to claim 21, wherein the screening of the candidate mutation data comprises: filtering out low quality, low coverage, candidate mutations located at the ends of the repeat region and the sequence, and having chain bias.
    其中所述低质量的候选突变是指碱基质量值小于20或比对质量值小于30的候选突变,所述低覆盖度的候选突变是指最小支持数小于3的候选突变。Wherein the low-mass candidate mutation refers to a candidate mutation having a base mass value of less than 20 or a ratio of less than 30, and the low-coverage candidate mutation refers to a candidate mutation having a minimum support number of less than 3.
  25. 根据权利要求21所述的系统,其特征在于,利用ANNOVA软件进行所述注释,利用人群突变数据库过滤掉多态性位点,利用致病突变数据库过滤掉良性突变。The system according to claim 21, wherein said annotation is performed using ANNOVA software, the polymorphic site is filtered using a population mutation database, and the benign mutation is filtered using the disease-causing mutation database.
  26. 根据权利要求25所述的系统,其特征在于,所述人群突变数据库选自千人基因组数据库、ExAc数据库和Esp6500数据库中的至少一种。The system according to claim 25, wherein said population mutation database is selected from at least one of a thousand human genome database, an ExAc database, and an Esp6500 database.
  27. 根据权利要求25所述的系统,其特征在于,所述致病突变数据库为ClinVar。The system of claim 25 wherein said pathogenic mutation database is ClinVar.
  28. 根据权利要求21-27中任一项所述的系统,其特征在于,进一步包括:A system according to any one of claims 21 to 27, further comprising:
    质量控制单元,所述质量控制单元与所述测序单元相连,所述质量控制单元用于在所述突变检测之前,对所述测序序列进行质量控制,从而过滤掉低质量及接头污染序列,然后将过滤后的序列比对到所述参考基因组上。a quality control unit, the quality control unit being coupled to the sequencing unit, the quality control unit for performing quality control on the sequencing sequence prior to the detecting of the mutation, thereby filtering out low quality and joint contamination sequences, and then filtering The filtered sequences are aligned to the reference genome.
  29. 权利要求1或2所述的标志物在制备恶性淋巴瘤基因突变的检测和/或确定的试剂中的用途。Use of the marker of claim 1 or 2 in the preparation of a reagent for detecting and/or determining a mutation in a malignant lymphoma gene.
  30. 权利要求1或2所述的标志物在恶性淋巴瘤基因突变的检测和/或确定领域中的用途。Use of the marker of claim 1 or 2 in the field of detection and/or determination of a mutation in a malignant lymphoma gene.
PCT/CN2018/079061 2018-03-14 2018-03-14 Malignant lymphoma marker and application thereof WO2019173991A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/079061 WO2019173991A1 (en) 2018-03-14 2018-03-14 Malignant lymphoma marker and application thereof
CN201880083693.1A CN111655868A (en) 2018-03-14 2018-03-14 Malignant lymphoma markers and their applications

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/079061 WO2019173991A1 (en) 2018-03-14 2018-03-14 Malignant lymphoma marker and application thereof

Publications (1)

Publication Number Publication Date
WO2019173991A1 true WO2019173991A1 (en) 2019-09-19

Family

ID=67908693

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/079061 WO2019173991A1 (en) 2018-03-14 2018-03-14 Malignant lymphoma marker and application thereof

Country Status (2)

Country Link
CN (1) CN111655868A (en)
WO (1) WO2019173991A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101470112A (en) * 2007-12-28 2009-07-01 上海交通大学医学院附属瑞金医院 Molecular markers for treatment guidance and prognosis in diffuse large B-cell lymphoma
CN101470119A (en) * 2007-12-28 2009-07-01 上海交通大学医学院附属瑞金医院 Molecular Pathological Classification of Diffuse Large B-Cell Lymphoma and Its Application
CN103717755A (en) * 2011-08-08 2014-04-09 霍夫曼-拉罗奇有限公司 Predicting response to anti-cd20 therapy in dlbcl patients

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1829967A4 (en) * 2004-12-03 2008-04-23 Aichi Prefecture METHOD FOR DIAGNOSING MALIGNANT MALIGNANT AND ASSOCIATED PROGNOSTIC ESTIMATION
KR100689274B1 (en) * 2005-03-30 2007-03-08 김현기 Human primary cancer gene, protein encoded thereby
JPWO2006112483A1 (en) * 2005-04-19 2008-12-11 愛知県 Method for diagnosing disease type and prognosis of diffuse large B-cell lymphoma
WO2009082856A1 (en) * 2007-12-28 2009-07-09 Ruijin Hospital Affiliated To The Shanghai Jiao Tong University Medical School Molecule marker used for prognosticating diffuse large b cell lymphoma
EP2297349A1 (en) * 2008-06-04 2011-03-23 The Arizona Board Of Regents On Behalf Of The University Of Arizona Diffuse large b-cell lymphoma markers and uses therefor
WO2012031008A2 (en) * 2010-08-31 2012-03-08 The General Hospital Corporation Cancer-related biological materials in microvesicles
CN101968491A (en) * 2010-09-29 2011-02-09 上海生物芯片有限公司 Molecular pathological typing method and kit for diffuse large B cell lymphoma and application
CN102808019B (en) * 2011-06-03 2014-02-19 复旦大学附属肿瘤医院 Diffuse large B-cell lymphoma molecular marker detection method and its application
JP2016080672A (en) * 2014-10-17 2016-05-16 勇 廣▲瀬▼ Detection method of malignant lymphoma
CN105624272B (en) * 2014-10-29 2019-08-09 深圳华大基因科技有限公司 The construction method and device in genome presumptive area nucleic acid sequencing library
CN105603052B (en) * 2014-11-11 2021-03-19 武汉华大医学检验所有限公司 Probe and use thereof
CN105779572B (en) * 2014-12-22 2020-07-07 深圳华大基因研究院 Chip and method for capturing target sequence of tumor susceptibility gene and mutation detection method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101470112A (en) * 2007-12-28 2009-07-01 上海交通大学医学院附属瑞金医院 Molecular markers for treatment guidance and prognosis in diffuse large B-cell lymphoma
CN101470119A (en) * 2007-12-28 2009-07-01 上海交通大学医学院附属瑞金医院 Molecular Pathological Classification of Diffuse Large B-Cell Lymphoma and Its Application
CN103717755A (en) * 2011-08-08 2014-04-09 霍夫曼-拉罗奇有限公司 Predicting response to anti-cd20 therapy in dlbcl patients

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MARA COMPAGNO: "Mutations of multiple genes cause deregulation of NF-kB in diffuse large B- cell lymphoma", NATURE, vol. 459, 7247, 4 June 2009 (2009-06-04), pages 717 - 721, XP055635287, ISSN: 0028-0836 *

Also Published As

Publication number Publication date
CN111655868A (en) 2020-09-11

Similar Documents

Publication Publication Date Title
AU2023202572B2 (en) Single-molecule sequencing of plasma DNA
CN108753967B (en) Gene set for liver cancer detection and panel detection design method thereof
CN105518151A (en) Identification and use of circulating nucleic acid tumor markers
US20250051757A1 (en) Non-unique barcodes in a genotyping assay
CN105734679B (en) Nucleic acid target sequence captures the preparation method of sequencing library
Lv et al. Detection of rare mutations in CtDNA using next generation sequencing
CN110343748A (en) Method based on high-throughput targeting sequencing analysis Tumor mutations load
CN105925665A (en) Kit, database establishment method, and method and system for detecting area target variation
CN117441027A (en) Heatrich-BS: thermal enrichment of CpG-rich regions for bisulfite sequencing
WO2019173991A1 (en) Malignant lymphoma marker and application thereof
CN116219016A (en) Thyroid nodule benign and malignant detection method, kit and application
CN117418003A (en) Marker, probe and application thereof
CN113948150B (en) JMML related gene methylation level evaluation method, model and construction method
KR102695246B1 (en) Simμltaneous analytic method and system of genome and epigenome information
Yin Comprehensive Data Analysis Toolkit Development for a Low Input Bisulfite Sequencing
de Leng et al. Sequencing Approaches for Personalized Cancer Therapy Selection in Pathology
CN117512116A (en) A biomarker for cholangiocarcinoma detection and its application
CN118139987A (en) Compositions and methods for CFRNA and CFTNA targeted NGS sequencing
WO2023164713A1 (en) Probe sets for a liquid biopsy assay
WO2023058522A1 (en) Method for analyzing structural polymorphism, primer pair set, and method for designing primer pair set
CN119799898A (en) Kit for molecular typing diagnosis of kidney cancer and application thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18909912

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18909912

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载