WO2019173991A1

WO2019173991A1 - Malignant lymphoma marker and application thereof

Info

Publication number: WO2019173991A1
Application number: PCT/CN2018/079061
Authority: WO
Inventors: 潘嫱; 叶晓飞; 苏红; 刘栋兵; 任伟成; 吴逵; 朱师达
Original assignee: 深圳华大生命科学研究院; 潘嫱
Priority date: 2018-03-14
Filing date: 2018-03-14
Publication date: 2019-09-19
Also published as: CN111655868A

Abstract

Provided is a malignant lymphoma marker. The malignant lymphoma marker comprises 212 genes in total, such as AICDA and AKT1. Further provided is an application of the marker in the fields of gene sequencing and medical detection.

Description

Malignant lymphoma marker and its application

Technical field

The invention relates to the field of gene sequencing and medical detection, in particular to a malignant lymphoma marker and application thereof, in particular to a malignant lymphoma marker and a probe and a chip for detecting the marker, and constructing a malignant sample to be tested A method and system for determining a genetic mutation in a malignant lymphoma in a sample to be tested by a method for detecting a sequencing library of lymphoma.

Background technique

Malignant lymphoma is a type of systemic disease that is closely related to the functional status of the body's immune system. It is different from other solid malignant tumors and different from blood tumors. It includes a disease of Hodgkin's lymphoma and a group of diseases of non-Hodgkin's lymphoma. The clinical manifestations are complicated by the type of pathology, stage and invasion. Currently, multiple FDA-approved molecularly targeted drugs are available for malignant lymphomas such as Ibrutinib (BTK) and Idelalisib (PI3K delta), so accurate and timely detection of malignant lymphoma gene mutations is significant for clinical diagnosis and treatment. The meaning.

However, the detection and determination of genetic mutations associated with malignant lymphomas needs to be improved.

Summary of the invention

The present invention aims to solve at least to some extent one of the technical problems in the related art, and to improve the efficiency and sensitivity of detection of a gene mutation of a malignant lymphoma. To this end, an object of the present invention is to provide a malignant lymphoma marker and a probe and a chip for detecting the same, and a method for constructing a sequencing library of a malignant lymphoma detection sample to determine a malignant lymphocyte in a sample to be tested. Methods and systems for gene mutations in tumors.

According to an aspect of the invention, the invention provides a malignant lymphoma marker comprising the genes in the following table:

The present invention selects 212 genes which are highly associated with malignant lymphoma as markers related to malignant lymphoma, and the present invention is stronger than the technique based on detection of all genes associated with multiple cancers at one time. Targeted, and the detection range is smaller, the detection cost is lower, and the efficiency can be significantly improved while improving the efficiency.

According to an embodiment of the invention, the malignant lymphoma is a diffuse large B-cell lymphoma.

According to another aspect of the invention, the invention provides a probe for a malignant lymphoma. According to an embodiment of the invention, the probe is designed for all exon regions in the marker described in the above table and the junction region of the exon and the intron, the probe specifically recognizing the above The at least one portion of the malignant lymphoma marker coding region, and the probe satisfies at least one selected from the group consisting of:

(1) the length of the probe is 75-85 bp, preferably 81 bp;

(2) the probe specifically recognizes a sequence from 10 bp upstream to 10 bp downstream of the marker coding region described in the above Examples;

(3) a probe that specifically recognizes a region having a GC content higher than 0.6 and lower than 0.3, the multiplier is greater than 2;

(4) the melting temperature of the probe and the target sequence is 60-10 degrees Celsius, preferably 80 degrees Celsius;

(5) the probe does not comprise a hairpin structure;

(6) the probe matches at most 2 sites on the reference genome;

(7) The window sliding size when the probe is selected is 10 bp.

According to still another aspect of the present invention, the present invention provides a gene chip. According to an embodiment of the invention, the gene chip comprises a probe and a support, the probe being located on the surface of the support, the probe being the probe described in the above embodiments.

According to an embodiment of the present invention, the gene chip may further add the following technical features:

According to an embodiment of the invention, the gene chip is a liquid phase chip and the support is a microsphere containing different fluorescent labels.

According to another aspect of the present invention, the present invention provides a method for constructing a sequencing library of a malignant lymphoma detection sample to be tested, comprising: enriching a target sequence of a sample to be tested, the target sequence being as described in the above table A malignant lymphoma marker, and the enriched target sequence constitutes the sequencing library for malignant lymphoma detection.

According to an embodiment of the invention, the method may further add the following technical features:

According to an embodiment of the present invention, in the method, the target sequence of the sample to be tested is subjected to hybridization capture using the probe described in the above embodiment or the gene chip described in the above embodiment, thereby achieving the enrichment.

According to an embodiment of the invention, the method further comprises: sequencing the sequencing library for malignant lymphoma detection to obtain a sequencing sequence.

According to an embodiment of the invention, in the method, the sequencing library for malignant lymphoma detection is sequenced using a BGISeq-500 sequencing platform.

According to an embodiment of the present invention, in the method, the sequencing sequence has a sequencing depth of 400× or more, and the coverage of the sequencing sequence reaches 99% or more.

According to an embodiment of the invention, in the method, the raw data amount of the sequencing sequence is above 3Gb.

According to another aspect of the present invention, the present invention provides a method of determining a gene mutation of a malignant lymphoma in a sample to be tested. According to an embodiment of the invention, the method comprises:

Constructing a sequencing library of malignant lymphoma detection of the sample to be tested according to the construction method described in the above examples;

Sequencing the sequencing library of the malignant lymphoma detection to obtain a sequencing sequence;

The sequencing sequence is aligned to a reference genome, and mutation detection is performed to obtain candidate mutation data;

Screening the candidate mutation data to obtain potential mutation data;

The potential mutation data is annotated to obtain target mutation data.

According to an embodiment of the present invention, the above method for determining a gene mutation for a malignant lymphoma may further include the following technical features:

According to an embodiment of the invention, in the method, the reference genome is the human reference genome hg19.

According to an embodiment of the invention, in the method, the mutation detection is performed using VarScan software.

According to an embodiment of the present invention, in the method, screening the candidate mutation data comprises: filtering out low quality, low coverage, candidate mutations located at both ends of the repeat region and the sequence, and having chain bias, wherein A low-mass candidate mutation refers to a candidate mutation having a base mass value of less than 20 or a ratio of less than 30, and the candidate mutation of the low coverage refers to a candidate mutation having a minimum support number of less than 3.

According to an embodiment of the present invention, in the method, the annotation is performed by using ANNOVA software, the polymorphic site is filtered out by using a population mutation database, and the benign mutation is filtered out by using the pathogenic mutation database.

According to an embodiment of the present invention, in the method, the population mutation database is selected from at least one of a thousand human genome database, an ExAc database, and an Esp6500 database.

According to an embodiment of the invention, in the method, the disease-causing mutation database is ClinVar.

According to an embodiment of the invention, the method further comprises:

Prior to said mutation detection, the sequencing sequence is quality controlled to filter out low quality and linker contamination sequences, and the filtered sequences are then aligned to the reference genome.

According to still another aspect of the present invention, the present invention provides a system for determining a gene mutation of a malignant lymphoma in a sample to be tested. According to an embodiment of the invention, the system comprises:

a target region library construction unit, wherein the target region library construction unit is based on the marker described in the above embodiment as a target region, thereby constructing a target region library;

a sequencing unit, the sequencing unit is connected to the target region library building unit, and the sequencing unit detects the target region library to obtain a sequencing sequence;

a candidate mutation determining unit, wherein the candidate mutation determining unit is connected to the sequencing unit, wherein the candidate mutation determining unit is configured to compare the sequencing sequence in the target region library to the reference genome, and perform mutation detection to obtain candidate mutation data;

a potential mutation determining unit, the potential mutation determining unit being connected to the candidate mutation determining unit, wherein the potential mutation determining unit is configured to screen the candidate mutation data to obtain potential mutation data;

The target mutation determining unit is connected to the potential mutation determining unit, and the target mutation determining unit is configured to annotate the potential mutation data to obtain target mutation data.

According to an embodiment of the present invention, the system for determining a gene mutation for a malignant lymphoma may further include the following technical features:

According to an embodiment of the invention, in the system, the reference genome is a human reference genome hg.

According to an embodiment of the invention, in the system, the mutation detection is performed using VarScan software.

According to an embodiment of the invention, in the system, screening the candidate mutation data comprises: filtering out low quality candidate mutations, low coverage candidate mutations, candidate mutations at both ends of the repeat region and the sequence, and having a chain bias Candidate mutations in which the low-mass candidate mutation refers to a candidate mutation having a base mass value of less than 20 or a specific mass value of less than 30, and the low-coverage candidate mutation refers to a candidate mutation having a minimum support number of less than 3. .

According to an embodiment of the invention, in the system, the annotation is performed by using ANNOVA software, the polymorphic site is filtered out by using a population mutation database, and the benign mutation is filtered out by using the pathogenic mutation database.

According to an embodiment of the invention, in the system, the population mutation database is selected from at least one of a thousand human genome database, an ExAc database, and an Esp6500 database.

According to an embodiment of the invention, in the system, the disease-causing mutation database is ClinVar.

According to an embodiment of the invention, the system further comprises:

a quality control unit, the quality control unit being coupled to the sequencing unit, the quality control unit for performing quality control on the sequencing sequence prior to the detecting of the mutation, thereby filtering out low quality and joint contamination sequences, and then filtering The filtered sequences are aligned to the reference genome.

According to another aspect of the invention, the invention provides the use of a combination of 212 markers in the above table for the preparation of a reagent for the detection and/or determination of a mutation in a malignant lymphoma gene.

According to yet another aspect of the invention, the invention provides the use of the combination of 212 markers in the above table for the detection and/or determination of genetic mutations in malignant lymphoma.

The beneficial effects obtained by the present invention are as follows: the present invention enriches 212 specific malignant lymphoma specific target genes, and then uses high-throughput sequencing means for detecting and determining a mutant gene associated with malignant lymphoma. It can be quickly and effectively used to detect single base substitutions, single base/multibase insertions or deletions in target sequences, and large fragment deletion/amplification mutation types, which can meet the high-efficiency and comprehensive detection of common malignant lymphoma gene mutations. . In particular, the method of detection by means of the BGISEQ-500 second-generation sequencing platform has the advantages of wide application range, high efficiency, comprehensiveness and easy operation, and realizes rapid and efficient determination of genes related to malignant lymphoma.

DRAWINGS

1 is a schematic diagram of a system for determining genetic mutations in a malignant lymphoma, in accordance with an embodiment of the present invention.

2 is a schematic diagram of a system for determining genetic mutations in a malignant lymphoma, in accordance with an embodiment of the present invention.

3 is a schematic diagram of obtaining a target mutation by analyzing a sequencing sequence according to an embodiment of the present invention.

4 is a graphical representation of the consistency of mutation frequencies obtained using two detection methods, wherein the abscissa represents the minimum allele frequency (MAF in WGS) obtained using whole genome sequencing, and the ordinate represents the sigmoid representation, in accordance with an embodiment of the present invention. The minimum allele frequency (MAF in LC) obtained by target gene capture sequencing.

detailed description

The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the drawings are intended to be illustrative of the invention and are not to be construed as limiting.

The method for target sequence capture and high-throughput sequencing of malignant lymphoma genes as described in the present invention is designed based on the needs of the gene mutation detection technology for malignant lymphoma. The present invention targets all exon regions and exon-intron junction regions of common malignant lymphoma genes (212 genes shown in Table 1) as target capture regions, and designs probes capable of simultaneously capturing all target sequence regions. Combine and then customize the liquid phase chip (produced by Huada Gene) and combine the BGISEQ-second generation high-throughput sequencing technology and information analysis technology to sequence all captured target sequences and different types of mutation information. Analysis to interpret the presence of malignant lymphoma cancer-driven genes and targeted drug gene mutations in the target sample, and to guide the classification and medication of malignant lymphoma according to the nature of the mutation, and to rapidly accumulate malignant lymphoma gene mutation data for industrialization. Provide strong data support. The invention has the advantages of wide application range, high efficiency, comprehensiveness, easy operation, and the like, and detects single base substitution, single base/multibase insertion or deletion, and large fragment deletion/amplification in the target sequence, and satisfies malignancy. Efficient, comprehensive detection of lymphoma gene mutations.

Malignant lymphoma marker

The inventors of the present invention collected and analyzed a plurality of genes associated with malignant lymphoma by conducting research and analysis, and finally determined 212 gene combinations related to malignant lymphoma according to their correlation and pathogenicity (eg, 1)), as a marker for identifying malignant lymphoma, can also use these markers as target regions to enrich them, and can effectively detect and/or identify genetic mutations associated with malignant lymphoma, including However, it is not limited to single base substitution, single base/multibase insertion or deletion, and large fragment deletion/amplification, so that it can satisfy the high-efficiency and comprehensive detection of malignant lymphoma gene mutations. The sensitivity is over 93%.

Table 1 Target gene combination

Tables 2 and 3 list the names of the malignant lymphoma cancer genes and their corresponding malignant lymphoma names, respectively. Based on a series of theoretical studies and experimental verification work, the inventors discovered and demonstrated the correlation between the 212 genes in the above table, and concluded that the effective detection of malignant lymphoma can be achieved by using this group of genes, and With a single gene or other combination of genes as markers, the test results are more accurate, reliable, and reproducible.

It should be noted that this group of genes is involved in important pathogenic signaling pathways of lymphoma, such as BCR, chromatin modification, apoptosis and cell cycle regulation, immunosuppression, and Notch. This group of genes has broad and comprehensive advantages in the field of lymphoma cancer gene detection. Moreover, these genes are also listed separately in the literature with high impact factors, and, to date, no report has been made to use the combination of these 212 genes as a marker for malignant lymphoma. In addition, the KLF2 gene, ZFP36L1 gene, and TMSB4X gene (KLF2 and ZFP36L1 are important regulators of NOTCH signaling pathway) are the first inventors to discover that the frequency of mutations in Asian ethnic groups is significantly higher than that of Caucasians. gene.

These genes associated with malignant lymphoma are associated with diffuse large B-cell lymphoma, mantle cell lymphoma, follicular lymphoma, Burkitt's lymphoma, especially with diffuse large B-cell lymphoma.

Table 2 List of test genes

Remarks: “Important” in Table 1 refers to an important pathogenic signaling pathway present in lymphoma.

Table 3 Corresponding to malignant lymphoma and list of references

Based on the discovery of these 212 markers associated with malignant lymphoma, the inventors designed a probe and gene chip that can be used for malignant lymphoma. The probe is designed with all exon regions and exon and intron junction regions of the 212 target cancer genes as the total target region, and the probe specifically recognizes the above 212 markers. At least a portion of the coding region, and the probe satisfies at least one selected from the group consisting of: (1) the probe has a length of 75-85 bp, preferably 81 bp; (2) the probe specifically recognizes this 212 sequences between 10 bp and 10 bp downstream of the marker coding region; (3) probes that specifically recognize regions with GC content higher than 0.6 and below 0.3, multiplier greater than 2; (4) The melting temperature of the probe to the target sequence is 60-10 degrees Celsius, preferably 80 degrees Celsius; (5) the probe does not comprise a hairpin structure; (6) the probe matches at most 2 sites on the reference genome; (7) The window sliding size when the probe is selected is 10 bp.

The probes for malignant lymphoma designed according to the above principles contain a total of 32779 probes, each of which has a length of 81 bp, and each of which contains a 16 bp and 15 bp tag sequence, and the sequence of the two tag sequences. The two tag sequences, GAAGCGAGGATCAACT (SEQ ID NO: 1) and CATTGCGTGAACCGA (SEQ ID NO: 2), respectively, are the restriction sites and transcription sites, and both ends are used to design PCR primers, and the transcription sites are simultaneously Used for transcription and functions as an RNA probe.

On this basis, the inventors have also devised a gene chip comprising a probe and a support, the probe being located on the surface of the support. According to a specific embodiment of the present invention, the gene chip may be designed as a liquid phase chip, and the support is a microsphere containing different fluorescent labels.

Method for determining gene mutation of malignant lymphoma in a sample to be tested

The inventors of the present invention found that by combining these 212 markers associated with malignant lymphoma as a target region and enriching them, a sequencing library for malignant lymphoma can be constructed, and on the basis of this, the sequencing is performed. Bioinformatics analysis of the library can effectively detect and/or identify genetic mutations associated with malignant lymphoma, including but not limited to single base substitutions, single base/multibase insertions or deletions, and large fragment deletions/amplifications The type of mutation can meet the high-efficiency and comprehensive detection of gene mutations in malignant lymphoma. It has been proved by experiments that the sensitivity is over 93%. Therefore, it can realize comprehensive detection of cancer gene mutation sites of malignant lymphoma, and has the advantages of high detection flux, high sensitivity, high specificity, high accuracy, wide coverage, etc., and effectively solves a wide range of cancer gene mutations in malignant lymphoma. Problems such as the uncertainty of the mutation site.

According to an embodiment of the present invention, the method for determining a gene mutation of a malignant lymphoma in a sample to be tested comprises: enriching a target sequence of a sample to be tested, wherein the target sequence is a combination of 212 malignant lymphoma markers, The obtained target sequence constitutes the sequencing library for detection of malignant lymphoma; the sequencing library of the malignant lymphoma detection is sequenced to obtain a sequencing sequence; the sequencing sequence is aligned to a reference genome for mutation Detecting, obtaining candidate mutation data; screening the candidate mutation data to obtain potential mutation data; annotating the potential mutation data to obtain target mutation data. The sample to be tested in the present invention may be derived from a tissue sample.

It should be noted that the method for determining a gene mutation of a malignant lymphoma in a sample to be tested can also be expressed as a method for detecting and/or determining a gene mutation for a malignant lymphoma, which is not a method for diagnosing a disease. The mutation results detected by the present invention can only indicate that the cancer tissue of the relevant individual carries a consistent cancer-driven gene mutation, and in practice, it is also necessary to combine the clinical results to confirm the individual's disease.

Those skilled in the art can enrich the DNA sequence of the target region by different techniques according to actual needs, including but not limited to the target region DNA enrichment method based on multiplex PCR technology (such as Thermo Fisher Scientific AmpliSeq technology) and Target region DNA enrichment methods based on probe hybridization techniques (such as Agilent's SureSelect technology, and Nimble's SeqCap EZ technology).

In the process of sequencing target DNA by means of high-throughput sequencing technology, Illumina's Hiseq/Miseq/NextSeq, Thermo Fisher Scientific's Ion Proton/Ion PGM, and BGI SEQ-500 of the BGI gene can be used. Sequencing platform, and three generations of sequencing platforms such as PacBio. According to a specific embodiment of the present invention, the sequencing sequence is obtained by sequencing using a BGISeq-500 sequencing platform. High-throughput sequencing using a self-developed sequencer from Huada has stronger compatibility and better sequencing results.

According to a specific embodiment of the present invention, the original data volume of the constructed sequencing library reaches 3 Gb or more, the target region has a sequencing depth of 400× or more, and the target region coverage reaches 99% or more. The sequencing depth refers to the ratio of the total number of bases (bp) and the genome size (Genome) obtained by sequencing, reflecting the average number of times a single base on the tested genome is sequenced. Sequencing coverage refers to the proportion of sequences obtained by sequencing to the entire genome.

According to a specific embodiment of the present invention, the candidate mutation data is screened, including screening for removal of low-quality, low-coverage, candidate mutation data located at the ends of the repeat region and the sequence, and having strand bias. The low-quality candidate mutation refers to a sequence having a base mass value of less than 20 (base quality < 20) or a pair of quality values (mapping quality < 30), and a low-coverage candidate mutation refers to a minimum support number of less than 3 ( Candidate mutations for minimal support depth<3). Candidate mutations with strand bias refer to candidate mutations that occur only on one strand.

System for determining genetic mutations in malignant lymphoma in a sample to be tested

The present invention is based on a combination of 212 genes associated with a lymphoma associated with the discovery of the inventors, and a combination of the 212 specific genes as a target gene, and a system for determining a gene mutation of a malignant lymphoma in a sample to be tested is designed. The system for determining a gene mutation of a malignant lymphoma in a sample to be tested according to the present invention can also be understood as a system for detecting a gene mutation of a malignant lymphoma in a sample to be tested, and is used for detecting and determining a malignant sample in a sample to be tested. A system in which lymphoma-associated genes are mutated. The system can detect single base substitutions, single base/multibase insertions or deletions in target sequences, and large fragment deletion/amplification mutation types, which can meet the high-efficiency and comprehensive detection of common malignant lymphoma gene mutations. And ok.

According to an embodiment of the present invention, the present invention provides a system for determining a gene mutation of a malignant lymphoma in a sample to be tested, as shown in FIG. 1, the system comprising: a target region library building unit, and the target region library construction The unit constructs a library of target regions based on the combination of markers in the present invention 212 as a target region; a sequencing unit, the sequencing unit is connected to the target region library building unit, and the sequencing unit detects the target region library so that Obtaining a sequencing sequence; a candidate mutation determining unit, wherein the candidate mutation determining unit is connected to the sequencing unit, wherein the candidate mutation determining unit is configured to compare the sequencing sequence in the target region library to the reference genome, and perform mutation detection to obtain Candidate mutation data; a potential mutation determining unit, the potential mutation determining unit being linked to the candidate mutation determining unit, wherein the potential mutation determining unit is configured to screen the candidate mutation data to obtain potential mutation data; Unit, the target mutation determining unit and the potential The mutation determining unit is connected, and the target mutation determining unit is configured to annotate the potential mutation data to obtain target mutation data.

According to an embodiment of the present invention, the system for determining a gene mutation of a malignant lymphoma in a sample to be tested may also be as shown in FIG. 2, the system comprising: a target region library building unit, wherein the target region library building unit is based on The marker combination in the invention 212 is used as a target region to construct a target region library; the sequencing unit is connected to the target region library construction unit, and the sequencing unit detects the target region library; the quality control unit, The quality control unit is coupled to the sequencing unit, and the quality control unit is configured to perform quality control on the sequencing sequence before the mutation detection, thereby filtering out low quality and joint contamination sequences, and then filtering the filtered Aligning a sequence to the reference genome; a candidate mutation determining unit, the candidate mutation determining unit being coupled to a quality control unit, the candidate mutation determining unit for aligning the filtered sequence to the reference genome , performing mutation detection to obtain candidate mutation data; potential mutation determining unit, the latent a mutation determining unit is coupled to the candidate mutation determining unit for screening the candidate mutation data to obtain potential mutation data; a target mutation determining unit, the target mutation determining unit and the potential The mutation determining unit is connected, and the target mutation determining unit is configured to annotate the potential mutation data to obtain target mutation data.

The solution of the present invention will be explained below in conjunction with the embodiments. Those skilled in the art will appreciate that the following examples are merely illustrative of the invention and are not to be considered as limiting the scope of the invention. Where specific techniques or conditions are not indicated in the examples, they are carried out according to the techniques or conditions described in the literature in the art or in accordance with the product specifications. The reagents or instruments used are not indicated by the manufacturer, and are conventional products that can be obtained commercially.

Example 1 Preparation of Probes and Chips

The malignant lymphoma markers in Table 1, ie all exon regions of 212 target genes and the exon-intron junction region as the total target region (about 500 kb in total), were prepared according to the following probe design principles. needle:

(1) the length of the probe is 81 bp;

(2) the probe specifically recognizes a sequence of from 10 bp upstream to 10 bp downstream of 212 of the marker coding regions in Table 1;

(5) the probe does not comprise a hairpin structure;

(6) the probe matches at most 2 sites on the reference genome;

(7) The window sliding size when the probe is selected is 10 bp.

Thus, 212 probes of the gene of interest were obtained, which specifically recognized at least a portion of the coding regions of the 212 of the markers. The finally obtained target region probe sequence contains 32779 probes, each of which has a length of 81 bp, and each of which contains a 16 bp and 15 bp tag sequence, and the sequence of the two tag sequences is GAAGCGAGGATCAACT (SEQ ID NO). : 1) and CATTGCGTGAACCGA (SEQ ID NO: 2). Among them, the two tag sequences are respectively an enzyme cleavage site and a transcription site, and both ends are used to design PCR primers, and the transcription site is used for transcription and functions as an RNA probe.

Due to the large number of probes, only a few probe sequences of individual genes are given here as examples, as follows:

KLF2 gene probe sequence (SEQ ID NO: 3)

KLF2 gene probe sequence (SEQ ID NO: 4)

KLF2 gene probe sequence (SEQ ID NO: 5)

KLF2 gene probe sequence (SEQ ID NO: 6)

KLF2 gene probe sequence (SEQ ID NO: 7)

ZFP36L1 gene probe sequence (SEQ ID NO: 8)

ZFP36L1 gene probe sequence (SEQ ID NO: 9)

ZFP36L1 gene probe sequence (SEQ ID NO: 10)

ZFP36L1 gene probe sequence (SEQ ID NO: 11)

ZFP36L1 gene probe sequence (SEQ ID NO: 12)

TMSB4X gene probe sequence (SEQ ID NO: 13)

TMSB4X gene probe sequence (SEQ ID NO: 14)

TMSB4X gene probe sequence (SEQ ID NO: 15)

TMSB4X gene probe sequence (SEQ ID NO: 16)

TMSB4X gene probe sequence (SEQ ID NO: 17)

Further, using the probes of the 212 target genes obtained above, a liquid phase capture chip was prepared and used. Among them, the liquid phase chip is prepared by using polysphere microspheres having a diameter of about 5.6 μm, a carboxyl group on the surface, and red and orange dyes inside, according to the ratio of the two dyes. The difference can be divided into 100 kinds of microspheres, each with a number. Each microsphere has a specific spectral characteristic due to the difference in internal fluorescence ratio and can be specifically recognized by the laser. Different probe molecules are coated with different numbered microspheres to detect the target molecule in the sample, and the target molecule is then combined with the reporter molecule with fluorescence. The detection of the molecule of interest is then achieved by fluorescence detection.

Embodiment 2

This study was based on 16 cases of diffuse large B-cell lymphoma cancer samples, using chip capture combined with high-throughput sequencing technology to detect and analyze the SNP and Indel mutations of 16 diffuse large B-cell lymphoma cancer genes, which were used to confirm The cancer gene in this batch of samples drives mutations.

Among them, the experimental samples used were 16 tissue samples clinically diagnosed as diffuse large B-cell lymphoma. The specific experimental methods are as follows:

1. Genomic DNA extraction from cancer tissues

Genomic DNA was extracted from diffuse large B-cell lymphoma tissue samples using the QIAGEN DNA Tissue and Blood mini kit and using the QIAGEN DNA Tissue and Blood mini kit, as described in the kit's extraction instructions. Fluorescence analyzer to detect DNA concentration, the required concentration is greater than 5ng / μL, the volume is greater than 30μL, and in principle, the DNA yield of each sample is ≥ 2μg, then the DNA is detected by electrophoresis and its degradation degree, which is not suitable for the seriously degraded sample. The library, wherein the electrophoresis conditions were: 1% agarose gel, electrophoresis voltage 4 V/cm, electrophoresis time 45 min. The results of agarose gel electrophoresis showed that the DNA of all samples was intact and substantially free of degradation.

2. Library construction before sequencing

100 ng of genomic DNA was taken and randomly interrupted by enzyme digestion using a DNA interrupter, and the terminal repair and A were simultaneously performed; followed by ligation and purification, PCR amplification, obtaining a pre-hybrid library, and using the Agient 2100 bioanalyzer. The second fragment is screened to obtain a length fragment of 150-500 bp; then the PCR product is subjected to target region hybridization capture using a liquid phase capture chip, and the target DNA is eluted from the probe by an elution reagent to obtain a desired target DNA. After that, PCR amplification is performed. The resulting product was cyclized to construct a library captured in the region of interest, wherein the yield of the hybrid library obtained was greater than 160 ng.

Among them, the liquid phase capture chip used was prepared as in Example 1.

3. High-throughput sequencing

The library DNA after the quality control was subjected to sequencing on the basis of the operation instructions of BGISeq-500 sequencing. The obtained raw data amount of each sample reached more than 3Gb, the average sequencing depth of the target area reached 400×, and the target area coverage was over 99%. The quality of the sequencing data of 16 samples is shown in Table 4 below.

Table 4 Basic information of sequencing samples

4, sequencing data filtering, comparison, mutation analysis

After the sequencing is completed, the biological information analysis is performed on the offline data, and the flow is as follows (as shown in FIG. 3):

First, quality control (QC) is performed on the prepared reads, thereby removing sequences whose sequencing quality is not in conformity with the requirements and sequencing of the junction contamination, and obtaining a clean sequence (ie, the filtered sequence). The filtered sequence was then aligned to the human reference genome Hg19 (http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/) using bwa (Burrows-Wheeler Aligner) software to obtain alignment results. Then use VarScan software to detect mutations, obtain candidate mutations, and perform initial filtering on candidate mutation results, filtering out low quality (base quality<20 or mapping quality<30), low coverage (minimal support depth<3), and repeating The region and the ends of the reads, with strand-biased mutation sites, eventually yielded a list of potential mutations.

A list of potential mutations obtained is annotated with ANNOVA software, excluding synonymous mutations therein. Then use a population mutation database (such as the Thousand Genome Database (http://www.1000genomes.org), ExAC database and Esp6500 database) to filter the common polymorphic sites in the population. Using a pathogenic mutation database (such as ClinVar), the benign mutation is filtered out and the final mutation result is obtained, that is, the target mutation data is obtained. Among them, the synonymous mutation is a neutral mutation. Due to the degenerate phenomenon of the genetic code of the organism, when the synonymous mutation occurs, the base is replaced, and a new codon is generated, but the new and old codons are encoded. The amino acid type remains unchanged, so this part of the mutation does not have any effect on the pathogenic condition.

The experimental results showed that 163 cancer mutation sites were detected by analyzing and filtering the sequencing data of 16 samples.

Embodiment 3

This example utilizes the same sample as in the second embodiment, and uses the hiseq2000 sequencing platform to construct a sequencing library corresponding to each sample according to its operation guide according to the method of whole genome sequencing, and according to the same method as step 4 in the second embodiment. To determine cancer mutations. The results of the experiment showed that 16 patients with diffuse large B-cell lymphoma were detected by whole-genome sequencing. A total of 174 cancer mutation sites were detected compared with normal samples.

Comparing the experimental results of Example 2 and Example 3, it can be seen that a total of 174 cancer mutations were detected in all 16 patients with diffuse large B-cell lymphoma using whole-genome sequencing, and the use was related to malignant lymphoma. Of the 212 target gene capture methods, a total of 163 cancer mutations were detected in the 174 cancer mutations. Comparing the two results, it was observed that mutations were made using 212 target gene captures associated with malignant lymphoma. Detection, compared to the whole genome sequencing for mutation detection, the overall sensitivity reached 93.7%. The detailed detection of each sample is shown in Table 5 below, including SNP mutation sites and insertion deletion variants (Indel mutations):

Table 5 Detailed test results for each sample

At the same time, the correlation between the minimum allele frequency detected by the target gene capture and the minimum allele frequency detected by whole genome sequencing was as high as 0.8186 (r2, Pearson correlation coefficient) as shown in FIG. Among them, the abscissa in Figure 4 represents the minimum allele frequency (MAF in WGS, Minor allele frequency in Whole-genome-sequencing) obtained by whole genome sequencing, and the ordinate represents the minimum obtained by target gene capture sequencing. The gene frequency (MAF in LC, Minor allele frequency in Low-coverage whole-genome-sequencing). It can be seen that the correlation between the minimum allele frequency detected by the target gene capture and the minimum allele frequency detected by whole genome sequencing is as high as 80% or more, showing a good correlation.

In the present invention, the terms "installation", "connected", "connected", "fixed" and the like shall be understood broadly, and may be either a fixed connection or a detachable connection, unless explicitly stated and defined otherwise. Or in one piece; it may be a mechanical connection, or it may be an electrical connection or a communication with each other; it may be directly connected or indirectly connected through an intermediate medium, and may be an internal connection of two elements or an interaction relationship between two elements. Unless otherwise expressly defined. For those skilled in the art, the specific meanings of the above terms in the present invention can be understood on a case-by-case basis.

In the description of the present specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" and the like means a specific feature described in connection with the embodiment or example. A structure, material or feature is included in at least one embodiment or example of the invention. In the present specification, the schematic representation of the above terms is not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples. In addition, various embodiments or examples described in the specification, as well as features of various embodiments or examples, may be combined and combined.

Although the embodiments of the present invention have been shown and described, it is understood that the above-described embodiments are illustrative and are not to be construed as limiting the scope of the invention. The embodiments are subject to variations, modifications, substitutions and variations.

Claims

A marker of malignant lymphoma characterized by including the genes in the following table:
The marker according to claim 1, wherein the malignant lymphoma is a diffuse large B-cell lymphoma.
A probe characterized in that all of the exon regions and the junction regions of exons and introns in the marker according to claim 1 or 2 are designed, the probe Specifically identifying at least a portion of the marker coding region of claim 1 or 2, and the probe satisfies at least one selected from the group consisting of:

(1) the length of the probe is 75-85 bp;

(2) the probe specifically recognizes a sequence from 10 bp upstream to 10 bp downstream of the marker coding region of claim 1;

(3) a probe that specifically recognizes a region having a GC content higher than 0.6 and lower than 0.3, the multiplier is greater than 2;

(4) the melting temperature of the probe and the target sequence is 60-10 degrees Celsius, preferably 80 degrees Celsius;

(5) the probe does not comprise a hairpin structure;

(6) the probe matches at most 2 sites on the reference genome;

(7) The window sliding size when the probe is selected is 10 bp.
The probe according to claim 3, wherein the probe has a length of 81 bp in the condition (1).
A gene chip, characterized in that the gene chip comprises a probe and a support, the probe being located on the surface of the support, and the probe is the probe according to claim 3 or 4.
The gene chip according to claim 5, wherein the gene chip is a liquid phase chip, and the support is a microsphere containing different fluorescent labels.
A method for constructing a sequencing library for detecting a malignant lymphoma of a sample to be tested, comprising:

The target sequence of the sample to be tested is enriched, the target sequence is the marker of claim 1 or 2, and the enriched target sequence constitutes the sequencing library for malignant lymphoma detection.
The method according to claim 7, wherein the target sequence of the sample to be tested is hybridized and captured by using the probe according to claim 3 or 4 or the gene chip according to claim 5 or 6, thereby realizing The enrichment.
The method of claim 7, further comprising sequencing the sequencing library for malignant lymphoma detection to obtain a sequencing sequence.
The method of claim 9, wherein the sequencing library for malignant lymphoma detection is sequenced using a BGISeq-500 sequencing platform.
The method according to claim 9, wherein the sequencing sequence has a sequencing depth of 400× or more, and the coverage of the sequencing sequence reaches 99% or more.
The method of claim 9 wherein the raw data amount of said sequencing sequence is above 3 Gb.
A method for determining a gene mutation of a malignant lymphoma in a sample to be tested, characterized in that it comprises:

Constructing a malignant lymphoma detection sequencing library of the sample to be tested according to the method of any one of claims 7-12;

Sequencing the sequencing library of the malignant lymphoma detection to obtain a sequencing sequence;

The sequencing sequence is aligned to a reference genome, and mutation detection is performed to obtain candidate mutation data;

Screening the candidate mutation data to obtain potential mutation data;

The potential mutation data is annotated to obtain target mutation data.
The method of claim 13 wherein said reference genome is human reference genome hg19.
The method of claim 13 wherein said mutation detection is performed using VarScan software.
The method according to claim 13, wherein the screening of the candidate mutation data comprises: filtering out low quality, low coverage, candidate mutations located at the ends of the repeat region and the sequence, and having chain bias,

Wherein the low-mass candidate mutation refers to a candidate mutation having a base mass value of less than 20 or a pairwise mass value of less than 30, and the low-coverage candidate mutation refers to a candidate mutation having a minimum support number of less than 3.
The method according to claim 13, wherein the annotation is performed using ANNOVA software, the polymorphic site is filtered out using a population mutation database, and the benign mutation is filtered using the disease-causing mutation database.
The method according to claim 17, wherein said population mutation database is selected from at least one of a thousand human genome database, an ExAc database, and an Esp6500 database.
The method of claim 17 wherein said pathogenic mutation database is ClinVar.
The method of any of claims 13 to 19, further comprising:

Prior to said mutation detection, the sequencing sequence is quality controlled to filter out low quality and linker contamination sequences, and the filtered sequences are then aligned to the reference genome.
A system for determining a gene mutation of a malignant lymphoma in a sample to be tested, characterized in that it comprises:

a target region library construction unit, wherein the target region library construction unit is based on the marker according to claim 1 or 2 as a target region, thereby constructing a target region library;

a sequencing unit, the sequencing unit is connected to the target region library building unit, and the sequencing unit detects the target region library to obtain a sequencing sequence;

a candidate mutation determining unit, wherein the candidate mutation determining unit is connected to the sequencing unit, and the candidate mutation determining unit is configured to compare the sequencing sequence in the target region library to the reference genome, and perform mutation detection to obtain candidate mutation data;

a potential mutation determining unit, the potential mutation determining unit being connected to the candidate mutation determining unit, wherein the potential mutation determining unit is configured to screen the candidate mutation data to obtain potential mutation data;

The target mutation determining unit is connected to the potential mutation determining unit, and the target mutation determining unit is configured to annotate the potential mutation data to obtain target mutation data.
The system of claim 21 wherein said reference genome is human reference genome hg.
The system of claim 21 wherein said mutation detection is performed using VarScan software.
The system according to claim 21, wherein the screening of the candidate mutation data comprises: filtering out low quality, low coverage, candidate mutations located at the ends of the repeat region and the sequence, and having chain bias.

Wherein the low-mass candidate mutation refers to a candidate mutation having a base mass value of less than 20 or a ratio of less than 30, and the low-coverage candidate mutation refers to a candidate mutation having a minimum support number of less than 3.
The system according to claim 21, wherein said annotation is performed using ANNOVA software, the polymorphic site is filtered using a population mutation database, and the benign mutation is filtered using the disease-causing mutation database.
The system according to claim 25, wherein said population mutation database is selected from at least one of a thousand human genome database, an ExAc database, and an Esp6500 database.
The system of claim 25 wherein said pathogenic mutation database is ClinVar.
A system according to any one of claims 21 to 27, further comprising:

a quality control unit, the quality control unit being coupled to the sequencing unit, the quality control unit for performing quality control on the sequencing sequence prior to the detecting of the mutation, thereby filtering out low quality and joint contamination sequences, and then filtering The filtered sequences are aligned to the reference genome.
Use of the marker of claim 1 or 2 in the preparation of a reagent for detecting and/or determining a mutation in a malignant lymphoma gene.
Use of the marker of claim 1 or 2 in the field of detection and/or determination of a mutation in a malignant lymphoma gene.