WO2018097657A1

WO2018097657A1 - Genome sequencing method and genome editing identification method, which use chromatin dna

Info

Publication number: WO2018097657A1
Application number: PCT/KR2017/013547
Authority: WO
Inventors: 김대식
Original assignee: Toolgen Inc; Seoul National University R&DB Foundation
Current assignee: Toolgen Inc; SNU R&DB Foundation
Priority date: 2016-11-25
Filing date: 2017-11-24
Publication date: 2018-05-31
Anticipated expiration: 2019-05-25
Also published as: KR102067810B1; KR20180059383A

Abstract

The present invention relates to a method for detecting cleavage sites and/or off-target sites of a target-specific nuclease in genomic DNA and, particularly, treats chromatin protein-containing genomic DNA in vitro with a target-specific nuclease so as to cleave a genome, and then subjects the same to whole genome sequencing so as to identify off-target sites through data analysis.

Description

【명세서】【Specification】

【발명의 명칭] [Name of invention]

크로마틴 DNA를 이용한 유전체 서열분석 방법 및 유전체 교정 확인 방법 【기술분야】 Genome sequencing method using chromatin DNA and genome calibration confirmation method

본 발명은 유전체 DNA에서의 표적 특이적 뉴클레아제의 절단 위치 및 /또는 비표적 위치 (off-target site)를 검출하는 방법에 관한 것으로서, 구체적으로는 시험관 내 Un 에서 크로마틴 단백질을 포함하는 유전체 DNA에 표적 특이적 뉴클레아제를 처리하여 유전체를 절단한 뒤, 이를 전체 유전체 시퀀싱 (whole genome sequencing)하여 데이터 분석을 통해 비표적 위치를 확인하는 것을 특징으로 한다. The present invention relates to a method for detecting the cleavage site and / or off-target site of a target specific nuclease in genomic DNA, specifically a genome comprising chromatin protein in Un in vitro. After processing the target specific nuclease to DNA, the genome is cleaved, and then whole genome sequencing is performed to identify non-target positions through data analysis.

【배경 기술】 [Background technology]

RNA 유전자 가위 (RNA-guided engineered nucleases, RGEN)의 발전으로 인간 세포를 비롯한 다양한 동식물의 유전체 교정에 사용되어 왔다. . The development of RNA-guided engineered nucleases (RGEN) has been used to correct genomes of a wide variety of animals and plants, including human cells. .

예컨대， ZFN (zinc finger nuclease) , TALEN (transcript ional act ivator-1 ike effector nuclease) , 및 제 2형 CRISPR I Cas (clustered regularly interspaced repeat I CRISPR-associated) 원핵생물 획득 면역 시스템 유래 RGEN (RNA-guided engineered nuclease) 등의 유전자 가위 (programmable nucleases)가 배양된 세포 및 /또는 개체의 유전체 교정에 널리 사용되고 있다. 상기 유전자 가위를 이용한 유전체 교정 기술은 생명과학, 생명공학 및 의학분야 등에서 다양한 목적으로 이용될 수 있는 매우 유용한 기술이다. 예를 들어 , 줄기세포 또는 체세포에서 표적화된 유전자 변형을 일으킴으로써 다양한 유전적 질환 또는 후천적 질환에 대한 유전자 /세포 치료가 ；4능해졌다. 다만， 상기 유전자 가위들은 표적 위치 (on-target site) 뿐만 아니라 이와 상동성을 가지는 비표적 위치 (off- target site)를 절단하여, 원하지 않는 돌연변이를 일으킬 수 있다. ' 대표적인 예로， 5. pyogenes 유래^' Cas9 단백질 및 sgRNA (small guide RNA)로 구성되는 RGEN은 sgRNA와 흔성화 되는 20-bp (base pair) 서열 및 Cas9에 의해 인식되는 PAM (protospacer-adjacent motif) 서열인 5'ᅳ 00—3'으로 구성되는 23-bp의 표적 DNA 서열을 인식하지만, 일부 뉴클레오티드 서열이 일치하지 않는 경우에도 작동될 수 있다. 나아가， RGEN은 sgRNA 서열과 비교하여 추가 염기서열을 가지거나 (DNA bulge) 또는 하나 이상의 염기가 없는 (RNA bulge) 비표적 DNA 서열도 절단할 수 있다. 이와 유사하게 ZFN과 TALEN도 일부 염기가 다른 서열을 절단할 수 있다. 이는 유전체에 유전자 가위를 적용할 경우 표적 위치 이외에 상당한 수의 비표적 위치를 가질 수 있다는 것을 시사한다. For example, ZFN (zinc finger nuclease), TALEN (transcript ional act ivator-1 ike effector nuclease), and type 2 CRISPR I Cas (clustered regularly interspaced repeat I CRISPR-associated) prokaryotic immune system derived RNA-guided Programmable nucleases, such as engineered nucleases, are widely used for genome editing of cultured cells and / or individuals. The genome correction technology using the genetic scissors is a very useful technology that can be used for various purposes in the life sciences, biotechnology and medical fields. For example, gene / cell therapy for various genetic or acquired diseases has been made possible by causing targeted genetic modifications in stem or somatic cells. However, the genetic scissors may cut not only an on-target site but also an off-target site having homology thereto, thereby causing unwanted mutations. "A typical example, 5. pyogenes-derived ^'Cas9 protein and sgRNA (small guide RNA) PAM ( protospacer-adjacent motif) that is RGEN is recognized by a 20-bp (base pair) sequence that is common and Cas9 torch and sgRNA consists of SEQ ID NO: It recognizes a 23-bp target DNA sequence consisting of phosphorus 5 'ᅳ 00-3' but can also work if some nucleotide sequences do not match. Furthermore, RGEN can also cleave non-target DNA sequences that have additional base sequences (DNA bulge) or one or more bases (RNA bulge) compared to sgRNA sequences. Similarly, ZFN and TALEN can also cleave sequences with some bases different. This suggests that applying genetic scissors to a genome may have a significant number of nontarget positions in addition to the target position.

비표적 DNA 절단은 전암유전자 (proto-oncogene) 및 암억제유전자 (tumor suppressor gene)와 같이 원치 않는 유전자에서 돌연변이를 야기하거나, 및 /또는 전위 (translocation), 결실 (deletion), 및 역위 (inversion)와 같은 유전체 재조합을 증가시킬 수 있어, 연구 분야 및 의학 분야 등에서 유전자 가위를 이용하는데 심각한 문제가 된다. 이러한 유전자 가위의 비표적 효과를 감소시키기 위해 다양한 전략이 보고되고는 있으나, 전체 유전체 수준에서 바표작 효과 없이 표적 위치에만 특이적으로 작동하는 유전자 가위는 아직까지 보고된 바 없다. 이러한 문제점을 다루기 위해, 유전자 가위의 특이성을 유전체 수준에서 확인할 수 있는 기술의 개발이 요구된다. Nontarget DNA cleavage causes mutations in unwanted genes, such as proto-oncogenes and tumor suppressor genes, and / or translocations, deletions, and inversions. Genomic recombination, such as can be increased, is a serious problem in the use of genetic scissors in the field of research and medicine. Various strategies have been reported to reduce the non-target effects of these gene shears, but no genetic shears have been reported that operate specifically at the target site without bar targeting effects at the overall genome level. To address this problem, there is a need for the development of a technique that can confirm the specificity of genetic scissors at the genome level.

【발명의 상세한 설명】 [Detailed Description of the Invention]

【기술적 과제】 [Technical problem]

일 예는 One example is

(a) 분리된 유전체 (genomic) DNA 를 표적 특이적 뉴클레아제로 절단하는 단계 ; 및 (a) cleaving the isolated genomic DNA with a target specific nuclease; And

(b) 상기 절단된 DNA 에 대한 전체 유전체 시퀀싱 (whole genome sequencing, WGS)을 수행하는 단계 (b) performing whole genome sequencing (WGS) on the cleaved DNA

를 포함하고， Including

상기 분리된 유전체 DNA 는 크로마틴 단백질을 포함하는 크로마틴 The isolated genomic DNA is chromatin containing chromatin protein

DNA인 , DNA,

유전체 DNA서열 분석 방법을 제공한다. It provides a genomic DNA sequencing method.

다른 예는 Another example is

(b) 상기 절단된 DNA 에 대한 전체 유전체 시퀀싱 (whole genome sequencing)을 수행하는 단계; 및 (b) performing whole genome sequencing on the cleaved DNA; And

(c) 상기 시뭔싱으로 수득한 염기서열 데이터 (sequence read)에서 상기 절단된 위치를 확인하는 단계 ^' 를 포함하고, (c) identifying the cleaved position in the sequence read obtained by the sequencing ^' ,

DNA인， DNA,

표적 특이적 뉴클레아제의 절단 위치 또는 비표적 위치 (of f-target s i te)를 검출하는 방법을 제공한다. Provided are methods for detecting the cleavage site or non-target site of a target specific nuclease.

다른 예는 상기 표적 특이적 뉴클레아제의 절단 위치 또는 비표적 위치 (of f-target s i te)를 검출하는 방법을 사용하여 비표적 위치가 적은 표적 부위를 선별하는 방법을 제공한다. 【기술적 해결방법】 Another example provides a method of selecting a target site having a low non-target position using a method of detecting the cleavage position or the non-target position of the target specific nuclease. Technical Solution

본 명세서에서는 전체 유전체에서 RNA 유전자 가위 (programmable nuc lease ; RNA-guided engineered nuc leases , RGEN)와 같은 표적 특이적 뉴클레아제의 비표적 절단 위치 (of f-target si te)를 확인하는 방법의 일환으로, 절단 유전체 시뭔성 (digested genome sequencing, Digenome—seq ; 표적 특이적 뉴클레아제 처리 전과 후를 한 눈에 파악해 잘린 위치를 구별하는 방식) 기술을 제공하며， 특히, 절단 유전체 시퀀싱 기술이 세포내의 크로마틴 단백질을 모두 제거한 DNA 를 대상으로 진행하기 때문에 실제 세포내의 크로마틴 상태를 고려하지 못하는 한계가 있음을 인식하여, 이러한 한계점을 보완하고자 크로마틴 구조가 보존 된 크로마틴 DNA 를 이용한 절단 유전체 시퀀싱 방법올 제안한다. In the present specification, part of a method for identifying a non-target cleavage site of a target specific nuclease such as a programmable nuc lease (RNA-guided engineered nuc leases, RGEN) in the whole genome. To provide a technique for digested genome sequencing (Digenome—seq; a method of identifying before and after target-specific nuclease treatment at a glance to distinguish truncated positions). Recognizing that there is a limitation that does not consider the actual state of the chromatin in the cell because it proceeds to the DNA from which all the chromatin proteins have been removed, the truncated genome sequencing method using the chromatin DNA which preserved the chromatin structure to compensate for this limitation Suggest to come.

일 예는, For example,

(a) 분리된 유전체 (genomi c) DNA 를 표적 특이적 뉴클레아제로 절단하는 단계 ; 및 (a) cleaving the isolated genomi c DNA with a target specific nuclease; And

(b) 상기 절단된 DNA 에 대한 전체 유전체 시뭔싱 (whole genome sequencing)을 수행하는 단계 (b) performing whole genome sequencing on the cleaved DNA

를 포함하고， Including

상기 분리된 유전체 DNA 는 크로마틴 단백질올 포함하는 크로마틴 The isolated genomic DNA is chromatin containing chromatin proteinol

DNA인 , DNA,

유전체 DNA (genome DNA) 서열 분석 방법을 제공한다. A method of sequencing genomic DNA is provided.

다른 예는, Another example is

(a) 분리된 유전체 (genomi c) DNA 를 표적 특이적 뉴클레아제로 절단하는 단계 ; (a) cleaving the isolated genomi c DNA with a target specific nuclease;

(b) 상기 절단된 DNA 에 대한 전체 유전체 시뭔싱 (whole genome sequenc ing)을 수행하는 단계; 및 (c) 상기 시퀀싱으로 수득한 염기서열 데이터 (sequence read)에서 상기 절단된 위치를 확인하는 단계 (b) performing whole genome sequencing on the cleaved DNA; And (c) identifying the cleaved position in the sequence read obtained by the sequencing

를 포함하고， Including

상기 분리된 유전체 DNA 는 크로마틴 단백질올 포함하는 크로마틴 DNA인 , The isolated genomic DNA is chromatin DNA containing chromatin protein,

표적 특이적 뉴클레아제의 절단 위치 또는 비표적 위치 (off-target site)를 검출하는 방법을 제공한다. Methods of detecting the cleavage site or off-target site of a target specific nuclease are provided.

상기 비표적 위치 (off— target site)를 검출하는 방법은 상기 (c) 단계 이후에， 상기 확인된 절단된 위치가 표적 위치 (on-target site)가 아닌 경우, 비표적 위치 (off-target site)로 판단하는 단계 ((d) 단계)를 추가로 포함할 수 있다. The method of detecting the off-target site may include, after step (c), if the identified truncated position is not an on-target site, the off-target site. The method may further include the step (d) of determining.

다른 예는, Another example is

(a) 분리된 유전체 (genomic) DNA 를 표적 특이적 뉴클레아제로 절단하는 단계 ; (a) cleaving the isolated genomic DNA with a target specific nuclease;

(c) 상기 시뭔싱으로 수득한 염기서열 데이터 (sequence read)에서 상기 절단된 위치를 확인하는 단계 (c) identifying the cleaved position in the sequence read obtained by the sequencing

를 포함하고, Including,

상기 분리된 유전체 DNA 는 크로마틴 단백질을 포함하는 크로마틴 飄인, The isolated genomic DNA is a chromatin protein containing chromatin protein,

표적 특이적 뉴클레아제의 교정 효율 및 /또는 정확도 확인 방법을 제공한다. Provided are methods for confirming the calibration efficiency and / or accuracy of target specific nucleases.

상기 표적 특이적 뉴클레아제의 교정 효율 확인 방법은 상기 (c) 단계 이후에， 상기 확인된 절단된 위치가 표적 위치 (on-target site)가 아닌 경우, 비표적 위치 (off— target site)로 판단하는 단계 ((d) 단계) 및 상기 비표적 위치에서의 절단 정도 (비표적 위치 개수 및 /또는 비표적 위치에서의 절단 빈도)를 측정하여 비교 대상의 절단 정도와 비교하는 단계 (d— 1)를 추가로 포함할 수 있으며, 이 경우, 비표적 위치에서의 절단 정도가 낮을수록 교정 효율 및 /또는 정확도가 높다고 판단할 수 있다-: 상기 비교 대상은 임의의 표적 DNA 의 표적 서열에 대한 표적 특이적 뉴클레아제일 수 있으몌 일 예에서, 통상적으로 사용되거나 이미 알려진 표적 특이적 뉴클레아제 (예컨대, RGEN 및 가이드 RNA 조합)들 중 선택된 어느 하나일 수 있다. 본 명세서에서 절단 유전체 시뭔싱 (digested genome sequenc ing, Digenome-seq) 기술이라 함은 뉴클레아제에. 의해 절단된 유전체의 서열 분석을 의미하는 것으로, 세포에서의 전체 유전체에서의 뉴클레아제 비표적 Jl과 (of f-target ef fect )i 분석하；⁷! 위한 in vitro nuc lease-digested who l e-genome sequencing 에 적용될 수 있다. 뉴클레아제의 절단 부위에서 동일한 5 ' 말단을 갖는 sequence reads 를 생성하며， 이들은 적절한 프로그램 (예컨대, Digenome program)에 의하여 컴퓨터로 확인 가능하다. 일 예에서, 절단 유전체 시뭔싱은 앞서 기재한 유전체 DNA (genome DNA) 서열 분석 방법 또는 뉴클레아제의 비표적 위치 (of f— target s i te)를 검출하는 방법에서의 단계 ) 및 (b) , 또는 단계 (a) , (b) , 및 (c)를 포함하는 것으로 정의될 수 있다. 이를 다르게 설명하면, 상기 단계 (a) 및 (b) , 또는 단계 (a) , (b) , 및 (c)는 절단 유전체 시퀀싱에 의하여 수행될 수 있다. The method of confirming the calibration efficiency of the target specific nuclease is performed after the step (c), when the identified cleaved position is not an on-target site, the off-target site. Determining (d) step and the degree of cut at the non-target position (the number of non-target positions and / or the frequency of cut at the non-target position) and comparing it with the degree of cut of the comparison target (d— 1 ), In which case, the lower the degree of cleavage at the non-target position, the higher the calibration efficiency and / or accuracy may be determined-the comparison target is a target for a target sequence of any target DNA. In particular, it may be a specific nuclease, and may be any one selected from among commonly used or known target specific nucleases (eg, RGEN and guide RNA combinations). Digested genome sequencing (Digenome-seq) technology herein refers to nucleases. Herein to mean a sequence analysis of the genome by cleavage, nuclease non-target Jl of the entire genome of the cell and (f of target-ef fect) i and analysis; ^7! It can be applied to in vitro nuc lease-digested who l e-genome sequencing. Generate sequence reads with identical 5 ′ ends at the cleavage site of the nuclease, which can be identified by computer by an appropriate program (eg, Digenome program). In one example, cleavage genomic sequencing is a step in a genomic DNA sequencing method as described above or in a method of detecting a non-target site of nucleases) and (b), Or may comprise steps (a), (b), and (c). In other words, the steps (a) and (b), or (a), (b), and (c) may be performed by cleavage genome sequencing.

유전체 교정 및 /또는 유전자 교정 기술은, 인간 세포를 비롯한 동식물 세포의 유전체 염기서열에 표적 지향형 변이를 도입할 수 있는 기술로서, 특정 유전자를 넉아웃 (knock-out ) 또는 넉인 (knock- in)하거나, 단백질을 생성하지 않는 비 -코딩 DNA 서열에 변이를 도입하는 등 다양한 형태로 수행될 수 있다. 본 명세서에서 제안되는 방법은 상기 유전체 교정 및 /또는 유전자 교정 기술에 사용되는 표적 특이적 뉴클레아제의 비표적 위치를 검출하는 것으로 , 이는 표적 위치에만 특이적으로 작동하는 표적 특이적 뉴클레아제 시스템을 개발하는데 유용하게 이용될 수 있다. Genomic correction and / or genetic correction techniques allow the introduction of target-directed mutations into the genome sequences of animal and plant cells, including human cells, by knocking out or knocking in a specific gene. And introducing a mutation into a non-coding DNA sequence that does not produce a protein. The method proposed herein is to detect a non-target position of a target specific nuclease used in the genome correction and / or genetic correction technique, which target specific nuclease system operates specifically at the target position. This can be useful to develop.

상기 (a) 단계는 생체 또는 세포로부터 분리된 유전체 (genomi c) DNA를 표적 특이적 뉴클레아제로 절단하는 단계로서， 분리된 유전체 DNA를 시험관 내 n y/ ro)에서 특정 표적에 특이적으로 작용하는 뉴클레아제로 절단하는 단계이다. 상기 뉴클레아제는 표적 특이적으로 제작하였더라도 특이성에 따라 다른 부위, 즉 비표적 위치를 절단할 가능성을 갖는다. 따라서 결과적으로 상기 (a) 단계에 의해, 사용된 표적 특이적 뉴클레아제가 유전체 DNA 에 대하여 활성을 가질 수 있는 위치인 표적 위치 또는 다수의 비표적 위치를 절단함으로써 특정 위치가 절단된 유전체 DNA 단편 ( l ead)을 얻을 수 있다. The step (a) is a step of cleaving genomic DNA isolated from a living body or cell with a target specific nuclease, wherein the genomic DNA is isolated from ny / ro in vitro to specifically target a specific target. Cleavage with nuclease. The nucleases have the potential to cleave other sites, i.e., non-target positions, depending on the specificity, even if targeted specifically. Therefore, as a result, by the step (a), a genomic DNA fragment whose specific position is cleaved by cleaving a target position or a plurality of non-target positions, which are positions where the target specific nuclease used can be active with respect to the genomic DNA ( l ead).

상기 분리된 유전체 DNA 는 비형질전환 세포 (야생형 세포) 및 /또는 표적 특이적 뉴클레아제가 발현되거나 뉴클레아제 활성을 갖도록 형질전환된 세포로부터 분리된 것일 수 있으며, 표적 특이적 뉴클레아제의 비표적 위치를 검출하고자 하는 목적에 따라 그 유래에 제한 없이 사용될 수 있다. The isolated genomic DNA may be isolated from non-transformed cells (wild-type cells) and / or cells expressing target specific nucleases or transformed to have nuclease activity, and may be isolated from target specific nucleases. Depending on the purpose to detect the non-target position can be used without limitation in its origin.

상기 분리된 유전체 DNA 는 크로마틴. DNA 를 포함하는 것을 특징으로 한다. 본 명세서에서 크로마틴 DNA 라 함은 세포 (또는 핵)에서 히스톤, 비히스톤 단백질, RNA 등의 비 -DNA 염색질 성분이 제거되지 않은, 즉， 상기한 비 -DNA 염색질 성분들 중에서 선택된 하나 이상을 함께 포함하는 형태의 DNA 를 의미한다. 상기 크로마틴 DNA 는 세포의 세포막 제거 후 원심분리하여 얻어지는 세포질 층과 크로마틴 DNA 층 중에서 세포질 층을 제거하거나 크로마틴 DNA 층을 취하여 얻어질 수 있다 (도 1 참조). 상기 세포막 제거는 세포에 통상의 용해 버퍼 (lysis buffer)를 처리하여 수행될 수 있으나, 이에 제한되는 것은 아니다. 본 명세서에 기재된 분리된 유전체 DNA는, 크로마틴으로부터 DNA 만 분리된 상태는 배제하는 의미일 수 있으며, 예컨대 세포 용해물에서 세포질을 제거하여 얻어지거나， 원심분리에 의하여 생성된 크로마틴 DNA 층올 취하여 얻어진 크로마틴 DNA 또는 크로마틴 DNA과 세포질을 포함하는 것일 수 있다. The isolated genomic DNA is chromatin. It is characterized by including a DNA. In the present specification, chromatin DNA refers to one or more non-DNA chromatin components, such as histones, non-histone proteins, RNA, etc., that are not removed from a cell (or nucleus). It means DNA of the containing form. The chromatin DNA may be obtained by removing the cytoplasmic layer or taking a chromatin DNA layer from the cytoplasmic layer and chromatin DNA layer obtained by centrifugation after cell membrane removal (see FIG. 1). The cell membrane removal may be performed by treating a cell with a conventional lysis buffer, but is not limited thereto. The isolated genomic DNA described herein may be meant to exclude a state in which only DNA is separated from chromatin, for example, obtained by removing cytoplasm from cell lysate or by taking a chromatin DNA layer generated by centrifugation. It may be one containing chromatin DNA or chromatin DNA and cytoplasm.

본 명세서에 사용된 바로서 , 표적 특이적 뉴클레아제는， 유전자 가위 (progra隱 able, nuclease)라고도 불리며 , 목적하는 유전체 DNA 상의 특정 위치를 인식하여 절단할 수 있는 모든 형태의 뉴클레아제 (예컨대， 엔도뉴클레아제)를 통칭한다. As used herein, target specific nucleases, also called progra 隱 able (nuclease), are any type of nuclease (eg, capable of recognizing and cleaving a specific position on the desired genomic DNA). , Endonuclease).

예컨대, 상기 표적 특이적 뉴클레아제는 표적 유전자의 특정 서열을 인식하고 뉴클레오티드 절단 활성을 가져 표적 유전자에서 인델 (insertion and/or deletion, Indel)을 야기할 수 있는 모든 뉴클레아제에서 선택된 1종 이상일 수 있다. For example, the target specific nuclease may be one or more selected from all nucleases that recognize a specific sequence of the target gene and have nucleotide cleavage activity that may result in insertion and / or deletion (Indel) in the target gene. Can be.

예컨대, 상기 표적 특이적 뉴클레아제는 For example, the target specific nuclease may be

유전체 상의 특정 표적 서열을 인식하는 도메인인 식물 병원성 유전자에서 유래한 TAL 작동자 (transcription activator-like effector) 도메인과 절단 도메인이 융합된 TALEN (transcription activator-like effector nuclease)； Transcription activator-like effector nuclease (TALEN) in which a TAL activator-like effector (TAL) activator domain and a cleavage domain are derived from a plant pathogenic gene, a domain that recognizes a specific target sequence on the genome;

징크ᅳ 거 뉴클레아제 (zinc— finger nuclease)； Zinc— finger nuclease;

메가뉴클레아제 (meganuclease); Meganucleases;

미생물 면역체계인 CRISPR 에서 유래한 RGEN (RNA-guided engineered nuclease; 예컨대， Cas 단백질 (예컨대, Cas9 등)， Cpfl, 등); RGEN (RNA-guided engineered nuclease; derived from the microbial immune system CRISPR; eg, Cas protein (eg, Cas9, etc.), Cpfl, etc.);

아고 호몰로그 (Ago homo 1 og , DNAᅳ guided endonuc lease) 등으로 이루어진 군에서 선택된 1 종 이상일 수 있으나, 이에 제한되는 것은 아니다. Ago homolog (Ago homo 1 og, DNA ᅳ guided endonuc lease) It may be one or more selected from the group consisting of, but is not limited thereto.

상기 표적 특이적 뉴클레아제는 원핵 세포， 및 /또는 인간 세포를 비롯한 동식물 세포 (예컨대, 진핵 세포)의 유전체에서 특정 염기서열을 인식해 이중나선절단 (double strand break, DSB)을 일으킬 수 있다. 상기 이중나선절단은 纖 의 이중 나선을 잘라, 둔단 (blunt end) 또는 점착종단 (cohesive end)을 생성시킬 수 있다. DSB 는 세포 내에서 상동재조합 (homologous recombination) 또는 비상동재접합 (non-homologous end- joining, NHEJ) 기작에 의해 효율적으로 수선될 수 있는데, 이 과정에 소망하는 변이를 표적 위치에 도입할 수 있다. The target specific nuclease may cause a double strand break (DSB) by recognizing specific sequences in the genomes of animal and plant cells (eg, eukaryotic cells) including prokaryotic cells and / or human cells. The double helix cutting may cut a double helix of 纖 to create a blunt end or a cohesive end. DSBs can be efficiently repaired by homologous recombination or non-homologous end-joining (NHEJ) mechanisms in cells, in which desired mutations can be introduced at target sites.

상기 메가뉴클레아제는 이에 제한되는 것은 아니나, 자연 -발생 메가뉴클레아제일 수 있고 이들은 15 ― 40 개 염기쌍 절단 부위를 인식하는데, 이는 통상 4 개의 패밀리로 분류된다: LAGLIDADG 패밀리, GIY- YIG 패밀리, His-Cyst 박스 패밀리, 및 HNH 패밀리. 예시적인 메가뉴클레아제는 I-Scel, I-Ceul , PI-PspI, PI -See I, I—SceIV, I-Csml, I- Panl, I-SceII, I-Ppol, I— Scelll, I-Crel, I -Tevl, I -Tevl I 및 I-TevIII 를 포함한다. Said meganucleases can be naturally-occurring meganucleases, including but not limited to, they recognize 15-40 base pair cleavage sites, which are generally classified into four families: the LAGLIDADG family, the GIY-YIG family, His-Cyst box family, and HNH family. Exemplary meganucleases include I-Scel, I-Ceul, PI-PspI, PI-See I, I—SceIV, I-Csml, I- Panl, I-SceII, I-Ppol, I— Scelll, I- Crel, I-Tevl, I-Tevl I and I-TevIII.

자연—발생 메가뉴클레아제， 주로 LAGLIDADG 패밀리로부터 유래하는 DNA 결합 도메인을 이용하여 식물, 효모, 초파리 (Drosophila), 포유동물 세포 및 마우스에서 위치-특이적 게놈 변형이 촉진되었으나， 이런 접근법은 메가뉴클레아제 표적 서열이 보존된 상동성 유전자의 변형 (Monet et al. (1999) Biochem. Biophysics. Res. Common. 255： 88-93)으로， 표적 서열이 도입되는 사전-조작된 게놈의 변형에는 한계가 있었다. 따라서, 의학적으로나 생명공학적으로 관련된 부위에서 신규한 결합 특이성을 나타내도록 메가뉴클레아제를 조작하려는 시도가 있었다. 또한, 메가뉴클레아제로부터 유래하는 자연-발생된 또는 조작된 DNA 결합 도메인이 이종성 뉴클레아제 (예, Fokl)로부터 유래하는 절단 도메인에 작동 가능하게 연결되었다. Site-specific genomic modifications have been promoted in plants, yeasts, Drosophila, mammalian cells and mice using naturally-occurring meganucleases, primarily DNA binding domains derived from the LAGLIDADG family, but this approach has Modification of homologous genes in which the clease target sequence is conserved (Monet et al. (1999) Biochem. Biophysics. Res. Common.255: 88-93), which is limited to modification of the pre-engineered genome into which the target sequence is introduced. There was. Thus, attempts have been made to engineer meganucleases to exhibit novel binding specificities at medically and biotechnologically relevant sites. In addition, naturally-occurring or engineered DNA binding domains derived from meganucleases are operably linked to cleavage domains derived from heterologous nucleases (eg, Fokl).

상기 ZFN 은 선택된 유전자, 및 절단 도메인 또는 절단 하프- 도메인의 표적 부위에 결합하도록 조작된 징크—핑거 단백질을 포함한다. 상기 ZFN 은 징크 -핑거 DNA 결합 도메인 및 DNA 절단 도메인을 포함하는 인공적인 제한효소일 수 있다. 여기서, 징크 -핑거 DNA 결합 도메인은 선택된 서열에 결합하도록 조작된 것일 수 있다. 예를 들면， Beerli et al. (2002) Nature Biotechnol . 20:135—141; Pabo et al. (2001) Ann. Rev. Biochem. 70:313—340; Isalan et al , (2001) Nature Biotechnol . 19： 656- 660； Segal et al . (2001) Curr. Opin. Biotechnol . 12:632—637; Choo et al . (2000) Curr . Opin. Struct. Biol . 10:411-416 이 본 명세서 참고자료로서 포함될 수 있다. 자연 발생된 징크 핑거 단백질과 비교하여， 조작된 징크 핑거 결합 도메인은 신규한 결합 특이성을 가질 수 있다. 조작 방법은 합리적 설계 및 다양한 타입의 선택을 포함하나 이에 국한되지는 않는다. 합리적 설계는, 예를 들어 삼증 (또는 사중) 뉴클레오티드 서열, 및 개별 징크 핑거 아미노산 서열을 포함하는 데이터베이스의 이용을 포함하며， 이때 각 삼중 또는 사중 뉴클레오티드 서열은 특정 삼중 또는 사중 서열에 결합하는 징크 핑거의 하나 이상의 서열과 연합된다. The ZFN comprises a selected gene and a zinc-finger protein engineered to bind to the target site of the cleavage domain or cleavage half-domain. The ZFN may be an artificial restriction enzyme comprising a zinc-finger DNA binding domain and a DNA cleavage domain. Here, the zinc-finger DNA binding domain can be engineered to bind to the selected sequence. For example, Beerli et al. (2002) Nature Biotechnol. 20: 135—141; Pabo et al. (2001) Ann. Rev. Biochem. 70: 313—340; Isalan et al, (2001) Nature Biotechnol. 19: 656-660; Segal et al. (2001) Curr. Opin. Biotechnol. 12: 632—637; Choo et al. (2000) Curr. Opin. Struct. Biol. 10: 411-416 may be incorporated herein by reference. Compared to naturally occurring zinc finger proteins, engineered zinc finger binding domains can have novel binding specificities. Manipulation methods include, but are not limited to, rational design and various types of selection. Rational design includes, for example, the use of a database comprising triplet (or quadruple) nucleotide sequences, and individual zinc finger amino acid sequences, wherein each triplet or quadruple nucleotide sequence is associated with a zinc finger that binds to a particular triplet or quadruple sequence. Is associated with one or more sequences.

표적 서열의 선택, 융합 단백질 (및 그것을 암호화하는 폴리뉴클레오티드)의 설계 및 구성은 당업자에 공지되어 있으며, 참고자료로 미국특허출원 공개 2005/0064474 및 2006/0188987 의 전문에 상세하게 설명되며, 상기 공개특허의 전문이 본 발명의 참고자료로서 본 명세서에 포함된다. 또한, 이러한 참고문헌 및 당업계의 다른 문헌에 개시된 대로, 징크 핑거 도메인 및 /또는 다중 -핑거 징크 핑거 단백질들이 임의의 적절한 링커 서열, 예를 들면 5 개 이상의 아미노산 길이의 링커를 포함하는 링커에 의해 함께 연결될 수 있다. 6 개 이상의 아미노산 길이의 링커 서열의 예는 미국등록특허 6,479,626; 6,903,185； 7,153,949 을 참고한다. 여기 설명된 단백질들은 단백질의 각 징크 핑거 사이에 적절한 링커의 임의의 조합을 포함할 수 있다ᅳ Selection of target sequences, design and construction of fusion proteins (and polynucleotides encoding them) are known to those of skill in the art and are described in detail in the entirety of US Patent Application Publications 2005/0064474 and 2006/0188987, which are incorporated by reference. The entirety of the patent is incorporated herein by reference of the invention. In addition, as disclosed in these references and other references in the art, zinc finger domains and / or multi-finger zinc finger proteins are provided by linkers comprising any suitable linker sequence, eg, a linker of 5 or more amino acids in length. Can be linked together. Examples of linker sequences of six or more amino acids in length are described in US Pat. No. 6,479,626; 6,903,185; 7,153,949. The proteins described herein may comprise any combination of linkers that are appropriate between each zinc finger of the protein.

또한, ZFN 과 같은 뉴클레아제는 뉴클레아제 활성 부분 (절단 도메인 절단 하프-도메인)을 포함한다. 주지된 대로, 예를 들면 징크 핑거 DNA 결합 도메인과 상이한 뉴클레아제로부터의 절단 도메인과 같이， 절단 도메인은 DNA 결합 도메인에 이종성일 수 있다. 이종성 절단 도메인은 임의의 엔도뉴클레아제나 엑소뉴클레아제로부터 얻어질 수 있다. 절단 도메인이 유래할 수 있는 예시적인 엔도뉴클레아제는 제한 엔도뉴클레아제 및 메가뉴클레아제를 포함하나 이에 한정되지는 않는다. In addition, nucleases, such as ZFNs, contain nuclease active moieties (cleavage domain cleavage half-domains). As noted, the cleavage domain can be heterologous to the DNA binding domain, such as, for example, a cleavage domain from a nuclease different from the zinc finger DNA binding domain. Heterologous cleavage domains can be obtained from any endonuclease or exonuclease. Exemplary endonucleases from which a cleavage domain can be derived include, but are not limited to, restriction endonucleases and meganucleases.

유사하게, 절단 하프 -도메인은, 상기 제시된 바와 같이， 절단 활성을 위하여 이량체화를 필요로 하는 임의의 뉴클레아제 또는 그것의 일부로부터 유래될 수 있다. 융합 단백질이 절단 하프—도메인을 포함하는 경우, 일반적으로 2 개의 융합 단백질이 절단에 필요하다. 대안으로， 2 개의 절단 하프-도메인을 포함하는 단일 단백질이 이용될 수도 있다. 2 개의 절단 하프ᅳ도메인은 동일한 엔도뉴클레아제 (또는 그것의 기능적 단편들)로부터 유래할 수도 있고, 또는 각 절단 하프-도메인이 상이한 엔도뉴클레아제Similarly, cleaved half-domains can be derived from any nuclease or portion thereof that requires dimerization for cleavage activity, as shown above. If the fusion protein comprises a cleavage half-domain, two fusion proteins are generally required for cleavage. Alternatively, a single protein comprising two truncated half-domains may be used. Two cleaved hafnium domains are from the same endonuclease (or functional fragments thereof) Endonucleases that may be derived or that each cleaved half-domain is different

(또는 그것의 기능적 단편들)로부터 유래할 수도 있다. 또한, 2 개의 융합 단백질의 표적 부위는, 2 개의 융합 단백질과 그것의 각 표적 부위의 결합에 의해 절단—하프 도메인들이 서로에 대해 공간적으로 배향되어 위치됨으로써 , 절단 하프 -도메인이 , 예를 들어 이량체화에 의해 기능성 절단 도메인을 형성할 수 있도록 하는 관계로 배치되는 것이 바람직하다. 따라서, 일 구현예에서, 3 - 8 개 뉴클레오티드 또는 14 - 18 개 뉴클레오티드에 의해 표적 부위의 이웃 가장자리가 분리된다. 그러나, 임의의 정수의 뉴클레오티드 또는 뉴클레오티드 쌍이 ₂ 개의 표적 부위 사이에 개재될 수 있다 (예， 2 내지 50 개 뉴클레오티드 쌍 또는 그 이상) . 일반적으로, 절단 부위는 표적 부위 사이에 놓인다. (Or functional fragments thereof). In addition, the target sites of the two fusion proteins are cleaved by the binding of the two fusion proteins and their respective target sites—the half domains are spatially oriented with respect to each other so that the cleavage half-domains are, for example, divalent. It is preferably arranged in a relationship that allows formation of a functional cleavage domain by sieving. Thus, in one embodiment, the neighboring edges of the target site are separated by 3-8 nucleotides or 14-18 nucleotides. However, the nucleotide or nucleotides at any integer pairs may be interposed between _{the two} target sites (e.g., 2 to 50 nucleotide pairs or more). In general, the cleavage site lies between the target sites.

제한 엔도뉴클레아제 (제한 효소)는 많은 종에 존재하며， DNA 에 서열 -특이적으로 결합하여 (표적 부위에서) , 바로 결합 부위나 그 근처에서 DNA 를 절단할 수 있다. 어떤 제한 효소 (예, Type I IS)는 인식 부위로부터 제거된 부위에서 DNA 를 절단하며, 분리 가능한 결합과 절단 가능한 도메인을 가진다. 예를 들면， Type I IS 효소 Fokl 은 한 가닥 상의 인식 부위로부터 9 개 뉴클레오티드에서 그리고 나머지 한 가닥 상의 인식 부위로부터 13 개 뉴클레오티드에서 DNA 의 이중가닥 절단을 촉매한다. 따라서， 한 구현예에서， 융합 단백질은 최소 1 개의 Type I IS 제한 효소로부터의 절단 도메인 (또는 절단 하프-도메인)과 하나 이상의 아연- 핑거 결합 도메인 (조작될 수도 있고 그렇지 않을 수도 있는)을 포함한다. Restriction endonucleases (limiting enzymes) are present in many species and can sequence-specifically bind (at the target site) to DNA, thereby cleaving the DNA at or near the binding site. Some restriction enzymes (eg, Type I IS) cleave DNA at sites removed from the recognition site and have separable bonds and cleavable domains. For example, the Type I IS enzyme Fokl catalyzes double strand cleavage of DNA at 9 nucleotides from the recognition site on one strand and 13 nucleotides from the recognition site on the other strand. Thus, in one embodiment, the fusion protein comprises a cleavage domain (or cleavage half-domain) from at least one Type I IS restriction enzyme and one or more zinc-finger binding domains (which may or may not be engineered). .

^' "TALEN"은 DNA 의 타켓 영역을 인식 및 절단할 수 있는 뉴클레아제를 가리킨다. TALEN 은 TALE 도메인 및 뉴클레오티드 절단 도메인을 포함하는 융합 단백질을 가리킨다. 본 발명에서, "TAL 이펙터 뉴클레아제" 및 "TALEN"이라는 용어는 호환이 가능하다. TAL 이펙터는 크산토모나스 (Xanthomonas ) 박테리아가 다양한 식물 종에 감염될 때 이들의 타입 IE 분비 시스템을 통해 분비되는 단백질로 알려져 있다. 상기 단백질은 숙주 식물 내의 프로모터 서열과 결합하여 박테리아 감염을 돕는 식물 유전자의 발현을 활성화시킬 수 있다. 상기 단백질은 34 개 이하의 다양한 수의 아미노산 반복으로 구성된 중심 반복 도메인을 통해 식물 DNA 서열을 인식한다. 따라서, TALE 은 게놈 엔지니어링의 도구를 위한 신규 플랫품이 될 수 있을 것으^ 여겨진다. 다만 게놈—교정 활성을 갖는 기능 TALEN 올 제작하기 위해서 다음과 같이 현재까지 알려지지 않았던 소수의 주요 매개변수가 정의되어야 한다. i ) TALE의 최소 DNA-결합 도메인, i i ) 하나의 타켓 영역을 구성하는 2 개의 절반—자리 사이의 스페이서의 길이, 및 iii) Fokl 뉴클레아제 도메인을 dTALE 에 연결하는 링커 또는 융합 접합 (fusion junction). ^' TALEN ^' refers to a nuclease capable of recognizing and cleaving target regions of DNA. TALEN refers to a fusion protein comprising a TALE domain and a nucleotide cleavage domain. In the present invention, the terms "TAL effector nuclease" and "TALEN" are compatible. TAL effectors are known to be proteins that are secreted through their type IE secretion system when Xanthomonas bacteria are infected with various plant species. The protein may bind to a promoter sequence in the host plant to activate expression of plant genes to aid bacterial infection. The protein recognizes plant DNA sequences through a central repeat domain consisting of up to 34 different numbers of amino acid repeats. Thus, it is believed that TALE could be a new platform for tools of genome engineering. However, in order to produce a functional TALEN with genome-correcting activity, a few key parameters that have not been known to date should be defined. i) minimal DNA-binding domain of TALE, ii) one The length of the spacer between the two half-sites that make up the target region, and iii) a linker or fusion junction connecting the Fokl nuclease domain to dTALE.

본 발명의 TALE 도메인은 하나 이상의 TALE-반복 모들을 통해 서열- 특이적 방식으로 뉴클레오티드에 결합하는 단백질 도메인을 가리킨다. 상기 TALE 도메인은 적어도 하나의 TALEᅳ반복 모들， 보다 구체적으로는 1 내지 30 개의 TALE-반복 모들을 포함하나 이에 한정되지 않는다. 본 발명에서， "TAL 이펙터 도메인 " 및 "TALE 도메인''이라는 용어는 호환가능하다. 상기 TALE 도메인은 TALE-반복 모들의 절반을 포함할 수 있다. 상기 TALEN 과 관련하여 국제공개특허 WO/2012/093833 호 또는 미국공개특허 2013- 0217131호에 개시된 내용 전문이 본 명세서에 참고자료로서 포함된다. TALE domains of the invention refer to protein domains that bind nucleotides in a sequence-specific manner through one or more TALE-repeat parents. The TALE domain includes, but is not limited to, at least one TALE 'repeating modulus, more specifically 1 to 30 TALE repeating moduli. In the present invention, the terms “TAL effector domain” and “TALE domain” are interchangeable.The TALE domain may comprise half of the TALE-repeat modalities. The entire contents disclosed in US Pat. No. 093833 or US Patent Publication No. 2013-0217131 are incorporated herein by reference.

일 구체예에서, 상기 표적 특이적 뉴클레아제는 Cas 단백질 (예컨대, Cas9 단백질 (CRISPR (Clustered regularly interspaced short pal indromic repeats) associated protein 9)), Cpf 1 단백질 (CRISPR from Prevotel la and Franci sella 1) 등과 같은 타입 Π 및 /또는 타입 V 의 CRISPR 시스템에 수반되는 뉴클레아제 (예컨대, 엔도뉴클레아제) 등으로 이루어진 군에서 선택된 1 종 이상일 수 있다. 이 경우, 상기 표적 특이적 뉴클레아제는 유전체 DNA 의 표적 부위로 안내하기 위한 표적 DNA 특이적 가이드 RNA 를 추가로 포함할 수 있다. 상기 가이드 R A 는 생체 외 (in vitro)에서 전사된 (transcribed) 것일 수 있고, 예컨대 올리고뉴클레오티드 이중가닥 또는 플라스미드 주형으로부터 전사된 것일 수 있으나, 이에 제한되지 않는다. 상기 표적 특이적 뉴클레아제는 가이드 RNA 에 결합된 리보핵산- 단백질 복합체를 형성 (RNA-Guided Engineered Nuclease)하여 리보핵산 단백질 (RNP) 형태로 작용할 수 있다. In one embodiment, the target specific nuclease is a Cas protein (e.g., a Cas9 protein (CRISPR (Clustered regularly interspaced short pal indromic repeats) associated protein 9), a Cpf 1 protein (CRISPR from Prevotel la and Franci sella 1)). And one or more selected from the group consisting of nucleases (eg, endonucleases) and the like involved in the CRISPR system of type Π and / or type V, and the like. In this case, the target specific nuclease may further comprise a target DNA specific guide RNA for guiding to the target site of the genomic DNA. The guide R A may be transcribed in vitro, for example, but may be transcribed from an oligonucleotide double strand or plasmid template, but is not limited thereto. The target specific nuclease can act in the form of ribonucleic acid protein (RNP) by forming a ribonucleic acid-protein complex bound to guide RNA (RNA-Guided Engineered Nuclease).

Cas 단백질은 CRISPR/Cas 시스템의 주요 단백질 구성 요소로, 활성화된 엔도뉴클레아제 또는 nickase를 형성할 수 았는 단백질이다. Cas protein is a major protein component of the CRISPR / Cas system, a protein capable of forming activated endonucleases or nickases.

Cas 단백질 또는 유전자 정보는 NCBI (National Center for Biotechnology Informat ion)의 GenBank와 같은 공지의 데이터 베이스에서 얻을 수 있다. 예컨대, 상기 Cas 단백질은, Cas protein or genetic information can be obtained from known databases such as GenBank of the National Center for Biotechnology Informat ion (NCBI). For example, the Cas protein,

스트렙토코커스 sp. {Streptococcus sp.), 예컨대， 스트렙토코커스 피요게네스 [Streptococcus pyogenes) 유래의 Cas 단백질， 예컨대， Cas9 단백질 (예컨대 , SwissProt Accession number Q99ZW2(NP_269215.1))； Streptococcus sp. {Streptococcus sp.), Such as Cas protein from Streptococcus pyogenes, such as Cas9 protein (eg SwissProt Accession number Q99ZW2 (NP — 269215.1));

캄필로박터 속, 예컨대, 캄필로박터 제주니 {Campylobacter jejuni) 유래의 Cas 단백질, 예컨대, Cas9 단백질; 스트랩토코커스 속, 예컨대, 스트템토코커스 써모필러스Cas proteins, such as Cas9 protein, of the genus Campylobacter, such as, for example, Campylobacter jejuni; Genus Straptococcus, for example, Stemtococcus thermophilus

{ Streptococcus thermophiles) 또는 스트렙토코커스 아우레우스 { Streptocuccus aureus) 유래의 Cas 단백질, 예컨대, Cas9 단백질; Cas proteins, such as Cas9 proteins from Streptococcus thermophiles or Streptocuccus aureus;

네이세리아 메닝기디티스 Neisseria meningitidis) 유래의 Cas 단백질, 예컨대， Cas9 단백질; Cas proteins from Neisseria meningitidis), such as Cas9 protein;

파스테우렐라 iPasteurel d 속, 예컨대, 파스테우렐라 물토시다 {Pasteurella multocida) 유래의 Cas 단백질, 예컨대 Cas9 단백질; Cas proteins such as the Cas9 protein from Pasteurella iPasteurel d, such as Pasteurella multocida;

프란시셀라 {Franc i sell a) 속, 예컨대, 프란시셀라 노비시다 (Francisella novicida) 유래의 Cas 단백질, 예컨대 Cas9 단백질 Cas proteins, such as Cas9 proteins, from the genus Francis i sell a, such as Francisella novicida

등으로 이루어진 군에서 선택된 하나 이상일 수 있으나, 이에 제한되는 것은 아니다. It may be one or more selected from the group consisting of, but is not limited thereto.

Cpf l 단백질은 상기 CRISPR/Cas 시스템과는 구^'별되는 새로운 CRISPR 시스템의 엔도뉴클레아제로서， Cas9 에 비해 상대적으로 크기가 작고 t racrRNA 가 필요 없으며, 단일 가이드 RNA 에 의해 작용할 수 있다. 또한, 티민 ( thymine)이 풍부한 PAM (protospacer-adj acent mot i f ) 서열을 인식하고 DNA 의 이중 사슬을 잘라 점착종단 (cohes ive end ; cohes ive doubl e-st rand break)을 생성한다. Cpf l protein is the CRISPR / Cas system is ^old, as the endonuclease of the new CRISPR system be specific, not relatively require a smaller size as compared to t racrRNA Cas9, may function by a single guide RNA. It also recognizes thymine-rich PAM (protospacer-adj acent mot if) sequences and cuts the double chains of DNA to create a cohesive end (cohesive doubl e-st rand break).

예컨대， 상기 Cpf l 단백질은 캔디다투스 { Candidatus) 속， 라치노스피라 achnospira) 속, 뷰티리비브리오 Butyrivibrio) 속, 페레그리니박테리아 Peregrinibacteria) , 액시도미노코쿠스 For example, the Cpf l protein may be used in the genus Candidatus, genus Lachinospira, Genus Livibrib Butyrivibrio, Peregrinibacteria, and Aximinococcus.

(Aci ominococcus) 속, 포르파이로모나스 Porphyromonas) 속, 프레보텔라 (Prevotella) 속, 프란시셀라 Franci sel la) 속, 캔디다투스 메타노플라스마 { Candidatus Methanoplasma) , 또는 유박테리움 {Eubacterium) 속 유래의 것일 수 있고, 예컨대， Parcubacteria bacterium (GWC2011— GWC2ᅳ 44_17) , Lachnospiraceae bacterium (MC2017) , Butyrivibrio proteoclasi icus, Peregr in ibact er ia bacterium (GW2011_GWA_33_10) , Acidaminococcus sp . (BV3L6) , Porphyromonas macacae, Lachnospiraceae bacterium (ND2006) , Porphyromonas crevi or i cam^'s, Prevotella disiens, Moraxella bovoculi (237) , Smiihella sp . (SC_K08D17) , Leptospira inadai , Lachnospiraceae bacterium (MA2020) , Francisella novicida (U112) , Candidatus Methanoplasma termitum, Candidatus Paceibacter , Eubacterium eligens등의 미생물 유래의 것일 수 있으나, 이에 제한되는 것은 아니다 . 상기 표적 특이적 엔도뉴클레아제는 미생물에서 분리된 것 또는 재조합적 방법 또는 합성적 방법 등과 같이 인위적 또는 비자연적 생산된 것 (non-natura l ly occurr i ng)일 수 있다. 일 예에서， 상기 표적 특이적 엔도뉴클레아제 (예컨대, Cas9 , Cpf l , 등)은 재조합 DNA에 의하여 만들어진 재조합 단백질일 수 있다. 재조합 DAN(Recombinant DNA ; rDNA)는 다양한 유기체로부터 얻어진 이종 또는 동종 유전 물질을 포함하기 위하여 분자 클로닝과 같은 유전자 재조합 방법에 의하여 인공적으로 만들어진 DNA 분자를 의미한다. 예컨대 , 재조합 DNA를 적절한 유기체에서 발현시켜 표적 특이적 엔도뉴클레아제를 생산 Un vivo 또는 in 하는 경우, 재조합(Aci ominococcus), Porphyromonas genus, Prevotella genus, Franci sel la genus, Candidatus Methanoplasma, or Eubacterium genus Parcubacteria bacterium (GWC2011—GWC2 ′ 44_17), Lachnospiraceae bacterium (MC2017), Butyrivibrio proteoclasi icus, Peregr in ibact ia ia bacterium (GW2011_GWA_33_10), Acidaminococcus sp. (BV3L6), Porphyromonas macacae, Lachnospiraceae bacterium (ND2006), Porphyromonas crevi or i cam 's, Prevotella disiens, Moraxella bovoculi (237), Smiihella sp. (SC_K08D17), Leptospira inadai, Lachnospiraceae bacterium (MA2020), Francisella novicida (U112), Candidatus Methanoplasma termitum, Candidatus Paceibacter, Eubacterium eligens, etc., but are not limited thereto. The target specific endonucleases are isolated from microorganisms or artificially or unnaturally produced, such as recombinant or synthetic methods. It can be (non-natura l ly occurr i ng). In one example, the target specific endonucleases (eg Cas9, Cpf l, etc.) may be recombinant proteins made by recombinant DNA. Recombinant DNA (rDNA) refers to a DNA molecule artificially produced by genetic recombination methods such as molecular cloning to include heterologous or homologous genetic material obtained from various organisms. For example, when the recombinant DNA is expressed in an appropriate organism to produce a target specific endonuclease, either in vivo or in

DNA는 제조하고자 하는 단백질을 암호화 하는 코돈들 중에서 상기 유기체에 발현하기에 최적화된 코돈을 선택하여 재구성된 뉴클레오타이드 서열을 갖는 것일 수 있다. The DNA may be one having a nucleotide sequence reconstituted by selecting a codon optimized for expression in the organism among codons encoding a protein to be prepared.

상기 불활성화된 표적특이적 엔도뉴클레아제불활성화된 표적특이적 엔도뉴클레아제는 DNA 이중 가닥을 절단하는 엔도뉴클레아제 활성을 상실한 표적특이적 엔도뉴클레아제을 의미하는 것으로， 예컨대, 엔도뉴클레아제 활성을 상실하고 니케이즈 활성을 갖는 불활성화된 표적특이적 엔도뉴클레아제 및 엔도뉴클레아제 활성과 니케이즈 활성을 모두 상실한 불활성화된 표적특이적 엔도뉴클레아제 중에서 선택된 1 종 이상일 수 있다. 상기 불활성화된 표적특이적 엔도뉴클레아제가 니케이즈 활성을 갖는 것인 경우, 상기 시토신이 우라실로 변환되는 것과 동시 또는 순서와 무관하게 순차적으로, 시토신이 우라실로 변환된 가닥 또는 그 반대 가닥 (예컨대 반대 가닥)에서 ni ck 이 도입된다 (예컨대， PAM 서열의 5 ' 말단 방향으로 3 번째 뉴클레오타이드와 4 번째 뉴클레오타이드 사이에 ni ck 이 도입됨) . 이와 같은 표저특이적 엔도뉴클레아제의 변형 (돌연변이)는 적어도 촉매 활성을 갖는 아스파르트산 잔기 (cat alyt i c aspart ate res i due ; 예컨대， 스트렙토코커스 피요젠스 유래 Cas9 단백질의 경우 10 번째 위치의 아스파르트산 (D10) 잔기 등)가 임의의 다른 아미노산으로 치환돤 Cas9 의 돌연변이를 포함하는 것일 수 있으며, 상기 다른 아미노산은 알라닌 ( a l anine)일 수 있지만, 이에 제한되지 않는다. The inactivated target specific endonuclease inactivated target specific endonuclease refers to a target specific endonuclease that has lost endonuclease activity that cleaves DNA double strands, eg, endonuclease. It may be at least one selected from an inactivated target specific endonuclease that has lost its activity and has a Nikase activity and an inactivated target specific endonuclease that has lost both endonuclease activity and Nikase activity. . When the inactivated target specific endonuclease is of Nikase activity, the strand in which the cytosine is converted to uracil or vice versa simultaneously or sequentially with or without the cytosine being converted to uracil (eg, Ni ck is introduced (e.g., ni ck is introduced between the third and fourth nucleotides in the 5 'terminal direction of the PAM sequence). Such modification of the surface specific endonuclease (mutation) is at least as catalytic acid activity (cat alytic aspartate res i due; for example, aspartic acid at position 10 in the case of Streptococcus pyogenes-derived Cas9 protein) (D10) residues, etc.) may include a mutation of Cas9 substituted with any other amino acid, and the other amino acid may be al anine, but is not limited thereto.

본 명세서에 사용된 바로서, 상기 '다른 아미노산'은， 알라닌， 이소류신, 류신, 메티오닌, 페닐알라닌， 프를린, 트립토판, 발린， 아스파라긴산, 시스테인, 글루타민， 글리신 세린, 트레오닌, 티로신 , 아스파르트산, 글루탐산, 아르기닌, 히스티딘, 라이신, 상기 아미노산들의 공지된 모든 변형체 중에서, 야생형 단백질이 원래 변이 위치에 갖는 아미노산을 제외한 아미노산들 중에서 선택된 아미노산을 의미한다. 일 예에서， 상기 불활성화된 표적특이적 엔도뉴클레아제가 변형 Cas9 단백질인 경우, 변형 Cas9 단백질은 스트렙토코커스 피요젠스 Streptococcus pyogenes) 유래의 Cas9 단백질 (예컨대， Swi ssProt Access i on number Q99ZW2(NP_269215. 1 ) )에 D10 위치에서의 돌연변이 (예컨대, 다른 아미노산으로의 치환) 가 도입되어 엔도뉴클레아제 활성이 상실되고 니케이즈 활성을 갖는 변형 Cas9 , 스트렙토코커스 피요젠스 { Streptococcus pyogenes) 유래의 Cas9 단백질에 D10 위치에서의 돌연변이 (예컨대, 다른 아미노산으로의 치환)와 H840 위치에 돌연변이 (예컨대, 다른 아미노산으로의 치환)가 모두 도입되어 엔도뉴클레아제 활성 및 니케이즈 활성을 모두 상실한 변형 Cas9 단백질 등으로 이루어진 군에서 선택된 1 종 이상일 수 있다. 예컨대, 상기 CAs9 단백질의 D10 위치에서의 돌연변이는 D10A 돌연변이 (Cas9 단백질의 아미노산 중 10 번째 아미노산인 D 가 A 로 치환된 돌연변이를 의미함; 이하, Cas9 에 도입된 돌연변이는 동일한 방법으로 표기됨)일 수 있고, 상기 H840 위치에서의 돌연변이는 H840A 돌연변이일 수 있다. As used herein, the 'other amino acids' are alanine, isoleucine, leucine, methionine, phenylalanine, plin, tryptophan, valine, aspartic acid, cysteine, glutamine, glycine serine, threonine, tyrosine, aspartic acid, glutamic acid , Arginine, histidine, lysine, among all known variants of the amino acid, means an amino acid selected from among amino acids except the amino acid has the original mutation position. In one embodiment, where the inactivated target specific endonuclease is a modified Cas9 protein, the modified Cas9 protein is a Cas9 protein derived from Streptococcus pyogenes (eg, Swi ssProt Access i on number Q99ZW2 (NP_269215. 1) )) Introduced a mutation at the D10 position (e.g., substitution with another amino acid) resulting in a loss of endonuclease activity and a Cas9 protein from a modified Cas9, Streptococcus pyogenes, having Nikase activity. A group consisting of a modified Cas9 protein or the like which has introduced both a mutation at the position (e.g., substitution with another amino acid) and a mutation (e.g., substitution with another amino acid) at the H840 position, thereby losing both endonuclease activity and Nikase activity. It may be one or more selected from. For example, the mutation at the D10 position of the CAs9 protein means a D10A mutation (mutation in which D, the tenth amino acid of the amino acids of the Cas9 protein, is substituted with A; hereinafter, a mutation introduced into Cas9 is represented by the same method). And the mutation at the H840 position may be a H840A mutation.

본 명세서에서, "뉴클레아제 "는, 다른 언급이 없는 한, 앞서 설명된, 예컨대, Cas9 , Cpf l , 등과 같은 "표적 특이적 뉴클레아제 (엔도뉴클레아제) "를 의미한다. As used herein, "nuclease", unless otherwise indicated, means "target specific nuclease (endonuclease)" as described above, eg, Cas9, Cpf l, and the like.

상기 뉴클레아제는 미생물에서 분리된 것 또는 재조합적 방법 또는 합성적 방법 등과 같이 인위적 또는 비자연적 생산된 것 (non-na iral ly occurr ing)일 수 있다. 일 예에서, 상기 뉴클레아제 (예컨대, Cas9 , Cpf l , 등)은 재조합 DNA 에 의하여 만들어진 재조합 단백질일 수 있다. 재조합 DAN (Recombi nant DNA ; rDNA)는 다양한 유기체로부터 얻어진 이종 또는 동종 유전 물질을 포함하기 위하여 분자 클로닝과 같은 유전자 재조합 방법에 의하여 인공적으로 만들어진 DNA 분자를 의미한다. 예컨대, 재조합 DNA 를 적절한 유기체에서 발현시켜 단백질 (엔도뉴클레아제)를 생산 in vivo 또는 in w^' ro)하는 경우， 재조합 DNA 는 제조하고자 하는 단백질올 암호화 하는 코돈들 중에서 상기 유기체에 발현하기에 최적화된 코돈을 선택하여 재구성된 뉴클레오타이드 서열을 갖는 것일 수 있다. The nuclease may be isolated from a microorganism or artificially or non-naturally produced, such as a recombinant method or a synthetic method. In one example, the nuclease (eg Cas9, Cpf l, etc.) may be a recombinant protein made by recombinant DNA. Recombinant DAN (Recombinant DNA; rDNA) refers to a DNA molecule artificially made by genetic recombination methods such as molecular cloning to include heterologous or homologous genetic material obtained from various organisms. For example, when recombinant DNA is expressed in an appropriate organism to produce a protein (endonuclease) in vivo or in w ^' ro), the recombinant DNA is optimized for expression in the organism among the protein-coding codons to be prepared. The selected codon may be one having a nucleotide sequence reconstituted.

상기 뉴클레아제는 단백질, 이를 암호화하는 핵산 분자, 가이드 The nuclease is a protein, a nucleic acid molecule encoding the same, a guide

RNA 와 결합된 리보핵산 단백질, 상기 리보핵산 단백질올 암호화하는 핵산 분자, 또는 상기 핵산 분자를 포함하는 재조합 백터의 형태로 사용될 수 있다. 상기 뉴클레아제 또는 이를 코딩하는 핵산 분자는 핵 내로 전달, 작용, 및 /또는 발현될 수 있는 형태일 수 있다. A ribonucleic acid protein coupled to RNA, a nucleic acid molecule encoding the ribonucleic acid protein, or a recombinant vector comprising the nucleic acid molecule may be used. The nuclease or nucleic acid molecule encoding the same may be in a form that can be delivered, acted, and / or expressed into the nucleus.

상기 뉴클레아제는 세포 내로 도입되기에 용이한 형태일 수 있다. 일 예로, 상기 뉴클레아제는 세포 침투 펩타이드 및 /또는 단백질 전달 도메인 (protein transduction domain)과 연결될 수 있다. 상기 단백질 전달 도메인은 폴리-아르기닌 또는 HIV 유래의 TAT 단백질일 수 있으나， 이에 제한되지 않는다. 세포 침투 펩타이드 또는 단백질 전달 도메인은 상기 기술된 예 외에도 다양한 종류가 당업계에 공지되어 있으므로, 당업자는 상기 예에 제한되지 않고 다양한 예를 적용할 수 있다. The nuclease may be in a form that is easy to introduce into the cell. In one example, the nuclease may be linked to a cell penetrating peptide and / or a protein transduction domain. The protein delivery domain may be, but is not limited to, poly-arginine or HIV derived TAT protein. Cell penetrating peptides or protein delivery domains are known in the art in addition to the examples described above, so those skilled in the art can apply various examples without being limited to these examples.

또한, 상기 뉴클레아제 또는 암호화하는 핵산 분자는 핵 위치 신호 In addition, the nuclease or nucleic acid molecule encoding a nuclear position signal

(nuclear localization signal , NLS) 서열 또는 이를 암호화하는 서열을 추가로 포함할 수 있다. _. 따라서, 상기 뉴클레아제를 암호화하는 핵산 분자를 포함하는 발현 카세트는 상기 뉴클레아제를 발현시키기 위한 프로모터 서열 등의 조절 서열, 또는 여기에 더하여, NLS 서열을 포함할 수 있다. 상기 NLS 서열은 당업계에 잘 알려져 있다. (nuclear localization signal, NLS) sequence or may further comprise a sequence encoding it. _. Thus, an expression cassette comprising a nucleic acid molecule encoding said nuclease may comprise a regulatory sequence, such as a promoter sequence for expressing said nuclease, or in addition, an NLS sequence. Such NLS sequences are well known in the art.

상기 뉴클레아제 또는 이를 암호화하는 핵산 분자는 분리 및 /또는 정제를 위한 태그 또는 상기 태그를 암호화하는 핵산 서열과 연결될 수 있다. 일 예로 상기 태그는 His 태그, Flag 태그， S 태그 등과 같은 작은 펩타이드 태그， GST (Glutathione S-transferase) 태그, MBP (Maltose binding protein) 태그 등으로 이루어진 군에서 적절하게 선택될 수 있으나， 이에 제한되지 않는다. The nuclease or nucleic acid molecule encoding the same may be linked to a tag for isolation and / or purification or to a nucleic acid sequence encoding the tag. For example, the tag may be appropriately selected from the group consisting of a small peptide tag such as His tag, Flag tag, S tag, GST (Glutathione S-transferase) tag, MBP (Maltose binding protein) tag, but is not limited thereto. Do not.

본 발명에서, 용어 "가이드 RNA (guide RNA)"는 표적 DNA 특이적인 NA (예컨대, DNA 의 표적 부위와 흔성화 가능한 RNA)를 의미하며, Cas 단백질, Cpfl 등과 같은 뉴클레오타이드와 결합하여 표적 DNA 로 인도하는 역할을 한다. In the present invention, the term "guide RNA" refers to a target DNA specific NA (eg, RNA that can be localized with a target site of DNA), and binds to a nucleotide such as Cas protein, Cpfl, etc., and leads to the target DNA. It plays a role.

상기 가이드 RNA는 복합체를 형성할 뉴클레아제의 종류 및 /또는 그 유래 미생물에 따라서 적절히 선택될 수 있다. 예컨대, 상기 가이드 NA는 DNA 표적 부위와 흔성화 가능한 부위를 포함하는 CRISPR RNA (crRNA); Cas 단백질, Cpfl 등과 같은 엔도뉴클레오타이드와 상호작용하는 부위를 포함하는 /^y?5^~activating crRNA (tracrRNA); 및 The guide RNA may be appropriately selected depending on the type of nuclease and / or the microorganism derived from the nuclease. For example, the guide NA may be selected from the group consisting of CRISPR RNA (crRNA) comprising a DNA target site and a site capable of hybridization; / ^ Y? 5 ^to activating crRNA (tracrRNA), including a site that interacts with endonnucleotides such as Cas protein, Cpfl, etc .; And

상기 crRNA 및 tracrRNA의 주요 부위 (예컨대, crRNA의 흔성화 부위 및 tracrRNA의 상호작용 부위)가 융합된 형태의 단일 가이드 RNA (single guide RNA; sgRNA) Single guide RNA (sgRNA) in a fused form of the major sites of the crRNA and tracrRNA (eg, the localization site of the crRNA and the interaction site of the tracrRNA)

로 이루어진 군에서 선택된 1종 이상일 수 있으며, 구체적으로 CRISPR RNA (crRNA) 및 raJS-act ivat ing crR A (tracrRNA)를 포함하는 이중 RNA (dual RNA), 또는 crRNA 및 tracrRNA의 주요 부위를 포함하는 단일 가이드 RNA (sgRNA)일 수 있다. At least one selected from the group consisting of, Specifically, it may be a dual RNA comprising CRISPR RNA (crRNA) and raJS-act ivat ing crR A (tracrRNA), or a single guide RNA (sgRNA) comprising the major sites of crRNA and tracrRNA.

상기 sgRNA는 표적 DNA 내 서열과 상보적인 서열을 가지는 부분 (이를 Spacer region, Target DNA recognition sequence, base pairing region 등으로도 명명함) 및 Cas 단백질 결합을 위한 hairpin 구조를 포함할 수 있다. 보다 구체적으로, 표적 DNA 내 서열과 상보적인 서열을 가지는 부분, Cas 단백질 결합을 위한 hairpin 구조 및 Terminator 서열을 포함할 수 .있다. 상기 기술된 구조는 5'에서 3' 순으로 순차적으로 존재하는 것일 수 있으나, 이에 제한되는 것은 아니다. 상기 가이드 RNA가 crRNA 및 tracrRNA의 주요 부분 및 표적 DNA의 상보적인 부분을 포함하는 경우라면 어떠한 형태의 가이드 R A도 본 발명에서 사용될 수 있다. The sgRNA may include a portion having a sequence complementary to a sequence in the target DNA (also referred to as a spacer region, a target DNA recognition sequence, a base pairing region, etc.) and a hairpin structure for Cas protein binding. More specifically, it may include a moiety having a sequence complementary to a sequence in the target DNA, a hairpin structure for Cas protein binding, and a terminator sequence. The structure described above may be present in order from 5 'to 3', but is not limited thereto. Any form of guide R A may be used in the present invention, provided that the guide RNA comprises a major portion of crRNA and tracrRNA and complementary portions of the target DNA.

예컨대, Cas9 단백질을 타겟 유전자 교정을 위하여 두 개의 가이드 RNA, 즉, 표적 유전자의 표적 서열 부위와 흔성화 가능한 뉴클레오타이드 서열을 갖는 CRISPR RNA (crRNA)와 Cas9 단백질와 상호작용하는 trans— activating crRNA (tracrRNA; Cas9 단백질과 상호작용함)를 필요로 하며, 이들 crRNA와 tracrRNA는 서로 결합된 이중 .가닥 crRNA: tracrRNA 복합체 형태， 또는 링커를 통하여 연결되어 단일 가이드 RNA (single guide RNA; sgRNA) 형태로 사용될 수 있다. 일 예에서, Streptococcus pyogenes 유래의 Cas9 단백질을 사용하는 경우, sgRNA는 상기 Cas9의 crRNA의 흔성화 가능한 뉴클레오타이드 서열을 적어도 포함하는 crRNA 일부 또는 전부와 상기 Cas9의 tracrRNA의 Cas9 단백질와 상호작용하는 부위를 적어도 포함하는 tracrRNA 일부 또는 전부가 뉴클레오타이드 링커를 통하여 헤어핀 구조 (stem— loop 구조)를 형성하는 것일 수 있다 (이 때 뉴클레오타이드 링커가 루프 구조에 해당할 수 있음). For example, a Cas9 protein can be used for two types of guide RNA for correcting the target gene, namely CRISPR RNA (crRNA) having a nucleotide sequence that is capable of hybridizing with a target sequence region of the target gene, and a trans—activating crRNA (tracrRNA; Cas9) that interacts with the Cas9 protein. Interacting with proteins) and these crRNA and tracrRNA can be used in the form of a double .stranded crRNA: tracrRNA complex bound to each other, or linked through a linker to form a single guide RNA (sgRNA). In one embodiment, when using a Cas9 protein derived from Streptococcus pyogenes, the sgRNA comprises at least a portion of all or part of the crRNA comprising the localizable nucleotide sequence of the crRNA of the Cas9 and a site that interacts with the Cas9 protein of the tracrRNA of the Cas9. Some or all of the tracrRNA may be to form a hairpin structure (stem- loop structure) through the nucleotide linker (the nucleotide linker may correspond to the loop structure).

상기 가이드 RNA, 구체적으로 crRNA 또는 sgRNA는 표적 DNA 내 서열과 상보적인 서열을^■ 포함하며， crRNA 또는 sgRNA의 업스트림 부위， 구체적으로 sgRNA 또는 dualR A의 crRNA의 5' 말단에 하나 이상， 예컨대, 1-10개, 1—5개， 또는 1—3개의 추가의 뉴클레오티드를 포함할 수 있다. 상기 추가의 뉴클레오티드는 구아닌 (guanine, G)일 수 있으나, 이에 제한되는 것은 아니다. The guide RNA, specifically or crRNA sgRNA is within the target DNA sequence and a sequence complementary to and including ^■, crRNA or upstream portion of the sgRNA, specifically, one or more of the 5 'end of the sgRNA crRNA dualR or A, for example, 1 It may contain 10, 1-5, or 1-3 additional nucleotides. The additional nucleotide may be guanine (G), but is not limited thereto.

다른 예에서， 상기 뉴클레아제가 Cpfl인 경우, 상기 가이드 RNA는 crRNA을 포함하는 것일 수 있으며, 복합체를 형성할 Cpfl 단백질 종류 및 /또는 그 유래 미생물에 따라서 적절히 선택될 수 있다. 상기 가이드 RNA의 구체적 서열은 뉴클레아제 (Cas9 단백질 또는 Cpf l) 의 종류 (즉， 유래 미생물)에 따라서 적철히 선택할 수 있으며， 이는 이 발명이 속하는 기술 분야의 통상의 지식을 가진 자가 용이하게 알 수 있는 사항이다. In another example, when the nuclease is Cpfl, the guide RNA may include crRNA, and may be appropriately selected depending on the type of Cpfl protein and / or the microorganism derived therefrom. The specific sequence of the guide RNA can be appropriately selected according to the type of nuclease (Cas9 protein or Cpf l) (ie, the derived microorganism), which can be easily understood by those skilled in the art. It is possible.

일 예에서 , 표적특이적 엔도뉴클레아제로서 Streptococcus pyogenes 유래의 Cas9 단백질을 사용하는 경우, crRNA 는 다음의 일반식 1 로^'표현될 수 있다: In one example, when using a Cas9 protein from Streptococcus pyogenes as a target specific endonuclease, the crRNA can be ^' expressed in Formula 1 below:

5 ' -(N_cas9) (GUUUUAGAGCUA)-(X_cas9)_m-3 ' (일반식 1) 5 '-(N _cas9 ) (GUUUUAGAGCUA)-(X _cas9 ) _m -3' (Formula 1)

상기 일반식 1에서, In the general formula 1,

N_cas9 는 표적화 서열, 즉 표적 유전자 ( target gene)의 표적 부위 ( t arget s i te)의 서열에 따라서 결정되는 부위 (즉， 표적 부위의 서열과 흔성화 가능한 서열임)이며， 1 은 상기 표적화 서열에 포함된 뉴클레오타이드 수를 나타내는 것으로 17 내지 23 또는 18 내지 22 의 정수， 예컨대 20일 수 있고; N _cas9 is a targeting sequence, i.e., a site determined according to the sequence of a target site of a target gene (i.e., a sequence that is capable of hybridizing with the sequence of the target site), 1 is the targeting sequence Representing the number of nucleotides contained in and may be an integer from 17 to 23 or 18 to 22, such as 20;

상기 표적 서열의 3 ' 방향으로 인접하여 위치하는 연속하는 12 개의 뉴클레오타이드 (GUUUUAGAGCUA) (서열번호 1)를 포함하는 부위는 crR 의 필수적 부분이고, The site comprising 12 consecutive nucleotides (GUUUUAGAGCUA) (SEQ ID NO: 1) located adjacent to the 3 'direction of the target sequence is an essential part of crR,

X_Cas9는 crRNA 의 3 ' .말단쪽에 위치하는 (즉， 상기 crRNA 의 필수적 부분의 3 ' 방향으로 인접하여 위치하는) m 개의 뉴클레오타이드를 포함하는 부위로, m 은 8 내지 12 의 정수, 예컨대 11 일 수 있으며, 상기 m 개와 뉴클레오타이드들은 서로 같거나 다를 수 있으며, 각각 독립적으로 A , U , C 및 G로 이루어진 군에서 선택될 수 있다. X _C as9 is a site comprising m nucleotides located at the 3 ′ .terminal side of the crRNA (ie, located adjacent to the 3 ′ direction of an essential part of the crRNA), where m is an integer from 8 to 12, such as 11 M and the nucleotides may be the same as or different from each other, and may be independently selected from the group consisting of A, U, C, and G.

일 예에서, 상기 X_cas9 는 UGCUGUUUUG (서열번호 2)를 포함할 수 있으나 이에 제한되지 않는다. In one example, the X _cas9 may comprise a UGCUGUUUUG (SEQ ID NO: 2), but is not limited thereto.

또한, 상기 t racrRNA는 다음의 일반식 2로 표현될 수 있다: In addition, the t racrRNA may be represented by the following general formula (2):

(일반식 2) (Formula 2)

상기 일반식 2에서， In the general formula 2,

60 개의 뉴클레오타이드 60 nucleotides

( UAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC ) (UAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC)

(서열번호 3)로 표시된 부위는 t racrRNA의 필수적 부분이고, The site marked (SEQ ID NO: 3) is an essential part of t racrRNA,

Y_{c aS}9 는 상기 t racrRNA 의 필수적 부분의 5 ' 말단에 인접하여 위치하는 p개의 뉴클레오타이드를 포함하는 부위로, p는 6 내지 20의 정수 예컨대 8 내지 19 의 정수일 수 있으며, 상기 p 개의 뉴클레오타이드들은 서로 같거나 다를 수 있고， A, U, C 및 G 로 이루어진 군에서 각각 독립적으로 선택될 수 있다. Y _{c aS} 9 is a site comprising p nucleotides located adjacent to the 5 'end of the essential part of the t racrRNA, p is an integer of 6 to 20 For example, it may be an integer of 8 to 19, the p nucleotides may be the same or different from each other, and may be independently selected from the group consisting of A, U, C and G.

또한， sgRNA는 상기 crRNA의 표적화 서열과 필수적 부위를 포함하는 crRNA 부분과 상기 tracrRNA 의 필수적 부분 (60 개 뉴클레오타이드)를 포함하는 tracrRNA 부분이 올리고뉴클레오타이드 링커를 통하여 헤어핀 구조 (stem— loop 구조)를 형성하는 것일 수 있다 (이 때， 올리고뉴클레오타이드 링커가 루프 구조에 해당함). 보다 구체적으로， 상기 sgRNA 는 crRNA 의 표적화 서열과 필수적 부분을 포함하는 crRNA 부분과 tracrRNA 의 필수적 부분을 포함하는 tracrRNA 부분이 서로 결합된 이중 가닥 R A 분자에서, crRNA 부위의 3' 말단과 tracrRNA 부위의 5' 말단이 올리고뉴클레오타이드 링커를 통하여 연결된 헤어핀 구조를 갖는 것일 수 있다. In addition, the sgRNA is a crRNA portion comprising the targeting sequence and the essential portion of the crRNA and a tracrRNA portion including the essential portion (60 nucleotides) of the tracrRNA form a hairpin structure (stem-loop structure) through the oligonucleotide linker. Where the oligonucleotide linker corresponds to the loop structure. More specifically, the sgRNA is a double-stranded RA molecule in which a crRNA portion including a targeting sequence and an essential portion of a crRNA and a tracrRNA portion including an essential portion of a tracrRNA are bonded to each other. ′ May have a hairpin structure linked via an oligonucleotide linker.

일 예에서 , sgRNA는 다음의 일반식 3으로 표현될 수 있다: In one example, the sgRNA can be represented by the following general formula 3:

5'-(N_cas9)厂 (GUIMJAGAGCUA)- (올리고뉴클레오타이드 링커) -5 '-(N _cas9 ) 厂 (GUIMJAGAGCUA)-(oligonucleotide linker)-

( UAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC ) -3 * (UAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC) -3 *

(일반식 3) (Formula 3)

상기 일반식 3 에서， (Ν^)!는 표적화 서열로서 앞서 일반식 1 에서 설명한 바와 같다. In Formula 3, (Ν ^)! Is the same as described above in Formula 1 as the targeting sequence.

상기 sgRNA 에 포함되는 올리고뉴클레오타이드 링커는 3 내지 5 개, 예컨대 4 개의 뉴클레오타이드를 포함하는 것일 수 있으며, 상기 뉴클레오타이드들은 서로 같거나 다를 수 있고， A, U, C 및 G 로 이루어진 군에서 각각 독립적으로 선택될 수 있다. The oligonucleotide linker included in the sgRNA may be three to five, for example four nucleotides, the nucleotides may be the same or different from each other, each independently selected from the group consisting of A, U, C and G Can be.

상기 crRNA 또는 sgRNA는 5' 말단 (즉, crRNA의 타겟팅 서열 부위의 5' 말단)에 1 내지 3개의 구아닌 (G)을 추가로 포함할 수 있다. The crRNA or sgRNA may further comprise 1-3 guanine (G) at the 5 'end (ie, the 5' end of the targeting sequence region of the crRNA).

상기 tracrRNA 또는 sgRNA 는 tracrRNA 의—필수적 부분 (60nt)의 3' 말단에 5 개 내지 7 개의 우라실 (U)을 포함하는 종결부위를 추가로 포함할 수 있다. The tracrRNA or sgRNA may further comprise a termination comprising 5 to 7 uracils (U) at the 3 ′ end of the—essential portion (60nt) of the tracrRNA.

상기 가이드 RNA 의 표적 서열은 표적 DNA 상의 PAM (Protospacer Adjacent Motif 서열 (6^". pyogenes Cas9 의 경우， 5'-NGG—3' (N 은 A, T, G, 또는 C 임))의 5'에 인접하여 위치하는 약 17 개 내지 약 23 개 또는 약 18개 내지 약 22개 , 예컨대 20개의 연속하는 핵산 서열일 수 있다. If the target sequence of the RNA are targeted PAM guide (Protospacer Adjacent on the DNA sequence Motif (6 ^". Pyogenes Cas9, the 5'-NGG-3 '(N is A, T, G, or C Im)) of the 5 ' And from about 17 to about 23 or about 18 to about 22, such as 20 contiguous nucleic acid sequences located contiguously.

상기 가이드 RNA의 표적 서열과 흔성화 가능한 가이드 RNA의 표적화 서열은 상기 표적 서열이 위치하는 DNA 가닥 (즉, PAM서열 (5'-NGG-3' (N은 A, T, G, 또는 C 임)이 위치하는 DNA 가닥)의 상보적인 가닥의 뉴클레오타이드 서열과 50% 이상, 60% 이상, 70% 이상, 80% 이상, 90% 이상 95% 이상, 99% 이상, 또는 100%의 서열 상보성을 갖는 뉴클레오타이드 서열을 의미하는 것으로, 상기 상보적 가닥의 뉴클레오타이드 서열과 상보적 결합이 가능하다. The target sequence of the guide RNA capable of hybridizing with the target sequence of the guide RNA is a DNA strand (ie, PAM sequence (5'-NGG-3 '(N is At least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 99% with the nucleotide sequence of the complementary strand of the DNA strand) where A, T, G, or C) is located Or nucleotide sequence having 100% sequence complementarity, and complementary binding to the nucleotide sequence of the complementary strand is possible.

본 명세서에서, 표적 부위의 핵산 서열은 표적 유전자의 해당 유전자 부위의 두 개의 DNA 가닥 중 PAM 서열이 위치하는 가닥의 핵산 서열로 표시된다. 이 때， 실제로 가이드 RNA 가 결합하는 DNA 가닥은 PAM 서열이 위치하는 가닥의 상보적 가닥이므로， 상기 가이드 RNA 에 포함된 표적화 서열은, RNA 특성상 T 를 U 로 변경하는 것을 제외하고， 표적 부위의 서열과 동일한 핵산 서열을 갖게 된다. 따라서ᅳ 본 명세서에서, 가이드 RNA 의 표적화 서열과 표적 부위의 서열 (또는 절단 부위의 서열)은 T 와 U가 상호 변경되는 것을 제외하고 동일한 핵산_.서열로 표시된다. In this specification, the nucleic acid sequence of the target site is represented by the nucleic acid sequence of the strand where the PAM sequence is located among the two DNA strands of the corresponding gene site of the target gene. At this time, since the DNA strand to which the guide RNA actually binds is the complementary strand of the strand where the PAM sequence is located, the targeting sequence included in the guide RNA is a sequence of the target site, except that T is changed to U due to RNA characteristics. It will have the same nucleic acid sequence as. Thus, in this specification, the targeting sequence of the guide RNA and the sequence of the target site (or the sequence of the cleavage site) are identical nucleic acids except that T and U are mutually altered _. It is represented by the sequence.

상기 가이드 RNA 는 RNA 형태로 사용 (또는 상기 조성물에 포함)되거나, 이를 암호화하는 腿 를 포함하는 플라스미드 형태로 사용 (또는 상기 조성물에 포함)될 수 있다. The guide RNA may be used (or included in the composition) in the form of RNA (or included in the composition), or in the form of a plasmid containing 腿 encoding it.

본 발명에서 용어_. "표적 위치 (on— target site)" 란, 상기 표적 특이적 뉴클레아제를 이용하여 변이 (절단, 삽입， 및 /또는 결실)를 도입하고자 하는 위치를 의미하며, 그 목적에 따라 임의로 선택될 수 있는 것으로 특정 유전자의 코딩 서열 내부에 존재할 수 있을 뿐만 아나라， 단백질을 생성하지 않는 비 -코딩 DNA서열에 존재할 수도 있다, Terminology in the present invention _. “On—target site” means the position at which a mutation (cutting, insertion, and / or deletion) is to be introduced using the target specific nuclease, and may be arbitrarily selected according to the purpose. As well as being present within the coding sequence of a particular gene, but may also be present in non-coding DNA sequences that do not produce a protein.

상기 표적 특이적 뉴클레아제는 서열 특아성 (specificity)을 가지므로 표적 위치에 작용하는 것이나， 표적 서열에 따라 비표적 위치 (off-target site)에 작용하는 부작용이 발생할 수도 있다. Since the target specific nuclease has sequence specificity, the target specific nuclease acts at the target position, but side effects may occur at the off-target site depending on the target sequence.

본 명세서에서, 비표적 위치 (off-target site)라 함은 표적 특이적 뉴클레아제의 표적 서열과 동일하지 않은 서열을 갖지만 상기 표적 특이적 뉴클레아제가 활성을 가지는 위치를 말한다. 즉, 표적 위치 이외의, 표적 특이적 뉴클레아제에 의해 절단되는 위치를 말한다. 일 예에서， 상기 비표적 위치는 특정 표적 특이적 뉴클레아제에 대한 실제 비표적 위치뿐만 아니라 비표적 위치가 될 가능성이 있는 위치까지 포함하는 개념으로 사용될 수 있다. 상기 비표적 위치는 이에 제한되는 것은 아니나， 시험관 내 Un / ·ο)에서 표적 특이적 뉴클레아제에 의해 절단되는 표적 위치 이외의 모든 위치일 수 있다. 유전자 가위가 표적 위치 이외의 위치에서도 활성을 가지는 것은 다양한 원인에 의해 야기될 수 있다. 예컨대, 표적 위치에 대하여 설계된 표적 서열과 뉴클레오티드 불일치 (mi smatch)를 가지는, 표적 위치와 서열 상동성이 높은 비표적 서열의 경우 유전자 가위가 작동할 가능성이 있다. 상기 비표적 위치는 이에 제한되는 것은 아니나, 표적 서열과 1 이상의 뉴클레오티드 불일치 (mi smatch)를 가지는 위치일 수 있다. As used herein, an off-target site refers to a position having a sequence that is not identical to a target sequence of a target specific nuclease, but wherein the target specific nuclease is active. That is, it refers to a position cleaved by a target specific nuclease other than the target position. In one example, the non-target position may be used as a concept including not only the actual non-target position for a specific target specific nuclease but also a position that is likely to be a non-target position. The non-target position may be any position other than the target position cleaved by the target specific nuclease in Un / o) in vitro, but not limited thereto. Genetic shearing activity at a position other than the target position can be caused by a variety of causes. For example, there is a possibility that genetic scissors work for non-target sequences that have high sequence homology with the target position, which has a nucleotide mismatch with the target sequence designed for the target position. The non-target position may be a position having one or more nucleotide mismatches with a target sequence, but not limited thereto.

이는 유전체 내에서 원치 않는 유전자의 돌연변이를 야기할 수 있어 상기 표적 특이적 뉴클레아제를 사용하는데 심각한 문제가 될 수 있다. 이에, 표적 특이적 뉴클레아제의 표적 위치에서의 활성 못지 않게 비표적 위치를 정확히 검출하여 분석하는 과정 또한 매우 중요할 수 있으며, 이는 비표적 효과 없이 표적 위치에만 특이적으로 작동하는 표적 특이적 뉴클레아제를 개발하는데 유용하게 사용될 수 있을 것이다. This can cause mutations in unwanted genes in the genome, which can be a serious problem in using such target specific nucleases. Therefore, the process of accurately detecting and analyzing non-target positions as well as the activity at target positions of target specific nucleases can also be very important, which is a target specific nucleus that works specifically at target positions without non-target effects. It may be useful to develop clease.

본 발명의 목적상 상기 뉴클레아제는 생체 내 Un vivo) 및 시험관 내 Un w ro)에서 뉴클레아제 활성을 가질 수 있으므로, 시험관 내에서 유전체 DNA 의 비표적 위치를 검출하는데 사용될 수 있으며, 이를 생체 내에서 적용하였을 때 상기 검출된 비표적 위치와 동일한 위치에도 활성을 가질 것을 예상할 수 있다. For the purposes of the present invention, since the nuclease may have nuclease activity in vivo in vivo and in vitro, it may be used to detect non-target positions of genomic DNA in vitro, which may be used in vivo. When applied within it can be expected to have activity at the same position as the detected non-target position.

상기 (b) 단계는 상기 (a) 단계를 통해 절단된 DNA 를 이용해 전체 유전체 시뭔싱 (whole genome sequenc ing , WGS)을 수행하는 단계로서， 표적 위치의 서열과 상동성을 가지는 서열을 찾아 비표적 위치일 것으로 예측하는 간접적인 방법과 달리 전체 유전체 수준에서 실질적으로 표적 특이적 뉴클레아제에 의해 절단되는 비표적 위치를 검출하기 위해 수행되는 것이다. Step (b) is a step of performing whole genome sequence (WGS) using the DNA cut through the step (a), and finds a sequence having homology with the sequence of the target position Unlike indirect methods of predicting position, it is performed to detect non-target positions substantially cleaved by target specific nucleases at the entire genome level.

본 발명에서 용어 "전체 유전체 시뭔싱 (whole genome sequenc ing; WGS) ' '은 차세대 시퀀싱 (next generat i on sequencing)에 의한 전장 유전체 시퀀싱을 10 X , 20 X , 40 X 형식으로 여러 배수로 유전체를 읽는 방법을 의미한다. "차세대 시퀀싱 "은 칩 (Chip) 기반 및 PCR 기반 페어드엔드 (pai red end) 형식으로 전장 유전체를 조각내고, 상기 조각을 화학적인 반웅 (hybr idi zat ion)에 기초하여 초고속으로 시퀀싱을 수행하는 기술을 의미한다 . In the present invention, the term "whole genome sequencing" (WGS) '' reads the full-length genome sequencing by next gene sequencing in multiple 10x, 20x, 40x format in multiple multiples "Next-Generation Sequencing" fragments the full-length genome in chip-based and PCR-based pai red end formats, and then fragments are ultra-fast based on chemical hybr idi zat ions. This refers to a technique for performing sequencing.

상기 (c) 단계는 상기 전체 유전체 시퀀싱으로 수득한 염기서열 데이터 (sequence read)에서 DNA 가 절단된 위치를 결정하는 단계로서, 시퀀싱 데이터를 분석하여 표적 특이적 뉴클레아제의 표적 위치 및 비표적 위치를 간편하게 검출할 수 있다. 상기 염기서열 데이터로부터 DNA 가 절단된 특정 위치를 결정하는 것은 다양한 접근 방법으로 수행될 수 있으며， 본 발명에서는 상기 위치를 결정하기 위한 여러 가지의 합리적인 방법들을 제공한다. 그러나 이는 본 발명의 기술적 사상에 포함되는 예시에 불과하며, 본 발명의 범위가 이들 방법에 의해 제한되는 것은 아니다. Step (c) is a step of determining the position where the DNA is cleaved from the sequential read obtained from the whole genome sequencing, by analyzing the sequencing data to target and non-target position of the target specific nuclease Can be detected easily. DNA from the sequence data Determining a particular location cut may be performed in a variety of approaches, and the present invention provides a variety of reasonable ways to determine the location. However, this is only an example included in the technical idea of the present invention, and the scope of the present invention is not limited by these methods.

예컨대, 상기 절단된 위치를 결정하기 _.위한 일례로서， 전체 유전체 시뭔싱을 통해 수득한 염기서열 데이터를 분석 프로그램 (예를 들어, BWA/GATK 또는 ISAAC 등)을 이용하여 유전체 상의 위치에 따라 정렬하였을 경우， 5' 말단이 수직 정렬된 위치가 DNA가 절단된 위치를 의미할 수 있다. 본 명세서에 사용된 바로서, 상기 용어 "수직 정렬"이란, BWA/GATK 또는 ISAAC 등의 프로그램으로 전체 유전체 시뭔싱 결과를 분석할 때, 인접한 왓슨 가닥 (Watson strand)과 크릭 가닥 (Crick strand) 각각에 대해， 2 개 이상의 염기서열 데이터의 5' 말단이 유전체 상의 동일한 위치 (nucleotide posit ion)에서 시작되는 배열을 말한다. 이로 인하여， 표적 특이적 뉴클레아제에 의해 잘려 동일한 5' 말단을 갖게 되는 DNA 단편들이 각각 시퀀싱되어 나타나게 된다. For example, determining the cut position _. As an example, the 5 'terminal is vertically aligned when sequence data obtained through whole genome sequencing is aligned according to the position on the genome using an analysis program (eg, BWA / GATK or ISAAC). May refer to a position where the DNA is cleaved. As used herein, the term "vertical alignment" refers to adjacent Watson strands and Crick strands, respectively, when analyzing whole genome sequencing results with a program such as BWA / GATK or ISAAC. For, refers to an arrangement where the 5 'end of two or more nucleotide sequences data starts at the same position on the genome (nucleotide posit ion). This results in the sequencing of each of the DNA fragments that are cut by the target specific nuclease and have the same 5 'end.

즉, 표적 특이적 뉴클레아제가 표적 위치 및 비표적 위치에 뉴클레아제 활성을 나타내 상기 위치를 절단하는 경우， 염기서열 데이터를 정렬하게 되면 공통적으로 절단된 부위는 각각 그 위치가 5' 말단으로 시작되므로 수직 정렬되나, 절단되지 않은 부위에는 5' 말단이 존재하지 않으므로 정렬 시 스태거드 (staggered) 방식으로 배열될 수 있다. 따라서， 수직 정렬된 위치를 표적 특이적 뉴클레아제에 의해 절단된 부위로 볼 수 있으며, 이는 곧 표적 특이적 뉴클레아제의 표적 위치 또는 비표적 위치를 의미하는 것일 수 있다. That is, when a target specific nuclease exhibits nuclease activity at a target position and a non-target position and cleaves the position, when sequencing the sequence data, the commonly cleaved sites start at 5 'ends. Therefore, the vertical alignment is performed, but since the 5 'end does not exist in the uncut portion, the alignment may be arranged in a staggered manner. Thus, the vertically aligned position can be viewed as the site cleaved by the target specific nuclease, which may mean the target position or non-target position of the target specific nuclease.

상기 "정렬''은 표준 염기서열 (reference genome)로 염기서열 데이터를 맵핑한 뒤, 유전체에서 동일 위치를 가지는 염기들을 각 위치에 맞게 배열하는 것을 의미한다. 따라서, 염기서열 데이터를 상기와 같은 방식으로 정렬할 수 있다면 어떠한 컴퓨터 프로그램도 이용될 수 있으며, 이는 당업계에 이미 알려진 공지의 프로그램이거나 또는 목적에 맞게 제작된 프로그램들 중에서 선택될 수 있다. 일 실시예에서는 ISAAC 를 이용하여 정렬을 수행하였으나, 이에 제한되는 것은 아니다. The term "alignment" means mapping base sequence data to a reference genome, and arranging bases having the same position in the genome according to each position. Any computer program may be used as long as it can be sorted by a computer program, which may be a known program known in the art, or may be selected from among programs designed for the purpose. However, the present invention is not limited thereto.

정렬 결과, 상기 설명한 바와 같은 5' 말단이 수직 정렬된 위치를 찾는 등의 방법을 통해 표적 특이적 뉴클레아제에 의해 DNA 가 절단된 위치를 결정할 수 있고, 상기 절단된 위치가 표적 위치 (on-target site)가 아니라면, 비표적 위치 (off— target site)로 판단할 수 있다. 다시 말해， 표적 특이적 뉴클레아제의 표적 위치로 설계한 염기 서열과 동일한 서열은 표적 위치이고, 상기 염기 서열과 동일하지 않은 서열은 비표적 위치로 볼 수 있다. 이는 상기 기술한 비표적 위치의 정의상 자명한 것이다ᅳ 상기 비표적 위치는 특히, 표적 위치의 서열과 상동성을 가지는 서열로 구성된 것일 수 있고, 구체적으로 표적 서열과 1 개 이상의 뉴클레오티드 불일치 (m i smat ch)를 가지는 서열， 더욱 구체적으로 표적 위치와 1 내지 6 개， 1 개 내지 5 개, 1 개 내지 4 개, 1 개 내지 3 개, 1 개 내지 2 개, 또는 1 개의 뉴클레오티드 불일치를 가지는 것일 수 있으나， 이에 특별히 제한되는 것은 아니고 표적 특이적 뉴클레아제가 절단할 수 있는 위치라면 본 발명의 범위에 포함될 수 있다. 이때， 상기 표적 위치는 가이드 RNA 와 상보적인 15 내지 30 뉴클레오티드 서열일 수 있고, 추가적으로 표적 특이적 뉴클레아제가 인식하는 서열 (예컨대, Cas9 의 경우 Cas9 이 인식하는 PAM 서열)을 포함할 수 있다. As a result of the alignment, the position where the DNA is cleaved by the target specific nuclease can be determined by a method such as finding a position where the 5 'end is vertically aligned as described above, and the cleaved position is the target position (on- If it is not a target site, it can be determined as an off target site. In other words, The same sequence as the base sequence designed as the target position of the target specific nuclease is the target position, and sequences not identical to the base sequence can be regarded as non-target positions. This is obvious in the definition of the non-target positions described above. The non-target positions may in particular be composed of sequences having homology with the sequence of the target position, and in particular one or more nucleotide mismatches with the target sequence (mi smat ch). ), More specifically 1 to 6, 1 to 5, 1 to 4, 1 to 3, 1 to 2, or 1 nucleotide mismatch with the target position It is not particularly limited thereto, and may be included in the scope of the present invention as long as the target specific nuclease is cleavable. In this case, the target position may be a 15 to 30 nucleotide sequence complementary to the guide RNA, and may further include a sequence recognized by the target specific nuclease (eg, Cas9 recognizes Cas9 PAM sequence).

다른 예에서, 5 ' 말단이 수직 정렬된 위치를 찾는 방법 이외에도, 5 ' 말단 플롯에서 이중 피크 패턴을 보아는 경우 그 위치가 표적 위치가 아니라면 비표적 위치로 판단할 수 있다. 유전체 DNA 내의 각 위치에 대하여 동일한 염기의 5 ' 말단을 구성하고 있는 뉴클레오티드 수를 세어 그래프를 그릴 경우， 특정 위치에서 이중 피크 패턴이 나타나게 되는데， 상기 이중 피크는 표적 특이적 뉴클레아제에 의해 절단된 이중 가닥의 각각의 가닥에 의해 나타나는 것이기 때문이다. In another example, in addition to finding a position where the 5 'end is vertically aligned, when viewing the double peak pattern in the 5' end plot, it may be determined as a non-target position if the position is not the target position. When counting the number of nucleotides constituting the 5 'end of the same base for each position in genomic DNA, a double peak pattern appears at a specific position, which is cleaved by a target specific nuclease. It is represented by each strand of the double strand.

일 실시예에서, 유전체 DNA 를 표적 특이적 뉴클레아제 (예컨대, RGEN)으로 절단한 뒤, 전체 유전체 분석 후 이를 ISAAC 로 정렬하여 절단된 위치에서는 수직 정렬, 절단되지 않은 위치에서는 스태거드 방식으로 정렬되는 패턴을 확인하였으며, 이를 5 ' 말단 플롯으로 나타내었을 때 절단 위치에서 이중 피크의 독특한 패턴이 나타나는 것을 확인하였다. In one embodiment, the genomic DNA is cleaved with a target specific nuclease (eg, RGEN), followed by whole genome analysis and then aligned with ISAAC, in a vertical alignment at the cut position and in a staggered position at the uncut position. The alignment pattern was confirmed, and when it is represented by the 5 'terminal plot, it was confirmed that a unique pattern of double peaks appeared at the cleavage position.

나아가 이에 제한되는 것은 아니나, 구체적인 일례로 왓슨 가닥 Further, but not limited to, Watson strand as a specific example

(Wat son st rand)과 크릭 가닥 (Cr i ck st rand)에 해당하는 염기서열 데이터 ( sequence read)가 각각 두 개 이상씩 수직으로 정렬되는 위치를 비표적 위치인 것으로 판단할 수 있고, 또한 20 % 이상의 염기서열 데이터가 수직으로 정렬되고， 각각의 왓슨 가닥 및 크릭 가닥에서 동일한 5 ' 말단을 가진 염기서열 데이터의 수가 10 이상인 위치가 비표적 위치 , 즉 절단되는 위치인 것으로 판단할 수 있다. The position where the sequence data corresponding to (Wat son st rand) and the creek strand (Cr ck st rand) are vertically aligned by two or more, respectively, may be determined as a non-target position. It can be determined that at least 10 nucleotide sequence data is vertically aligned, and the position at which the number of nucleotide sequence data having the same 5 'end in each of the Watson strand and the creek strand is 10 or more is a non-target position, that is, a cleavage position.

상기 비표적 위치 확인 (검출)은 시험관 내 in w^' o)에서 표적 특이적 뉴클레아제를 유전체 DNA 에 처리하여 수행될 수 있다. 이에 상기 방법을 통해 확인 (검출)된 비표적 위치에 대하여 실질적으로 생체 내 (/ ra)에서도 비표적 효과가 나타나는지 확인해볼 수 있다. 다만 이는 추가적인 검증 과정에 불과하므로 본 발명의 범위에 필수적으로 수반되는 단계는 아니며, 필요에 따라 추가적으로 수행될 수 있는 단계에 불과하다. 본 명세서에 사용된 바로서, 용어 "비표적 효과 (off-target effect)"는 비표적 위치 (off— target site)와는 구별되는 개념일 수 있다. 즉, 상기 설명한 바와 같이 본 발명에서 비표적 위치라는 개념은 표적 특이적 뉴클레아제가 작동할 수 있는 위치 중 표적 위치가 아닌 위치를 의미하는 것으로, 상기 표적 특이적 뉴클레아제에 의해 절단되는 위치를 말하는 것이나, 비표적 효과는 세포 내 비표적 위치에서 표적 특이적 뉴클레아제에 의해 인델 (Insertion and/or deletion)이 나타나는 효과를 의미한다. Make the non-target position (detection) it can be carried out by treating the target-specific nucleases in in vitro w ^'o) in the dielectric DNA. Above this Through the method, it is possible to confirm whether the non-target effect is substantially observed in vivo (/ ra) for the non-target position identified (detected). However, since this is only an additional verification process, it is not an essential step in the scope of the present invention, but is only a step that may be additionally performed as necessary. As used herein, the term “off-target effect” may be a concept that is distinct from off—target sites. That is, as described above, in the present invention, the concept of non-target position means a position other than the target position among the positions where the target specific nuclease can operate, and indicates a position that is cleaved by the target specific nuclease. As said, non-target effects refer to the effects of insertion and / or deletion by target specific nucleases at non-target locations in the cell.

상기 용어 "인델"은 DNA 의 염기 배열에서 일부 염기가 중간에 삽입되거나 (insertion) 및 /또는 결실된 (deletion) 변이를 총칭한다. 또한, 표적 특이적 뉴클레아제에 의해 상기 인델이 일어난 비표적 위치를 비표적 인델 위치라고 한다. 결론적으로, 본 명세서의 비표적 위치는 비표적 인델 위치를 포함하는 개념으로 볼 수 있으며， 표적 특이적 뉴클레아제가 활성을 가질 수 있는 가능성이 있는 위치로 족하며, 반드시 유전자 가위에 의한 인델이 확인되어야 하는 것은 아니다. 한편， 본 발명에서의 ^'비표적 위치는 비표적 후보 위치 (candidate off-target site)로, 비표적 인델 위치는 검증된 비표적 위치 (validated off-target site)로도 명명될 수 있다. The term “indel” refers to a variation in which some bases are inserted or deleted in the base sequence of DNA. In addition, the non-target position where said indel was caused by a target specific nuclease is called a non-target indel position. In conclusion, the non-target position of the present specification can be viewed as a concept including a non-target indel position, and the target specific nuclease may be a position capable of having activity, and the indel may be identified by genetic scissors. It does not have to be. On the other hand, ^"non-target position in the present invention is a non-target position candidates (candidate off-target site), non-target indel position may be referred to as a proven non-target position (validated off-target site).

구체적으로 상기 검증 과정은, 이에 제한되는 것은 아니나, 상기 비표적 위치에 대한 표적 특이적 뉴클레아제가 발현된 세포로부터 유전체 DNA 를 분리하고， 상기 DNA 의 비표적 위치에서 인델을 확인하여 비표적 위치에서의 비표적 효과를 확인하는 것일 수 있다. 이는, T7E1 분석， Cel- Specifically, the verification process is not limited thereto, but the genomic DNA is isolated from the cells expressing the target specific nucleases for the non-target positions, and the indels are identified at the non-target positions of the DNA. It may be to confirm the non-target effect of. This is the T7E1 analysis, Cel-

I 효소를 이용한 돌연변이 검출 분석 또는 표적화 딥시퀀싱 (targeted deep sequencing) 등 당업계에 공지된 인델 확인 방법을 수행하여 비표적 효과를 확인하는 것일 수 있다. 상기 비표적 효과를 확인하는 단계는 비표적 위치에서 인델이 일어났는지를 직접적으로 확인하는 것일 수 있다. 다만, 이러한 생체 내 검증 과정에서 인델이 일어나지 않았다고 하더라도， 이는 검출할 수 있는 수준 이하의 빈도로 인델이 일어날 경우까지 확인한 것은 아니므로 어디까지나 보조적인 수단으로 보아야 한다. 상기 기술한 것과 같이 수직 정렬된 위치를 확인하거나, 또는 5' 말단 플롯에서 이중 피크를 확인하는 것만으로도 비표적 위치를 충분히 검출할 수 있고 이는 고도의. 재현성을 가지는 것이나， 불균일 절단 패턴 또는 낮은 시퀀싱 깊이 (depth)를 가^ 이부 위치가 누락될 수 있다는 문제가 있다. 이에 본 명세서에서는 염기서열 데이터의 정렬 패턴을 기반으로 하여, 각 뉴클레오티드의 위치 i. (즉 유전체 DNA 상의 뉴클레오타이드 위치)에 DNA 절단 점수를 산출하는 수식을 다음과 같이 제공한다 (도 4 참조): 위치에서의점수 = It may be to confirm the non-target effect by performing indel identification methods known in the art such as mutation detection analysis or targeted deep sequencing using the I enzyme. The step of identifying the non-target effect may be to directly check whether the indel occurred at the non-target position. However, even though indel did not occur in the in vivo verification process, this is not confirmed until indel occurs at a frequency below the detectable level and should be regarded as an auxiliary means to the last. Identifying vertically aligned positions as described above, or identifying double peaks in the 5 'end plots, is sufficient to detect non-target positions, which is highly sensitive. There is a problem of having reproducibility, but a non-uniform cutting pattern or a low sequencing depth may cause missing tooth locations. In this specification, based on the alignment pattern of the sequence data, the position of each nucleotide i. (I.e., nucleotide positions on genomic DNA) provides the formula for calculating DNA cleavage scores as follows (see Figure 4): score at position =

ι위치에서 시작하는정방향염기서열 데이터의 수

Number of forward base sequence data starting at position

I위치에서시작하는역방향염기서열 데이터의 수 Number of reverse base sequence data starting at position I

i위치에서의 시퀀싱 깊이 (depth ) Sequencing Depth at i Position

임의의 상수 Any constant

상기 수식에서 염기서열 데이터의 수는 뉴클레오타이드 리드 수를 의미하고, 시퀀싱 깊이는 특정 위치에서의 시퀀싱 리드수를 의미한다. In the above formula, the number of nucleotide sequence data means the number of nucleotide reads, and the sequencing depth means the number of sequencing reads at a specific position.

또한, 상기 수식을 통해 기존의 Digenome-seq 에서는 검출되지 않았던 다수의 추가적인 위치를 검출할 수 있으며, 이를 통해 거짓 -양성 위치를 손쉽게 걸러낼 수 있다. 상기 수식에서 C 값은 당업자가 임의의 상수를 적용할 수 있는 것으로 본 발명의 실시예에 의해 제한되는 것은 아니다. 일 예에서, 상기 C는 1 내지 1000, 1 내지 500, 1 내지 100,.1 내지 50, 1 내지 10, 1 내지 5, 또는 1 내지 3 일 수 있으나， 이에 제한되는 것은 아니다. 특히, 이에 제한되는 것은 아니나， 예컨대, 임의의 위치 (절단된 위치)의 염기서열에 있어서, C 값을 1 으로 하여 상기 산출된 점수가 2.5 점 이상이거나, 0.1 점 이상이며 On— target 서열과 homology 를 갖는 경우 (예컨대， Οη-target 서열과 10 개 이하의 미스매치를 가지고 PAM (5'-NGN-3' 또는 5'-NNG-3')을 포함하는 경우)， 상기 임의의 위치 (절단된 위치)를 비표적 위치로 판단할 수 있다. 다만， 상기 점수의 기준은 목적에 따라 당업자에 의해 적절히 조정， 변경될 수 있다. In addition, the above formula can detect a number of additional positions that were not detected in the existing Digenome-seq, and thus can easily filter out false-positive positions. The C value in the above formula is not limited by the embodiment of the present invention as one skilled in the art can apply any constant. In one example, C may be 1 to 1000, 1 to 500, 1 to 100, .1 to 50, 1 to 10, 1 to 5, or 1 to 3, but is not limited thereto. In particular, but not limited to, for example, in the nucleotide sequence of any position (cutting position), the calculated score with a C value of 1 is 2.5 or more, or 0.1 or more and On-target sequence and homology In the case of (eg, containing PAM (5'-NGN-3 'or 5'-NNG-3') with Οη-target sequence and no more than 10 mismatches) Position) can be determined as a non-target position. However, the criteria of the score may be appropriately adjusted and changed by those skilled in the art according to the purpose.

일 예에서, 본 명세서에서 제공되는 Di genotne-seQ 방법은 복수의 표적 특이적 뉴클레아제 (예컨대, 표적 부위가 상이한 가이드 R A 를 다수 포함하는 표적 특이적 뉴클레아제)를 이용하여 수행될 수도 있으며, 본 명세서에서는 이를 "복합 Di genome— seq' '로 명명한다. 이 경우， 상기 표적 특이적 뉴클레아제는 2 개 이상, 구체적으로 2 내지 100 개의 표적에 대한 표적 특이적 뉴클레아제를 흔합한 것일 수 있으나, 이에 제한되는 것은 아니다. In one example, the Di genotne-seQ method provided herein may be performed using a plurality of target specific nucleases (eg, target specific nucleases comprising multiple guide RAs with different target sites) In the present specification, this is referred to as “complex Di genome—seq '” In this case, the target specific nuclease may be a combination of target specific nucleases for two or more, specifically 2 to 100 targets. It may be, but is not limited thereto.

상기 복합 Di genome— seq 의 경우 각각의 표적 특이적 뉴클레아제에 의해 유전체 DNA 가 절단되므로 절단 위치가 어느 유전자 가위에 의해 절단되었는지를 확인하는 것이 중요하다. 이는 표적 위치와의 편집 거리 (ed i t di st ance)에 따라 비표적 위치를 분류함으로써 달성될 수 있으며， 비표적 위치의 염기 서열이 표적 위치와 상동성을 가진다는 것을 전제로 한다. 이를 통해 각각의 유전자 가위에 대한 표적 및 비표적 위치가 명확하게 구분될 수 있다. In the case of the complex Di genome seq, genomic DNA is cleaved by each target specific nuclease, so it is important to identify which gene was cut by the cleavage site. This can be achieved by classifying non-target positions according to the edit distance from the target position (ed i t st ance), and it is assumed that the base sequence of the non-target position has homology with the target position. This makes it possible to clearly distinguish between target and non-target positions for each gene shear.

본 발명의 구체적인 일 실시예에서는 특정 위치를 표적으로 하는 RGEN (R A-gu i ded engineered nuc l ease)에 대하여, 전체 유전체에서 Digenome-seq 를 통해 검출된 비표적 위치 중 표적 위치와의 뉴클레오티드 불일치가 6 개 이하인 상동성 위치가 13 , 000 개 이하이고， 뉴클레오티드 블일치가 2 개 이하인 상동성 위치를 가지지 않는 경우, 상기 특정 위치를 RGEN 의 표적 위치로 선별하는 것이 비표적 효과를 최소화할 수 있음을 확인하였다. 이는 본 발명의 Digenome-seq 를 이용하여 표적 위치를 선별하는 바람직한 기준을 확립해가는 과정을 보여주는 일례로서, D i genome-seq 를 통해 유전자 가위의 비표적 효과를 최소화 시킬 수 있올 것으로 기대된다. In a specific embodiment of the present invention, a nucleotide mismatch with a target position among non-target positions detected by Digenome-seq in the whole genome for RGEN (R A-gu i ded engineered nuclei ease) targeting a specific position. If there are no more than 13, 000 homology positions with no more than 6, and no homology positions with no more than 2 nucleotide matches, selecting the specific position as the target position of the RGEN can minimize non-target effects. It was confirmed. This is an example showing the process of establishing a desirable criterion for selecting a target position using the Digenome-seq of the present invention, Di genome-seq is expected to minimize the non-target effect of the genetic scissors.

한편, 표적 위치의 서열과 상동성을 가지는 위치의 수는 뉴클레오티드 블일치 수준이 증가할 수록 Di genome-seq를 통해 적은 비율로 검출되는 것을 확인하였다. 이는 RGEN 의 표적 위치를 선별함에 있어서， 표적 서열과 유전체 내에서 상동성을 가지는 뉴클레오티드 서열이 많을 수록, 특히 고도의 상동성을 가지는 뉴클레오티드 서열이 많을 수록 상대적으로 더욱 특이적이기 때문이다. 이를 통해 선별된 RGEN 의 표적 위치는 비표적 효과가 최소화된 것일 수 있다. 다른 예는 상기 표적 특이적 뉴클레아제의 절단 위치 또는 비표적 위치 (off-target site)를 검출하는 방법을 사용하여 비표적 위치가 적은 표적 부위 및 /또는 상기 표적 부위를 표적으로 하는 표적 특이적 뉴클레아제 * 선별하는 방법을 제공한다. On the other hand, it was confirmed that the number of positions having homology with the sequence of the target position is detected at a small rate through Di genome-seq as the level of nucleotide mismatch increases. This is because, in selecting the target position of RGEN, the more nucleotide sequences having homology in the target sequence and the genome, and in particular, the more nucleotide sequences having high homology, are relatively more specific. The target location of the selected RGEN may be a non-target effect is minimized. Another example is a target site having a low non-target position and / or a target specific target for the target site using a method of detecting the cleavage site or off-target site of the target specific nuclease. Nuclease * Provides a method for screening.

상기 선별 방법은, The screening method,

(b) 상기 절단된 DNA 에 대한 전체 유전체 시퀀싱 (whole genome sequencing)을 수행하는 단계; (b) performing whole genome sequencing on the cleaved DNA;

(c) 상기 시뭔싱으로 수득한 염기서열 데이터 (sequence read)에서 상기 절단된 위치를 확인하는 단계; (c) identifying the cleaved position in sequence reads obtained by the sequencing;

(d) 상기 확인된 절단된 위치가 표적 위치 (on-target site)가 아닌 경우， 비표적 위치 (off-target site)로 판단하는 단계; 및 (d) if the identified truncated position is not an on-target site, determining it as an off-target site; And

(e) 상기 판단된 비표적 위치 데이터를 분석하여， 비표적 위치가 비교 대상보다 적게 나타난 경우 상기 표적 특이적 뉴클레아제가 표적으로 하는 표적 부위 및 /또는 이를 표적으로 표적 특이적 뉴클레아제를 선택하는 단계 (e) analyzing the determined non-target position data to select a target site to which the target specific nuclease targets and / or a target specific nuclease as a target when the non-target position appears less than the comparison target Steps to

를 포함할 수 있다. It may include.

일 예에서, 상기 표적 특이적 뉴클레아제가 가이드 NA에 의하여 표적 부위로 표적화되는 것인 경우, 상기 선별된 표적 부위 및 /또는 표적 특이적 뉴클레아제는 상기 가이드 RNA의 표적 부외의 표적 DNA 서열과 흔성화 가능한 서열을 포함하는 것으로 특징화될 수 있다. In one embodiment, where the target specific nuclease is targeted to a target site by a guide NA, the selected target site and / or target specific nuclease may be linked to a target DNA sequence other than the target non-target of the guide RNA. It may be characterized as comprising a sequence that can be localized.

상기 비교 대상은 임의의 표적 DNA의 표적 서열에 대한 표적 특이적 뉴클레아제일 수 있으며, 일 예에서, 통상적오로 사용되거나 이미 알려진 표적 특이적 뉴클레아제 (예컨대, RGEN 및 가이드 RNA 조합)들 중 선택된 어느 하나일 수 있다. The comparison subject may be a target specific nuclease for the target sequence of any target DNA, and in one embodiment is selected from among the commonly used or known target specific nucleases (eg, RGEN and guide RNA combinations) It can be either.

【발명의 효과】【Effects of the Invention】

본 발명의 Digenome-seq 는 고도의 재현성으로 유전체 수준에서 유전자 가위의 비표적 위치를 검출할 수 있어， 표적 특이성이 높은 유전자 가위의 제작 및 이를 위한 연구에 사용될 수 있다ᅳ Digenome-seq of the present invention can detect the non-target position of the gene scissors at the genome level with high reproducibility, and can be used for the construction and research for the gene scissors with high target specificity.

【도면의 간단한 설명】 [Brief Description of Drawings]

도 1은 크로마틴 DNA를 이용한 절단 유전체 시뭔싱 방법을 모식적으로 보여준다. Figure 1 shows the cleavage genomic sequencing method using chromatin DNA It is shown schematically.

도 2a 내지 2c는 크로마틴 DNA와 크로마틴이 없는 DNA를 이용한 절단 유전체 시퀀싱 결과를 비교하여 보여주는 것으로， Figures 2a to 2c shows the comparison of the cleavage genome sequencing results using chromatin DNA and chromatin-free DNA,

도 2a는 크로마틴 단백질이 없는 DNA (분리된 DNA)에 R A 유전자 가위를 처리하지 않고 전유전체를 시퀀싱한 경우 (위) , 크로마틴 단백질이 없는 DNA (분리된 DNA)에 RNA 유전자 가위를 처리한 후 전유전체를 시뭔싱한 경우 (중간) , 및 크로마틴 DNA (크로마틴 단백질을 포함)에 RNA 유전자 가위를 처리한 후 전유전체를 시퀀싱한 경우 (아래)에서의 DNA 절단 점수 (DNA c l eavage score)를 보여주는 그래프로서, 진한 색 막대 부분은 절단 유전체 시퀀싱 결과 측정 DNA 절단 점수 (0 . 0001 - 10)를 넘는 위치의 전체 수를 나타내고, 연한 색 막대 부분은 이 중 Cas9이 결합하는 서열 (NNG 또는 NGN)이 존재하고 20bp의 표적 위치의 서열과 10bp 이상의 서열이 일치하는 위치의 수를 나타내며; Figure 2a shows that when sequencing the whole genome without processing RA gene shears on DNA without chromatin protein (isolated DNA) (top), RNA gene shears on DNA without chromatin protein (isolated DNA) DNA cl eavage score for post genome sequencing (middle) and for sequencing the genome (bottom) after RNA genotyping to chromatin DNA (including chromatin proteins) Dark colored bars represent the total number of positions beyond the cut DNA sequencing results measured DNA cleavage score (0.01 000-10), and light colored bars represent the sequences (NNG or NGN) is present and indicates the number of positions where a sequence of 20 bp of target position coincides with a sequence of 10 bp or more;

도 2b는 크로마틴 단백질이 없는 DNA를 가지고 한. 절단 유전체 시퀀싱과 크로마틴 단백질을 포함하는 DNA를 가지고 한 절단 유전체 시퀀성에서 나온 비표적 위치의 수를 비교한 결과를 보여주는 밴다이어그램이고; Figure 2b is one with DNA without chromatin protein. A van diagram showing the result of comparing the cleavage genome sequencing with the number of non-target positions resulting from cleavage genomic sequencing with DNA containing chromatin protein;

도 2c는 크로마틴 단백질이 없는 DNA를 가지고 한 절단 유전체 시퀀싱에서 나온 비표적 위치의 腿 절단 점수와 크로마틴 단백질을 포함하는 DNA를 가지고 한 절단 유전체 시뭔싱에서 나온 비표적 위치의 절단 점수의 상관관계를 비교한 그래프이다. FIG. 2C shows the correlation between the truncation score of a nontarget location from a truncated genomic sequencing with DNA without chromatin protein and the truncation score of a nontarget location from a truncated genomic sequence with DNA containing chromatin protein. This is a graph comparing.

도 3a 내지 3c는 크로마틴을 포함하는 DNA를 이용해 진행한 절단 유전체 시퀀싱 분석 결과를 보여주는 것으로, 3a to 3c show the results of cleavage genome sequencing analysis using DNA containing chromatin,

도 3a는 세포 내에서의 돌연변이 비율과 크로마틴 단백질을 포함하는 DNA를 이용하여 절단 유전체를 시퀀싱한 경우의 DNA 절단 점수를 보여주는 그래프이고, FIG. 3A is a graph showing DNA cleavage scores when sequencing cleavage genomes using mutation rates in cells and DNA containing chromatin proteins, FIG.

도 3b는 세포 내에서의 돌연변이 비율과 크로마틴 단백질이 없는 DNA를 이용하여 절단 유전체 시퀀싱을 했을 때의 DNA 절단 점수를 보여주는 그래프이며, Figure 3b is a graph showing the DNA cleavage score when the mutant ratio in the cell and cleavage genome sequencing using DNA without chromatin protein,

도 3c는 상기 도 3a와 3b에 나타난 돌연변이 비율 ( Inde l 빈도)과 DNA 절단 점수와의 상관관계를 보여주는 그래프이다. FIG. 3C is a graph showing the correlation between the mutation rate (Inde l frequency) and DNA cleavage scores shown in FIGS. 3A and 3B.

도 4는 Di genome-seq 분석에 대한 시험관 내 DNA 절단 점수 부여 시스템을 보여준다. 4 shows an in vitro DNA cleavage scoring system for Di genome-seq analysis.

도 5a는 Hel a ce l l에서 HBB 특이적 Cas9를 사용하여 Digenome 1 .0에 의하여 확인된 비표적 위치 결과를 보여주는 벤다이어그램이다. Figure 5a shows Digenome 1.0 in Hel a ce ll using HBB specific Cas9. Venn diagram showing non-target location results identified.

도 5b는 Hela cell에서 HBB 특이적 Cas9를 사용하여 Digenotne 1.0에 의하여 확인된 비표적 위치에서의 핵 펠렛의 절단 점수와 _nati_Ve chromatin의 절단 점수의 상관관계를 보여주는 그래프이다. Figure 5b is a graph showing the relationship between the cutting points of the cutting points of the nuclei pellet in a non-target location by Digenotne 1.0 using HBB specific Cas9 from Hela cell and _nat i _V e chromatin.

도 6a는 Hela cell에서 HBB 특이적 Cas9를 사용하여 Di genome 2.0에 의하여 확인된 비표적 위치 결과를 보여주는 벤다이어그램이다. FIG. 6A is a venn diagram showing non-target location results identified by Di genome 2.0 using HBB specific Cas9 in Hela cells.

도 6b는 Hela cell에서 HBB 특이적 Cas9를 사용하여 Di genome 2.0에 의하여 확인된 핵 펠렛 및 native chromatin에서의 비표적 위치의 절단 점수 간의 상관관계를 보여주는 그래프이다. 6B is a graph showing the correlation between cleavage scores of nuclear targets identified by Di genome 2.0 and non-target positions in native chromatin using HBB specific Cas9 in Hela cells.

도 7a는 HEK293T cell에서 HBB 특이적 Cas9를 사용하여 Digenome Figure 7a shows Digenome using HBB specific Cas9 in HEK293T cells

2.0에 의하여 확인된 비표적 위치 결과를 보여주는 벤다이어그램이다. Venn diagram showing non-target location results identified by 2.0.

도 7b는 HEK293T cell에서 HBB 특이적 Cas9를 사용하여 Digenome 2.0에 의하여 확인된 비표적 위치에서의 핵 펠뻣의 절단 점수와 native chromatin의 절단 점수의 상관관계를 보여주는 그래프이다. FIG. 7B is a graph showing the correlation between the cleavage score of the nuclear pellet and the cleavage score of native chromatin at the non-target position identified by Digenome 2.0 using HBB specific Cas9 in HEK293T cells.

도 8a는 Hela 세포의 핵 펠렛과 크로마틴이 제거된 DNA (chromatin free DNA)에 대하여 HBB 특이적 Cas9를 사용하여 Digenome 1.0를 수행하여 확인된 in vitro 절단 위치 (비표적 위치)를 보여주는 밴다이어그램이다. 도 8b는 Hela cell에서 HBB 특이적 Cas9를 사용하여 Digenome 1.0에 의하여 확인된 핵 펠뻣 및 크로마틴이 제거된 DNA에서의 비표적 위치의 절단 점수 간의 상관관계를 보여주는 그래프이다.도 9a는 Hela 세포의 핵 펠렛과 크로마틴이 제거된 DNA에 대하여 HBB 특이적 Cas9를 사용하여 Digenome 2.0를 수행하여 확인된 in vitro 절단 위치 (비표적 위치)를 보여주는 밴다이어그램이다. FIG. 8A is a band diagram showing in vitro cleavage sites (non-target positions) identified by performing Digenome 1.0 using HBB specific Cas9 on nuclear pellets and chromatin-free DNA of Hela cells. . FIG. 8B is a graph showing the correlation between the nuclear Peltiff identified by Digenome 1.0 using HBB specific Cas9 in Hela cells and the cleavage scores of non-target sites in chromatin-removed DNA. FIG. 9A is a diagram of Hela cells. This is a band diagram showing in vitro cleavage sites (non-target positions) identified by performing Digenome 2.0 using HBB specific Cas9 on nuclear pellet and chromatin-depleted DNA.

도 9b는 Digenome-capture sites에서의 DNA 서열들을 사용하여 WebLogo를 통해 얻은 서열 로고 (DNA 절단 점수 > 0.1)를 나타낸다. 9B shows the sequence logo (DNA cleavage score> 0.1) obtained via WebLogo using DNA sequences at Digenome-capture sites.

도 9c는 Hela cell에서 HBB 특이적 Cas9를 사용하여 Digenome 2.0에 의하여 확인된 핵 펠렛 및 크로마틴이 제거된 DNA에서의 비표적 위치의 절단 점수 간의 상관관계를 보여주는 그래프이다. 9C is a graph showing the correlation between the nuclear pellets identified by Digenome 2.0 using HBB specific Cas9 in Hela cells and the cleavage scores of non-target positions in chromatin-removed DNA.

도 9d는 크로마틴 DNA 및 크로마린 제거 DNA에 대하여 HBB 표적 서열을 사용하여 Digenome-seq을 수행한 결과 얻어진 DNA 절단 점수와 Indel 빈도 사이의 상관관계를 보여주는 그래프이다. 9D is a graph showing the correlation between Indel frequency and DNA cleavage scores obtained by performing Digenome-seq using HBB target sequences on chromatin DNA and chromatin-removed DNA.

도 9e는 크로마틴 DNA 및 크로마틴 제거 DNA에 대하여 HBB 표적 서열을 사용하여 Digenome-seq을 수행한 결과 얻어진 R²값을 보여주는 그래프이다. 【발명을 실시하기 위한 형태】 FIG. 9E is a graph showing R ² values obtained by performing Digenome-seq on chromatin DNA and chromatin-removed DNA using HBB target sequences. [Form to carry out invention]

이하 실시예를 통하여 본 발명을 더욱 상세히 설.명하고자 한다. 이들 실시예는 오로지 본 발명을 보다 구체적으로 설명하기 위한 것으로서， 본 발명의 범위가 이들 실시예에 의해 제한되지 않는다는 것은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 있어 자명할 것이다ᅳ 실시예 1: 엔도뉴클레아제를 이용한 크로마틴 DNA 절단 Un vitro cleavage) 및 절단 DNA의 서열 분석 Through the following examples will be described in more detail the present invention. These examples are only for illustrating the present invention in more detail, it will be apparent to those of ordinary skill in the art that the scope of the present invention is not limited by these examples. Example 1: Chromatin DNA cleavage using endonucleases Un vitro cleavage and sequence analysis of cleaved DNA

유전체 DNA 절단 및 서열 분석에 사용하기 위하여, He la (ATCC, CCL- For use in genomic DNA cleavage and sequencing, He la (ATCC, CCL-

2), HEK293T (ATCC, CRL- 11268), K562 (ATCC, CCL— 243)， 및 T cell (공여자의 blood에서 추출)을 준비하였다. 상기 준비된 각각의 세포 (각각 5xl0⁵개 )에 Lysis buf fer (lxPBS, 0.4% NP-40 (CAS 9016-45-9) , 및 3mM MgCl₂)를 처리하여 세포막을 제거한 후, 500 X g에서 5분간 원심분리하여 세포질 층 (상층액)과 크로마틴 DNA 층으로 분리하였다. 2), HEK293T (ATCC, CRL-11268), K562 (ATCC, CCL-243), and T cells (extracted from donor blood) were prepared. Lysis buf fer (lxPBS, 0.4% NP-40 (CAS 9016-45-9), and 3 mM MgCl ₂ ) were treated with each of the cells (5xl0 ⁵ each) to remove the cell membranes, and then, at 500 X g. Centrifugation was performed for a minute to separate the cytoplasmic layer (supernatant) and the chromatin DNA layer.

세포질 층을 제거한 후, 300nM 의 Cas9 단백질 Streptococcus pyogenes 유래， SwissProt Accession number Q99ZW2； 서열번호 4)과 900nM 의 가이드 RNA 를 반웅 buffer (100 mM NaCl , 50 mM Tris-HCl, 10 mM MgCl₂, 100 μ g/ml BSA, and pH 7.9)에 넣고 37^°C에서 8 시간 동안 incubation 하였다. 상기 Cas9 및 가이드 RNA 처리에 의하여 DNA 절단이 일어난 반응물에 R ase A (50 ug/mL)를 처리하여 가이드 RNA 를 제거하고 DNeasy Tissue kit (Qiagen)을 이용하여 전 유전체 DNA를 정제하였다. After removal of the cellular layer, 300 nM of Cas9 protein Streptococcus pyogenes, SwissProt Accession number Q99ZW2; SEQ ID NO: 4) and 900 nM of guide RNA were added to the reaction buffer (100 mM NaCl, 50 mM Tris-HCl, 10 mM MgCl ₂ , 100 μg). / ml BSA, and pH 7.9) and incubated for 8 hours at 37 ^° C. The RNA was cleaved by Cas9 and guide RNA treatment, and then treated with R ase A (50 ug / mL) to remove the guide RNA and purified whole genome DNA using DNeasy Tissue kit (Qiagen).

상기 분리된 전 유전체 DNA를 lug의 양으로 취하여 , Covaris system (Life Technologies)을 이용하여 400-500bp 정도로 절단한 후, 절단에 의하여 생성된 DNA 돌출부를 End Repair Mix 를 이용하여 제거하였다. 서열분석을 위한 라이브러리를 만들기 위하여, 상기 얻어진 절단된 DNA 를 어댑터 (절단된 DNA 에 결합하는 DNA 조각으로， Illumina 에서 제공하는 kit 를 이용함)와 결합을 시킨 후, HiSeq X Ten Sequencer (Macrogen)를 이용하여 전유전체 시퀀싱 (Whole genome sequencing)을 수행하였다. The separated whole genomic DNA was taken in an amount of lug, cut about 400-500 bp using a Covaris system (Life Technologies), and the DNA protrusion generated by the cleavage was removed using an End Repair Mix. In order to make a library for sequencing, the obtained cut DNA is combined with an adapter (a piece of DNA that binds to the cut DNA, using a kit provided by Illumina), and then using HiSeq X Ten Sequencer (Macrogen). Whole genome sequencing was performed.

본 실시예에 기재된 과정을 도 1에 모식적으로 나타내었다. The procedure described in this example is schematically shown in FIG.

또한， 상기 사용된 가이드 NA 는 다음의 뉴클레오타이드 서열을 갖는다: 5'- (표적 서열) -(GUUUUAGAGCUA; 서열번호 1)- (뉴클레오타이드 링커) - ( UAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAMGUGGCACCGAGUCGGUGC； In addition, the guide NA used above has the following nucleotide sequence: 5'- (target sequence)-(GUUUUAGAGCUA; SEQ ID NO: 1)-(nucleotide linker)-(UAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAMGUGGCACCGAGUCGGUGC;

서열번호 3)-3' SEQ ID NO: 3) -3 '

(상기 표적 서열은 하기의 표 1 내지 표 8 에 기재된 DNA sequence at target site (네모 박스로 표시하거나 on-target 으로 표시한 서열) 또는 Target sequence (on-target 서열)에서 3' 말단의 "NGG" (N 은 임의의 뉴클레오타이드로서, A, T, G， 또는 C 의 염기를 갖는 뉴클레오타이드임)를 제외한 뉴클레오타이드 서열에서 "T"를 로 변환한 서열이며, (The target sequence is the "NGG" at the 3 'end in the DNA sequence at target site (sequenced by a square box or on-target) or a target sequence (on-target sequence) described in Tables 1 to 8 below. (N is any nucleotide, a nucleotide sequence having the base of A, T, G, or C), except for the nucleotide sequence converted to "T",

상기 뉴클레오타이드 링커는 GAAA의 뉴클레오타이드 서열을 가짐 . 이하 실시예에서, Cas9 단백질과 가이드 RNA 의 복합체를 유전자 가위라고 칭한다 . The nucleotide linker has a nucleotide sequence of GAAA. In the examples below, the complex of Cas9 protein and guide RNA is called genetic scissors.

비교를 위하여， 상기 얻어진 크로마틴 DNA 에서 DNA 를 정제하여 얻어진 크로마틴 제거 DNA (chromatin free DNA)에 대하여 상기와 동일한 방법으로 유전체 절단 및 시뭔싱을 수행하였다. 상기 크로마틴 제거 DNA는 상기 준비된 각각의 세포로부터 DNeasy tissue kit (Qiagen)를 이용하여 제조사의 사용설명서에 따라서 유전체 DNA를 정제하여 준비하였다. 실시예 2: 크로마틴 DNA와크로마틴 제거 DNA를 이용한 절단유전체 서열 분석 결과 비교 For comparison, genome cleavage and sequencing were performed on the chromatin free DNA obtained by purifying DNA from the chromatin DNA obtained above. The chromatin-removed DNA was prepared by purifying genomic DNA from each of the prepared cells using a DNeasy tissue kit (Qiagen) according to the manufacturer's instructions. Example 2 Comparison of Cleavage Sequence Analysis Using Chromatin DNA and Chromatin Removal DNA

실시예 1 에서 Hela, HE 293T, K562, 및 T cell 의 4 종의 세포에 대하여 얻어진 전 유전체 DM 서열분석 결과를 이용하여, DNA 절단 점수 체계 (DNA Cleavage Scoring system)에 따라서 비표적 위치 (off-target site)를 측정하였다 (절단 유전체 시퀀싱). Using the whole genome DM sequencing results obtained for four cells of Hela, HE 293T, K562, and T cells in Example 1, non-target positions were determined according to the DNA Cleavage Scoring system. target site) (cut genome sequencing).

DNA 절단 점수 체계 (DNA Cleavage Scoring system)는 하기의 식에 따라서 구하였으며， 보다 상세한 사항은 "Kim, 0.， Kim, S. , Kim, S. , Park, J . & Kim, J.S. Genome-wide target specificities of CRISPRᅳ Cas9 nucleases revealed by multiplex Digenome-seq. Genome research 26， 406- 415 (2016) "를 참조로 하였다: 위치에서의점수 = The DNA Cleavage Scoring System was obtained according to the following equation, for more details, see "Kim, 0., Kim, S., Kim, S., Park, J. & Kim, JS Genome-wide". target specificities of CRISPR ᅳ Cas9 nucleases revealed by multiplex Digenome-seq. Genome research 26, 406-415 (2016) ": Score at position =

F_t： i위치에서 시작하는 정방향염기서열 데이터의 F _t : Forward sequence data starting at position i

R,： i 시작하는역방향염기서열 데이터의 R ,： i Starting of reverse base sequence data

Di ： /위치에서의 시 ¾성 ο Di ： Positivity at / position ο

C όΐ Q] ol ΛΤ- τ C όΐ Q] ol ΛΤ- τ

상기 수식에서 염기서열 테이터의 수는 뉴클레오타이드 리드 수를 의미하고, 시퀀싱 깊이는 (특정 위치에서의 시뭔싱 리드수)를 의미한다. 또한， 상기 C 값은 1로 하였다. In the above formula, the number of nucleotide sequence data means the number of nucleotide reads, and the sequencing depth means (the number of sequencing reads at a specific position). In addition, the said C value was set to 1.

먼저, 세포 용해물에 아무 처리 없이 얻어진 유전체 First, the obtained dielectric without any treatment to the cell lysate

(세포질 +크로마틴)를 전 유전체 시뭔싱 한 것과 크로마틴을 포함하거나 제거된 상태의 DNA 에 유전자 가위를 처리하여 절단한 반웅 결과물을 전 유전체 시퀀싱 한 결과를 이용하여 상기 방법으로 산출한 DNA 절단 점수를 서로 비교하였다. 이 중에서 표 1 의 조건 (He l a ce l l 및 표 1 의 t arget sequence (HBB 표적 서열) 이용)에 해당하는 결과를 도 2a 내지 2c 에 나타내었다. DNA cleavage score calculated by the above method using the whole genome sequencing of (cytoplasm + chromatin), and the result of the whole genome sequencing of the resultant half-cutting result obtained by processing the scissors with DNA that contained or removed the chromatin. Were compared with each other. Among these, the results corresponding to the conditions of Table 1 (He l ce l l and the t arget sequence (HBB target sequence) of Table 1) are shown in FIGS. 2A to 2C.

우선, 특정 DNA 절단 점수 (0.0001 ― 10)를 넘는 전체 위치의 수와 특정 DNA 절단 점수를 넘는 위치 중 Cas9 단백질이 결합하는 서열 (NNG 또는 NGN ; N 은 임의의 뉴클레오타이드로서, A , T(U) , G , 또는 C 의 염기를 갖는 뉴클레오타이드임)이 존재하고 20bp 의 표적 위치의 서열과 10bp 이상의 서열이 일치하는 위치의 수를 구하여, 그 결과를 도 2a 에 나타내었다. 도 2a 는 크로마틴 제거 DNA(chromat in f ree DNA 로 표시)에서 유전자 가위를 처리하지 않고 전유전체 시퀀싱을 한 경우 (상단) , 크로마틴 제거 DNA 에서 유전자 가위를 처리한 후 전유전체 시퀀싱을 한 경우 (중간) , 및 크로마틴 DNA 에 유전자 가위를 처리한 경우 (하단)에서의 DNA 절단 점수에 따른 위치 수 (number of s i t es )를 비교하여 보여주는 그래프이다. 도 2a 에서, 절단 유전체 시퀀싱 결과, 측정 DNA 절단 점수 (0.0001 - 10)를 넘는 위치의 전체 수 (진한 색 막대)와 이 중 cas9 이 결합하는 서열 (NNG 또는 NGN)이 존재하고 20bp 의 표적 위치의 서열과 10bp 이상의 서열이 일치하는 위치의 수 (연한 색 막대)를 그래프로 나타내었다. First, the number of total positions over a specific DNA cleavage score (0.0001-10) and the sequence to which the Cas9 protein binds among the positions over a specific DNA cleavage score (NNG or NGN; N is any nucleotide, A, T (U) Nucleotide having a base of, G, or C), and the number of positions where the sequence of the target position of 20 bp and the sequence of 10 bp or more coincides with each other, and the result is shown in FIG. 2A. FIG. 2A shows a case in which the genome sequencing is performed without processing the genetic scissors in the chromatin-removed DNA (denoted as chromat in f ree DNA) (upper), and the genome sequencing is performed after the genetic shearing is performed on the chromatin-removed DNA. (Middle), and the number of sit es according to the DNA cleavage score in the (same bottom) when the genetic scissors were treated with chromatin DNA (bottom). In FIG. 2A, the result of cleavage genome sequencing shows that the total number of positions (dark color bars) above the measured DNA cleavage score (0.0001-10) and the sequence to which cas9 binds (NNG or NGN) The number of positions (light color bars) present and where the sequence of the target position of 20 bp and the sequence of 10 bp or more coincident is shown.

측정 DNA 절단 점수를 0.1 이상으로 하였을 때, Cas9 처리 없이 유전체를 전유전체 시뭔싱 한 것에서 나온 위치 증 Cas9 이 결합하는 서열 (NNG 또는 NGN)이 존재하고 20bp 의 표적 위치와 서열과 10bp 이상의 서열이 일치하는 위치를 가지고 있는 곳이 없었기 때문에 DNA 절단 점수의 cutoff value를 0.1로 설정하였다. When the measured DNA cleavage score is 0.1 or higher, there is a sequence (NNG or NGN) to which the positional Cas9 binds from the genome sequence of the genome without Cas9 treatment, and the target position of 20 bp matches the sequence of 10 bp or more The cutoff value of the DNA cleavage score was set to 0.1 because there was no place to.

이러한 기준을 통하여 크로마틴 DNA 에서 절단 유전체 시뭔싱을 한 경우 44 개의 비표적 절단 위치가 존재하였으며 이는 크로마틴 제거 DNA( chromatin free DNA)에서 절단 유전체 시퀀싱을 하였을 때 나온 119개의 비표적 위치에 모두 포함이 되었다 (도 2b 참조). 도 2b 는 밴다이어그램을 통하여 크로마틴 제거 DNA 에 대한 절단 유전체 시퀀싱과 크로마틴 DNA 에 대한 절단 유전체 시퀀싱 결과 얻어진 비표적 위치의 수를 비교하여 보여준다. 도 2b 는 크로마틴 DNA 를 사용하여 절단 유전체 시뭔싱을 수행함으로써 크로마틴을 제거 한 DNA 를 사용하는 경우와 비교하여 비표적 위치 수가 현저히 감소하여 보다 정확한 절단 유전체 시퀀심이 가능함을 보여준다. Based on these criteria, there were 44 non-target cleavage sites for cleavage genomic sequencing on chromatin DNA, which were included in all 119 non-target positions resulting from cleavage genome sequencing on chromatin free DNA. This became (see Figure 2b). 2B shows the number of non-target positions obtained as a result of cleavage genomic sequencing for chromatin-removed DNA and cleavage genomic sequencing for chromatin DNA through a van diagram. 2B shows that by performing cleavage genome sequencing using chromatin DNA, the number of non-target positions is significantly reduced as compared with the case of using chromatin-free DNA, thereby enabling more accurate cleavage genome sequencing.

또한, 크로마틴 제거 DNMchromatin free DNA)에 대한 절단 유전체 시퀀싱 결과 얻어진 비표적 위치의 DNA 절단 점수와 크로마틴 DNA 에 대한 절단 유전체 시퀀싱 결과 얻어진 비표적 위치의 절단 점수의 상관관계를 도 2c 에 나타내었다. 도 2c 에 나타난 바와 같이 , 크로마틴 DNA 와 크로마틴 제거 DNA 각각에 대하여 절단 유전체 시퀀싱을 한 경우에 얻어진 DNA 절단 점수 사이의 상관관계를 보았올 때. 상관관계가 거의 없음을 볼 수 있다. 또한, 크로마틴 DNA 를 이용한 절단 유전체 시퀀싱을 Hela 세포 이외에도， HEK293T, 562, 및 T cell과 같은 다양한 세포에 수행하여, 이들 세포에서도 적용 가능함을 입증하였다 (표 1 (표적서열들은 순서대로,. 서열번호 5 내지 48， 이 중에서 서열번호 5 가 HBB on— target sequence 이며， 나머지는 off- target sequence 임) 및 표 2 (표적서열들은， 네모 박스로 표시된 on-target sequence 를 제외하고， 순서대로 서열번호 49 내지 84); 표 1 및 표 2 에서 소문자로 표시한뉴클레오타이드가 mismatch nucleotide임).

In addition, the correlation between the cleavage score of the non-target position obtained as a result of cleavage genome sequencing for chromatin-removed DNMchromatin free DNA) and the cleavage score of the non-target position as a result of cleavage genome sequencing for chromatin DNA is shown in FIG. 2C. As shown in FIG. 2C, when the correlation between the DNA cleavage scores obtained when the cleavage genome sequencing was performed for each of the chromatin DNA and the chromatin-removed DNA, was obtained. It can be seen that there is little correlation. In addition, cleavage genomic sequencing using chromatin DNA was performed on various cells, such as HEK293T, 562, and T cells, in addition to Hela cells, demonstrating their applicability to these cells (Table 1 (target sequences in order). Nos. 5 to 48, wherein SEQ ID No. 5 is the HBB on—target sequence and the rest is off-target sequence and Table 2 (target sequences are in sequence order except for the on-target sequence indicated by square boxes). 49-84), the nucleotides represented by lowercase letters in Tables 1 and 2 are mismatch nucleotides).

[표 2] TABLE 2

HB 93T HB 93T

Chr Location DNA C!eavaoe Score DNA seauence at taraet sites Chr Location DNA C! Eavaoe Score DNA seauence at taraet sites

1 chr 11 5248215 14.63 CTTGCCCCACAGGGCAGTAACGG | chr9 104595883 8.17 tcaGCCCCACAGGGCAGTAAGGG1 chr 11 5248215 14.63 CTTGCCCCACAGGGCAGTAACGG | chr9 104595883 8.17 tcaGCCCCACAGGGCAGTAAGGG

Chir17 8370253 3.65 tTgctCC CACAGGG CAGTAAACG c r12 124803834 2.96 gcTGCCCCACAGGGCAGcAAAGG chr14 94585327 t.82 a Tg GCCC CACAa GGCAiGa AATG GChir17 8370253 3.65 tTgctCC CACAGGG CAGTAAACG c r12 124803834 2.96 gcTGCCCCACAGGGCAGcAAAGG chr14 94585327 t.82 a Tg GCCC CACAa GGCAiGa AATG G

Chir12 93549202 1.01 aTT&CCCCACgGGGCAGTgACGG chr2 121715240 0 88 gTgtCCC CACAGGG CAGg AAAG G c r12 27234755 070 gaTGCCtCACAGGa CAGgAAGGG chrX 75006257 0.46 gTgGCCC CACAGGG CAGgaATG G chr 14 36889538 0.40 gTTatCC CACAGGa CAGTgAGGG chr6 23709579 0.29 ga a GCCC tACAGGG CAGcAATC G chr7 97874171 0.28 CcctCCC CACAGGG CAG cATGG chr 10 72286450 0.23 Caa GCCC CACAGGG CAGa c AG GG chr8 24931381 0.19 agTGCC a CACAcaG CA.GTAAGGG chr9 134994964 0.12 C TGCCCCtCAGGGCAGctAAGG Chir12 93549202 1.01 aTT & CCCCACgGGGCAGTgACGG chr2 121715240 0 88 gTgtCCC CACAGGG CAGg AAAG G c r12 27234755 070 gaTGCCtCACAGGa CAGgAAGGG chrX 75006257 0.46 gTgGCCC CACAGGG CAGgaATG G chr 14 36889538 0.40 gTTatCC CACAGGa CAGTgAGGG chr6 23709579 0.29 ga a GCCC tACAGGG CAGcAATC G chr7 97874171 0.28 CcctCCC CACAGGG CAG cATGG chr 10 72286 450 0.23 Caa GCCC CACAGGG CAGa c AG GG chr8 24931381 0.19 agTGCC a CACAcaG CA.GTAAGGG chr9 134994964 0.12 C TGCCCCtCAGGGCAGctAAGG

K562 K562

Chr Location DNA C!eavaae Score DNA seauence at taraet sites Chr Location DNA C! Eavaae Score DNA seauence at taraet sites

1 chr 1 1 5248215 16,29 CTT&CC CCACAGGG C AGTAAC GG | chr9 104595883 9. 1 t-c a GCCC CAC AGGG CAGTAA.GGG chr 17 8370253 3.80 tTgctCC CACAGGG CAGTAAACG1 chr 1 1 5248215 16,29 CTT & CC CCACAGGG C AGTAAC GG | chr9 104595883 9. 1 t-c a GCCC CAC AGGG CAGTAA.GGG chr 17 8370 253 3.80 tTgctCC CACAGGG CAGTAAACG

. chr 12 124803834 2,65 gcTGCCCCAC GGGCAGcAAAGe chrX 75006257 2.62 gTgGCCC CACAGGG CAGgaATGG chr 14 94585327 1.04 aTgGCCCCACAaGGCAGaAATGG chr2 12171 5240 0.56 gTgtCCC CACAGGGCAGgAAAGG chr22 17230623 0.30 tg GCC CCACAGaGCAcTAAGGG chr 12 93549202 0,28 aTTGCCCCACgGGGCAGTgACGG chfi 17346702 0.15 ^' . gglc CCC a cagGGt C AGTAAG GG chr? 9787417 t 0.1 Ccc CCC CACAGGG CAGTcATGG chr6 23709579 0.12 ga a GCCC tACAGGG CAGcAATGG . chr 12 124803834 2,65 gcTGCCCCAC GGGCAGcAAAGe chrX 75006257 2.62 gTgGCCC CACAGGG CAGgaATGG chr 14 94585327 1.04 aTgGCCCCACAaGGCAGaAATGG chr2 12171 5240 0.56 gTgtCCC CACAGGGCAGgAAAGG chr22 17230623 0.30 tg GCC CCACAGaGCAcTAAGGG chr 12 93549202 0,28 aTTGCCCCACgGGGCAGTgACGG chfi 17346702 0.15 '. gglc CCC a cagGGt C AGTAAG GG chr? 9787417 t 0.1 Ccc CCC CACAGGG CAGTcATGG chr6 23709579 0.12 ga a GCCC tACAGGG CAGcAATGG

T cell T cell

Chr Location DMA CIeavaqe Score OIMA seciuence at taraet sites Chr Location DMA CIeavaqe Score OIMA seciuence at taraet sites

1 c r1 1 5248215 15.29 CTTGCG CCACAGGG GAGTAACGG | chr9 104595883 5.64 tea GCCC CACAGGG CAG AAG GG chr 12 124803834 3,78 gcTGCC CCACAGGGCAGCAAAG G chr17 8370253 3.27 CTgc tCC CACAGGG CAGTAAACG chr14 94585327 3.03 a ¾ GCCC CACAOGG CAGaAATG G chr12 93549202 t 94 aTTGCCCCACgGGGCAGTgACGG chr22 17230623 1.52 tg GCCCCACAGaGCAcTAAGGG chrX 75006257 1.48 gTgGCCC CACAGGGCAGgAATGG chr6 23709579 1 23 gaaGCCCtACAGGGCAGcaATGG chr2: 121715240 1 ,06 gTgtCCC CACAGGG CAGgAAAG (3 chr 19 5001 0010 0.50 aT GCCCCcCAGG CAGTAgGGG chr5 131423385 0.23 tcTGCCCCACAGGc CAGgAAGG G 실시예 3 : 크로마틴 DNA를 이용한 절단 유전체 서열 분석 7 개의 다른 유전자 가위 (상이한 표적 서열을 포함하는 가이드 RNA 를 포함함)를 이용하여 크로마틴 DNA 를 이용한 절단 유전체 시퀀싱을 진행하여 세포 (Hela cell) 내에서의 돌연변이 (indel frequency)를 검증하여 그 결과를 표 3 (표적 서열들은 순서대로 서열번호 85 내지 115), 표 4 (표적 서열들은 순서대로 서열번호 116 내지 151)， 표 5 (표적 서열들은 순서대로 서열번호 152 내지 164) 및 표 6 (표적 서열들은 순서대로 서열번호 165 내지 181)에 나타내었다 (표 3 내지 표 6 에서 소문자로 표시한 뉴클레오타이드가 mismatch nucleot ide임). 1 c r1 1 5248215 15.29 CTTGCG CCACAGGG GAGTAACGG | chr9 104595883 5.64 tea GCCC CACAGGG CAG AAG GG chr 12 124803834 3,78 gcTGCC CCACAGGGCAGCAAAG G chr17 8370253 3.27 CTgc tCC CACAGGG CAGTAAACG chr14 94585327 3.03 a ¾ GCCC CACAOGG CAGaAATG G chr12 93549202 t 94 aTTGCCCCACgGGGCAGTgACGG chr22 17230623 1.52 tg GCCCCACAGaGCAcTAAGGG chrX 75006257 1.48 gTgGCCC CACAGGGCAGgAATGG chr6 23709579 1 23 gaaGCCCtACAGGGCAGcaATGG chr2: 121715240 1,06 gTgtCCC CACAGGG CAGgAAAG (3 chr 19 5001 0010 0.50 aT GCCCCcCAGG CAGTAgGGG chr5 131423385 0.23 tcTGCCCCACAGGc CAGgAAGG G Analysis DNA Example 3 DNA Analysis: Cleavage genome sequencing using chromatin DNA using seven different gene shears (including guide RNAs containing different target sequences) confirmed the indel frequency in the Hela cell. Tables 3 (target sequences are SEQ ID NOs: 85 to 115), Table 4 (target sequences are SEQ ID NOs: 116 to 151), Table 5 (target sequences are SEQ ID NOs: 152 to 164) and Table 6 (targets The sequences are shown in sequence in SEQ ID NOs: 165 to 181) (the nucleotides shown in lowercase in Tables 3 to 6 are mismatch nucleotides).

FA.NCFFA.NCF

^

^

_//:_/ O _ZJs20_ZJ02Ml_>d₈S_ZAV₇ _// : _/ O _Z Js20 _Z J02Ml _> d ₈ S _Z AV ₇

세포 내에서, 상기 표 3 내지 6 에 나타난 절단 유전체 시퀀싱 결과 온 비표적 위치에 대한 돌연변이 비율을 측정하여, 이를 그 위치에서의 DNA 절^'단 점수와 비교해 보았다. 상기 비교 결과를 도 3a 에 나타내었다. 비교를 위하여, 크로마틴 제거 DNA 에 대하여 동일한 시험을 수행하여, 그 결과를 도 3b 에 나타내고, 도 3a 및 3b 에 나타난 상관 관계 (평균 1 에 가까울수록 상관성이 높음)를 비교하여 도 3c 에 나타내었다. 도 3a 내지 도 3c 에 나타난 바와 같이, 크로마틴 DNA 를 이용하여 절단 유전체 시뭔싱을 하였을 경우， 특정 위치에서의 DNA 절단 점수와 돌연변이 비율의 상관 관계 (평균 ^ = 0.62)가 크로마틴 제거 DNA 를 이용하여 절단 유전체 시뭔싱을 하였을 때의 상관 관계 (평균 ^ = 0.20) 보다 유의미하게 높은 것을 확인할 수 있다. 실시예 4: 크로마틴 DNA (핵펠렛 및 native chromatin DNA)의 절단 시험 In cells, the ratio of mutations to non-target positions on the truncated genome sequencing results shown in Tables 3 to 6 was determined, and the DNA at that position was measured. I compared it with the score of ^' dan. The comparison result is shown in FIG. 3A. For comparison, the same test was performed on chromatin-removed DNA, and the results are shown in FIG. 3B, and the correlations shown in FIGS. 3A and 3B (the closer the mean is, the higher the correlation) are shown in FIG. 3C. . As shown in FIGS. 3A to 3C, when cleavage genomics was performed using chromatin DNA, the correlation between the DNA cleavage score and the mutation rate at a specific position (mean ^ = 0.62) was used for the chromatin removal DNA. It can be seen that significantly higher than the correlation (mean ^ = 0.20) when the cleavage dielectric sequencing. Example 4: Cleavage test of chromatin DNA (nuclei pellets and native chromatin DNA)

앞선 실시예에서, Cas9 뉴클레아제 활성에 크로마틴이 중요한 역할을 함올 확인할 수 있었으므로， 게놈 전체의 Cas9 비표적 (Genomeᅳ wide Cas9 off- target)의 프로파일 시에 크로마틴 상태를 반영하기 위해 본래의 크로마틴 DNA(native chromatin DNA)를 사용하는 Digenome-seq (절단 유전체 시퀀싱) 방법이 이점을 가짐을 보이기 위하여 하기의 시험을 수행하였다. In the previous examples, it was confirmed that chromatin plays an important role in Cas9 nuclease activity, and therefore, it was originally intended to reflect the chromatin state in the profile of the Genome 'wide Cas9 off-target. The following tests were performed to show that the Digenome-seq (cleavage genome sequencing) method using native chromatin DNA had an advantage.

게놈 전체의 Cas9 바표적 부위를 확인하기 위하여, 크로마틴 DNA 를 사용하여 D i genome— seq 을 수행하였다. 실시예 1 및 하기의 표 7 의 표적 서열 (박스 내 서열 (HBB 표적 서열))을 참조하여, Hela 세포 또는 HEK293T 세포로부터 분리된 핵 펠렛 (nuclei pellet) 및 native chromatin 을 미리 조립된 Cas9 단백질 (300nM) 및 sgRNA (900nM)를 포함하는 리보핵산 단백질과 함께 배양하고, DNA 정제 후, whole— genome sequencing (WGS)을 수행하였다 (도 1 참조). 상기 핵 펠렛은 세포에서 세포질 제거 후 핵막이 있는 상태로, 세포에 lysis buffer (lx PBS, 0.4% NP-40, 1 mM EDTA) 처리 후 500 x g에서 5 분간 원심분리하고 세포질층 (상층액)을 제거하여 준비하고 (실시예 1 내지 실시예 3 에서 사용된 크로마틴 DNA 와동일함), native chromatin 는 핵막이 제거 된 상태로, 세포에 lysis buffer (lx PBS, 0.4% NP-40, 1 mM EDTA) 처리 후 500 x g 에서 5 분간 원심분리하고 세포질층 (상층액)을 제거한 후, Nuc- lysis solution (10 mM EDTA, 0.5 mM EGTA, 0.1% Triton X100)을 처리하여 Nucleoplasm (상층액)을 제거하여 준비하였다. To identify Cas9 target sites throughout the genome, Di genome—seq was performed using chromatin DNA. Referring to Example 1 and the target sequence of Table 7 below (sequence in box (HBB target sequence)), nuclear pellets and native chromatin isolated from Hela cells or HEK293T cells were pre-assembled Cas9 protein (300 nM). ) And incubated with ribonucleic acid protein including sgRNA (900nM), followed by whole-genome sequencing (WGS) after DNA purification (see FIG. 1). The nuclear pellet is in the state of the nuclear membrane after removing the cytoplasm from the cells, centrifuged for 5 minutes at 500 xg after lysis buffer (lx PBS, 0.4% NP-40, 1 mM EDTA) to the cells and the cytoplasm layer (supernatant) Prepared by removal (same as the chromatin DNA used in Examples 1 to 3), native chromatin was removed from the nuclear membrane, cells lysis buffer (lx PBS, 0.4% NP-40, 1 mM EDTA) After centrifugation at 500 xg for 5 minutes after treatment, the cytoplasm layer (supernatant) was removed, and then treated with a Nucleation solution (10 mM EDTA, 0.5 mM EGTA, 0.1% Triton X100) to remove Nucleoplasm (supernatant). It was.

인간 참조 게놈 (hgl9)에 대한 서열 정렬 후, integrative genomics viewer (IGV)를 이용하여, 표적 부위에서의 Cas9—매개 DNA 소화에 의해 유도된 직선 정렬이 관찰되었다. Digenome-seq 과 관련된 이전 연구 (Kim, D. , Kim, S. , Kim, S. , Park, J. & Kim, J .S. Genome-wide target specificities of CRISPRᅳ Cas9 nucleases revealed by multiplex Digenome-seq. Genome research 26, 406- 415 (2016))에서 사용된 DNA 절단 점수 (DNA cleavage score)를 통해 Cas9 비표적 위치를 확인하였다. 상기 Cas9 비표적 위치 확인은 Digenome 1.0 (DNA cleavage score 가 2.5 이상인 site를 비표적 위치 후보군으로 결정함) 및 Digenome 2.0 (DNA cleavage score 가 0.1 이상이며 10 개 이하의 미스매치를 가지고 PAM (5'-NGN-3' 또는 5'-NNG-3')을 가지고 있는 site 를 비표적 위치로 결정)에 의하여 수행하였다. After sequence alignment to the human reference genome (hgl9), a linear alignment induced by Cas9—mediated DNA digestion at the target site was observed using an integrative genomics viewer (IGV). Previous studies involving Digenome-seq (Kim, D., Kim, S., Kim, S., Park, J. & Kim, J.S. Genome-wide target specificities of CRISPR ᅳ Cas9 nucleases revealed by multiplex Digenome-seq Cas9 non-target location was confirmed by DNA cleavage score used in Genome research 26, 406-415 (2016). The Cas9 non-target location was determined by Digenome 1.0 (determining sites having a DNA cleavage score of 2.5 or more as a non-target location candidate group) and Digenome 2.0 (DNA cleavage score of 0.1 or more and having a mismatch of 10 or less PAM (5'-). Sites with NGN-3 'or 5'-NNG-3') were determined as non-target locations.

상기 얻어진 결과를 도 5a (He la cell 에서 HBB 특이적 Cas9 를 사용하여 Digenonie 1.0 에 의하여 얻어진 비표적 위치 결과를 보여주는 벤다이어그램), 5b (Hela cell 에서 HBB 특이적 Cas9 를 사용하여 Digenome 1.0 에 의하여 얻어진 비표적 위치에서의 핵 펠¾의 절단 점수와 native chromatin 의 절단 점수의 상관관계를 보여주는 그래프), 6a (Hela cell 에서 HBB 특이적 Cas9 를 사용하여 Digenome 2.0 에 의하여 얻어진 비표적 위치 결과를 보여주는 벤다이어그램), 6b (Hela cell 에서 HBB 특이적 Cas9 를 사용하여 Digenome 2.0에 의하여 얻어진 핵 펠렛 및 native chromatin에서의 비표적 위치의 절단 점수 간의 상관관계를 보여주는 그래프)， 7a (HEK293T cell 에서 HBB 특이적 Cas9 를 사용하여 Digenome 2.0 에 의하여 얻어진 비표적 위치 결과를 보여주는 벤다이어그램)， 및 7b (HEK293T cell 에서 HBB 특이적 Cas9 를 사용하여 Digenome 2.0 에 의하여 얻어진 비표적 위치에서의 핵 펠렛의 절단 점수와 native chromatin 의 절단 점수의 상관관계를 보여주는 그래프)에 각각 나타내었다. The obtained results are shown in FIG. 5A (vendiagram showing the non-target position result obtained by Digenonie 1.0 using HBB specific Cas9 in He la cell), 5b (Digenome 1.0 obtained using HBB specific Cas9 in Hela cell). A graph showing the correlation between the cleavage score of the nuclear pel¾ at the non-target position and the cleavage score of the native chromatin), 6a (vena diagram showing the non-target position result obtained by Digenome 2.0 using HBB-specific Cas9 in Hela cells). ), 6b (graph showing the correlation between cleavage scores of nuclear pellets obtained by Digenome 2.0 and non-target positions in native chromatin using HBB specific Cas9 in Hela cells), 7a (HBB specific Cas9 in HEK293T cells) Venn diagram showing non-target position results obtained by Digenome 2.0 using the method and 7b (HBB specific Cas9 in HEK293T cells) Graph showing the correlation between the cleavage score of the nuclear pellet at the non-target position obtained by Digenome 2.0 and the cleavage score of the native chromatin, respectively.

_. 도 5a 에서 확인되는 바와 같이, Hela cell 의 핵 펠¾ 및 native chromatin 에서 Digenome 1.0 (컷오프: 2.5)을 통해 HBB 특이적 Cas9 를 사용하여 각각 15 및 11 개의 in vitro 절단 위치를 관찰하였다. 이 때， native chromatin 에서 확인된 in vitro 절단 위치 (11 개) 중 10 개 (91%; 10/11)는 핵 펠렛에서 확인된 절단 위치 (15 개) 중 10 개와 중첩되었다. 또한, 도 5b 에서 확인되는 바와 같이, Hela cell 에서, native chromatin 에서 확인된 in vitro 절단 위치의 DNA 절단 점수는 핵 펠렛에서 확인된 in vitro 절단 위치의 DNA 절단 점수와 높은 상관 관계를 보였다 (R² = 0.97). 이러한 결과는 핵 펠렛 또는 native chromatin 을 사용한 Digenome-seq이 높은 재현성을 가짐을 보여준다. ^' 도 6a 및 6b 에서 확인되는 바와 같이, Hela cell 의 핵 펠렛 및 native chromatin 에서 Digenome 2.0 (컷오프: 1.0)를 통해 HBB 특이적 Cas9 를 사용하여 각각 44 개와 37 개의 in vitro 절단위치가 관찰되었으며, 이 중에서 34 개가 서로 증첩되었고, 이들 간 높은 상관 관계가 관찰되었다 (R² = 0.97). _. As confirmed in FIG. 5A, 15 and 11 in vitro cleavage sites were observed using HBB specific Cas9 through Digenome 1.0 (cutoff: 2.5) in nuclear pellets and native chromatin of Hela cells. At this time, 10 (91%; 10/11) of the in vitro cleavage sites (11%) identified in native chromatin overlap with 10 of the cleavage sites (15) identified in nuclear pellets. In addition, as shown in FIG. 5B, in Hela cells, DNA cleavage scores of in vitro cleavage sites identified in native chromatin showed a high correlation with DNA cleavage scores of in vitro cleavage sites identified in nuclear pellets (R ² = 0.97). These results show that Digenome-seq using nuclear pellets or native chromatin has high reproducibility. ^' As seen in FIGS. 6A and 6B, 44 and 37 in vitro cleavage sites were observed using HBB specific Cas9 through Digenome 2.0 (cutoff: 1.0) in nuclear pellets and native chromatin of Hela cells, among which 34 dogs were folded together and a high correlation between them was observed (R ² = 0.97).

도 7₃ 및 7b 에서 확인되는 바와 같이, HEK293T cell 의 핵 펠렛 및 native chromatin 에 대하여 Digenome 2.0 (컷오프: 1.0)를 통해 HBB 특이적 Cas9 를 사용하여 각각 12 개와 7 개의 in vitro 절단 위치가 관찰되었으며, 이 중에서 6 개가 서로 중첩되었고， 이들 간 높은 상관 관계가 관찰되었다 (R² = 0.88). As confirmed in FIGS. ₃ and 7B, 12 and 7 in vitro cleavage sites were observed using HBB specific Cas9 through Digenome 2.0 (cutoff: 1.0) for nuclear pellets and native chromatin of HEK293T cells, Six of them overlapped each other, and a high correlation was observed between them (R ² = 0.88).

이러한 결과는 핵 펠렛과 native chromatin 을 사용하는 Digenome一 seq 이 서로 상호 연관성이 높고 핵 펠렛을 사용한 Digenome-seq 이 native chromatin 을 사용하는 Digenome— seq 보다 더 많은 in vitro 절단 위치를 확인했으며, 이후 시험 (표 7)에서는 핵 펠렛을 사용하여 Digenome-seq 를 수행하였다. 크로마틴 DNA 를 사용하는 Digenome-seq 이 일반적으로 여러 세포 유형에 적용 가능한지 확인하기 위하여, 이 방법을 K562 및 primary T cell 과 같은 다양한 세포에 적용하여 그 결과를 상기 Hela ell 및 HEK293T cell 에서의 결과와 함께 아래의 표 7 에 나타내었다 (표 7 (HBB on target sequence (네모 박스): 서열번호 5; 이를 제외한 서열들: 순서대로 서열번호 6 내지 84)에서 소문자로 표시한 뉴클레오타이드가 mismatch nucleot ide임). These results indicated that nuclear pellets and Digenome 一 seqs using native chromatin were highly correlated with each other, and that Digenome-seq using nuclear pellets had more in vitro cleavage sites than Digenome— seqs using native chromatin. In Table 7, Digenome-seq was performed using nuclear pellets. In order to determine whether Digenome-seq using chromatin DNA is generally applicable to various cell types, the method was applied to various cells such as K562 and primary T cells, and the results were compared with those of Helaell and HEK293T cells. Together, they are shown in Table 7 below (Table 7 (HBB on target sequence (square box): SEQ ID NO: 5; excluding these sequences: SEQ ID NOs: 6 to 84 in sequence) is a mismatch nucleotide in lowercase nucleotides. .

[표 7] TABLE 7

_.,- 1111- 4455 _., -1111-4455

chr9 134994964 CcT GCCCCACAGGGCAa t ATGG _j chr2 43013308 CcT GCCCa g AaGGCAGTAAGGG ，¹ chr8 ^ 41296595 ca GTCCCACJ^G tCAGc^TGG__j chr22^J 35537395 a QCX:CX:¾CAGGGg¾GaAATGG | HEK293T chr9 134994964 CcT GCCCCACAGGGCAa t ATGG _j chr2 43013308 CcT GCCCa g AaGGCAGTAAGGG, ¹ chr8 ^ 41296595 ca GTCCCACJ ^ G tCAGc ^ TGG__j chr22 ^J 35537395 a QCX: CX: ¾CAGGGg¾Ga HEK293T

Chr Location DNA Cleavage Score DNA sequence at target sites Chr Location DNA Cleavage Score DNA sequence at target sites

Chr 11 5248215 14.S3 CrTGCCCCACaGGGCAGTaACGG Chr 11 5248215 14.S3 CrTGCCCCACaGGGCAGTaACGG

chr9 104595883 8. t caGCC⁽X¾CAGGGCAGTAAGGG chr9 104595883 8.t caGCC ⁽ X¾CAGGGCAGTAAGGG

chr 17 8370253 3. cTgctCCCACAGGGCAGT¾AACG chr 17 8370253 3. cTgctCCCACAGGGCAGT¾AACG

chr 12 124803834 2. gcTGCCCCACAGGGCAGcAAAGG chr 12 124803834 2.gcTGCCCCACAGGGCAGcAAAGG

chr 14 94585327 1. a T gGCCCCACAaGGCAGaAATGG chr 14 94585327 1.a T gGCCCCACAaGGCAGaAATGG

chr 12 93549202 1. aTTGCCCCACgGGGCAGT gACGG chr 12 93549202 1.aTTGCCCCACgGGGCAGT gACGG

표 7 에서와 같이, 각 세포에서 정의되는 in vi tro 절단 위치의 대부분이 서로 중첩되어 있음을 확인할 수 있다. As shown in Table 7, it can be seen that most of the in vitro cleavage sites defined in each cell overlap each other.

Hel a 세포의 핵 펠렛과 크로마틴이 제거된 DNA (chromat in free DNA)에 대하여 HBB 특이적 Cas9를 사용하여 Digenome-seq (Di genome l . O)를 수행하여 in vi t ro 절단 위치 (on target s i te를 포함한 모든 절단위치)를 측정 및 비교하여, 그 결과를 도 8a 및 8b에 나타내었다. 도 8a에 나타난 바와 같이, Digenome 1.0을 사용하여 핵 펠렛과 크로마틴 제거 DNA에 대하여 Digenome-seq를 수행하여 각각 15와 48 개의 in vitro 절단 위치를 관찰하였으며, 핵 펠렛에서 관찰된 대부분의 in vitro 절단 부위 (93%)는 크로마틴 제거 DNA 사용 digenomeᅳ seq로 관찰된 in vitro 절단 부위와 중첩되었다. Digenome-seq (Di genome l. O) using HBB-specific Cas9 was performed on nuclear pellets and chromatin-depleted DNA of Hel a cells to in situ cleavage sites (on target). all cutting positions including si te) The results are shown in Figs. 8A and 8B, measured and compared. As shown in FIG. 8A, Digenome-seq was performed on nuclear pellets and chromatin-removed DNA using Digenome 1.0 to observe 15 and 48 in vitro cleavage sites, respectively. Most of the in vitro cleavage observed in nuclear pellets was observed. The site (93%) overlapped the in vitro cleavage site observed with chromatin-removed DNA using digenome® seq.

Hela 세포의 핵 펠렛과 크로마틴이 제거된 DNA (chromatin free DNA)에 대하여 Digenome-seq (Di enome 2.0)를 수행하여 in vitro 절단 위치 (On target site를 포함한 모든 절단위치 )를 측정 및 비교하여 , 그 결과를 도 9a 내지 도 9c에 나타내었다. 도 9a 및 도 9b에 나타낸 바와 같이, Digenome 2.0을 사용하여 핵 펠렛과 크로마틴 제거 DNA에 대하여 Digenomeᅳ seq를 수행하여 각각 44 및 97 개의 in vitro 절단 위치가 관찰되었고, 핵 펠렛에서 관찰된 대부분의 in vitro 절단 위치부위. (98 %)는 크로마틴 제거 DNA 사용 digenome-seq로 관찰된 in vitro 절단 부위와 중첩되었다. 도 8b 및 도 9c에 나타난 바와 같이 , 핵 .펠렛과 크로마틴 제거 DNA에서 확인된 DNA 절단 점수는 서로 상관 관계를 거의 보이지 않았다 (Digenome 1.0의 경우 R² = 0.22, Digenome 2.0의 경우, R² = 0.19). 실시예 5: DNA 절단 점수 (DNA cleavage score)와 Indel 빈도 (Indel frequencies) 간의 상관관계 확인 Digenome-seq (Di enome 2.0) was performed on nuclear pellets and chromatin-free DNA of Hela cells to measure and compare in vitro cleavage sites (all cleavage sites including on target sites). The results are shown in Figures 9a to 9c. 9A and 9B, Digenome® seq was performed on nuclear pellets and chromatin-removed DNA using Digenome 2.0, and 44 and 97 in vitro cleavage sites were observed, respectively, and most of the nuclear pellets were observed. in vitro cleavage site. (98%) overlapped the in vitro cleavage site observed with digenome-seq using chromatin-removed DNA. As it is shown in Figure 8b and Figure 9c, the nucleus. For the pellets and the chromatin of DNA cut score identified in removing DNA was almost not correlated with each other (for ^{Digenome 1.0 R 2 = 0.22, Digenome} 2.0, R 2 = 0.19). Example 5: Confirmation of correlation between DNA cleavage score and Indel frequencies

크로마틴 DNA (실시예 1 참조) 및 크로마틴 제거 DNA (실시예 1 참조)에 대하여 Digenome-seq (상기 표 7의 HBB 표적 서열을 사용)을 수행하고， DNA 절단 점수와 Indel 빈도 사이의 상관관계를 조사하여， 그 결과를 도 9(1에 나타내었다. 크로마틴 제거 DNA를 사용한 Digenome- seq로부터 얻은 비표적 후보 위치들의 DNA 절단 점수와 indel 빈도간에는 낮은 상관관계를 보였으나 (R² = 0.10； 도 9e 참조), 크로마틴 DNA를 사용한 Digenome-seq로부터 얻은 비표적 후보 위치들의 DNA 절단 점수와 indel 빈도는 비교적 높은 상관관계를 보였다 (R² = 0.72) (도 9d 참조). Digenome-seq (using HBB target sequences from Table 7 above) was performed on chromatin DNA (see Example 1) and chromatin removal DNA (see Example 1), and the correlation between DNA cleavage score and Indel frequency The results are shown in Fig. 9 (1). There was a low correlation between DNA cleavage scores and indel frequencies of non-target candidate positions obtained from Digenome- seq using chromatin-removed DNA (R ² = 0.10; 9e), DNA cleavage scores and indel frequencies of non-target candidate positions obtained from Digenome-seq using chromatin DNA showed a relatively high correlation (R ² = 0.72) (see FIG. 9d).

다음으로, HBB 표적 서열 (표 7) 및 다른 표적 서열 (표 8 참조)에 대해 핵 펠렛을 사용하여 Digenome— seq을 수행하고, DNA 절단 점수와 Indel 빈도를 측정하고， R²값을 계산하예 DNA 절단 점수와 Indel 빈도 간의 상관관계를 측정하여, 그 결과를 표 7 (HBB 표적 결과), 표 8 (표적 서열들은 순서대로 서열번호 85 내지 181)， 및 도 9e (HBB, VEGFA1 , ΗΕΚ3, EMX1, 및 FANCF 표적 결과) 나타내었다 (표 8에서 소문자로 표시한 뉴클레오타이드가 mismatch nucleot ide임).Next, perform Digenome—seq using nuclear pellets against HBB target sequences (Table 7) and other target sequences (see Table 8), measure DNA cleavage scores and Indel frequencies, and calculate R ² values. Correlation between cleavage scores and Indel frequencies was measured and the results were determined in Table 7 (HBB target results), Table 8 (target sequences in SEQ ID NOs: 85-181), and FIG. 9E (HBB, VEGFA1, ΗΕΚ3, EMX1, and FANCF target results) (the lowercase nucleotides in Table 8 are mismatch nucleotides).

LP

/,59/,60/810∑； OAV

/, 59 /, 60 / 810∑; OAV

Claims

[Claim]

[Claim 1]

(a) cleaving the isolated genomic DNA with a target specific nuclease;

(b) performing whole genome sequencing on the cleaved DNA; And

(c) determining the cleaved position in sequence reads obtained by the sequencing;

The isolated genomic DNA is to include chromatin DNA,

A method of detecting off-target sites of target specific nucleases.

[Claim 2]

The method of claim 1, further comprising determining an off target site if the truncated position is not an on target site.

[Claim 3]

The method of claim 1, wherein the cleaved position is a position where the 5 ′ end is vertically aligned by aligning the obtained sequence data, or a position showing a double peak pattern in the 5 ′ end plot.

[Claim 4]

The method of claim 1, wherein the genomic DNA is isolated from cells expressing or not expressing target specific nucleases.

[Claim 5]

The method of claim 3, wherein the alignment is performed using BWA / GATK or ISAAC after mapping the sequence data to a reference genome.

[Claim 6]

The method of claim 3, wherein the determining of the non-target position as a position at which two or more sequence reads corresponding to the Watson strand and the Crick strand are vertically aligned, respectively, is performed. Further comprising.

[Claim 7]

4. The nucleotide sequence of claim 3, wherein at least 20% sequencing data is vertically aligned and has the same 5 'terminus at each Watson strand and Creek strand. And determining that the location where the number of data is at least 10 is a non-target location.

[Claim 8]

The method of claim 1, wherein the isolated genomic DNA is isolated from the target specific nuclease-expressing cells, and further comprising the step of confirming the non-target effect by checking the insertion and deletion at the non-target position of the DNA. That, including.

[Claim 9]

10. The method of claim 8, wherein validating the indel method would obtained by performing the analysis T7E1, ^"mutation detection analysis or targeting deep sequencing using the Cel-I enzyme (targeted deep sequencing) for the non-target position.

[Claim 10]

The method of claim 1, wherein the nontarget position has at least one nucleotide mismatch with the target position.

[Claim 11]

The method of claim 1, wherein the nontarget position has 1-6 nucleotide mismatches with the target position.

[Claim 12]

According to claim 1, wherein the step (c) is performed by calculating the cleavage score by applying the following formula to each cleaved position, the off-target site of the target specific nuclease How to detect. Score at position i

a = 1 ^LJ li-i ÷ a

+ Σ ^{^} W ^{1 "eu} XD ^{^} x sswe _{+ - 3 + fl - 2)} a = 1 i- 1 - 3 + ft ■

F _l : Number of forward base sequence data starting at position i

. _t : Number of reverse base sequence data starting at position I Di: Detective depth at position i (de ≠ h)

C ： Arbitrary constant

[Claim 13] The method according to claim 12, wherein when the constant C is 1 in the above formula, the score calculated is 2.5 or more, or the score is 0.1 or more and includes the PAM sequence with 10 or less mismatches with the target sequence. Determining the truncated position as a non-target position.

[Claim 14]

The method of claim 1, wherein the target specific nuclease combines target specific nucleases for two or more target sequences.

[Claim 15]

The method of claim 1, wherein the target specific nuclease combines target specific nucleases for 2 to 100 target sequences.

[Claim 16]

The method of claim 14, further comprising classifying the non-target position according to the edit distance to the target position (edi t di stance).

[Claim 17]

The method of claim 1, wherein the genetic scissors are meganuclease

(meganuc lease), ZFN (zinc f inger nuc lease), TALEN (t ranscr ipt ion act ivator-1 ike ef fector nuc lease), and RGEN (RNA 一 guided engineered nuc l ease) , Way.

[Claim 18]

The method of claim 17, wherein the RGEN is a Cas9 protein or a Cpf l protein.

[Blue ^ term 19]

The method of claim 18, wherein the RGEN further comprises a guide RNA that is capable of localizing to the target sequence of the target gene.

[Claim 20].

20. The method of any one of claims 1 to 19, wherein the chromatin DNA is in the form of DNA comprising one or more selected from non-DNA chromatin components consisting of histones, nonhistone proteins, and RNA.