WO1997004409A1 - Dispositif de recherche de fichiers - Google Patents
Dispositif de recherche de fichiers Download PDFInfo
- Publication number
- WO1997004409A1 WO1997004409A1 PCT/JP1996/001954 JP9601954W WO9704409A1 WO 1997004409 A1 WO1997004409 A1 WO 1997004409A1 JP 9601954 W JP9601954 W JP 9601954W WO 9704409 A1 WO9704409 A1 WO 9704409A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- character
- character string
- similar
- image data
- search
- Prior art date
Links
- 238000000034 method Methods 0.000 claims description 12
- 238000001514 detection method Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 238000007689 inspection Methods 0.000 claims description 2
- 239000000835 fiber Substances 0.000 claims 1
- 241000283690 Bos taurus Species 0.000 description 8
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000010187 selection method Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 2
- UNPLRYRWJLTVAE-UHFFFAOYSA-N Cloperastine hydrochloride Chemical compound Cl.C1=CC(Cl)=CC=C1C(C=1C=CC=CC=1)OCCN1CCCCC1 UNPLRYRWJLTVAE-UHFFFAOYSA-N 0.000 description 1
- 206010010071 Coma Diseases 0.000 description 1
- 241000087799 Koma Species 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000009941 weaving Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5846—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
Definitions
- the present invention relates to a file search device that incorporates information described in documents and the like into a computer as data of a computer, and organizes and utilizes the information by using a function of the computer.
- This conventional file search system uses an image scanner! Scan ⁇ to use the image data obtained by one scan as one image data, and prompt the user to enter a search keyword for each image data or for a predetermined number of image data groups. Is stored together with a search keyword.
- this conventional file search device when searching for a document image of a predetermined document, a user inputs a keyword to be searched, and a keyword to be searched by a search means provided in the file search device. By searching for image data having the same or partially the same search key, desired image data is searched for.
- Searching for partially identical search keywords means, for example, image data At the time of storage, the search keyword entered as "@ Inc.” could be searched for even with the same "@@" keyword.
- a file search device incorporating a character recognition device has been proposed for the purpose of compressing the amount of data to be stored.
- the text search device When retrieving the information described in «, etc., the text search device incorporating this character recognition device automatically discriminates the text portion of the document from the photo, ⁇ , etc., and the text recognition device uses the character recognition device for the text portion. Converts text, photos, pictures, etc. into image data and converts text into text data, with a small storage capacity! It was designed to store information such as ⁇ . Also in this file detection, when storing image data and character data, a search keypad for searching them is input.
- the conventional file search device can respond to a request for searching using unexpected keywords. Did not. For this reason, it is difficult to flexibly utilize the information already stored.
- the conventional file search device that converts and stores partial character data described above had to confirm or correct the result of conversion by the character recognition device by the user when importing information. . It took time and effort to store the information to confirm and correct this character recognition. Also, if information was stored with incorrect character recognition, there was a risk that the original information would be lost.
- an object of the present invention is to provide a file search device that can easily store information described in documents and the like, and is easy and reliable to search.
- a file search device comprises:
- Image input means for inputting characters, symbols, and graphics as image data
- storage means for storing the image data read by the image input means as image data
- Range designating means for designating a predetermined portion of the image data displayed by the display means as a range
- Character recognition means for converting an array of pixels within the range specified by the range specification means into character data
- the similar sentence ⁇ assigns a rank to character strings similar to a predetermined character string according to the similarity probability.
- the search means sets the character string input by the user as the first doctor, sets the similar character string ⁇ by the similar character string ⁇ unit as ⁇ M with a rank, and puts the sentence in the rank m. It is characterized by searching ⁇ .
- the similar character string ⁇ refers to a corresponding file of a predetermined character and a character similar thereto and performs similar character string ⁇ S.
- the similar character string part is characterized by searching for a character having a shape similar to the input character and searching for a similar character string according to a similarity rule.
- the similar character string section searches for a character similar to the input character and converts the similar character string into ⁇ according to a character transformation rule based on printing and reading. Also, the file search device of the present invention
- the image in the range specified by the range specifying means is converted into a character string by the character recognition means, and the converted character string is set as a search target and It is characterized in that a search means is provided for searching the character string to be searched from the character data whose range is specified and converted by the character recognition means.
- the range specifying means specifies the same range of all image data by specifying the range of one image data.
- FIG. 1 is a block diagram showing an example of the configuration of a file search device according to the present invention and the flow of its processing.
- FIG. 2 is a diagram showing an example of a screen by the display means of the file search device of the present invention.
- FIG. 3 is an explanatory diagram showing a search process by the search means of the file search device of the present invention.
- FIG. 1 shows a configuration of a file search device according to an embodiment of the present invention and a flow of processing thereof.
- the file detection device of this difficult form includes an image input unit 1, a storage unit 2, a display unit 3, a range designation unit 4, a character recognition unit 5, a search unit 6, and an editing unit 7. Have.
- the image input means 1 is composed of various components. If the information described in the documents or the like can be input as image data, the image scanner 1a, a cable connected to another convenience store network 1b, etc. However, the distance may be different from that of a multifunction machine (not shown) of a fax and a copier.
- the display means 3 of the present H form is separate from the display device 8 such as a monitor and the like.
- the display means 3 is a control means for transmitting image data to the display device 8 and performing display control, and is described below.
- the means may include a display device.
- the range specifying means 4, the searching means 6, and the editing means 7 are independent of the input device 9 such as a board and a mouse, and will be described below as control means for performing range designation, search, and editing.
- each may include input means such as a keyboard.
- the processing flow of the file search device having the above configuration will be described below.
- all information described in documents and the like is captured as image data.
- a document or the like is placed on a scanning surface of the image scanner 1a or the like, and all the characters, figures, photos, etc., written on the document or the like are read by optical reading of the image scanner 1a and the image data ( It is stored in the image data file 10 of the storage means 2 as a data record of the pixel array.
- image data obtained by one scanning of the image scanner 1a is stored as one image data.
- information which has already been converted into image data may be input to the image data file 10 via the cable 1b.
- the display means 3 extracts the image data from the image data file 10 and displays it on the display device 8.
- one image data is displayed as one page, and is displayed in a file format with headings according to a predetermined classification.
- a desired image data portion can be quickly opened by clicking a heading with a mouse or the like.
- the display means 3 includes “high-speed page turning”, “enlargement / reduction / rotation”. Provide functions such as 'browsing', 'marking' and 'comment'.
- a portion to be searched in the image data is specified by the range specifying means 4.
- the user designates a frame 11 of a search range as shown in FIG. 2 on the image data by using an input means such as a mouse while watching the display device 8. This is because in a standard form such as a form, for example, the title is described at a predetermined position, and if the title includes a key word to be searched, only that part is included in the search range 1 If it is surrounded by 1, efficient searches can be performed with a small number of searches.
- the entire image data can be surrounded by the search range frame 11 by the range specifying means 4 so that all of the image data can be obtained. Can be searched for.
- the range specifying means 4 can specify the same range of all the image data by specifying the range of one image data.
- the title portion of all forms can be searched by enclosing the title part of one form with the frame 11 of the search range. This function is particularly effective when searching an image data file 10 that stores only image data of a standard document.
- the specified search range is stored in the range specification file 12 of the storage means 2.
- the array of pixels in the portion specified by the range specifying means 4 is converted into character data by the character recognizing means 5.
- the character recognizing means 5 extracts the image data from the image data file 10 with reference to the range specification file 12 and The array of pixels within the specified search range is converted into character data while referring to the dictionary file 13.
- the converted character data is stored in a character data file 14. These converted character data form a set of character strings to be searched.
- the search means 6 prompts the user to input a character string to be searched. From the set of character strings in the character data file 14 described above, the input character string and characters similar to the input character string are input. Search for a column.
- FIG. 3 shows the flow of the search by the search means 6.
- the search means 6 of the present embodiment is characterized in that not only an input character string but also a character string similar to the input character string is searched. This will be described below with reference to a specific example.
- the search means 6 of the present device has a similar character string ⁇ section 15 for generating a character string similar to the input character string. For example, if the character “middle” is input, the similar character string portion 15 selects “cow”, “noon”, “instep”, etc., which are similar to the character, and generates the similar character string. And
- the first similar character selection method is a correspondence file of a predetermined character and similar characters. Is prepared in advance, and similar characters are selected with reference to the corresponding file. For example, for “medium”, “noon”, “cow”, “ka”, etc. are stored in the corresponding file as characters that are misidentified in advance, and when the character “medium” is input, "Noon", "cow"
- a character having a shape similar to the input character is selected using a character shape rule that determines the character based on the character outline, line density, and the like. For example, when the character “Middle” is input, “Noon”, “Cow”, “Instep”, etc., whose shape is similar to this, are selected according to similar rules. If the rules for reading the sentence can also be shared with the character recognition means 5, they are shared.
- the third similar character selection method is to select characters similar to the input character according to a character deformation rule that prepares a large number of examples of characters ⁇ B by printing and reading. For example, the number “1” is sometimes misunderstood as the English letter “i” or “1” or the symbol "(J, etc. In this case, select “i”, ⁇ (el), and “CJ as similar characters.
- the search means 6 of the present apparatus uses a similar “interim settlement” as well as “interim settlement”.
- "Interim settlement” is also strings to be searched.
- These character strings to be searched are matched with the character strings in the character data file 14 one by one, and the same character strings are searched.
- the search means 6 preferably has the possibility of erroneous recognition of a predetermined character as a probability value, and ranks similar character strings.
- the search string that matches the search target string as it is is given the highest priority, and then the search is performed from similar strings that are likely to be misidentified, and the possibility of misidentification when displaying the results is also high. Display something about As shown in Fig. 3, the search result shows that Displays nl, n 2, n 3... and highlights the corresponding character string.
- character recognition is performed only on the portion of the image data that includes the search key code, and the character string input for the search is determined without considering whether the result of the character recognition is correct.
- image data containing the corresponding character string is detected.
- the search means 6 is similar to the entire character string to be searched. Although ⁇ is searched for, the present invention is not limited to this, and the search means 6 may search for a part of a character string input for search or a similar character string.
- the present file search device it is possible to perform a search using a method different from the method of inputting a search target character string by the user as described above.
- This search method focuses on a predetermined character string in predetermined image data, and has a character string identical to this character string. This is a method of searching for image data.
- the different search methods are described below.
- This search is exactly the same as the above-described search for inputting a character string up to forming a set of character strings 14 to be searched.
- the character string to be searched is converted into a character string by the range specifying means 4 and the character recognizing means 5.
- the character string recognized by the character recognizing means 5 is used as it is as a search target character string, even if the character string is incorrectly recognized.
- a predetermined pixel array in the image data is converted into character data by the character recognizing means 5, and by using this, the editing can be performed, such as copying into a word processing sentence. .
- the editing means 7 of the present apparatus designates a predetermined range by the range designation means 4 while referring to the image data displayed by the display means 3, and this is designated by the character recognition means 5. To convert it to characters.
- This character data is stored in the editing data file 17 and can be used for editing text such as a word processor.
- a predetermined range of the image data can be cut out by the range specifying means 4 and stored as it is in the edited data file 17 so as to be incorporated into a text such as a word processor. This makes it possible to utilize the information stored in each image stored as image data, and to create new ⁇ information from conventional information as needed.
- the file retrieval device can store various document information as it is in the form of image data by means of the image input means. And no need to perform character recognition. Therefore, the information of the document can be stored first, and the information of the document can be stored quickly.
- a range to be searched is specified by the range specifying means, and character recognition is performed by the character recognizing means on an array of pixels within the range.
- the result of character recognition is a character string to be searched without checking or correcting its correctness.
- a character string input for search by the search means and a character string similar to the character string are set as search target character strings, and a character string corresponding to these is detected from the searched character string.
- a predetermined portion of the image data is cut out as necessary, and the portion is converted not only as image data but also into character data by character recognition means and edited by character editing means. Can be easily utilized. ⁇ Possibility of industrial use
- the file search device of the present invention can be applied as a database device for image data.
Landscapes
- Engineering & Computer Science (AREA)
- Library & Information Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Processing Or Creating Images (AREA)
- Character Discrimination (AREA)
Abstract
L'invention a pour objet un dispositif d'extraction de fichiers. Ce dispositif est pourvu d'un moyen (1) d'entrée d'images destiné à entrer des caractères, des symboles et des graphiques sous forme de données d'images. Il comporte également un moyen (2) de mémorisation qui mémorise les données d'images entrées avec le moyen (1) d'entrée d'images, un moyen d'affichage (3) qui affiche les données d'images sous forme d'un fichier dans lequel des données d'images sont visualisées sur une page, un moyen (4) de désignation de zone destiné à désigner une partie spécifiée des données d'images affichées sur le moyen d'affichage (3) sous forme d'une zone. Il comprend aussi un moyen (5) de reconnaissance de caractères qui convertit la disposition des éléments d'images dans la zone spécifiée par le moyen (4) de désignation de zone en données de caractères, et un moyen d'édition (7) qui édite les données de caractères converties à l'aide du moyen (5) de reconnaissance de caractères sous forme de données de caractères.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP7/181850 | 1995-07-18 | ||
JP7181850A JPH0934903A (ja) | 1995-07-18 | 1995-07-18 | ファイル検索装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1997004409A1 true WO1997004409A1 (fr) | 1997-02-06 |
Family
ID=16107922
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP1996/001954 WO1997004409A1 (fr) | 1995-07-18 | 1996-07-12 | Dispositif de recherche de fichiers |
Country Status (3)
Country | Link |
---|---|
JP (1) | JPH0934903A (fr) |
CN (1) | CN1165571A (fr) |
WO (1) | WO1997004409A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
USRE42413E1 (en) | 2000-04-27 | 2011-05-31 | Bayard Chimney Rock Llc | Web search engine with graphic snapshots |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11191112A (ja) * | 1997-12-25 | 1999-07-13 | Ebara Corp | テンプレートによる文字抽出方法 |
JP2001034627A (ja) * | 1999-07-19 | 2001-02-09 | Hitachi Ltd | レセプト点検方法およびシステム並びにレセプト点検プログラムを格納した記憶媒体 |
JP3669626B2 (ja) * | 2000-06-06 | 2005-07-13 | 松下電器産業株式会社 | 検索装置、記録媒体およびプログラム |
US6944344B2 (en) | 2000-06-06 | 2005-09-13 | Matsushita Electric Industrial Co., Ltd. | Document search and retrieval apparatus, recording medium and program |
CN100370459C (zh) * | 2005-12-08 | 2008-02-20 | 华为技术有限公司 | 一种减少分页数据检索时间的方法及装置 |
KR20150006740A (ko) * | 2013-07-09 | 2015-01-19 | 류중하 | 문자에 대한 기호 이미지 구성 방법, 및 기호 이미지에 대한 대응되는 문자의 분석 방법 |
JP2014026660A (ja) * | 2013-09-12 | 2014-02-06 | Toppan Printing Co Ltd | データ生成装置およびデータ生成方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS63280374A (ja) * | 1987-05-13 | 1988-11-17 | Hitachi Ltd | 情報の検索・表示方法 |
JPH0512345A (ja) * | 1991-06-28 | 1993-01-22 | Toshiba Corp | 画像記憶装置 |
JPH06162098A (ja) * | 1992-11-24 | 1994-06-10 | Fujitsu Ltd | 類義語生成処理方法 |
JPH07121547A (ja) * | 1993-10-21 | 1995-05-12 | Matsushita Electric Ind Co Ltd | 情報検索装置 |
JPH07152774A (ja) * | 1993-11-30 | 1995-06-16 | Hitachi Ltd | 文書検索方法および装置 |
-
1995
- 1995-07-18 JP JP7181850A patent/JPH0934903A/ja active Pending
-
1996
- 1996-07-12 WO PCT/JP1996/001954 patent/WO1997004409A1/fr active Application Filing
- 1996-07-12 CN CN96190752A patent/CN1165571A/zh active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS63280374A (ja) * | 1987-05-13 | 1988-11-17 | Hitachi Ltd | 情報の検索・表示方法 |
JPH0512345A (ja) * | 1991-06-28 | 1993-01-22 | Toshiba Corp | 画像記憶装置 |
JPH06162098A (ja) * | 1992-11-24 | 1994-06-10 | Fujitsu Ltd | 類義語生成処理方法 |
JPH07121547A (ja) * | 1993-10-21 | 1995-05-12 | Matsushita Electric Ind Co Ltd | 情報検索装置 |
JPH07152774A (ja) * | 1993-11-30 | 1995-06-16 | Hitachi Ltd | 文書検索方法および装置 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
USRE42413E1 (en) | 2000-04-27 | 2011-05-31 | Bayard Chimney Rock Llc | Web search engine with graphic snapshots |
USRE46967E1 (en) | 2000-04-27 | 2018-07-24 | Mineral Lassen Llc | System, apparatus, method, and computer program product for indexing a file |
Also Published As
Publication number | Publication date |
---|---|
JPH0934903A (ja) | 1997-02-07 |
CN1165571A (zh) | 1997-11-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP3298676B2 (ja) | 知覚可能情報セグメントのアクセス方法 | |
US6353840B2 (en) | User-defined search template for extracting information from documents | |
JP4118349B2 (ja) | 文書選択等の方法及び文書サーバ | |
US5781914A (en) | Converting documents, with links to other electronic information, between hardcopy and electronic formats | |
US5963966A (en) | Automated capture of technical documents for electronic review and distribution | |
US7081975B2 (en) | Information input device | |
JP4260790B2 (ja) | ファイリング・検索装置およびファイリング・検索方法 | |
US20060206305A1 (en) | Translation system, translation method, and program | |
CN101178725A (zh) | 用于信息检索的设备、方法和计算机程序产品 | |
JP2007299422A (ja) | 情報処理装置および文書の探索方法 | |
US9881001B2 (en) | Image processing device, image processing method and non-transitory computer readable recording medium | |
Baird | Difficult and urgent open problems in document image analysis for libraries | |
US20060217958A1 (en) | Electronic device and recording medium | |
US9672438B2 (en) | Text parsing in complex graphical images | |
WO1997004409A1 (fr) | Dispositif de recherche de fichiers | |
JP2023007268A (ja) | 特許用文章生成装置、特許用文章生成方法、および特許用文章生成プログラム | |
JP7651962B2 (ja) | 情報処理装置、情報処理システム、情報処理方法、及びプログラム | |
JP2024003769A (ja) | 文字認識システム、コンピュータによる文字の認識方法、および文字検索システム | |
JPH08180068A (ja) | 電子ファイリング装置 | |
JP3979288B2 (ja) | 文書検索装置および文書検索プログラム | |
JPH1021043A (ja) | アイコン生成方法、ドキュメント検索方法及びドキュメント・サーバー | |
Alzuru et al. | Quality-Aware Human-Machine Text Extraction for Biocollections using Ensembles of OCRs | |
US20050256868A1 (en) | Document search system | |
JP2007011683A (ja) | 文書管理支援装置 | |
JP2001022773A (ja) | イメージ文書のキーワード抽出方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 96190752.5 Country of ref document: CN |
|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CN GB SG |