WO2009063329A1 - A color-based computerized method for automatic document indexing - Google Patents
A color-based computerized method for automatic document indexing Download PDFInfo
- Publication number
- WO2009063329A1 WO2009063329A1 PCT/IB2008/003857 IB2008003857W WO2009063329A1 WO 2009063329 A1 WO2009063329 A1 WO 2009063329A1 IB 2008003857 W IB2008003857 W IB 2008003857W WO 2009063329 A1 WO2009063329 A1 WO 2009063329A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- document
- tag
- label
- text
- applying
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 239000003086 colorant Substances 0.000 claims description 23
- 238000012015 optical character recognition Methods 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 4
- 239000003550 marker Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/1444—Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
- G06V30/1448—Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields based on markings or identifiers characterising the document or the area
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
- G06V30/18105—Extraction of features or characteristics of the image related to colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- the application relates to a color-based method for automatic electronic document indexing.
- the present application provides a method to categorize and tag documents and data using a combination of color and text.
- the method includes associating one or more colors with one or more indexing criteria to create a tag (e.g. 'Customer name' or 'date of document' etc), applying the one or more colors to text on a document, identifying the colored portions of the document, detecting text on the colored portions, comparing the detected text to a label (e.g. a specific name of customer) associated with the tag, if the detected text matches a label associated with the tag, applying the tag and the label to the document, and filing the tagged and labeled document in a database or a specified folder.
- a label e.g. a specific name of customer
- a computer readable medium is also described.
- the medium has instructions stored thereon executable by a process for carrying out the method including associating one or more colors with one or more indexing criteria to create a tag, applying the one or more colors to text on a document, identifying the colored portions of the document, detecting text on the colored portions, comparing the detected text to a label associated with the tag, if the detected text matches a label associated with the tag, applying the tag and the label to the document, and filing the tagged and labeled document in a database or a specified folder.
- Fig. 1 shows a flow chart of the method of the present application
- Fig. 2 shows another flow chart of the method of the present application.
- a software-based method for computerized indexing is provided for the automatic indexing of electronic documents.
- the indexing method includes detecting colors and text in a document, tagging the document based on the colors, and filing the document into databases based on the detected text associated with the tags applied.
- Text/color combinations may be obtained in different ways, including, but not limited to: the text's background being color marked/highlighted manually, the text letters being specific colors or combination of colors, the text being defined by colored circles, lines or patterns, and possibly including a combination of colors (e.g. a blue line between green lines), or any combination of the above.
- the color may be applied by marking text on a document, either by using a colored marker or other dedicated tool on the document, or on an electronic document by using the highlighting feature or other dedicated marking features. [00011] In one embodiment, only the color may be used to define specific texts for tagging. In such a case, the color doesn't have a meaning other than for creating general tags.
- the colors may have a meaning according to a predetermined code.
- the color red may represent the 'customer name' title in a database. Any text having red background or other red marking will be categorized under 'customer name' in the tag database. The color yellow in the text background or other yellow marking may represent the document 'date' etc.
- a user creates a color/indexing criterion category or relationship. For example, when the criterion is a customer name, the user can mark the customer name in the text of the document with the associated color, such as red. This color/indexing criterion pair is then created as a tag and saved in either the document under the file properties, or in a separate database.
- tags can also be created and saved, such as yellow being associated with the date of the document or blue being associated with a case number.
- Tags may also include labels associated with the tags.
- labels associated with the customer name/red tag may include names of individual customers, such as Proctor &Gamble, Johnson & Johnson, and Kimberly Clark.
- Labels associated with the date of the document may be specific dates, such as 11/11/09.
- one tag may represent customer names plus the color red.
- the labels associated with this tag may include the specific names of customers (such as Proctor &Gamble, Johnson & Johnson, and Kimberly Clark). Another tag may represent the date plus the color yellow. Labels associated with this tag may include the specific dates (such as 11/11/09).
- a document when a document is received, either by email or in paper form and scanned, it is marked according to the color/indexing criterion tag. For example, each mention of the customer name on the document may be highlighted in red.
- the color marking may be performed with a marker or highlighter or other dedicated tool, on a paper document, or with the highlighting tool or other dedicated feature on an electronic document.
- OCR Optical Character Recognition
- any known method may be performed to detect text. When text is detected, that text is compared to a list of labels associated with the tag.
- the tag and label are applied to the document, and the document is filed in a database or a specified folder. If the detected text is similar to a label associated with the tag but does not match a label exactly, such as in the case of a typo, two options may apply. In one option, the system applies document analysis methods such as text recognition, semantical analysis, logo recognition, graphic recognition or others, in order to compare the document in question with existing documents associated with a similar label and tag. If the comparison results in a strong match the system automatically applies the similar label and associated tag to the document in question. In another option, a user is prompted to apply a similar existing label associated with the tag. If the user decides to use this label, the label is applied.
- document analysis methods such as text recognition, semantical analysis, logo recognition, graphic recognition or others
- the colors code may reflect hierarchies specified by the user.
- Such hierarchies determine the importance of the specific text and are created by marking more then one text in different colors in a single document. For instance, the customer name is associated with the color red, date is associated with the color yellow, and case number is associated with the color blue. Each color is given a predetermined location within the hierarchy, and the document can be filed as follows: customer/case number/date. Alternatively, the document can be filed as: date/customer/case number, etc. The filing of a document in accordance with more then one criterion may be executed simultaneously according to different filing colors. [00020] While certain features and embodiments of the present application have been described in detail herein, it is to be understood that the application encompasses all modifications and enhancements.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Document Processing Apparatus (AREA)
- Processing Or Creating Images (AREA)
Abstract
A color-based computerized method for automatic electronic document indexing is disclosed. The method includes associating a color with at least one indexing criterion to create a tag, applying the color to text on a document, identifying the colored portions of the document, detecting text on the colored portions, comparing the detected text to a label associated with the tag, if the detected text matches a label associated with the tag, applying the tag and the label to the document, and filing the tagged and labeled document in a database or a specified folder. If the detected text does not match a label associated with the tag, optional procedures are applied.
Description
A COLOR-BASED COMPUTERIZED METHOD FOR AUTOMATIC
DOCUMENT INDEXING
Related applications
[0001] This application claims the benefit of priority to Provisional Application
Serial No. 60/996,452, which was filed on November 16, 2007, which is incorporated by reference in its entirety.
Field
[0002] The application relates to a color-based method for automatic electronic document indexing.
Background
[0003] Data is transmitted via two main routes: hard copy documents (books, mail, fax, news papers, etc.) and electronic documents (internet, cd, email). Data indexing, categorizing, tagging, and filing are burdensome and time consuming tasks. [0004] Nowadays, it is more common to scan documents and file them on a computer. Although scanning of hard copy documents allows for a later stage of manual electronic indexing and tagging, there is a need for a quick, simple and versatile automatic electronic method for data indexing and tagging for both electronic and scanned documents.
Summary
[0005] The present application provides a method to categorize and tag documents and data using a combination of color and text. The method includes associating one or more colors with one or more indexing criteria to create a tag (e.g. 'Customer name' or 'date of document' etc), applying the one or more colors to text on a document, identifying the colored portions of the document, detecting text on the colored portions, comparing the detected text to a label (e.g. a specific name of customer) associated with the tag, if the detected text matches a label associated with the tag, applying the tag and the label to the document, and filing the tagged and labeled document in a database or a specified folder.
[0006] A computer readable medium is also described. The medium has instructions stored thereon executable by a process for carrying out the method including associating one or more colors with one or more indexing criteria to create a tag, applying the one or more colors to text on a document, identifying the colored portions of the document, detecting text on the colored portions, comparing the detected text to a label associated with the tag, if the detected text matches a label associated with the tag, applying the tag and the label to the document, and filing the tagged and labeled document in a database or a specified folder.
Brief Description of the Drawings
[0007] Exemplary embodiments of the invention are described herein with reference to the drawings, in which:
Fig. 1 shows a flow chart of the method of the present application; and
Fig. 2 shows another flow chart of the method of the present application.
Detailed Description
[0008] A software-based method for computerized indexing is provided for the automatic indexing of electronic documents. The indexing method includes detecting colors and text in a document, tagging the document based on the colors, and filing the document into databases based on the detected text associated with the tags applied. [0009] Text/color combinations may be obtained in different ways, including, but not limited to: the text's background being color marked/highlighted manually, the text letters being specific colors or combination of colors, the text being defined by colored circles, lines or patterns, and possibly including a combination of colors (e.g. a blue line between green lines), or any combination of the above.
[00010] The color may be applied by marking text on a document, either by using a colored marker or other dedicated tool on the document, or on an electronic document by using the highlighting feature or other dedicated marking features. [00011] In one embodiment, only the color may be used to define specific texts for tagging. In such a case, the color doesn't have a meaning other than for creating general tags.
[00012] In another embodiment, the colors may have a meaning according to a predetermined code. For example, the color red may represent the 'customer name' title in a database. Any text having red background or other red marking will be categorized under 'customer name' in the tag database. The color yellow in the text background or other yellow marking may represent the document 'date' etc.
[00013] Referring to Figure 1, a user creates a color/indexing criterion category or relationship. For example, when the criterion is a customer name, the user can mark the customer name in the text of the document with the associated color, such as red. This color/indexing criterion pair is then created as a tag and saved in either the document under the file properties, or in a separate database. Additional color/indexing criterion tags can also be created and saved, such as yellow being associated with the date of the document or blue being associated with a case number. Tags may also include labels associated with the tags. For example, labels associated with the customer name/red tag may include names of individual customers, such as Proctor &Gamble, Johnson & Johnson, and Kimberly Clark. Labels associated with the date of the document may be specific dates, such as 11/11/09.
[00014] For example, one tag may represent customer names plus the color red.
The labels associated with this tag may include the specific names of customers (such as Proctor &Gamble, Johnson & Johnson, and Kimberly Clark). Another tag may represent the date plus the color yellow. Labels associated with this tag may include the specific dates (such as 11/11/09).
[00015] Referring to Figure 2, when a document is received, either by email or in paper form and scanned, it is marked according to the color/indexing criterion tag. For example, each mention of the customer name on the document may be highlighted in red. The color marking may be performed with a marker or highlighter or other dedicated tool, on a paper document, or with the highlighting tool or other dedicated feature on an electronic document.
[00016] The colored portions of the document are then identified, and a method is used to detect the text in the colored portions. For example, Optical Character Recognition (OCR) may be performed on the colored portions to detect text. Alternatively, any known method may be performed to detect text. When text is detected, that text is compared to a list of labels associated with the tag. If the detected text matches a label associated with that tag, then the tag and label are applied to the document, and the document is filed in a database or a specified folder. If the detected text is similar to a label associated with the tag but does not match a label exactly, such as in the case of a typo, two options may apply. In one option, the system applies document analysis methods such as text recognition, semantical analysis, logo recognition, graphic recognition or others, in order to compare the document in question with existing documents associated with a similar label and tag. If the comparison results in a strong match the system automatically applies the similar label and associated tag to the document in question. In another option, a user is prompted to apply a similar existing label associated with the tag. If the user decides to use this label, the label is applied. [00017] If the user does not want to apply the similar label, a new label is created and associated with the tag, and the new label is applied to the document. The document is then filed in the database or specified folder. In this example, the document would be filed within the customer name folder, under the specific customer name. [00018] If the detected text does not match a label associated with the tag and is not similar, such as in the case where the name does not exist in the customers list, a new label is created and associated with the tag, and the new label is applied to the document. The document is then filed in the database or specified folder.
[00019] In yet another embodiment, the colors code may reflect hierarchies specified by the user. Such hierarchies determine the importance of the specific text and are created by marking more then one text in different colors in a single document. For instance, the customer name is associated with the color red, date is associated with the color yellow, and case number is associated with the color blue. Each color is given a predetermined location within the hierarchy, and the document can be filed as follows: customer/case number/date. Alternatively, the document can be filed as: date/customer/case number, etc. The filing of a document in accordance with more then one criterion may be executed simultaneously according to different filing colors. [00020] While certain features and embodiments of the present application have been described in detail herein, it is to be understood that the application encompasses all modifications and enhancements.
Claims
1. A method for color-based electronic indexing comprising: associating one or more colors with one or more indexing criteria to create tags; applying the one or more colors to text on a document; identifying the colored portions of the document; detecting text on the colored portions; comparing the detected text to a label associated with the tag; if the detected text matches a label associated with the tag, applying the tag and the label to the document; and filing the tagged and labeled document in a database or a specified folder.
2. The method of claim 1 further comprising: if the detected text is similar to a label associated with the tag but does not match the label exactly, applying document analysis methods to compare the document with existing documents associated with a similar label and tag; if the comparison results in a match:
(i) automatically applying the similar label and associated tag to the document; or
(ii) prompting a user to use the similar label or to create a new label associated with the tag.
3. The method of claim 2 further comprising: if the user chooses to apply the similar label, applying the tag and the similar label to the document; and if the user chooses not to apply the similar label, creating a new label associated with the tag and applying the tag and new label to the document.
4. The method of any one of claims 1 to 3 further comprising: if the detected text does not match a label associated with the tag and is not similar, creating a new label associated with the tag and applying the new label to the document.
5. The method of claims 2, 3 or 4 further comprising saving the new label in a separate database.
6. The method of any one of claims 1 to 5 wherein the document is scanned into a computer.
7. The method of any one of claims 1 to 6 wherein the colored portions are created by at least one of the following: using a highlighting or marking tool; using text letters of specific colors or combination of colors; highlighting text by colored shapes, lines or patterns; or a combination of one or more of the above.
8. The method of any one of claims 1 to 7 further comprising: associating additional separate colors with two or more indexing criteria to create additional tags; assigning a priority to each color to create a hierarchy for indexing; filing the document according to the hierarchy in a database or a specified folder.
9. The method of any one of claims 1 to 8 wherein detecting text on the colored portions includes performing optical character recognition (OCR) on the colored portions.
10. A computer readable medium having instructions stored thereon executable by a process for carrying out the method comprising: associating one or more colors with one or more indexing criterion to create tags; applying the one or more colors to text on a document; identifying the colored portions of the document; detecting text on the colored portions; comparing the detected text to a label associated with the tag; if the detected text matches a label associated with the tag, applying the tag and the label to the document; and filing the tagged and labeled document in a database or a specified folder.
11. The computer readable medium of claim 10 further comprising: if the detected text is similar to a label associated with the tag but does not match the label exactly, applying document analysis methods to compare the document with existing documents associated with a similar label and tag; if the comparison results in a match:
(i) automatically applying the similar label and associated tag to the document; or
(ii) prompting a user to use the similar label or to create a new label associated with the tag.
12. The computer readable medium of claim 11 further comprising: if the user chooses to apply the similar label, applying the tag and the similar label to the document; and if the user chooses not to apply the similar label, creating a new label associated with the tag and applying the tag and new label to the document.
13. The computer readable medium of any one of claims 10 to 12 further comprising: if the detected text does not match a label associated with the tag and is not similar, creating a new label associated with the tag and applying the new label to the document.
14. The computer readable medium of claims 11, 12, or 13 further comprising saving the new label in a separate database.
15. The computer readable medium of any one of claims 10 to 14 wherein the document is scanned into a computer.
16. The computer readable medium of any one of claims 10 to 15 wherein the colored portions are created by at least one of the following: using a highlighting or marking tool; using text letters of specific colors or combination of colors; highlighting text by colored shapes, lines or patterns; or a combination of one or more of the above.
17. The computer readable medium of any one of claims 10 to 16 further comprising: associating additional separate colors with two or more indexing criteria to create additional tags; assigning a priority to each color to create a hierarchy for indexing; filing the document according to the hierarchy in a database or a specified folder.
18. The computer readable medium of any one of claims 10 to 17 wherein detecting text on the colored portions includes performing optical character recognition (OCR) on the colored portions.
17. A method for color-based electronic indexing comprising: utilizing color marking to define text on a document; reading the defined text; and indexing the document based on the defined text.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US99645207P | 2007-11-16 | 2007-11-16 | |
US60/996,452 | 2007-11-16 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2009063329A1 true WO2009063329A1 (en) | 2009-05-22 |
Family
ID=40551357
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2008/003857 WO2009063329A1 (en) | 2007-11-16 | 2008-11-17 | A color-based computerized method for automatic document indexing |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2009063329A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3716147A1 (en) * | 2019-03-25 | 2020-09-30 | Toshiba TEC Kabushiki Kaisha | Image processing method and image processing apparatus |
WO2022189899A1 (en) * | 2021-03-12 | 2022-09-15 | Ricoh Company, Ltd. | Information processing system, processing method, and recording medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0251237A2 (en) * | 1986-06-30 | 1988-01-07 | Wang Laboratories Inc. | Digital imaging file processing system |
US5579407A (en) * | 1992-04-21 | 1996-11-26 | Murez; James D. | Optical character classification |
-
2008
- 2008-11-17 WO PCT/IB2008/003857 patent/WO2009063329A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0251237A2 (en) * | 1986-06-30 | 1988-01-07 | Wang Laboratories Inc. | Digital imaging file processing system |
US5579407A (en) * | 1992-04-21 | 1996-11-26 | Murez; James D. | Optical character classification |
Non-Patent Citations (1)
Title |
---|
HANDSCHUH S ET AL: "CREAM: CREAting Metadata for the Semantic Web", COMPUTER NETWORKS, ELSEVIER SCIENCE PUBLISHERS B.V., AMSTERDAM, NL, vol. 42, no. 5, 5 August 2003 (2003-08-05), pages 579 - 598, XP004433788, ISSN: 1389-1286 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3716147A1 (en) * | 2019-03-25 | 2020-09-30 | Toshiba TEC Kabushiki Kaisha | Image processing method and image processing apparatus |
CN111738901A (en) * | 2019-03-25 | 2020-10-02 | 东芝泰格有限公司 | Storage medium and image processing device |
US11328448B2 (en) | 2019-03-25 | 2022-05-10 | Toshiba Tec Kabushiki Kaisha | Image processing method and image processing apparatus |
CN111738901B (en) * | 2019-03-25 | 2025-03-11 | 东芝泰格有限公司 | Storage medium and image processing device |
WO2022189899A1 (en) * | 2021-03-12 | 2022-09-15 | Ricoh Company, Ltd. | Information processing system, processing method, and recording medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9141691B2 (en) | Method for automatically indexing documents | |
US5159180A (en) | Litigation support system and method | |
US5926565A (en) | Computer method for processing records with images and multiple fonts | |
Papadopoulos et al. | The IMPACT dataset of historical document images | |
CN100414549C (en) | Image search system, image search method, and storage medium | |
EP2354966A2 (en) | System and method for visual document comparison using localized two-dimensional visual fingerprints | |
US6023528A (en) | Non-edit multiple image font processing of records | |
US9736331B2 (en) | Device, system and method for identifying sections of documents | |
CN114117171A (en) | Intelligent project file collecting method and system based on energized thinking | |
US20040044958A1 (en) | Systems and methods for inserting a metadata tag in a document | |
US20110197120A1 (en) | Document Flagging And Indexing System | |
JP6504514B1 (en) | Document classification system and method and accounting system and method. | |
WO2007069058A2 (en) | Specification wizard | |
JP2018190064A (en) | Accounting processing system | |
WO2009063329A1 (en) | A color-based computerized method for automatic document indexing | |
JP2007241355A (en) | Image processor and image processing program | |
CN112445911A (en) | Workflow assistance apparatus, system, method, and storage medium | |
WO2019119030A1 (en) | Image analysis | |
McCarthy et al. | Early modern Oxford bindings in twenty‐first century markup | |
CN109766726B (en) | Document batch signature method for realizing accurate positioning based on Word | |
JP2005165978A (en) | Business form ocr program, method and device thereof | |
US8155449B2 (en) | Method for comparing computer-generated drawings | |
CN114328804A (en) | A method and system for retrieving key words containing text and pictures | |
CN1187684C (en) | Method for auto-extracting marked data content in electronic file | |
Carpallo Bautista et al. | Proposal for Adapting the Cataloging of Bindings to the MARC Format |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08850331 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 08850331 Country of ref document: EP Kind code of ref document: A1 |