WO1992001998A1 - Method and apparatus for automatic text separation using automatic electronic filtering of multiple drop-out colors for optical character recognition of preprinted forms - Google Patents
Method and apparatus for automatic text separation using automatic electronic filtering of multiple drop-out colors for optical character recognition of preprinted forms Download PDFInfo
- Publication number
- WO1992001998A1 WO1992001998A1 PCT/US1991/005040 US9105040W WO9201998A1 WO 1992001998 A1 WO1992001998 A1 WO 1992001998A1 US 9105040 W US9105040 W US 9105040W WO 9201998 A1 WO9201998 A1 WO 9201998A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- color
- pixel
- grey
- scale
- processing
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 18
- 238000000926 separation method Methods 0.000 title abstract description 5
- 238000012015 optical character recognition Methods 0.000 title description 17
- 239000003086 colorant Substances 0.000 title description 11
- 238000001914 filtration Methods 0.000 title description 5
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 claims abstract description 9
- 229910052799 carbon Inorganic materials 0.000 claims abstract description 9
- 238000012545 processing Methods 0.000 claims description 17
- 230000004044 response Effects 0.000 claims description 15
- 239000000976 ink Substances 0.000 claims 4
- 239000000463 material Substances 0.000 abstract description 2
- 230000003287 optical effect Effects 0.000 description 11
- 230000003595 spectral effect Effects 0.000 description 6
- 238000001514 detection method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013481 data capture Methods 0.000 description 1
- 238000013479 data entry Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N1/00—Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
- H04N1/46—Colour picture communication systems
- H04N1/56—Processing of colour picture signals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/12—Details of acquisition arrangements; Constructional details thereof
- G06V10/14—Optical characteristics of the device performing the acquisition or on the illumination arrangements
- G06V10/143—Sensing or illuminating at different wavelengths
Definitions
- the invention relates to the automatic selection and detection of a drop-out color using a color electronic scanner and more particularly, allows the Optical Character Recognition (OCR) system to adjust the filtering parameters automatically based on the form itself, rather than matching the form to the optical filter.
- OCR Optical Character Recognition
- OCR Optical Character Recognition
- the first step of the OCR process is electronic scanning of the document and converting all of the information to a digital bit-map.
- the information to be read is separated from the background information—boxes and guide text must be ignored and the filled-out text should be read.
- the electronic image of the text is processed by the OCR algorithm, where the characters of interest are converted to ASCII data.
- OCR algorithms processing business forms employ the technique of a "drop-out color". By printing documents in a predetermined color (usually a pastel color) and employing an optical filter of the same color in the electronic scanner, the filled-out text on the document can be separated from the printed form.
- the color filter causes the scanner to ignore information printed in that color (to the electronic scanner, the form color appears as being equivalent to the white background of the paper).
- the filled-out text typically is typed or printed in black (or other dark color)
- this information is captured by the scanner as black.
- the pre-printed form is converted to a white background and the filled-out text can be processed readily by an OCR algorithm.
- Use of the optical filter works well in this application, but it limits the customer to a very specific color on the form (one that precisely matches the characteristics of the optical filter installed in the scanner) . Additional drop-out colors can be included in the scanner by adding additional optical filters. Accordingly, the processing of a particular form would require selecting the proper optical filter and mechanically inserting it prior to processing the form.
- the scanner would separate all images into the three primary colors: red, green and blue.
- a black and white rendition of the image can be produced simply by adding the three color components.
- By independently processing the red, green and blue signals it is possible to segregate color information from the common black and white information, so that the apparatus filters all colors, leaving only the high contrast text for OCR reading. Disclosure of the Invention
- three digital channels are multiplied by appropriate coefficients to insure uniform color and amplitude response among all pixels.
- the three signals Once the three signals have been corrected for uniformity, they are processed as independent video signals to create three binary representations of the image.
- an "all color” filter is created which can separate "black” text from any color pre-printed information.
- the three outputs represent document images using all possible combinations of drop-out colors in color space.
- Figure 1 illustrates the configurations of a solid state charged coupled device that can be used for color scanning
- Figure 2 illustrates a block diagram of the circuit used for electronic color filtering in accordance with the invention
- FIGS. 3A-B illustrate a flow chart that is used in conjunction with white calibration.
- Figure 1 illustrates the type of electronic scanner used to generate a programmable drop-out color.
- This scanner would separate all images into the three primary colors: red, green, and blue.
- a black and white rendition of the image (as a typical electronic scanner would produce today) can be produced simply by adding the three color components.
- the electronic scanner intended for use in the present apparatus is based on a "contact type" CCD (Charge Coupled Device) 10 currently available as Model TCD126C, made by Toshiba.
- the CCD is actually several CCD arrays on a single substrate and has a horizontal resolution of 1200 Pixels/inch and spans 12 inches. Because most OCR algorithms can read accurately with scan resolutions of 200 to 400 Pixels/inch, the added resolution can be used for color detection.
- Such detection is accomplished by masking adjacent pixels with appropriate red, green a and blue optical filters with the spectral content of these filters being based on the spectral characteristics of the CCD device itself.
- three adjacent cells 12, 14, and 16 form a single "super-pixel" 18, with cells 12, 14, and 16 being masked by red, green and blue optical filters 20, 22, and 24 respectively. If each pixel corresponds to 1/1200 the effective resolution of the CCD device would be 400 Pixels/inch.
- the output of this scanner contains a three channel output of red 26, green 28, and blue 30 video signals.
- Figure 2 illustrates a block diagram for use in automatic text separation for OCR reading as well as full image capture.
- the color scanner 10 outputs three video signals per pixel— ed 26, Green 28, and Blue 30, in a segmented fashion for each scan line.
- the R, G, B signals are converted to a grey-scale digital representation by respective A/D converters 32, 34 and 36.
- Each pixel's Red, Green, and Blue component is then fed to multipliers 38, 40 and 42 respectively.
- the Microprocessor and RAM Storage Subsystem 52 monitors each pixel within a scan line to ensure proper correlation between pixel video data and calibration coefficients 38, 40, and 42, which are sent to the corresponding multiplier 46, 48, and 50 for their respective color channel.
- the output of these multipliers is in the form of a segmented bit stream of Calibrated Red 56, Green 57, and Blue 58 pixels which can be used as a grey-scale color image.
- This calibrated color information is also fed to summing junction 59 where the three color components for each pixel are added to form a grey-scale black and white image as its output.
- the calibrated Red 56, Green 57, and Blue 58 video data is fed back to Microprocessor and RAM Storage Subsystem 52 for diagnostic purposes.
- the calibrated Red 56, Green 57, and Blue 58 video data is also processed by respective Threshold Circuits 41, 43 and 45 which create 1 bit/pixel video data for Red, Green and Blue.
- Threshold Circuits 41, 43 and 45 may be in the form of a simple comparator or be as elaborate as an M x N convolution filter with adaptive thresholding.
- the output of each threshold circuit 41, 43 and 45 is binary where a "1" corresponds to a "dark” pixel, and a "0" corresponds to a "light” pixel.
- the output of AND gate 63 can be considered a "text" output for typical forms employing a pastel drop-out color.
- Color background information is filtered out and only typed text information is passed on to an OCR algorithm.
- OCR algorithm For example, a form printed with non-carbon red ink and filled out with a typewriter using a carbon-based ribbon easily could be processed using this invention.
- the present invention system would produce an image only of the typed text, ignoring the pre-printed red.
- the present invention can filter out any non-carbon ink, thereby providing greater flexibility. The user could use inter-mixed documents of different colors without worrying about changing filters.
- any drop-out colors used on a particular form can be automatically determined and suppressed by making some assumptions about the spectral content of the filled-out text.
- Most text, used to fill out business forms, can be categorized as "carbon based”. This category includes most typewritten ribbons, pens or pencils. Such text would pass as black regardless of any color filter employed, and the text can be separated from any pre-printed color information by applying an "all-color filter”.
- White calibration can be used to optimize scanner performance by compensating for any spectral anomolies or sensitivity variations on a pixel by pixel basis.
- the white calibration method discussed here is the preferred method for assuring uniform response from the scanner, since the compensation can be done just prior to running, thereby compensating for differences due to age or wear.
- Feeding a white (blank) sheet of paper through the color scanner exercises all three color signals simultaneously. Because a white sheet of paper has a known and predictable spectral curve, the color gain coefficients can be programmed in such a manner as to allow the scanner to mimic this ideal response.
- Figures 3A and B show a flow chart for implementing white calibration. Step 80 requires microprocessor and RAM storage subsystem 52 (Fig.
- step 84 an operator feeds a white piece of paper through the color scanner in order to calibrate the response.
- step 86 the beginning of the page is detected and the calibration process begins.
- Color scanner 10 outputs a sequential three color data stream (R,G,B) as it scans each horizontal line of the white document. This information is digitized by A/D converters 32, 34 and 36, one for each color channel. The digitized- signals are sent to multipliers 38, 40 and 42 respectively.
- microprocessor 52 calculates the average red, green, and blue values for each pixel in step 96 by dividing each accumulator value by the line count (number of lines captured) . This information corresponds to the average color response for each horizontal pixel.
- red, green, and blue coefficients can be calculated for each pixel in step 98. This is done in order to "normalize” the response, which guarantees that each pixel responds in a similar fashion given a similar input.
- the gain coefficients are calculated by dividing the average R, .G, B response of each pixel into the ideal or optimum R, G, B response. The optimum response is based on the ideal R, G, B values for a "white” input.
- the apparatus is capable of compensating for any color or gain anomolies by multiplying each pixel's red, green, and blue video value by an image compensating coefficient.
- color scanner 10 outputs red, green, and blue signals for each horizontal pixel sequentially, and each color signal is digitized by A/D converters 32, 34 and 36.
- the digital grey scale color information for each pixel is then sent to multiplier circuits 38, 40 and 42 respectively.
- Microprocessor and RAM storage subsystem 52 recalls the unique R, G, B gain coefficients for each pixel in the horizontal scan and simultaneously presents these coefficients to the 3 multipliers, thereby multiplying each pixel's red, green, and blue values by their corresponding gain coefficient.
- the outputs of these multipliers represent the normalized red, green, and blue values for each pixel.
- the output of color scanner 10 is balanced for a correct and uniform spectral response.
- the present invention is useful for processing business forms in conjunction with optical character recognition systems as a way of separating text information on forms by automatic filtering of color information.
- This scanner system would separate all images into three primary * colors: red, green and blue.
- a black and white rendition of the image can be produced simply by adding the three color components.
- the invention is advantageous in eliminating the drop-out color variability problem associated with mechanical filter insertion. This variability can be caused by the color of the ink used on the forms varying from one printing batch to another such that the mechanical filter was ineffective in removing the printed text on the form printed with the out of tolerance ink.
- the present invention allows one to intermix documents of different color within a batch, as well as single documents having various drop-out colors (a form with red and blue preprinted information, for example) . Without the use of the present invention, it would be impossible to accomplish this using mechanical filter insertion, as practiced in the prior art.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Character Input (AREA)
- Battery Mounting, Suspending (AREA)
Abstract
Method and apparatus for the separation of text from previously printed material based on the assumption that text used to fill out business forms can be categorized as 'carbon based' and that such text would pass as black regardless of any color filters employed. Text can be separated from any pre-printed color information by applying an 'all-color filter'.
Description
METHOD AND APPARATUS FOR AUTOMATIC TEXT SEPARATION
USING AUTOMATIC ELECTRONIC FILTERING OF MULTIPLE DROP-OUT COLORS FOR OPTICAL CHARACTER RECOGNITION OF
PREPRINTED FORMS Technical Field of the Invention
The invention relates to the automatic selection and detection of a drop-out color using a color electronic scanner and more particularly, allows the Optical Character Recognition (OCR) system to adjust the filtering parameters automatically based on the form itself, rather than matching the form to the optical filter. Background of the Invention
Optical Character Recognition (OCR) is a useful technique for processing business forms. Machine reading systems can replace several data-entry operators and reduce the expense of data capture.
In general, the first step of the OCR process is electronic scanning of the document and converting all of the information to a digital bit-map. Once the image is captured in an electronic format, the information to be read is separated from the background information—boxes and guide text must be ignored and the filled-out text should be read. Once this separation is accomplished, the electronic image of the text is processed by the OCR algorithm, where the characters of interest are converted to ASCII data. Almost all OCR systems processing business forms employ the technique of a "drop-out color". By printing documents in a predetermined color (usually a pastel color) and employing an optical filter of the same color in the electronic scanner, the filled-out text on the document can be separated
from the printed form. The color filter causes the scanner to ignore information printed in that color (to the electronic scanner, the form color appears as being equivalent to the white background of the paper). However, since the filled-out text typically is typed or printed in black (or other dark color), this information is captured by the scanner as black. Hence, the pre-printed form is converted to a white background and the filled-out text can be processed readily by an OCR algorithm. Use of the optical filter works well in this application, but it limits the customer to a very specific color on the form (one that precisely matches the characteristics of the optical filter installed in the scanner) . Additional drop-out colors can be included in the scanner by adding additional optical filters. Accordingly, the processing of a particular form would require selecting the proper optical filter and mechanically inserting it prior to processing the form.
However, slight variations in the printing process can produce variability in the actual color of the printed form, thereby reducing the "drop-out" effect. Such changes can cause noise to be added (the scanner sees the pre-printed form information as black instead of white) which may result in the OCR algorithm producing erroneous results. Alternatively, the changing of optical filters to accommodate these slight variations in printing is not practical, since this would require a large inventory of filters, each with slightly different characteristics. Therefore, the only way to control this problem practically is to tightly control the printing process to insure a uniform drop-out color. As a result, OCR form Reading systems
presently in use are generally "closed loop", which means the Forms Processing Firm (such as an insurance carrier) must maintain control over the printing of the forms, because forms created by outside establishments may not read properly due to color variations.
With the present invention, the scanner would separate all images into the three primary colors: red, green and blue. A black and white rendition of the image can be produced simply by adding the three color components. By independently processing the red, green and blue signals, it is possible to segregate color information from the common black and white information, so that the apparatus filters all colors, leaving only the high contrast text for OCR reading. Disclosure of the Invention
In the present invention, three digital channels are multiplied by appropriate coefficients to insure uniform color and amplitude response among all pixels. Once the three signals have been corrected for uniformity, they are processed as independent video signals to create three binary representations of the image. By combining this signal ins such a way as to preserve only the information common to all three channels, an "all color" filter is created which can separate "black" text from any color pre-printed information. In effect, the three outputs represent document images using all possible combinations of drop-out colors in color space. Brief Description of the Drawings
Figure 1 illustrates the configurations of a solid state charged coupled device that can be used for color scanning;
Figure 2 illustrates a block diagram of the circuit used for electronic color filtering in accordance with the invention; and
Figures 3A-B illustrate a flow chart that is used in conjunction with white calibration. Modes of Carrying Out the Invention
Figure 1 illustrates the type of electronic scanner used to generate a programmable drop-out color. This scanner would separate all images into the three primary colors: red, green, and blue. A black and white rendition of the image (as a typical electronic scanner would produce today) can be produced simply by adding the three color components. The electronic scanner intended for use in the present apparatus is based on a "contact type" CCD (Charge Coupled Device) 10 currently available as Model TCD126C, made by Toshiba. The CCD is actually several CCD arrays on a single substrate and has a horizontal resolution of 1200 Pixels/inch and spans 12 inches. Because most OCR algorithms can read accurately with scan resolutions of 200 to 400 Pixels/inch, the added resolution can be used for color detection. Such detection is accomplished by masking adjacent pixels with appropriate red, green a and blue optical filters with the spectral content of these filters being based on the spectral characteristics of the CCD device itself. As shown in Fig. 1, three adjacent cells 12, 14, and 16 form a single "super-pixel" 18, with cells 12, 14, and 16 being masked by red, green and blue optical filters 20, 22, and 24 respectively. If each pixel corresponds to 1/1200 the effective resolution of the CCD device would be 400 Pixels/inch. The output of this scanner contains a three channel output of red 26, green 28, and blue 30 video signals.
Figure 2 illustrates a block diagram for use in automatic text separation for OCR reading as well as full image capture. The color scanner 10 outputs three video signals per pixel— ed 26, Green 28, and Blue 30, in a segmented fashion for each scan line. The R, G, B signals are converted to a grey-scale digital representation by respective A/D converters 32, 34 and 36. Each pixel's Red, Green, and Blue component is then fed to multipliers 38, 40 and 42 respectively.
The Microprocessor and RAM Storage Subsystem 52 monitors each pixel within a scan line to ensure proper correlation between pixel video data and calibration coefficients 38, 40, and 42, which are sent to the corresponding multiplier 46, 48, and 50 for their respective color channel. The output of these multipliers is in the form of a segmented bit stream of Calibrated Red 56, Green 57, and Blue 58 pixels which can be used as a grey-scale color image. This calibrated color information is also fed to summing junction 59 where the three color components for each pixel are added to form a grey-scale black and white image as its output. In addition, the calibrated Red 56, Green 57, and Blue 58 video data is fed back to Microprocessor and RAM Storage Subsystem 52 for diagnostic purposes. The calibrated Red 56, Green 57, and Blue 58 video data is also processed by respective Threshold Circuits 41, 43 and 45 which create 1 bit/pixel video data for Red, Green and Blue. Threshold Circuits 41, 43 and 45 may be in the form of a simple comparator or be as elaborate as an M x N convolution filter with adaptive thresholding. The output of each threshold circuit 41, 43 and 45 is binary where a "1" corresponds to a "dark" pixel, and a "0" corresponds
to a "light" pixel.
All three of these binary signals (corresponding to Red, Green and Blue values for each pixel) are sent to an AND gate 63. If any of the three color components of a given pixel are
"light" or "0" the output of the AND gate will be a "0", corresponding to "white". In the even all three color components of a pixel are "1" or "dark", the output of the AND gate will be a "1" corresponding to "black" .
The output of AND gate 63 can be considered a "text" output for typical forms employing a pastel drop-out color. Color background information is filtered out and only typed text information is passed on to an OCR algorithm. For example, a form printed with non-carbon red ink and filled out with a typewriter using a carbon-based ribbon easily could be processed using this invention. Unlike a conventional scanning system which would produce an image of all printed material (pre-printed red and typed black) , the present invention system would produce an image only of the typed text, ignoring the pre-printed red.
By closely matching an optical red filter with the color of ink used for the pre-printed information and using the filter with a conventional scanning system, similar results can be achieved. However, such a red optical filter could not drop out a green ink. Advantageously, the present invention can filter out any non-carbon ink, thereby providing greater flexibility. The user could use inter-mixed documents of different colors without worrying about changing filters.
Accordingly, any drop-out colors used on a particular form can be automatically determined and
suppressed by making some assumptions about the spectral content of the filled-out text. Most text, used to fill out business forms, can be categorized as "carbon based". This category includes most typewritten ribbons, pens or pencils. Such text would pass as black regardless of any color filter employed, and the text can be separated from any pre-printed color information by applying an "all-color filter". White Calibration
White calibration can be used to optimize scanner performance by compensating for any spectral anomolies or sensitivity variations on a pixel by pixel basis. The white calibration method discussed here is the preferred method for assuring uniform response from the scanner, since the compensation can be done just prior to running, thereby compensating for differences due to age or wear. Feeding a white (blank) sheet of paper through the color scanner exercises all three color signals simultaneously. Because a white sheet of paper has a known and predictable spectral curve, the color gain coefficients can be programmed in such a manner as to allow the scanner to mimic this ideal response. Figures 3A and B show a flow chart for implementing white calibration. Step 80 requires microprocessor and RAM storage subsystem 52 (Fig. 2) to set all of the red, green, and blue gain coefficients to a value of 1 and then in step 82 set all of the pixel accumulators (located in memory within microprocessor and RAM storage subsystem 52) to 0. In step 84, an operator feeds a white piece of paper through the color scanner in order to calibrate the response. In step 86 the beginning of the page is detected and the calibration process
begins. Color scanner 10 outputs a sequential three color data stream (R,G,B) as it scans each horizontal line of the white document. This information is digitized by A/D converters 32, 34 and 36, one for each color channel. The digitized- signals are sent to multipliers 38, 40 and 42 respectively. Because microprocessor 52 had previously set all gains to a value of 1, the output of each multiplier is equivalent to R,G,B values of each pixel. Microprocessor 52 captures this sequential line of grey scale color information in step 88 within its own memory (RAM) and then adds each pixel's red, green, and blue values to the appropriate accumulator in accordance with step 90. The microprocessor maintains separate accumulators for R, G, and B values for each pixel (total number of accumulators = 3 x number of horizontal pixels). This accumulation process continues until the end of the page is detected in step 92. The total number of lines processed is maintained by a line counter in step 94. Once the scanning* of the page has been completed, microprocessor 52 calculates the average red, green, and blue values for each pixel in step 96 by dividing each accumulator value by the line count (number of lines captured) . This information corresponds to the average color response for each horizontal pixel.
Once this color response is known, red, green, and blue coefficients can be calculated for each pixel in step 98. This is done in order to "normalize" the response, which guarantees that each pixel responds in a similar fashion given a similar input. The gain coefficients are calculated by dividing the average R, .G, B response of each pixel
into the ideal or optimum R, G, B response. The optimum response is based on the ideal R, G, B values for a "white" input. Once the gain coefficients are calculated, 3 per pixel they are stored in accordance with step 100 in a dual-ported memory (not shown, but part of the microprocessor and RAM storage subsystem), with microprocessor and RAM storage subsystem 52, thereby completing the white calibration process. Once calibrated, the apparatus (Fig. 2) is capable of compensating for any color or gain anomolies by multiplying each pixel's red, green, and blue video value by an image compensating coefficient. During operation, color scanner 10 outputs red, green, and blue signals for each horizontal pixel sequentially, and each color signal is digitized by A/D converters 32, 34 and 36. The digital grey scale color information for each pixel is then sent to multiplier circuits 38, 40 and 42 respectively. Microprocessor and RAM storage subsystem 52 recalls the unique R, G, B gain coefficients for each pixel in the horizontal scan and simultaneously presents these coefficients to the 3 multipliers, thereby multiplying each pixel's red, green, and blue values by their corresponding gain coefficient. The outputs of these multipliers represent the normalized red, green, and blue values for each pixel. By running calibration, storing the unique color gain coefficients for each pixel, and subsequently using the gain coefficients to normalize the R, G, B response for each pixel, the output of color scanner 10 is balanced for a correct and uniform spectral response.
While the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications, and
variations will be apparent to those skilled in the art in light of the foregoing description. Accordingly, it is intended to embrace all such alternatives, modifications and variations as fall within the spirit and broad scope of the appended claims. Advantages and Industrial Applicability
The present invention is useful for processing business forms in conjunction with optical character recognition systems as a way of separating text information on forms by automatic filtering of color information. This scanner system would separate all images into three primary * colors: red, green and blue. A black and white rendition of the image can be produced simply by adding the three color components. The invention is advantageous in eliminating the drop-out color variability problem associated with mechanical filter insertion. This variability can be caused by the color of the ink used on the forms varying from one printing batch to another such that the mechanical filter was ineffective in removing the printed text on the form printed with the out of tolerance ink. Additionally, the present invention allows one to intermix documents of different color within a batch, as well as single documents having various drop-out colors (a form with red and blue preprinted information, for example) . Without the use of the present invention, it would be impossible to accomplish this using mechanical filter insertion, as practiced in the prior art.
Claims
1. An apparatus for processing a three component color signal generated by a color scanner after said signal components have been converted to a grey-scale digital format by an analog to digital converter associated with each color component on a pixel by pixel basis, said apparatus characterized by: processing means for each of the color video signals to create three binary, 1 bit per pixel video signals, each corresponding to the red, green and blue components of a pixel; and means for combining said color video signals so as to preserve black information only when all of the binary color signals identify the pixel as being black.
2. An apparatus as set forth in Claim 1 wherein the combining means takes the form of an AND gate.
3. An apparatus as set forth in Claim 2 wherein the output of said combining means is a 1 bit per pixel segmented scan line that is suitable for OCR reading.
4. An apparatus as set forth in Claim 1 that further includes means for compensating each pixel within a scan line for amplitude and color response as represented by three color components.
5. An apparatus for processing a color form that was filled out using a carbon based ink, said apparatus characterized by: a color scanner having a plurality of grey-scale color outputs generating scan line having segmented pixels with red, green and blue components; means for converting each output to a grey-scale digital format on a pixel by pixel basis; me ory means for storing at least one scan line of color grey-scale information; means for compensating each pixel within a scan line for amplitude and color response as represented by the three color components; means for processing each of the color video signals to create three binary 1 bit per pixel video signals, each corresponding to the red, green and blue components of a pixel; and means for combining said binary color information such as to preserve black information, associated with carbon based inks, when all of the said binary color outputs indicate the pixel as being black.
6. An apparatus as set forth in Claim 5 wherein the compensating means takes the form of a digital multiplier.
7. A method of processing a color form that was filled out using a carbon based ink, said method characterized by the steps of: scanning the color form and producing at least two grey-scale color outputs having segmented pixels for each color component; converting each output to a grey-scale digital format on a pixel by pixel basis; storing at lest one scan line of each color grey-scale component; compensating for each pixel within a scan line for amplitude and color responses as represented by each color component; processing each of the color video signals to crate three binary 1 bit per pixel video signals, each corresponding to the color components for each pixel; and comparing said binary color information so as to preserve black information, associated with carbon based inks, when all of the color components indicate the pixel as being black.
8. A method of processing a color form as set forth in Claim 7 wherein the combining step means AND'ing all of said outputs.
9. A method of processing a color form as set forth in Claim 7 wherein the combined output is suitable for OCR reading.
10. A method of processing a color form as set forth in Claim 7 wherein the sum of the grey-scale color outputs produce a grey-scale black/white image.
11. A method of processing a color form as set forth in Claim 7 wherein the grey-scale color output signals produce a grey-scale color image.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US55729490A | 1990-07-24 | 1990-07-24 | |
US557,294 | 1990-07-24 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1992001998A1 true WO1992001998A1 (en) | 1992-02-06 |
Family
ID=24224826
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1991/005040 WO1992001998A1 (en) | 1990-07-24 | 1991-07-18 | Method and apparatus for automatic text separation using automatic electronic filtering of multiple drop-out colors for optical character recognition of preprinted forms |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP0491923A1 (en) |
JP (1) | JPH05501778A (en) |
WO (1) | WO1992001998A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7936488B2 (en) | 2008-02-15 | 2011-05-03 | Mitsubishi Electric Corporation | Image reading apparatus |
WO2013009530A1 (en) * | 2011-07-08 | 2013-01-17 | Qualcomm Incorporated | Parallel processing method and apparatus for determining text information from an image |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5287442B2 (en) * | 2009-04-07 | 2013-09-11 | 三菱電機株式会社 | Image reading device |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0375090A2 (en) * | 1988-12-21 | 1990-06-27 | Recognition International Inc. | Document processing system |
-
1991
- 1991-07-18 JP JP3512695A patent/JPH05501778A/en active Pending
- 1991-07-18 WO PCT/US1991/005040 patent/WO1992001998A1/en not_active Application Discontinuation
- 1991-07-18 EP EP91913232A patent/EP0491923A1/en not_active Withdrawn
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0375090A2 (en) * | 1988-12-21 | 1990-06-27 | Recognition International Inc. | Document processing system |
Non-Patent Citations (4)
Title |
---|
PATENT ABSTRACTS OF JAPAN vol. 14, no. 437 (E-098)19 September 1990 & JP,A,2 170 674 ( MINOLTA CAMERA CO LTD ) 2 July 1990 see abstract * |
PATENT ABSTRACTS OF JAPAN vol. 6, no. 245 (P-159)3 December 1982 & JP,A,57 143 683 ( TOKYO SHIBAURA DENKI KK ) 4 September 1982 see abstract * |
PATENT ABSTRACTS OF JAPAN vol. 9, no. 9 (P-327)(1732) 16 January 1985 & JP,A,59 158 481 ( NIPPON DENKI KK ) 7 September 1984 * |
see abstract * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7936488B2 (en) | 2008-02-15 | 2011-05-03 | Mitsubishi Electric Corporation | Image reading apparatus |
WO2013009530A1 (en) * | 2011-07-08 | 2013-01-17 | Qualcomm Incorporated | Parallel processing method and apparatus for determining text information from an image |
US9202127B2 (en) | 2011-07-08 | 2015-12-01 | Qualcomm Incorporated | Parallel processing method and apparatus for determining text information from an image |
Also Published As
Publication number | Publication date |
---|---|
JPH05501778A (en) | 1993-04-02 |
EP0491923A1 (en) | 1992-07-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5014328A (en) | Automatic detection and selection of a drop-out color used in conjunction with optical character recognition of preprinted forms | |
US5014329A (en) | Automatic detection and selection of a drop-out color using zone calibration in conjunction with optical character recognition of preprinted forms | |
EP0070161B1 (en) | Adaptive thresholder and method | |
DE69325527T2 (en) | Image processing apparatus and method | |
US4414581A (en) | Image signal processing method and apparatus therefor | |
EP0317268A2 (en) | Image recording apparatus | |
DE3629195C2 (en) | ||
US7580569B2 (en) | Method and system for generating contone encoded binary print data streams | |
US4825296A (en) | Method of and apparatus for copying originals in which an image to be printed is evaluated by observing a corresponding low-resolution video image | |
US7436994B2 (en) | System of using neural network to distinguish text and picture in images and method thereof | |
US4533942A (en) | Method and apparatus for reproducing an image which has a coarser resolution than utilized in scanning of the image | |
DE69631812T2 (en) | System and method for a highly addressable printing system | |
US6775031B1 (en) | Apparatus and method for processing images, image reading and image forming apparatuses equipped with the apparatus, and storage medium carrying programmed-data for processing images | |
EP0732842A2 (en) | Image processing apparatus capable of properly determining density of a background portion | |
US6718059B1 (en) | Block selection-based image processing | |
US5892596A (en) | Image processing apparatus capable of reforming marker editing | |
DE19744501A1 (en) | Gray scale value compensation arrangement for image sensor | |
WO1992001998A1 (en) | Method and apparatus for automatic text separation using automatic electronic filtering of multiple drop-out colors for optical character recognition of preprinted forms | |
US3902009A (en) | Multi aperture scanning and printing for facsimile line skipping | |
US20060165292A1 (en) | Noise resistant edge detection | |
US6693731B1 (en) | Image processing apparatus and method | |
US6108456A (en) | Image processing system | |
US5245446A (en) | Image processing system | |
JP2861089B2 (en) | Image addition device | |
EP3565232A1 (en) | Output image generating method of an image reading device, and an image reading device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB GR IT LU NL SE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1991913232 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1991913232 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1991913232 Country of ref document: EP |