WO2008134000A1

WO2008134000A1 - Image segmentation and enhancement

Info

Publication number: WO2008134000A1
Application number: PCT/US2008/005366
Authority: WO
Inventors: Jian Fan
Original assignee: Hewlett-Packard Development Company, L.P.
Priority date: 2007-04-27
Filing date: 2008-04-24
Publication date: 2008-11-06
Also published as: US20080267497A1; US8417033B2; CN101689300A; JP2010525486A; DE112008001052T5; CN101689300B

Abstract

Methods, apparatus, and machine-readable media for segmenting and enhancing images are described. In one aspect, gradient magnitude values at respective pixels of a given image (16) are determined. The gradient magnitude values are thresholded with a global threshold to produce thresholded gradient magnitude values (20). The pixels are segmented into respective groups in accordance with a watershed transform of the thresholded magnitude values (20). A classification record (18) is generated. The classification record (18) labels as background pixels ones of the pixels segmented into one of the groups determined to be largest in size and labels as non-background pixels ones of the pixels segmented into any of the groups except the largest group.

Description

IMAGE SEGMENTATION AND ENHANCEMENT

BACKGROUND Image segmentation typically involves separating object regions of an image from background regions of the image. Many different approaches for segmenting an image have been proposed, including thresholding, region growing, and watershed transform based image segmentation processes. The segmentation results of such processes may be used for a wide variety of different applications, including object extraction for object description or recognition. In general, noise reduces the accuracy with which an image segmentation process can segment objects from background regions. Text-like objects in digital images that are captured by camera-equipped handheld devices (e.g., digital cameras, cellular telephones, and personal digital assistants) often are degraded by nonuniform illumination and blur. The presence of these artifacts significantly degrades the overall appearance quality of the reproduced digital images. In addition, such degradation adversely affects OCR (optical character recognition) accuracy. What are needed are apparatus and methods that are capable of segmenting and enhancing document images in ways that are robust to text font size, blur level and noise.

SUMMARY In one aspect, the invention features a method in accordance with which gradient magnitude values at respective pixels of a given image are determined. The gradient magnitude values are thresholded with a global threshold to produce thresholded gradient magnitude values. The pixels are segmented into respective groups in accordance with a watershed transform of the thresholded magnitude values. A classification record is generated. The classification record labels as background pixels ones of the pixels segmented into one of the groups determined to be largest in size and labels as non-background pixels ones of the pixels segmented into any of the groups except the largest group. The invention also features an apparatus and a machine readable medium storing machine-readable instructions causing a machine to implement the method described above. Other features and advantages of the invention will become apparent from the following description, including the drawings and the claims .

DESCRIPTION OF DRAWINGS FIG. 1 is a block diagram of an embodiment of an image processing system. FIG. 2 is an example of an image of nonuniformly illuminated text. FIG. 3 is a flow diagram of an embodiment of an image processing method. FIG. 4 is an example of an image composed of gradient magnitude values derived from the image of FIG. 2 in accordance with an embodiment of the method of FIG. 3. FIG. 5A is a diagrammatic view of an array of devised gradient magnitude values at respective pixels of an illustrative image. FIG. 5B is a diagrammatic view of an array of labels assigned to the pixels of the image shown in FIG. 5A in accordance with a watershed transform based image segmentation process. FIG. 6A is an example of an image containing text. FIG. 6B is a grayscale image showing different labels that were assigned to the pixels of the image of FIG. 6A in accordance with an embodiment of the segmentation process in the method shown in FIG. 3. FIG. 6C is an example of a classification record in the form of a binary segmentation map generated from the grayscale image of FIG. 6B in accordance with an embodiment of the classification record generation process in the method shown in FIG. 3. FIG. 7 is an example of a classification record in the form of a binary segmentation map in which black pixels represent object pixels detected in the image of FIG. 2 and 75 white pixels represent background pixels detected in the

76 image of FIG. 2.

77 FIG. 8 is a block diagram of an embodiment of the image

78 processing system shown in FIG. 1.

79 FIG. 9 is an example of an image composed of illuminant

80 values estimated for the pixels of the image of FIG. 2 in

81 accordance with an embodiment of the invention.

82 FIG. 10 is an example of an illumination-corrected

83 image derived from the image of FIG. 2 based on the

84 illuminant values shown in FIG. 9 in accordance with an

85 embodiment of the invention.

86 FIG. 11 is an example of a sharpened image derived from

87 the image of FIG. 10 in accordance with an embodiment of the

88 invention.

89 FIG. 12 is a block diagram of an embodiment of an

90 apparatus incorporating an embodiment of the image

91 processing system of FIG. 1.

92 FIG. 13 is a block diagram of an embodiment of an

93 apparatus incorporating an embodiment of the image

94 processing system of FIG. 1.

95 DETAILED DESCRIPTION

96 In the following description, like reference numbers

97 are used to identify like elements. Furthermore, the

98 drawings are intended to illustrate major features of

99 exemplary embodiments in a diagrammatic manner. The

100 drawings are not intended to depict every feature of actual

101 embodiments nor relative dimensions of the depicted

102 elements, and are not drawn to scale.

103 I . INTRODUCTION

104 The embodiments that are described in detail below are

105 capable of segmenting and enhancing images in ways that are

106 robust to blur level and noise. These embodiments

107 incorporate global thresholding prior to watershed transform

108 based image segmentation in ways that achieve improved noise

109 resistant results, especially for images containing text, no The global thresholding eliminates or breaks noise in structures in the images before performing the watershed

112 transform based image segmentations. Some embodiments use

113 the segmentation result to enhance the document images in

114 various ways, including correcting for nonuniform

115 illumination, darkening target object regions, and lie sharpening target object regions. Implementations of

117 these embodiments are particularly useful for enhancing text

118 in ways that are robust to text font size, blur level and

119 noise.

120 II. OVERVIEW

121 FIG. 1 shows an embodiment of an image processing

122 system 10 that includes a preprocessing module 12 and a

123 segmentation module 14. The image processing system

124 produces from an image 16 a classification record 18 that

125 labels the pixels of the image 16 either as background

126 pixels or non-background pixels. In this process, the

127 preprocessing module 12 processes the image 16 to produce an

128 intermediate image 20, which has characteristics that

129 improve the accuracy with which the segmentation module 14

130 can distinguish target object regions from background

131 regions in the image 16.

132 The image 16 may correspond to any type of digital

133 image, including an original image (e.g., a video keyframe,

134 a still image, or a scanned image) that was captured by an

135 image sensor (e.g., a digital video camera, a digital still

136 image camera, or an optical scanner) or a processed (e.g.,

137 sub-sampled, filtered, reformatted, enhanced or otherwise

138 modified) version of such an original image. FIG. 2 shows

139 an example 22 of the image 16 that contains nonuniformly

140 illuminated text. In the following detailed description,

141 the exemplary image 22 and the various image data derived

142 therefrom are used for illustrative purposes only to explain

143 one or more aspects of one or more embodiments of the

144 invention.

145 In general, the classification record 18 may be used

146 for a wide variety of different purposes, including image

147 enhancement, object detection, object tracking, object 148 description, and object recognition. Some of the

149 embodiments of the invention that are described in detail

150 below use the classification record 18 to perform one or

151 more of the following image enhancement operations on the

152 image 16: reducing the effects of nonuniform illumination;

153 darkening and sharpening text-like objects.

154 III. SEGMENTING AN IMAGE INTO BACKGROUND REGIONS AND TARGET

155 OBJECT REGIONS

156 A. OVERVIEW

157 FIG. 3 shows an embodiment of a method that is

158 implemented by the image processing system 10. In

159 accordance with this method, the preprocessing module 12

160 determines gradient magnitude values at respective pixels of

161 the image 16 (FIG. 3, block 24) . The preprocessing module

162 12 thresholds the gradient magnitude values with a global

163 threshold to produce thresholded gradient magnitude values

164 (FIG. 3, block 26) . The segmentation module 14 segments the

165 pixels of the image 16 into groups in accordance with a

166 watershed transform of the thresholded gradient magnitude

167 values (FIG. 3, block 28). The segmentation module 14

168 generates the classification record 18. The classification

169 record 18 labels as background pixels ones of the pixels

170 segmented into one of the groups determined to be largest in

171 size and labels as non-background pixels ones of the pixels

172 segmented into any of the groups except the largest group

173 (FIG. 3, block 30) .

174 B. DETERMINING GRADIENT MAGNITUDE VALUES

175 As explained above, the preprocessing module 12

176 determines gradient magnitude values at respective pixels of

177 the image 16 (FIG. 3, block 24). In some embodiments, the

178 preprocessing module 12 denoises the pixel values of the

179 image 16 before determining the gradient magnitude values.

180 For this purpose any type of denoising filter may be used,

181 including a Gaussian smoothing filter and a bilateral

182 smoothing filter. In other embodiments, the preprocessing 183 module 12 determines the gradient magnitude values directly

184 from pixel values of the image 16.

185 In general, the preprocessing module 12 may use any

186 type of gradient filter or operator to determine the

187 gradient magnitude values. If the image 16 is a grayscale

188 image, the preprocessing module 12 may determine the

189 gradient magnitude values using, for example, a basic

190 derivative filter, a Prewitt gradient filter, a Sobel

191 gradient filter, a Gaussian gradient filter, or another type

192 of morphological gradient filter. If the image 16 is a

193 color image, the preprocessing module 12 may convert the

194 image 16 into a grayscale image and apply a gradient filter

195 of the type listed above to the grayscale values to

196 determine the gradient magnitudes. Alternatively, the

197 preprocessing module 12 may convert the color image into a

198 YCrCb color image and apply a gradient filter of the type

199 listed above to the luminance (Y) values to determine the

200 gradient magnitudes. In some embodiments, the preprocessing

201 module 12 computes each of the gradient magnitude values

202 from multiple color space components (e.g., red, green, and

203 blue components) of the color image. For example, in some

204 of these embodiments, the preprocessing module 12 determines

205 the magnitudes of color gradients in the color image in

206 accordance with the color gradient operator described in

207 Silvano DiZenzo, "A Note on the Gradient of a Multi-

208 Image, " Computer Vision, Graphics, and Image Processing,

209 vol. 33, pages 116-125 (1986). FIG. 4 depicts an example of

210 an image 32 that is composed of color gradient magnitude

211 values that were derived from a color version of the image

212 22 (see FIG. 2) in accordance with such a color gradient

213 operator.

214 C. GLOBAL THRESHOLDING GRADIENT MAGNITUDE VALUES

215 As explained above, the preprocessing module 12

216 thresholds the gradient magnitude values with a global

217 threshold to produce thresholded gradient magnitude values

218 (FIG. 3, block 26). This global thresholding process

219 eliminates or breaks noise structures in the images before 220 performing the watershed transform based image segmentation.

221 In this way, the problems of over-segmentation and

222 inaccurate segmentation results due to such noise structures

223 may be reduced. The preprocessing module 12 typically uses

224 an empirically determined global threshold to threshold the

225 gradient magnitude values. In some embodiments, the

226 preprocessing module 12 thresholds the gradient magnitude

227 values with a global threshold ( τ_GL0BAL ) that is determined in

228 accordance with equation (1) :

229 τ_amΛλL ( 1 )

230 where k is a real number, g^^ is the maximum gradient

231 magnitude value, and τ_mN is an empirically determined

232 minimum global threshold value. In one exemplary

233 embodiment, the range of gradient magnitude values is from 0

234 \ to 255, k = 0.1 and T_110N =S.

235 The resulting thresholded gradient magnitude values,

236 which correspond to the intermediate image 20 (see FIG. 1) ,

237 are passed to the segmentation module 14 for segmentation

238 processing.

239 D. SEGMENTING THRESHOLDED GRADIENT MAGNITUDE VALUES

240 As explained above, the segmentation module 14 segments

241 the pixels of the image 16 into groups in accordance with a

242 watershed transform of the thresholded gradient magnitude

243 values (FIG. 3, block 28).

244 In the course of computing the watershed transform of

245 the gradient magnitude values, the segmentation module 14

246 identifies basins and watersheds in the thresholded

247 magnitude values, assigns respective basin labels to those

248 pixels corresponding to ones of the identified basins,

249 assigns a unique shared label to those pixels corresponding

250 to the watersheds, and performs a connected components

251 analysis on the assigned labels. The segmentation module 14 252 may compute the watershed transform in accordance with any

253 one of a wide variety of different methods. In some

254 embodiments, the basins are found first and the watersheds

255 may be found by taking a set complement whereas, in other

256 embodiments, the image is partitioned completely into basins

257 and the watersheds may be found by boundary detection (see,

258 e.g., J. B. T.M. Roerdink et al . , "The Watershed Transform:

259 Definitions, Algorithms and Parallelization Strategies,

260 Fundamenta Informaticae, vol. 41, pages 187-228 (2001)). In

261 some embodiments, the segmentation module 14 computes the

262 watershed transform of the thresholded gradient magnitude

263 values in accordance with the watershed calculation method

264 described in Luc Vincent et al . , "Watersheds in Digital

265 Spaces: An Efficient Algorithm Based on Immersion

266 Simulations, " IEEE Transactions on Pattern Analysis and

267 Machine Intelligence, vol. 13, no. 6 (June 1991).

268 In general, the segmentation module 14 may perform any

269 one of a wide variety of different connected components

270 analyses on the assigned labels. For example, in one

271 connected component labeling approach, the labels assigned

272 to the pixels are examined, pixel -by-pixel in order to

273 identify connected pixel regions (or "blobs", which are

274 regions of adjacent pixels that are assigned the same

275 label) . For each given pixel, the label assigned to the

276 given pixel is compared to the labels assigned to the

277 neighboring pixels. The label assigned to the given pixel is

278 changed or unchanged based on the labels assigned to the

279 neighboring pixels. The number of neighbors examined and the

280 rules for determining whether to keep the originally

281 assigned label or to re-classify the given pixel depends on

282 the measure of connectivity being used (e.g., 4 -connectivity

283 or 8-connectivity) .

284 FIG. 5A shows a diagrammatic view of an array 34 of

285 devised gradient magnitude values at respective pixels of an

286 illustrative image. FIG. 5B shows a diagrammatic view of an

287 array of labels assigned to the pixels of the image shown in

288 FIG. 5B in accordance with a watershed transform based image

289 segmentation process and a connected component re-labeling

290 process based on 4 -connectivity. In FIG. 5B, the labels Bl 291 and B2 identify respective basins and the label W identifies

292 the watershed pixels that were detected in the array 34.

293 FIG. 6A shows an example of an image 36 containing text

294 (i.e., the word "advantage") and FIG. 6B shows a grayscale

295 (image 38 of the resulting (numbered) labels that were

296 assigned to the pixels of the image 36 in accordance with an

297 embodiment of the segmentation process of block 28 of FIG.

298 3.

299 In some embodiments, after the pixel connectivity

300 analysis has been performed, the watershed pixels are merged

301 with the neighboring region with the largest label number to

302 produce a segmentation of the pixels of the image 16 into a

303 final set of identified groups.

304 E. GENERATING A CLASSIFICATION RECORD

305 As explained above, the segmentation module 14

306 generates the classification record 18, which labels as

307 background pixels ones of the pixels segmented into one of

308 the identified groups determined to be largest in size and

309 labels as non-background pixels ones of the pixels segmented

310 into any of the identified groups except the largest group

311 (FIG. 3, block 30) . The largest group may be identified in

312 a variety of different ways. In some embodiments, the

313 largest grouped is determined by selecting the group having

314 the largest number of pixels.

315 In some embodiments, the segmentation module 14 records

316 in the classification record 18 a first binary value (e.g.,

317 "1 " or "white" ) for each of the pixels segmented into the

318 largest group and second binary value (e.g., "0" or

319 "black" ) for each of the pixels segmented into any of the

320 groups except the largest group. For example, FIG. 6C shows

321 an example of a classification record generated for the

322 image 36 (see FIG. 6A) in the form of a binary segmentation

323 map 40 that was generated from the grayscale image 38 (see

324 FIG. 6B) in accordance with an embodiment of the

325 classification record generation process of block 30 in FIG.

326 3. In the binary segmentation map 40, the black pixels

327 represent object pixels that were detected in the image 36 328 (see FIG. 6A) and the white pixels represent background

329 pixels that were detected in the image 36.

330 Referring back to FIG. 6B, the watershed transform

331 based segmentation performed in block 28 of FIG. 3 tends to

332 over-segment the text characters appearing in images 22, 36.

333 As shown in FIG. 6C, however, the background pixels in these

334 images 22, 36 readily can be identified as the largest

335 connected component in the pixel labels assigned by the

336 watershed transform segmentation process in spite of such

337 over-segmentation.

338 FIG. 7 shows an example of a graphical representation

339 of a classification record that was generated for the image

340 22 (see FIG. 2) in the form of a binary segmentation map 42

341 in which black pixels represent object pixels detected in

342 the image 22 and white pixels represent background pixels

343 detected in the image 22.

344 IV. ENHANCING AN IMAGE BASED ON ITS ASSOCIATED

345 CLASSIFICATION RECORD

346 A. OVERVIEW

347 As explained above, the classification record 18 may be

348 used for a wide variety of different purposes, including

349 image enhancement, object detection, object tracking, object

350 description, and object recognition.

351 FIG. 8 shows an embodiment 44 of the image processing

352 system 10 that additionally includes an image enhancement

353 module 46. In some embodiments, the image enhancement

354 module 46 produces an enhanced image 48 by performing one or

355 more of the following image enhancement operations on the

356 image 16 based on the classification record 18: reducing

357 the effects of nonuniform illumination; darkening target

358 object regions; and sharpening target object regions.

359 B. • ILLUMINATION CORRECTION

360 In some embodiments, the image enhancement module 46 is

361 operable to produce the enhanced image 48 by correcting for

362 nonuniform illumination in the image 16. 363 In some embodiments, the illumination correction is

364 based on the following image formation model:

365 l{x,y)=R(x,y) L{x,y) (2)

366 where l(x, y) is the measured intensity value, R(x,y) the

367 surface reflectivity value, and L{x,y) is the illuminant

368 value at pixel (x,y) of the image 16, respectively.

369 In accordance with this model, the illuminant values of.

370 background pixels (as indicated by the classification record

371 18) are assumed to be proportional to the luminance values

372 of the pixels. If the image 16 is a grayscale image, the

373 estimated illuminant values L{x,y) for the background pixels

374 are the grayscales values of the background pixels (x,y) .

375 If the image 16 is a color image, the estimated illuminant

376 values L{x,y) for the background pixels are obtained, for

377 example, by converting the image 16 into a grayscale color

378 space or the YCrCb color space and setting the estimated

379 luminant values L[x,y) to the grayscale values or the

380 luminance values (Y) of the background pixels (x,y) in the

381 converted image. The illuminant values for the non- 382 background pixels may be estimated from the estimated

383 illuminant values of the neighboring background pixels in a

384 variety of different ways, including using interpolation

385 methods and image impainting methods.

386 FIG. 9 depicts an example of an image 50 that is

387 composed of illuminant values that were estimated for the

388 pixels of the image 22 (see FIG. 2) in accordance with the

389 method described above.

390 In some embodiments, the illumination-corrected pixel

391 values E(x,y) of the enhanced image 48 are estimated from

392 ratios of spatially corresponding ones of the pixel values

393 of the image 16 to respective tone values that are

394 determined from the estimated illuminant values in

395 accordance with equation (3) : 396 E{x,y) = R{x,y) = s (3 )

397 where s is a scale factor, l{x,y) is the value of pixel (x,y)

398 in the image 16, t{x,y) is the illuminant value estimated for

399 pixel (x,y) , and T\L(x,y)j is a function that maps the

400 estimated illuminant value to a respective tone value. In

401 one exemplary embodiment in which pixel values range from 0

402 to 255, the scale factor s is set to 255. The tone mappings

403 corresponding to the function T\L[x,y)) typically are stored

404 in a lookup table (LUT) .

405 In some embodiments, the tone mapping function T\L{x,y))

406 maps the estimated illuminant values to themselves (i.e.,

407 T\L(x,y)~ L(x,y))) . In these embodiments, the resulting

408 enhanced image 48 corresponds to an illumination corrected

409 version of the original image 16. In other embodiments, the

410 tone mapping function T\L(x,y)j includes at least one other

411 image enhancement (e.g., selective darkening and selective

412 sharpening) as described in detail below.

413 C. SELECTIVE DARKENING

414 In some embodiments, the tone mapping function

415 incorporates an unsharp-masking-like contrast enhancement

416 that is applied to the object region (i.e., non-background

417 region) that are identified in the classification record 18.

418 In some of these embodiments, the tone mapping function that

419 is used for the object region pixels is defined in equation

420 (4) as follows:

421 (4)

422 where s=255 for 8-bit images, b - t^r(l - 1)' ^r and t = l/s is the

423 normalized mean luminance value of the image. In these

424 embodiments, in response to determinations that the 425 corresponding estimated illuminant values are below a

426 illuminant threshold value, the image enhancement module 46

427 sets pixel values of the enhanced image darker than

428 spatially corresponding ones of the pixel values of the

429 given image. In addition, in response to determinations

430 that the corresponding estimated illuminant values are above

431 the illuminant threshold value, the image enhancement module

432 46 sets pixel values of the enhanced image lighter than

433 spatially corresponding ones of the pixel values of the

434 given image .

435 In other ones of these embodiments, the tone mapping

436 function that is used for the non-background (i.e., object

437 region) pixels is defined in equation (5) as follows:

438

440 FIG. 10 shows an example of an illumination-corrected

441 image 52 that is derived from the image 22 (FIG. 2) based on

442 the illuminant values shown in FIG. 9 and the tone mapping

443 function defined in equation (4) .

444 ^" D. SELECTIVE SHARPENING

445 In some embodiments, selective sharpening is achieved

446 by applying unsharp masking selectively to target object

447 regions (e.g., text regions) that are identified in the

448 classification record 18. In some of these embodiments, the

449 pixel values of the object regions ( E_0BJECT(x,y)) of the

450 enhanced image 48 are computed by the selective filter

451 defined in equation (6) , which incorporates an unsharp

452 masking element in the illumination correction filter

453 defined in equation (3) :

/ x

454 E_OBJECT {x,y) ( 6 )

455 where a is an empirically determined parameter value that

456 dictates the amount of sharpening.

457 In some embodiments, the pixel values of the object

458 regions ( E₀'_BJECT(x,y) ) of the enhanced image 48 are computed by

459 applying the selective filter defined in equation (7) to the

460 pixel values (E_0BJECT(x,y)) generated by the selective

461 sharpening filter defined in equation (6) .

462 E'{x,y)={β+\)-E_0BJECT{x,y)-β-G[E_0BJECT] (7)

463 where G[] represents a Gaussian smoothing filter and the

464 parameter β represents the amount of sharpening. In some

465 embodiments, the size (w) of the Gaussian kernel and the

466 amount of sharpening β are determined from equations (8)

467 and (9) , respectively:

470 where [w_mm,w_max] is an empirically determined parameter value

471 range for the window size, [/?_rain,/?_max] is an empirically

472 determined parameter value range for the amount of

473 sharpening, and [§_£,£„] is the low and high thresholds of the

474 sharpness, g_mm is the maximum gradient magnitude value

475 determined in block 24 in the method shown in FIG. 3. In

476 some embodiments, the Gaussian smoothing filter G[] in

477 equation (7) may be replaced by a different type of

478 smoothing filter (e.g., an averaging filter) .

479 FIG. 11 depicts an example of a selectively sharpened

480 image 54 that was derived from the image 52 (see FIG. 9) in 481 accordance with the selective sharpening methods defined in

482 equations (6) - (9) .

483 V. EXEMPLARY ARCHITECTURES OF THE IMAGE PROCESSING SYSTEM

484 A. OVERVIEW

485 Embodiments of the image processing system 10

486 (including the embodiment 44 shown in FIG. 8) may be

487 implemented by one or more discrete modules (or data

488 processing components) that are not limited to any

489 particular hardware, firmware, or software configuration.

490 In the illustrated embodiments, the modules may be

491 implemented in any computing or data processing environment,

492 including in digital electronic circuitry (e.g., an

493 application-specific integrated circuit, such as a digital

494 signal processor (DSP)) or in computer hardware, firmware,

495 device driver, or software. In some embodiments, the

496 functionalities of the modules are combined into a single

497 data processing component. In some embodiments, the

498 respective functionalities of each of one or more of the

499 modules are performed by a respective set of multiple data

500 processing components.

501 In some implementations, process instructions (e.g.,

502 machine -readable code, such as computer software) for

503 implementing the methods that are executed by the

504 embodiments of the image processing system 10, as well as

505 the data it generates, are stored in one or more machine-

506 readable media. Storage devices suitable for tangibly

507 embodying these instructions and data include all forms of

508 non-volatile computer-readable memory, including, for

509 example, semiconductor memory devices, such as EPROM,

510 EEPROM, and flash memory devices, magnetic disks such as

511 internal hard disks and removable hard disks, magneto-

512 optical disks, DVD-ROM/RAM, and CD-ROM/RAM.

513 In general, embodiments of the image processing system

514 10 may be implemented in any one of a wide variety of

515 electronic devices, including desktop and workstation

516 computers, video recording devices (e.g., VCRs and DVRs),

517 cable or satellite set-top boxes capable of decoding and 518 playing paid video programming, and digital camera devices.

519 Due to its efficient use of processing and memory resources,

520 some embodiments of the image processing system 10 may be

521 implemented with relatively small and inexpensive components

522 that have modest processing power and modest memory

523 capacity. As a result, these embodiments are highly

524 suitable for incorporation in compact camera environments

525 that have significant size, processing, and memory

526 constraints, including but not limited to handheld

527 electronic devices (e.g., a mobile telephone, a cordless

528 telephone, a portable memory device such as a smart card, a

529 personal digital assistant (PDA) , a solid state digital

530 audio player, a CD player, an MCD player, a game controller,

531 a pager, and a miniature still image or video camera) , pc

532 cameras, and other embedded environments.

533 B. A FIRST EXEMPLARY IMAGE PROCESSING SYSTEM

534 ARCHITECTURE

535 FIG. 12 shows an embodiment of a computer system 60

536 that incorporates any of the embodiments of the image

537 processing system 10 described herein. The computer system

538 60 includes a processing unit 62 (CPU), a system memory 64,

539 and a system bus 66 that couples processing unit 62 to the

540 various components of the computer system 60. The

541 processing unit 62 typically includes one or more

542 processors, each of which may be in the form of any one of

543 various commercially available processors. The system

544 memory 64 typically includes a read only memory (ROM) that

545 stores a basic input/output system (BIOS) that contains

546 start-up routines for the computer system 60 and a random

547 access memory (RAM) . The system bus 66 may be a memory bus,

548 a peripheral bus or a local bus, and may be compatible with

549 any of a variety of bus protocols, including PCI, VESA,

550 MicroChannel, ISA, and EISA. The computer system 60 also

551 includes a persistent storage memory 68 (e.g., a hard drive,

552 a floppy drive, a CD ROM drive, magnetic tape drives, flash

553 memory devices, and digital video disks) that is connected

554 to the system bus 66 and contains one or more computer-

555 readable media disks that provide non-volatile or persistent 556 storage for data, data structures and computer-executable

557 instructions.

558 A user may interact (e.g., enter commands or data) with

559 the computer 60 using one or more input devices 150 (e.g., a

560 keyboard, a computer mouse, a microphone, joystick, and

561 touch pad) . Information may be presented through a

562 graphical user interface (GUI) that is displayed to the user

563 on a display monitor 72, which is controlled by a display

564 controller 74. The computer system 60 also typically

565 includes peripheral output devices, such as speakers and a

566 printer. One or more remote computers may be connected to

567 the computer system 140 through a network interface card

568 (NIC) 76.

569 As shown in FIG. 12, the system memory 64 also stores

570 the image processing system 10, a GUI driver 78, and a

571 database 80 containing image files corresponding to the

572 image 16 and the enhanced image 48, intermediate processing

573 data, and output data. In some embodiments, the image

574 processing system 10 interfaces with the GUI driver 78 and

575 the user input 70 to control the creation of the

576 classification record 18 and the enhanced image 48. In some

577 embodiments, the computer system 60 additionally includes a

578 graphics application program that is configured to render

579 image data on the display monitor 72 and to perform various

580 image processing operations on the images 16, 48.

581 C. A SECOND EXEMPLARY IMAGE PROCESSING SYSTEM

582 ARCHITECTURE

583 FIG. 13 shows an embodiment of a digital camera system

584 82 that incorporates any of the embodiments of the image

585 processing system 10 described herein. The digital camera

586 system 82 may be configured to capture one or both of still

587 images and video image frames . The digital camera system 82

588 includes an image sensor 84 (e.g., a charge coupled device

589 (CCD) or a complementary metal -oxide -semiconductor (CMOS)

590 image sensor), a sensor controller 86, a memory 88, a frame

591 buffer 90, a microprocessor 92, an ASIC (application-

592 specific integrated circuit) 94, a DSP (digital signal

593 processor) 96, an I/O (input/output) adapter 98, and a 594 storage medium 100. In general, the image processing system

595 10 may be implemented by one or more of hardware and

596 firmware components. In the illustrated embodiment, the

597 image processing system 10 is implemented in firmware, which

598 is loaded into memory 88. The storage medium 100 may be

599 implemented by any type of image storage technology,

600 including a compact flash memory card and a digital video

601 tape cassette. The image data stored in the storage medium

602 100 may be transferred to a storage device (e.g., a hard

603 disk drive, a floppy disk drive, a CD-ROM drive, or a non- 604 volatile data storage device) of an external processing

605 system (e.g., a computer or workstation) via the I/O

606 subsystem 98.

607 The microprocessor 92 choreographs the operation of the

608 digital camera system 82. In some embodiments, the

609 microprocessor 92 is programmed with a mode of operation in

610 which a respective classification record 18 is computed for

611 one or more of the captured images. In some embodiments, a

612 respective enhanced image 48 is computed for one or more of

613 the captured images based on their corresponding

614 classification records 18.

615 VI. CONCLUSION

616 The embodiments that are described in detail herein are

617 capable of segmenting and enhancing images in ways that are

618 -robust to noise. These embodiments incorporate global

619 thresholding prior to watershed transform based image

620 segmentation in ways that achieve improved noise resistant

621 results, especially for images containing text. The global

622 thresholding eliminates or breaks noise structures in the

623 images before performing the watershed transform based image

624 segmentations. These embodiments also apply to the

625 watershed transform based segmentation results a unique

626 background segmentation method, which enables background

627 regions of image containing text to be efficiently segmented

628 without placing significant demand on processing and memory

629 resources. Some embodiments use the improved segmentation

630 results to enhance the images in various ways, including 631 correcting for nonuniform illumination, darkening target

632 object regions, and sharpening target object regions. The

633 improved segmentation results not only improve the

634 localization of such enhancements to target object regions,

635 but also improve the quality of the parameter values used to

636 implement such enhancements.

637 Other embodiments are within the scope of the claims.

Claims

WHAT IS CLAIMED IS:

638 1. A method, comprising:

639 determining gradient magnitude values at respective

640 pixels of a given image (16) ;

641 thresholding the gradient magnitude values with a

642 global threshold to produce thresholded gradient magnitude

643 values (20) ;

644 segmenting the pixels into respective groups in

645 accordance with a watershed transform of the thresholded

646 magnitude values (20); and

647 generating a classification record (18) labeling as

648 background pixels ones of the pixels segmented into one of

649 the groups determined to be largest in size and labeling as

650 non-background pixels ones of the pixels segmented into any

651 of the groups except the largest group.

1 2. The method of claim 1, further comprising, before

2 the determining, deriving the given image (16) from a

3 denoising of an upstream image.

1 3. The method of claim 1, further comprising

2 producing an enhanced image (48) from the pixel values of

3 the given image (16) and the classification record (18) ,

4 wherein the producing comprises estimating respective

5 illuminant values for the pixels of the given image (16) ,

6 including those pixels labeled as non-background pixels,

7 from the values of those pixels of the given image labeled

8 as background pixels.

1 4. The method of claim 5, wherein the producing

2 comprises computing pixel values of the enhanced image (48)

3 from ratios of spatially corresponding ones of the pixel

4 values of the given image (16) to respective tone values

5 determined from the estimated illuminant values.

1 5. The method of claim 6, wherein the computing

2 comprises

3 in response to determinations that the corresponding

4 estimated illuminant values are below a illuminant threshold value, setting pixel values of the enhanced image (48) darker than spatially corresponding ones of the pixel values of the given image (16) , and in response to determinations that the corresponding estimated illuminant values are above the illuminant threshold value, setting pixel values of the enhanced image (48) lighter than spatially corresponding ones of the pixel values of the given image (16) .

6. The method of claim 6, wherein the producing comprises sharpening the values of ones of the pixels labeled as non-background pixels to produce values of spatially corresponding ones of the pixels of the enhanced image (48) .

7. An apparatus, comprising: a preprocessing module (12) operable to determine gradient magnitude values at respective pixels of a given image (16) and to threshold the gradient magnitude values with a global threshold to produce thresholded gradient magnitude values (20) ; and a segmentation module (14) operable to segment the pixels into respective groups in accordance with a watershed transform of the thresholded magnitude values (20) , and generate a classification record (18) labeling as background pixels ones of- the pixels segmented into one of the groups determined to be largest in size and labeling as non- background pixels ones of the pixels segmented into any of the groups except the largest group.

8. The apparatus of claim 17, further comprising an image enhancement module (46) operable to produce an enhanced image (48) from the pixel values of the given image (16) and the classification record (18) .

9. The apparatus of claim 19, wherein the image enhancement module (48) is operable to estimate respective illuminant values for the pixels of the given image (16) , including those pixels labeled as non-background pixels, from the values of those pixels of the given image (16) labeled as background pixels.

10. The apparatus of claim 19, wherein the image enhancement module (48) is operable to sharpen the values of ones of the pixels labeled as non-background pixels to produce values of spatially corresponding ones of the pixels of the enhanced image (48) .