Monday, April 30, 2012

Improving the Training

No pictures this week (but I do have 210 words, or about 21% of a picture) In my initial attempts to reproduce the previous results, I had simplified the training of the classifier for various reasons. This week I have been focused on getting the training to be equivalent to the scheme used in the prior work. Specifically,

1. Added the step in which known negative images are sampled and patches collected to serve as the "background" or "NOT a character" class during training.
2. I'm still working on adding back the inclusion of "hard negative" cases. In this step, we run the detection against known negative images and record the false positives. The false positive patches are saved as additional training samples for the "NOT a character" class. Finally, the ferns are retrained using good character training images for each character (class), and both "easy" and "hard" negative images for the "NOT a character" class.

The minor challenge in this step is determining the selection threshold used to select candidate bounding boxes. In the prior work, this value was determined by manual experimentation. I'm hoping to develop a routine to systematize the determination of this value, perhaps by optimizing the F-score to a user-specified bias for recall vs. precision.

Sunday, April 22, 2012

Detecting Multiple Hanzi in a Single Image

The following image shows the results of running the character detector on the sign13.jpg image with the classifier trained for 5 of the characters present in the image ( 向,前,小,大,文 ). Note that it fails to detect 文. Also, this set of bounding boxes is selected using a hand-picked threshhold value. The next image shows that with another value for threshold, the detector returns a lot of false detections, and a lot of noise (extra bounding boxes at different scales) for true detections.

So, it's apparent that I need to figure both a) how to "tune" the detector and b) select the "best" candidate bounding box.

threshold hand-picked for good results
results with poorly chosen threshold
results with previous "poorly chosen" threshold, and NMS applied


Friday, April 20, 2012

Visualizing the HOGs

The histograms of oriented gradients are computed over the entire image by dividing the image into cells that are small relative to the image size. So you end up with a grid of HOGs covering the entire image. Variables in this step include: cell size, overlap between adjacent cells, number of spatial bins, number of orientation bins. The following image shows a visualization of the HOGs computed for the sign13.jpg image using the default values.

Friday, April 13, 2012

It's alive


To borrow from Shelley, "it's alive". The following images shows the first glimmer of the basic character detection working. The number of caveats is too numerous to enumerate, but after a week of wrestling with Matlab this is a small victory.

Sunday, April 8, 2012

Synthetic Character Training Images

I haven't figured out to get MATLAB to display Unicode strings, despite a significant amount of research. I gave up and wrote a Java program to create the synthetic training images used with the Ferns classifier. The following image is an example.


Most Common Chinese Characters

One obvious challenge in recognizing Chinese characters (hanzi) and words in images is the tremendous number of characters in written Chinese. For now, I will be relying on work by Professor Jun Da at Middle Tennessee State University. In particular, the character frequency lists I'm using are here.

The frequency lists allow me to very simply use the top n most frequently occurring hanzi. One key desired outcome of this project is to see if I can characterize the performance of the recognizer in terms of n.

Wednesday, April 4, 2012

References

Histograms of Oriented Gradients is used for feature detection. This is the primary paper on HOGs. This presentation discusses HOGs.

A nice presentation on Random Forests and Ferns.

This post collects the references I've looked at for this project. So far, I'm only skimming the abstracts.

Video Character Recognition Through Hierarchical Classification
Text Detection in Natural Scene Images by Stroke Gabor Words
Enhanced Active Contour Method for Locating Text
A New Feature Optimization Method Based on Two-directional 2DLDA For Handwritten Chinese Character Recognition
Efficient Cut-off  Threshold Estimation for Word Spotting Applications

Sample Images

The following images give an intuitive understanding of the problem.

A busy Chinese city street
Fumin Lu street sign in the French Concession

Fumin Lu street sign close up
A neighborhood tire dealer
Tire dealer sign close up

A sign at a tourist attraction
Circular text is beyond the scope of this project!

Tuesday, April 3, 2012

Tool Chain Setup and GrOCR code (Plex v1.02)

I had to wrestle with my Matlab environment for quite awhile to get the Plex code running at all. It relies on 2 libraries (LIBSVM and Piotr Dollar's Matlab Toolbox). Neither would install or build for me without errors. I still do not have it running on OS X, but I've gotten Kai's "quick demo" working on Windows 7. I'm using Matlab R2010a.  Next step is to see if I can get the walkthrough of the evaluation code working.

[Update - 4/8/2012] The problem on Mac OS X turned out to be an assumption by Matlab about the target version of the OS (e.g. 10.5 vs. 10.6) that editing a config file seemed to resolve. I also have been able to verify Kai's code on my Windows 7 + Matlab R2010a (32-bit Student Version) box to the point of running the demoSVT() script successfully.

Motivation - GrOCR at UCSD

I try to keep an eye on the website for the UCSD Computer Vision group. In early 2011, I became aware of Kai Wang's GrOCR project. I during the summer of 2011 I lived in Shanghai for 3 months to begin studying Mandarin. As an aside, if you want to study Mandarin in Shanghai I highly recommend John Pasden's company AllSet Learning. I continue to study Mandarin at UCSD. My fascination with Chinese characters (hanzi) intersected my fascination with Kai's work, and this project was conceived.

Monday, April 2, 2012

Abstract

In this project I'm trying to recognize Chinese characters in scene text in unconstrained images. My initial approach is to apply the techniques of Wang, Babenko, and Belongie described in End-to-end Scene Text Recognition and Word Spotting in the Wild.

The project proposal is here.