Recognizing Chinese Characters in Scene Text: Tuning the HOGs

The following images show: the Chinese character (pinyin: yi1), the HOG for that character with spatial bin size = 6, and the HOG for that character with the spatial bin size = 8. There is detail evident with sBin = 6 that is not visible with sBin = 8. I think we can conclude that sBin is important in capturing certain visual details.

yi4

HOG: sBin = 6, oBin = 8

HOG: sBin = 8, oBin = 8

Note a side effect of reducing sBin to 6. The top and bottom row of HOGs appear to have no information in them.

With sBin = 6, 1000 known negative images, and 1000 hard negatives, the detector performance is given in the following graph. Note that there a range of threshold which gives perfect precision and recall! Before getting too excited, look at the following image showing the detection results.

So, although the detector has fired on the horizontal lines, there's a question about resolution/fidelity of the bounding boxes, scale, and sensitivity. Why do the bounding boxes exhibit so much variability in their size and location relative to the actual detected pixels? Why does the detector fire at multiple scales? Why is the selection threshold that works for this sample so much different than the other data sets? At the threshold that works for this example, the previous data set would have extremely poor precision.

Recognizing Chinese Characters in Scene Text

Saturday, May 19, 2012

Tuning the HOGs

No comments:

Post a Comment

About Me