The project final report is here.
Recognizing Chinese Characters in Scene Text
Wednesday, June 6, 2012
Tuesday, June 5, 2012
Monday, June 4, 2012
Trying a New Technique for Generating Features
I'm trying to implement the unsupervised learning algorithm in the following paper:
Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning
Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning
Wednesday, May 23, 2012
Geometric Shapes Test
I created a simple dataset featuring three classes shown below. I assumed that the detector would perform fine for circles and squares. I was curious primarily about how it would perform in distinguishing squares from nested squares.
The following images show some representative detection results.
As expected, the detector seems to perform well distinguishing circles from squares (but without perfect recall even in this simple case - see the third image). Somewhat as expected, it does not perform well distinguishing squares from nested squares.
I'm curious about which characteristics explain the detection of nested squares as squares in most cases, but as a nested square in the one case.
The figures below show the HOGs for each of the 3 classes at two spatial bin sizes.
As one would expect, the HOG for a circle is visually very distinct from those of squares and nested squares. The HOGs for squares and nested squares appear much more similar, but visually they appear distinct. My intuition is that the differences should matter statistically given enough samples, so is it possible that this is illustrating that ~1000 training examples is not sufficient for the random ferns classifier? On the other hand, nested squares are in fact 2 squares at slightly different scales, so it seems there is an argument that in a scale insensitive detector that correct performance would be to detect 2 squares, and a single nested square. In this case, some downstream component would have to distinguish between the 2 detected squares and the single nested square.
![]() |
| Geometric shapes class exemplars |
![]() |
| Geometric shapes test example image |
![]() |
![]() |
![]() |
I'm curious about which characteristics explain the detection of nested squares as squares in most cases, but as a nested square in the one case.
The figures below show the HOGs for each of the 3 classes at two spatial bin sizes.
| sBin = 8 | ![]() |
![]() |
![]() |
| sBin = 6 | ![]() |
![]() |
![]() |
| Circle | Square | Nested Squares |
As one would expect, the HOG for a circle is visually very distinct from those of squares and nested squares. The HOGs for squares and nested squares appear much more similar, but visually they appear distinct. My intuition is that the differences should matter statistically given enough samples, so is it possible that this is illustrating that ~1000 training examples is not sufficient for the random ferns classifier? On the other hand, nested squares are in fact 2 squares at slightly different scales, so it seems there is an argument that in a scale insensitive detector that correct performance would be to detect 2 squares, and a single nested square. In this case, some downstream component would have to distinguish between the 2 detected squares and the single nested square.
Saturday, May 19, 2012
Tuning the HOGs
The following images show: the Chinese character (pinyin: yi1), the HOG for that character with spatial bin size = 6, and the HOG for that character with the spatial bin size = 8. There is detail evident with sBin = 6 that is not visible with sBin = 8. I think we can conclude that sBin is important in capturing certain visual details.
Note a side effect of reducing sBin to 6. The top and bottom row of HOGs appear to have no information in them.
|
|
|
With sBin = 6, 1000 known negative images, and 1000 hard negatives, the detector performance is given in the following graph. Note that there a range of threshold which gives perfect precision and recall! Before getting too excited, look at the following image showing the detection results.
So, although the detector has fired on the horizontal lines, there's a question about resolution/fidelity of the bounding boxes, scale, and sensitivity. Why do the bounding boxes exhibit so much variability in their size and location relative to the actual detected pixels? Why does the detector fire at multiple scales? Why is the selection threshold that works for this sample so much different than the other data sets? At the threshold that works for this example, the previous data set would have extremely poor precision.
Wednesday, May 16, 2012
Increasing the Number of Trained Characters
In the previous runs, we had trained 11 characters. In this run, we increased the number of trained characters to 25 by adding 14 of the 100 most common Chinese characters to the 11 used in the prior runs.
The chart below shows precision, recall, and F-score for different values of selection threshold with the target window size 48x48. The best F-score is .57 at threshold = 160. This is a slightly lower max F-score than the case where only 11 characters are trained.
The chart below shows precision, recall, and F-score for different values of selection threshold with the target window size 48x48. The best F-score is .57 at threshold = 160. This is a slightly lower max F-score than the case where only 11 characters are trained.
Tuesday, May 15, 2012
Increasing the Number of Orientation Bins
I increased the number of HOG orientation bins to 12 from 8.
The chart below shows precision, recall, and F-score for different values of selection threshold with the target window size 48x48. The best F-score is .61 at threshold = 150. This is a slightly lower max F-score than using the original 8 orientation bins.
Subscribe to:
Comments (Atom)

















