Recognizing Chinese Characters in Scene Text: Increasing the Size of the Target Window

The detector works by sliding a target window over the image being searched. For each target window, the feature vector is computed. The computed feature vector is classified using the random ferns classifier. In this case, the feature vector is the set of HOGs computed for sub-windows of the target window, concatenated into a single vector for the target window.

The original code used a 48x48 pixel window size. One hypothesis is that a larger window size might preserve more discriminative information. By increasing the number of points at which gradients are computed, the computed HOG may exhibit higher fidelity. I'm not convinced of this, but it's an easy thing to check empirically.

The chart below shows precision, recall, and F-score for different values of selection threshold with the target window size 48x48 (this is the value used in the original paper). The best F-score is .69 at threshold = 150.

I changed the window size to 72x72, a 50% increase in each axis and 225% increase in area. Without changing the HOG parameters, this results in a 225% increase in the size of the feature vector. A 225% increase in feature vector bits describing a region 225% larger seems to net out to the same bits / area, so I remain dubious of this hypothesis.

The chart below shows precision, recall, and F-score for different values of selection threshold with the target window size 72x72. The best F-score is .69 at threshold = 110.

So although the behavior of the detector is clearly changed by differing window sizes (that is, the precision-recall curve changes), the basic shape of the precision, recall, and F-score graphs is similar, and the maximum F-score is the same for both.

Recognizing Chinese Characters in Scene Text

Tuesday, May 15, 2012

Increasing the Size of the Target Window

No comments:

Post a Comment

About Me