The original code used a 48x48 pixel window size. One hypothesis is that a larger window size might preserve more discriminative information. By increasing the number of points at which gradients are computed, the computed HOG may exhibit higher fidelity. I'm not convinced of this, but it's an easy thing to check empirically.
The chart below shows precision, recall, and F-score for different values of selection threshold with the target window size 48x48 (this is the value used in the original paper). The best F-score is .69 at threshold = 150.
I changed the window size to 72x72, a 50% increase in each axis and 225% increase in area. Without changing the HOG parameters, this results in a 225% increase in the size of the feature vector. A 225% increase in feature vector bits describing a region 225% larger seems to net out to the same bits / area, so I remain dubious of this hypothesis.
The chart below shows precision, recall, and F-score for different values of selection threshold with the target window size 72x72. The best F-score is .69 at threshold = 110.
So although the behavior of the detector is clearly changed by differing window sizes (that is, the precision-recall curve changes), the basic shape of the precision, recall, and F-score graphs is similar, and the maximum F-score is the same for both.


No comments:
Post a Comment