Wednesday, May 9, 2012

Challenges

Characters with Simple Geometry

There are some characters that have strong similarity to very basic features of most images. For example, the characters , , , all have very basic geometry. We will be on the lookout for techniques that can be successful for characters like these.

Scale Variation
Many ideographs and pictographs that comprise Chinese characters ( 汉字 ) are characters in their own right, and also appear as sub-components of other characters. The ideograph/pictograph reuse occurs at multiple scales. For example,

and

illustrate the issue at a single scale. While the characters

,, 寸 and 豆腐

illustrate the issue at multiple scales. My initial approach to this problem is to determine whether the detected bounding boxes actually nest the way they would with "perfect" recognition, and then whether a specific application of non-maximal suppression can correctly discard the nested character elements in favor of the larger character's bounding box.

General Training Sets

The experiments I've run so far have used tiny training sets. A key result of this work should be to characterize the performance of the technique using training sets comprised of several hundred to several thousand characters.

No comments:

Post a Comment