The classification is surprisingly simple - k-nearest neighbors on a 27-dimensional feature vector extracted from each drawing.
The features:
- Stroke count
- Point density across 6 horizontal and 6 vertical bands (where is the ink?)
- Direction histogram across 8 compass directions (which way are strokes going?)
- Aspect ratio and total stroke length
- First stroke start position, last stroke end position
The training set is ~64k hand-drawn samples from the original Detexify project. Each sample gets preprocessed and converted to this 27D vector. Classification is then just finding the k nearest training samples by Euclidean distance and returning the most common symbols among them.
The classification is surprisingly simple - k-nearest neighbors on a 27-dimensional feature vector extracted from each drawing.
The features: - Stroke count - Point density across 6 horizontal and 6 vertical bands (where is the ink?) - Direction histogram across 8 compass directions (which way are strokes going?) - Aspect ratio and total stroke length - First stroke start position, last stroke end position
The training set is ~64k hand-drawn samples from the original Detexify project. Each sample gets preprocessed and converted to this 27D vector. Classification is then just finding the k nearest training samples by Euclidean distance and returning the most common symbols among them.