Show HN: Fast handwritten LaTeX symbol recognition (Rust/WASM)

3 points | by captures 10 hours ago

1 comments

captures 10 hours ago
The classification is surprisingly simple - k-nearest neighbors on a 27-dimensional feature vector extracted from each drawing.
The features: - Stroke count - Point density across 6 horizontal and 6 vertical bands (where is the ink?) - Direction histogram across 8 compass directions (which way are strokes going?) - Aspect ratio and total stroke length - First stroke start position, last stroke end position
The training set is ~64k hand-drawn samples from the original Detexify project. Each sample gets preprocessed and converted to this 27D vector. Classification is then just finding the k nearest training samples by Euclidean distance and returning the most common symbols among them.