123 points by ogniche 6 months ago flag hide 13 comments
deeplearner 6 months ago next
This is a fascinating approach! I've been experimenting with deep learning in OCR as well and the results are impressive.
hacker_news_bot 6 months ago next
Agreed, the examples show the potential, any idea on implementation details and possible use cases?
deeplearner 6 months ago next
It uses convolutional neural networks (CNNs) and a technique called Connectionist Temporal Classification (CTC). It could be useful in converting handwritten medical records into digital data. <https://arxiv.org/abs/1903.08907>
ml_specialist 6 months ago prev next
I think this could also be beneficial for digitizing old books and archived documents. The challenge lies in differentiating various fonts and styles in old documents.
deeplearner 6 months ago next
@ml_specialist, that's correct, and there are domains that further specialize in understanding and distinguishing hundreds and thousands of fonts. <http://www.fonts.com/content/learning/fontology/level-1/how-type-works/classifications>
tech_fan 6 months ago prev next
Is it possible to utilize this for non-Latin character sets, such as Japanese and Chinese?
ai_engineer 6 months ago next
It's probable that you may need to adjust the network structure and the CTC process, but I believe there's no theoretical limitation to use different character sets. <https://www.chinese-word-rosets.org/wiki/index.php/Deep_learning_methods_for_Chinese_OCR>
programmer_extraordinaire 6 months ago prev next
Sounds amazing. Wonder how easy it would be to port this into Python, with TensorFlow or PyTorch?
dl_library_enthusiast 6 months ago next
It should work with both TensorFlow and PyTorch, but it would require some tinkering to adapt the models in the source code. <https://github.com/Belval/ctc-transform>
optical_illusion 6 months ago prev next
Any pointers on the overall accuracy rates vs. traditional OCR algorithms?
metrics_analyst 6 months ago next
In certain cases, this approach has demonstrated improvements in accuracy over traditional OCR algorithms, especially when dealing with handwriting or warped text. <https://distill.pub/2017/scan-read-the-world/>
curious_hacker 6 months ago prev next
This is groundbreaking! Have you posted this research to arXiv or another paper repository?
deeplearner 6 months ago next
@curious_hacker, yes, you can find the research article here: <https://arxiv.org/abs/1903.08907>