Script Identification Using Bag-of-Words with Entropy-weighted Patches

May 24, 2017·
Jan Zdenek
Jan Zdenek
,
Hideki Nakayama
· 0 min read
PDF
Abstract
The increasing interest in scene text reading in multilingual environments raises the need to recognize and distinguish between different writing systems. In this paper, we propose a novel method for script identification using convolutional features for the traditional bag-of-words model in a combination with weighting by means of intra-cluster information entropy. This approach exploits the expressive representation of convolutional neural networks, which have displayed outstanding performance in many text analysis and recognition tasks in recent years, discriminative power of script-characteristic features, and generalization abilities of bag-of-words model. The proposed method is evaluated on two public benchmark datasets for script identification. The experiments demonstrate that our method outperforms the baseline and yields competitive results.
Type
Publication
Japanese Society for Artificial Intelligence Annual Conference (JSAI 2017)
publications_non-peer-review
Jan Zdenek
Authors
Research Scientist
Jan is a research scientist at CyberAgent, where he works on artificial intelligence and computer vision with a focus on image generation and editing. He received his PhD in Information Science and Technology from the University of Tokyo, where his research centered on image generation. Prior to that, he received his Master’s degree in Creative Informatics from the University of Tokyo, and his Bachelor’s degree in Computer and Information Science from the Czech Technical University in Prague. Born and raised in the Czech Republic, he currently works in Japan.