Script Identification Using Bag-of-Words with Entropy-weighted Patches
May 24, 2017·
,·
0 min read
Jan Zdenek
Hideki Nakayama
Abstract
The increasing interest in scene text reading in multilingual environments raises the need to recognize and distinguish between different writing systems. In this paper, we propose a novel method for script identification using convolutional features for the traditional bag-of-words model in a combination with weighting by means of intra-cluster information entropy. This approach exploits the expressive representation of convolutional neural networks, which have displayed outstanding performance in many text analysis and recognition tasks in recent years, discriminative power of script-characteristic features, and generalization abilities of bag-of-words model. The proposed method is evaluated on two public benchmark datasets for script identification. The experiments demonstrate that our method outperforms the baseline and yields competitive results.
Type
Publication
Japanese Society for Artificial Intelligence Annual Conference (JSAI 2017)

Authors
Research Scientist
Jan is a research scientist at CyberAgent, where he works on artificial intelligence and computer vision
with a focus on image generation and editing. He received his PhD in Information Science and Technology
from the University of Tokyo, where his research centered on image generation. Prior to that, he received
his Master’s degree in Creative Informatics from the University of Tokyo, and his Bachelor’s degree
in Computer and Information Science from the Czech Technical University in Prague.
Born and raised in the Czech Republic, he currently works in Japan.
Authors