Script Identification Using Bag-of-Words with Entropy-weighted Patches

May 24, 2017·

Jan Zdenek

Hideki Nakayama

· 0 min read

Abstract

The increasing interest in scene text reading in multilingual environments raises the need to recognize and distinguish between different writing systems. In this paper, we propose a novel method for script identification using convolutional features for the traditional bag-of-words model in a combination with weighting by means of intra-cluster information entropy. This approach exploits the expressive representation of convolutional neural networks, which have displayed outstanding performance in many text analysis and recognition tasks in recent years, discriminative power of script-characteristic features, and generalization abilities of bag-of-words model. The proposed method is evaluated on two public benchmark datasets for script identification. The experiments demonstrate that our method outperforms the baseline and yields competitive results.

Type

Conference paper

Publication

Japanese Society for Artificial Intelligence Annual Conference (JSAI 2017)

Last updated on May 24, 2017

Non-Peer-Reviewed

Authors

Jan Zdenek

Research Scientist

Jan is a research scientist at CyberAgent, where he works on artificial intelligence and computer vision with a focus on image generation and editing. He received his PhD in Information Science and Technology from the University of Tokyo, where his research centered on image generation. Prior to that, he received his Master’s degree in Creative Informatics from the University of Tokyo, and his Bachelor’s degree in Computer and Information Science from the Czech Technical University in Prague. Born and raised in the Czech Republic, he currently works in Japan.

Authors

Hideki Nakayama

← Erasing Scene Text Using a General Inpainting Network Jul 1, 2019

No results found

Script Identification Using Bag-of-Words with Entropy-weighted Patches