Bag of Local Convolutional Triplets for Script Identification in Scene Text
Abstract
The increasing interest in scene text reading in multilingual environments raises the need to recognize and distinguish between different writing systems. In this paper, we propose a novel method for script identification in scene text using triplets of local convolutional features in combination with the traditional bag-of-visual-words model. Feature triplets are created by making combinations of descriptors extracted from local patches of the input images using a convolutional neural network. This approach allows us to generate a more descriptive codeword dictionary for the bag-of-visual-words model, as the low discriminative power of weak descriptors is enhanced by other descriptors in a triplet. The proposed method is evaluated on two public benchmark datasets for scene text script identification and a public dataset for script identification in video captions. The experiments demonstrate that our method outperforms the baseline and yields competitive results on all three datasets.
Type
Publication
The 14th IAPR International Conference on Document Analysis and Recognition (ICDAR 2017)

Authors
Research Scientist
Jan is a research scientist at CyberAgent, where he works on artificial intelligence and computer vision
with a focus on image generation and editing. He received his PhD in Information Science and Technology
from the University of Tokyo, where his research centered on image generation. Prior to that, he received
his Master’s degree in Creative Informatics from the University of Tokyo, and his Bachelor’s degree
in Computer and Information Science from the Czech Technical University in Prague.
Born and raised in the Czech Republic, he currently works in Japan.
Authors