Bag of Local Convolutional Triplets for Script Identification in Scene Text

Abstract

The increasing interest in scene text reading in multilingual environments raises the need to recognize and distinguish between different writing systems. In this paper, we propose a novel method for script identification in scene text using triplets of local convolutional features in combination with the traditional bag-of-visual-words model. Feature triplets are created by making combinations of descriptors extracted from local patches of the input images using a convolutional neural network. This approach allows us to generate a more descriptive codeword dictionary for the bag-of-visual-words model, as the low discriminative power of weak descriptors is enhanced by other descriptors in a triplet. The proposed method is evaluated on two public benchmark datasets for scene text script identification and a public dataset for script identification in video captions. The experiments demonstrate that our method outperforms the baseline and yields competitive results on all three datasets.

Publication
The 14th IAPR International Conference on Document Analysis and Recognition (ICDAR 2017).