Recent Trends and Techniques in Text Detection and Text Localization in a Natural Scene: A Survey

Vijay Prasad, Pranab Das


Text information extraction from natural scene images is a rising area of research. Since text in natural scene images generally carries valuable details, detecting and recognizing scene text has been deemed essential for a variety of advanced computer vision applications. There has been a lot of effort put into extracting text regions from scene text images in an effective and reliable manner. As most text recognition applications have high demand of robust algorithms for detecting and localizing texts from a given scene text image, so the researchers mainly focus on the two important stages text detection and text localization. This paper provides a review of various techniques of text detection and text localization.

Full Text:



K. Jung, K. Kim, and A. Jain, “Text information extraction in images and video: A survey,” Pattern Recognit., vol. 37, pp. 977–997, May 2004.

W. Huang, Z. Lin, J. Yang, and J. Wang, “Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors,” in 2013 IEEE International Conference on Computer Vision, 2013, pp. 1241–1248.

B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 2963–2970.

C. Yao, X. Bai, and W. Liu, “A Unified Framework for Multioriented Text Detection and Recognition,” IEEE Trans. Image Process., vol. 23, no. 11, pp. 4737–4749, Nov. 2014.

X. Yin, X. Yin, K. Huang, and H. Hao, “Robust Text Detection in Natural Scene Images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 5, pp. 970–983, May 2014.

R. Lienhart and A. Wernicke, “Localizing and segmenting text in images and videos,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 4, pp. 256–268, 2002.

S. Uchida, “Text Localization and Recognition in Images and Video,” in Handbook of Document Image Processing and Recognition, D. Doermann and K. Tombre, Eds. London: Springer London, 2014, pp. 843–883.

J. Zhang and R. Kasturi, “Extraction of Text Objects in Video Documents: Recent Progress,” in 2008 The Eighth IAPR International Workshop on Document Analysis Systems, 2008, pp. 5–17.

J. J. Weinman, E. Learned-Miller, and A. R. Hanson, “Scene text recognition using similarity and a lexicon with sparse belief propagation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 10, pp. 1733–1746, Oct. 2009.

T. Wang, D. J. Wu, A. Coates, and A. Y. Ng, “End-to-end text recognition with convolutional neural networks,” in Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), 2012, pp. 3304–3308.

Xiangrong Chen and A. L. Yuille, “Detecting and reading text in natural scenes,” in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., 2004, vol. 2, pp. II–II.

J. Gllavata, R. Ewerth, and B. Freisleben, “Text detection in images based on unsupervised classification of high-frequency wavelet coefficients,” in Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004, vol. 1, pp. 425-428 Vol.1.

Huiping Li, D. Doermann, and O. Kia, “Automatic text detection and tracking in digital video,” IEEE Trans. Image Process., vol. 9, no. 1, pp. 147–156, Jan. 2000.

Q. Ye, W. Gao, and D. Zhao, “Fast and robust text detection in images and video frames,” Image Vis. Comput., vol. 23, pp. 565–576, 2005.

H. Zhang, K. Zhao, Y.-Z. Song, and J. Guo, “Text extraction from natural scene image: A survey,” Neurocomputing, vol. 122, pp. 310–323, 2013.

J. Zhang and R. Kasturi, “Text Detection Using Edge Gradient and Graph Spectrum,” in 2010 20th International Conference on Pattern Recognition, 2010, pp. 3979–3982.

B. Bai, F. Yin, and C. L. Liu, “Scene Text Localization Using Gradient Local Correlation,” in 2013 12th International Conference on Document Analysis and Recognition, 2013, pp. 1380–1384.

Y. Zhong, K. Karu, and A. K. Jain, “Locating text in complex color images,” Pattern Recognit., vol. 28, no. 10, pp. 1523–1535, 1995.

V. Y. Mariano and R. Kasturi, “Locating uniform-colored text in video frames,” in Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, 2000, vol. 4, pp. 539–542 vol.4.

H. Wu, B. Zou, Y. Zhao, and J. Guo, “Scene text detection using adaptive color reduction, adjacent character model and hybrid verification strategy,” Vis. Comput., vol. 33, no. 1, pp. 113–126, 2017.

J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baseline stereo from maximally stable extremal regions,” Image Vis. Comput., vol. 22, no. 10, pp. 761–767, 2004.

Y. Li and H. Lu, “Scene text detection via stroke width,” 2012, pp. 681–684.

C. Yu, Y. Song, and Y. Zhang, “Scene Text Localization Using Edge Analysis and Feature Pool,” Neurocomput., vol. 175, no. PA, pp. 652–661, Jan. 2016.

S. Lee, M. S. Cho, K. Jung, and J. H. Kim, “Scene Text Extraction with Edge Constraint and Text Collinearity,” in 2010 20th International Conference on Pattern Recognition, 2010, pp. 3983–3986.

C. Yi and Y. Tian, “Text Detection in Natural Scene Images by Stroke Gabor Words,” in 2011 International Conference on Document Analysis and Recognition, 2011, pp. 177–181.

Y. Pan, Y. Zhu, J. Sun, and S. Naoi, “Improving Scene Text Detection by Scale-Adaptive Segmentation and Weighted CRF Verification,” in 2011 International Conference on Document Analysis and Recognition, 2011, pp. 759–763.

G. Caner and I. Haritaoglu, “Shape-DNA: Effective Character Restoration and Enhancement for Arabic Text Documents,” in Proceedings of the 2010 20th International Conference on Pattern Recognition, 2010, pp. 2053–2056.

P. Shivakumara, T. Q. Phan, S. Bhowmick, C. L. Tan, and U. Pal, “A Novel Ring Radius Transform for Video Character Reconstruction,” Pattern Recogn., vol. 46, no. 1, pp. 131–140, Jan. 2013.

Y. Lou, A. L. Bertozzi, and S. Soatto, “Direct Sparse Deblurring,” J. Math. Imaging Vis., vol. 39, no. 1, pp. 1–12, 2011.

H. Cho, J. Wang, and S. Lee, “Text Image Deblurring Using Text-Specific Properties,” in Computer Vision -- ECCV 2012, 2012, pp. 524–537.

C. Yao, X. Zhang, X. Bai, W. Liu, Y. Ma, and Z. Tu, “Rotation-invariant features for multi-oriented text detection in natural images,” PLoS One, vol. 8, no. 8, pp. e70173–e70173, Aug. 2013.

K. Sheshadri and S. Divvala, “Exemplar Driven Character Recognition in the Wild,” 2012.

K. Kuramoto, W. Ohyama, T. Wakabayashi, and F. Kimura, “Accuracy Improvement of Viewpoint-Free Scene Character Recognition by Rotation Angle Estimation,” in Revised Selected Papers of the International Workshop on Camera-Based Document Analysis and Recognition - Volume 8357, 2013, pp. 60–70.

A. Coates et al., “Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning,” 2011, pp. 440–445.

J. Liu, H. Li, S. Zhang, and W. Liang, “A Novel Italic Detection and Rectification Method for Chinese Advertising Images,” in 2011 International Conference on Document Analysis and Recognition, 2011, pp. 698–702.

C.-Y. Lee, A. Bhardwaj, W. di, V. Jagadeesh, and R. Piramuthu, “Region-Based Discriminative Feature Pooling for Scene Text Recognition,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2014, pp. 4050–4057.

J. Parker, Algorithms for Image Processing and Computer Vision. 1997.

D. Zhang and F. Chang, “A Bayesian framework for fusing multiple word knowledge models in videotext recognition,” 2003, vol. 2, pp. II–528.

J. Weinman, E. Learned-Miller, and A. Hanson, “A Discriminative Semi-Markov Model for Robust Scene Text Recognition,” in IEEE, Proc. Intl. Conf. on Pattern Recognition (ICPR, 2008, vol. 2008, pp. 1–5.

C. Shi, C. Wang, B. Xiao, Y. Zhang, S. Gao, and Z. Zhang, “Scene Text Recognition Using Part-Based Tree-Structured Character Detection,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 2961–2968.

A. Mishra, K. Alahari, and C. V Jawahar, “Top-down and bottom-up cues for scene text recognition,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2687–2694.


  • There are currently no refbacks.


The ADBU Journal of Engineering Technology (AJET)" ISSN:2348-7305

This journal is published under the terms of the Creative Commons Attribution (CC-BY) (

Number of Visitors to this Journal: