Character Segmentation Technique for Printed and Handwritten Devanagari Script without Extraction of Shirorekha

Ambadas Balu Shinde, Yogesh Hari Dandawate

Abstract


In India, there is a lot of literature available in the Devanagari script as well as Devanagari is most frequently used for written, oral correspondence and documentation reasons. How correctly the character segmentation of the Devanagari content is done, will ultimately decide the exactness of the OCR process. In this paper, we have proposed the character segmentation strategy along with existence of Shirorekha framed for printed as well as handwritten text written in Marathi language. Several methods utilized for pre-processing the document images like document binarization, skew identification and correction are discussed in this paper. We took vertical projection of the segmented words, compared the pixel count with the automatically calculated threshold and then characters were separated. With the proposed strategy, we have achieved 100 % exactness in line and word segmentation and results accomplished for character segmentation are a lot of encouraging. No standard dataset of Marathi characters is available having upper and lower modifiers. Non-availability of the standard datasets with modifiers is a significant issue in using a deep learning network for perceiving the Devanagari characters. The segmented characters with the presence of Shirorekha can be straightforwardly utilized for developing the deep learning OCR framework.

Full Text:

PDF

References


D. Ghosh, T. Dube, and A. P. Shivaprasad, Script Recognition — A Review, IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 32, no. 12, (2010), pp. 2142–2161.

A. Domale, B. Padalkar, R. Parekh, and M. A. Joshi, Printed book to audio book converter for visually impaired, Proc. - 2013 Texas Instruments India Educ. Conf. TIIEC 2013, (2013), pp. 114–120.

H. Ma and D. Doermann, Adaptive Hindi OCR using generalized Hausdorff image comparison, ACM Trans. Asian Lang. Inf. Process., vol. 2, no. 3, (2003), pp. 193–218.

U. Pal and B. B. Chaudhuri, Indian script character recognition: A survey, Pattern Recognit., vol. 37, no. 9, (2004), pp. 1887–1899.

P. K. Aithal, G. Rajesh, P. C. Siddalingaswamy, and D. U. Acharya, A novel skew estimation approach using radon transform, Proc. 2011 11th Int. Conf. Hybrid Intell. Syst. HIS 2011, (2011), pp. 1–4.

N. Sahu, R. K. Rathy, and I. Kashyap, Survey and Analysis of Devnagari Character Recognition Techniques using Neural Networks, Int. J. Comput. Appl., vol. 47, no. 15, (2012), pp. 13–18.

B. B. C. and U. Pal, OCR Devanagari, An OCR Syst. to Read Two Indian Lang. Scripts: Bangla and Devanagari (Hindi), (1997), pp. 1011–1015.

S. Marinai, Text retrieval from early printed books, Int. J. Doc. Anal. Recognit., vol. 14, no. 2, (2011), pp. 117–129.

B. V. Dhandra, V. S. Malemath, H. Mallikarjun, and R. Hegadi, Skew detection in binary image documents based on image dilation and region labeling approach, Proc. - Int. Conf. Pattern Recognit., vol. 2, (2006), pp. 954–957.

S. Lu, B. Su, and C. L. Tan, Document image binarization using background estimation and stroke edges, Int. J. Doc. Anal. Recognit., vol. 13, no. 4, (2010), pp. 303–314.

B. M. Singh and Mridula, Efficient binarization technique for severely degraded document images, CSI Trans. ICT, vol. 2, no. 3, (2014), pp. 153–161.

B. B. Chaudhuri and U. Pal, Skew angle detection of digitized indian script documents, IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 2, (1997), pp. 182–186.

D. Wang, X. Wang, and J. Liu, A skew angle detection algorithm based on maximum gradient difference, Proc. 2011 Int. Conf. Transp. Mech. Electr. Eng. TMEE 2011, (2011), pp. 1747–1750.

A. Alaei, U. Pal, P. Nagabhushan, and F. Kimura, A painting based technique for skew estimation of scanned documents, Proc. Int. Conf. Doc. Anal. Recognition, ICDAR, (2011), pp. 299–303.

X. Qi, L. Ma, C. Sun, and J. Liu, Fast skew angle detection algorithm for scanned document images, Proc. - PACCS 2011 2011 3rd Pacific-Asia Conf. Circuits, Commun. Syst., 2011, (2011).

A. B. Shinde and Y. H. Dandawate, Shirorekha extraction in Character Segmentation for printed devanagri text in Document Image Processing, 11th IEEE India Conf. Emerg. Trends Innov. Technol. INDICON 2014, (2014), pp. 1-7.


Refbacks

  • There are currently no refbacks.


------------------------------------------------------------------------------------------------------------------------

The ADBU Journal of Engineering Technology (AJET)" ISSN:2348-7305

This journal is published under the terms of the Creative Commons Attribution (CC-BY) (http://creativecommons.org/licenses/)

Number of Visitors to this Journal: