An In-depth Study on POS Tagging for Assamese Language

Dr. Karabi Kherkatary Boro, Uzzal Sharma

Abstract


Abstract:  An automatic POS tagger is very essential component of any Natural Language Processing (NLP) work. It is one of the important steps towards the processing of Natural Language. There are various challenges in the tagging of POS and most of the time these are language-dependent. Assamese is one of the morphologically rich and free word order language. Because of this, the challenges are even more. In the present paper, the basic concept of the POS tagger and its importance in the NLP is discussed. In the later part of the paper, the overall characteristics of the Assamese language are discussed in short and its various challenges, that may raise towards the tagging of POS is discussed. The paper also discusses about the various POS techniques that are commonly used in the tagging of POS for the Assamese language.


Full Text:

PDF

References


Reference:

Bahl, L. R. and Mercer, R. L. (1976). Part of speech assignment by a statistical decision algorithm. In Proceedings IEEE International Symposium on Information Theory, 88–89.

Gil, D. (2000). Syntactic categories, cross-linguistic variation and universal grammar. In Vogel, P. M. and Comrie, B. (Eds.), Approaches to the Typology of Word Classes, 173– 216. Mouton.

Karlsson, F., Voutilainen, A., Heikkil¨a, J., and Anttila, A. (Eds.). (1995). Constraint Grammar: A Language- Independent System for Parsing Unrestricted Text. Mouton de Gruyter.

Voutilainen, A. (1995). Morphological disambiguation. In Karlsson, F., Voutilainen, A., Heikkil¨a, J., and Anttila, A. (Eds.), Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text, 165–284. Mouton de Gruyter.

Arora, Sanjeev, Ge, Rong, Halpern, Yonatan, Mimno, David, Moitra, Ankur, Sontag, David, Wu, Yichen, and Zhu, Michael. A practical algorithm for topic modeling with provable guarantees. In Proceedings of The 30th International Conference on Machine Learning, pp. 280– 288, 2013.

Lee, Moontae, Mimno, David, and Bindel, David. Robust spectral inference for joint stochastic matrix factorization. In Advances in neural information processing systems, 2015.

James Allen. Natural Language Understanding. Pearson Education, Singapore, second edition, 2004.

A.Chen and F.C.Grey, “Generating statistical Hindi stemmer from Parallel texts,” ACM Trans. Asian Language Inform Process, vol. 2(3), 2003.

S. Dasgupta and V. Ng, “Unsupervised morphological parsing of Bengali,” Language Resources and Evaluation, pp. 311–330, 2006.

Yair Halevi, “Part of Speech Tagging”, Seminar in Natural Language Processing and Computational Linguistics (Prof. Nachum Dershowitz), School of Computer Science, Tel Aviv University, Israel, April 2006.

Xuedong Huang, Alex Acero, Hsiao-Wuen Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Prentice Hall, 2001.

Fahim Muhammad Hasan, Naushad UzZaman, Mumit Khan, “Comparison of Different POS Tagging Techniques (n-grams, HMM and Brill’s Tagger) for Bangla”, International Conference on Systems, Computing Sciences and Software Engineering (SCS2 06) of International Joint Conferences on Computer, Information, and Systems Sciences, and Engineering (CIS2E 06), December 4-14, 2006.

Karthik Kumar G, Sudheer K, Avinesh Pvs, “Comparative Study of Various Machine Learning Methods for Telugu Part of Speech Tagging”, In Proceeding of the NLPAI Machine Learning Competition, 2006.

Linda Van Guilder, “Automated Part of Speech Tagging: A Brief Overview”, Handout for LING361, Georgetown University, Fall 1995.

Andrew MacKinlay, “The Effects of Part-of-Speech Tagsets on Tagger Performance”, Undergraduate Thesis, University of Melbourne, 2005.

Manoj Kumar C, “Stochastic Models for POS Tagging”, IIT Bombay, 2005.

L. E. Baum, “An Inequality and Associated Maximization Technique in Statistical Estimation on Probabilistic Functions of a Markov Process”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume: 22, Issue: 4, April 2000, pp. 371-377.

Juan Antonio P´erez-Ortiz and Mikel L. Forcada, “Part-of-Speech Tagging with Recurrent Neural Networks”, Universitat d’Alacant, Spain, 2002.

Dhanalakshmi V, Anand Kumar1, Shivapratap G, Soman KP and Rajendran S, “Tamil POS Tagging using Linear Programming”, International Journal of Recent Trends in Engineering, Vol. 1, No. 2, May 2009.

Gurleen Kaur Sidhu, Navjot Kaur, “Role of Machine Translation and Word Sense Disambiguation in Natural Language Processing”, IOSR Journal of Computer Engineering (IOSR-JCE), May. - Jun. 2013.

Hem Chandra Baruah. Assamiya Vyakaran. Hemkosh Prakashan, Guwahati, 2003.

D. Deka and B. Kalita. Adhunik Rasana Bisitra. Assam Book Dipot, Guwahati, 7th edition, 2007.

P. Sharma, U. Sharma and J. Kalita, "Suffix stripping based NER in Assamese for location names," 2012 2nd National Conference on Computational Intelligence and Signal Processing (CISP), Guwahati, Assam, 2012, pp. 91-94, doi: 10.1109/NCCISP.2012.6189684.

A collection of Linguistics and Phonological essays written by a number of eminent writers, edited by Naherandra Padum and published by Sri Ujjal Hazarika, Bani Mandir, Dibrugarh-786001, August, 2004, ISBN: 81-7206-189-4.


Refbacks

  • There are currently no refbacks.


------------------------------------------------------------------------------------------------------------------------

The ADBU Journal of Engineering Technology (AJET)" ISSN:2348-7305

This journal is published under the terms of the Creative Commons Attribution (CC-BY) (http://creativecommons.org/licenses/)

Number of Visitors to this Journal: