Sentiment Analysis of Assamese Text Reviews: Supervised Machine Learning Approach with Combined n-gram and TF-IDF Feature

Chandana Dev, Amrita Ganguly

Abstract


Sentiment analysis (SA) is a challenging application of natural language processing (NLP) in various Indian languages. However, there is limited research on sentiment categorization in Assamese texts. This paper investigates sentiment categorization on Assamese textual data using a dataset created by translating Bengali resources into Assamese using Google Translator. The study employs multiple supervised ML methods, including Decision Tree, K-nearest neighbour, Multinomial Naive Bayes, Logistic Regression, and Support Vector Machine, combined with n-gram and Term Frequency-Inverse Document Frequency (TF-IDF) feature extraction methods. The experimental results show that Multinomial Naive Bayes and Support Vector Machine have over 80% accuracy in analyzing sentiments in Assamese texts, while the Unigram model performs better than higher-order n-gram models in both datasets. The proposed model is shown to be an effective tool for sentiment classification in domain-independent Assamese text data.

Keywords


Assamese; Machine Learning; n-gram; NLP; Sentiment Analysis; TF-IDF.

Full Text:

PDF

References


O. Almatrafi, S. Parack, and B. Chavan, “Application of location-based sentiment analysis using Twitter for identifying trends towards Indian general elections 2014,” in Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication, Bali Indonesia: ACM, Jan. 2015, pp. 1–5. doi: 10.1145/2701126.2701129.

S.-M. Kim and E. Hovy, “Determining the sentiment of opinions,” in Proceedings of the 20th International Conference on Computational Linguistics - COLING’04, Geneva, Switzerland: Association for Computational Linguistics, 2004, pp. 1367-es. doi: 10.3115/1220355.1220555.

B. Liu, “Sentiment Analysis and Subjectivity” in Handbook of Natural Language Processing, N. Indurkhya and F. J. Damerau, Eds., 2nd edition, New York: Chapman and Hall/CRC, 2010. Available: https://www.cs.uic.edu/~liub/FBS/NLP-handbook-sentiment-analysis.pdf [Accessed: May 19, 2023]

B. Liu, M. Hu, and J. Cheng, “Opinion observer: analyzing and comparing opinions on the Web,” in Proceedings of the 14th international conference on World Wide Web - WWW’05, Chiba, Japan: ACM Press, 2005, p. 342. doi: 10.1145/1060745.1060797.

V. N. Patodkar and S. I.R, “Twitter as a Corpus for Sentiment Analysis and Opinion Mining,” International Journal of Advanced Research in Computer and Communication Engineering, vol. 5, no. 12, pp. 320–322, Dec. 2016, doi: 10.17148/IJARCCE.2016.51274. Available: http://ijarcce.com/upload/2016/december-16/IJARCCE%2074.pdf [Accessed: May 19, 2023]

B. Pang and L. Lee, “A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts,” in Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics - ACL’04, Barcelona, Spain: Association for Computational Linguistics, 2004, pp. 271-es. doi: 10.3115/1218955.1218990.

B. Pang and L. Lee, “Opinion Mining and Sentiment Analysis,” FNT in Information Retrieval, vol. 2, no. 1–2, pp. 1–135, 2008, doi: 10.1561/1500000011.

P. D. Turney, “Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews,” in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02, Philadelphia, Pennsylvania: Association for Computational Linguistics, 2001, p. 417. doi: 10.3115/1073083.1073153.

C. Whitelaw, N. Garg, and S. Argamon, “Using appraisal groups for sentiment analysis,” in Proceedings of the 14th ACM international conference on Information and knowledge management, Bremen Germany: ACM, Oct. 2005, pp. 625–631. doi: 10.1145/1099554.1099714.

S. Rani and P. Kumar, “A Sentiment Analysis System to Improve Teaching and Learning,” Computer, vol. 50, no. 5, pp. 36–43, May 2017, doi: 10.1109/MC.2017.133.

B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: sentiment classification using machine learning techniques,” in Proceedings of the ACL-02 conference on Empirical methods in natural language processing - EMNLP’02, Stroudsburg, United States: Association for Computational Linguistics, 2002, pp. 79–86. doi: 10.3115/1118693.1118704.

S.-J. Wu, R.-D. Chiang, and Z.-H. Ji, “Development of a Chinese opinion-mining system for application to Internet online forums,” J Supercomput, vol. 73, no. 7, pp. 2987–3001, Jul. 2017, doi: 10.1007/s11227-016-1816-6.

Z. Li, L. Liu, and C. Li, “Analysis of customer satisfaction from Chinese reviews using opinion mining,” in 2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing, China: IEEE, Sep. 2015, pp. 95–99. doi: 10.1109/ICSESS.2015.7339013.

C. Henriquez Miranda and J. Guzman, “A review of Sentiment Analysis in Spanish,” TECCIENCIA, vol. 12, no. 22, pp. 35–48, Dec. 2016, doi: 10.18180/tecciencia.2017.22.5.

A. Rhouati, J. Berrich, M. G. Belkasmi, and T. Bouchentouf, “Sentiment Analysis of French Tweets based on Subjective Lexicon Approach: Evaluation of the use of OpenNLP and CoreNLP Tools,” Journal of Computer Science, vol. 14, no. 6, pp. 829–836, Jun. 2018, doi: 10.3844/jcssp.2018.829.836.

N. Banik and Md. Hasan Hafizur Rahman, “Evaluation of Naïve Bayes and Support Vector Machines on Bangla Textual Movie Reviews,” in 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), Sylhet: IEEE, Sep. 2018, pp. 1–6. doi: 10.1109/ICBSLP.2018.8554497.

S. Rani and P. Kumar, “A sentiment analysis system for social media using machine learning techniques: Social enablement,” Digital Scholarship in the Humanities, vol. 34, no. 3, pp. 569–581, Sep. 2019, doi: 10.1093/llc/fqy037. Available: https://academic.oup.com/dsh/article/34/3/569/5146723. [Accessed: Jun. 19, 2023]

S. Thavareesan and S. Mahesan, “Sentiment Analysis in Tamil Texts: A Study on Machine Learning Techniques and Feature Representation,” in 2019 14th Conference on Industrial and Information Systems (ICIIS), Kandy, Sri Lanka: IEEE, Dec. 2019, pp. 320–325. doi: 10.1109/ICIIS47346.2019.9063341.

R. Naidu, S. K. Bharti, K. S. Babu, and R. K. Mohapatra, “Sentiment analysis using Telugu SentiWordNet,” in 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai: IEEE, Mar. 2017, pp. 666–670. doi: 10.1109/WiSPNET.2017.8299844.

D. S. Nair, J. P. Jayan, R. R. Rajeev, and E. Sherly, “SentiMa - Sentiment extraction for Malayalam,” in 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), New Delhi: IEEE, Sep. 2014, pp. 1719–1723. doi: 10.1109/ICACCI.2014.6968548.

S. S. and P. K.V., “Sentiment analysis of malayalam tweets using machine learning techniques,” ICT Express, vol. 6, no. 4, pp. 300–305, Dec. 2020, doi: 10.1016/j.icte.2020.04.003.

R. Das and T. D. Singh, “A multi-stage multimodal framework for sentiment analysis of Assamese in low resource setting,” Expert Systems with Applications, vol. 204, p. 117575, Oct. 2022, doi: 10.1016/j.eswa.2022.117575.

M. Gamon, “Linguistic correlates of style: authorship classification with deep linguistic analysis features,” in Proceedings of the 20th international conference on Computational Linguistics - COLING’04, Geneva, Switzerland: Association for Computational Linguistics, 2004, pp. 611-es. doi: 10.3115/1220355.1220443.

V. S and T. S. N, “Breast Cancer Diagnosis and Classification Using Support vector machines With Diverse Datasets,” International Journal of Computer Sciences and Engineering, vol. 7, no. 4, pp. 442–446, Apr. 2019, doi: 10.26438/ijcse/v7i4.442446. Available: http://www.ijcseonline.org/full_paper_view.php?paper_id=4054. [Accessed: May 20, 2023]

A. Kennedy and D. Inkpen, “sentiment classification of movie reviews using contextual valence shifters,” Computational Intell, vol. 22, no. 2, pp. 110–125, May 2006, doi: 10.1111/j.1467-8640.2006.00277.x.

P. De Pelsmacker, S. Van Tilburg, and C. Holthof, “Digital marketing strategies, online reviews and hotel performance,” International Journal of Hospitality Management, vol. 72, pp. 47–55, Jun. 2018, doi: 10.1016/j.ijhm.2018.01.003.

E. Boiy and M.-F. Moens, “A machine learning approach to sentiment analysis in multilingual Web texts,” Inf. Retrieval, vol. 12, no. 5, pp. 526–558, Oct. 2009, doi: 10.1007/s10791-008-9070-z.

S. Al-Natour and O. Turetken, “A comparative assessment of sentiment analysis and star ratings for consumer reviews,” International Journal of Information Management, vol. 54, p. 102132, Oct. 2020, doi: 10.1016/j.ijinfomgt.2020.102132.

A. C. E. S. Lima, L. N. De Castro, and J. M. Corchado, “A polarity analysis framework for Twitter messages,” Applied Mathematics and Computation, vol. 270, pp. 756–767, Nov. 2015, doi: 10.1016/j.amc.2015.08.059.

B. Le and H. Nguyen, “Twitter Sentiment Analysis Using Machine Learning Techniques,” in Advanced Computational Methods for Knowledge Engineering, H. A. Le Thi, N. T. Nguyen, and T. V. Do, Eds., Cham: Springer International Publishing, 2015, pp. 279–289. doi: 10.1007/978-3-319-17996-4_25.

O. Araque, I. Corcuera-Platas, J. F. Sánchez-Rada, and C. A. Iglesias, “Enhancing deep learning sentiment analysis with ensemble techniques in social applications,” Expert Systems with Applications, vol. 77, pp. 236–246, Jul. 2017, doi: 10.1016/j.eswa.2017.02.002.

C. Nanda, M. Dua, and G. Nanda, “Sentiment Analysis of Movie Reviews in Hindi Language Using Machine Learning,” in 2018 International Conference on Communication and Signal Processing (ICCSP), Chennai: IEEE, Apr. 2018, pp. 1069–1072. doi: 10.1109/ICCSP.2018.8524223.

K. Sarkar and M. Bhowmick, “Sentiment polarity detection in bengali tweets using multinomial Naïve Bayes and support vector machines,” in 2017 IEEE Calcutta Conference (CALCON), Kolkata: IEEE, Dec. 2017, pp. 31–36. doi: 10.1109/CALCON.2017.8280690.

H. Borkakoty, C. Dev, and A. Ganguly, “A Novel Approach to Calculate TF-IDF for Assamese Language,” in Electronic Systems and Intelligent Computing, P. K. Mallick, P. Meher, A. Majumder, and S. K. Das, Eds., Singapore: Springer Singapore, 2020, pp. 387–393. doi: 10.1007/978-981-15-7031-5_37.

R. Das and T. D. Singh, “A Step Towards Sentiment Analysis of Assamese News Articles Using Lexical Features,” in Proceedings of the International Conference on Computing and Communication Systems, A. K. Maji, G. Saha, S. Das, S. Basu, and J. M. R. S. Tavares, Eds., Singapore: Springer Singapore, 2021, pp. 15–23. doi: 10.1007/978-981-33-4084-8_2.

C. Dev, A. Ganguly, and H. Borkakoty, “Assamese VADER: A Sentiment Analysis Approach Using Modified VADER,” in 2021 International Conference on Intelligent Technologies (CONIT), Hubli, India: IEEE, Jun. 2021, pp. 1–5. doi: 10.1109/CONIT51480.2021.9498455.

J. Tolles and W. J. Meurer, “Logistic Regression: Relating Patient Characteristics to Outcomes,” JAMA, vol. 316, no. 5, p. 533, Aug. 2016, doi: 10.1001/jama.2016.7653.

D. W. Hosmer, S. Lemeshow, and R. X. Sturdivant, Applied Logistic Regression, Third edition. in Wiley series in probability and statistics, no. 398. Hoboken, New Jersey: Wiley, 2013.

W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithms and applications: A survey,” Ain Shams Engineering Journal, vol. 5, no. 4, pp. 1093–1113, Dec. 2014, doi: 10.1016/j.asej.2014.04.011.

M. Rushdi Saleh, M. T. Martín-Valdivia, A. Montejo-Ráez, and L. A. Ureña-López, “Experiments with SVM to classify opinions in different domains,” Expert Systems with Applications, vol. 38, no. 12, pp. 14799–14804, Nov. 2011, doi: 10.1016/j.eswa.2011.05.070.

A. Naresh and P. Venkata Krishna, “An efficient approach for sentiment analysis using machine learning algorithm,” Evol. Intel., vol. 14, no. 2, pp. 725–731, Jun. 2021, doi: 10.1007/s12065-020-00429-1.

A. Tripathy, A. Agrawal, and S. K. Rath, “Classification of sentiment reviews using n-gram machine learning approach,” Expert Systems with Applications, vol. 57, pp. 117–126, Sep. 2016, doi: 10.1016/j.eswa.2016.03.028.

O. Sharif, M. M. Hoque, and E. Hossain, “Sentiment Analysis of Bengali Texts on Online Restaurant Reviews Using Multinomial Naïve Bayes,” in 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), Dhaka, Bangladesh: IEEE, May 2019, pp. 1–6. doi: 10.1109/ICASERT.2019.8934655.

A. Sharma and S. Dey, “A comparative study of feature selection and machine learning techniques for sentiment analysis,” in Proceedings of the 2012 ACM Research in Applied Computation Symposium, San Antonio Texas: ACM, Oct. 2012, pp. 1–7. doi: 10.1145/2401603.2401605.




Copyright (c) 2023 Chandana Dev, Amrita Ganguly

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.


Call for Paper for the upcoming issue


The journal welcomes publications of high-quality research papers, review papers, white papers, conference papers, etc. on theoretical developments and practical applications in the domain of Electrical and Electronics and its allied sciences.

Authors are solicited to contribute to the journal by submitting articles that illustrate original research works, short communications and review articles in the thrust areas of the journal as mentioned on the About page.

Submission and Template:

  • The downloadable Template and the Online submission link are available on the PAPER SUBMISSION page.
 

ADBU Journal of Electrical and Electronics Engineering (AJEEE) - ISSN: 2582-0257 is an International peer-reviewed Open-Access Online journal in the English language that publishes scientific articles which contribute new novel experimentation and theoretical work in all areas of Electrical and Electronics Engineering and its applications.

* The views, interpretations and opinions expressed in the articles are those of the author(s) and should not be considered to reflect the opinions of the Editorial Board of this journal- AJEEE.


Creative Commons License