Fuzzy document classification using ontology based approach for term weighting

Aijazahamed Qazi

Abstract


With the surge in web corpus, document classification is a vital issue in information retrieval. Term weighting increases the accuracy of classification for documents represented in the vector space model. This paper proposes an ontoTf-idf term weighting method based on the assessment of semantic similarity between the group label and the term. In this paper, a comparative analysis of the performance of the traditional Term Frequency-Inverse Document Frequency (Tf-idf) method and ontoTf-idf method is carried on the WebKB and Reuters-21578 benchmark datasets. The efficiency of ontoTf-idf method is validated with kNN (k nearest neighbor) and Fuzzy kNN classifier on the WebKB and Reuters-21578 datasets. The experimental results obtained with the proposed ontoTf-idf method outperform the Tf-idf method. In the proposed work, distance metrics like Euclidean distance, Cosine similarity, Manhattan distance, and Jaccard co-efficient are applied with Fuzzy kNN classifier on the WebKB and Reuters-21578 dataset.

Full Text:

PDF

References


Ibtihel BL, Lobna H, Maher BJ (2018) A Semantic Approach for Tweet Categorization. Procedia Computer Science 126:335–344. https://doi.org/10.1016/j.procs.2018.07.267

Kim D, Seo D, Cho S, Kang P (2019) Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Information Sciences 477:15–29. https://doi.org/10.1016/j.ins.2018.10.006

Patel AD, Sharma YK (2019) Web Page Classification on News Feeds Using Hybrid Technique for Extraction. In: Satapathy SC, Joshi A (eds) Information and Communication Technology for Intelligent Systems. Springer Singapore, Singapore, pp 399–405

Cheng L, Yang Y, Zhao K, Gao Z (2020) Research and Improvement of TF-IDF Algorithm Based on Information Theory. In: Liu Q, Mısır M, Wang X, Liu W (eds) The 8th International Conference on Computer Engineering and Networks (CENet2018). Springer International Publishing, Cham, pp 608–616

Uddin MN, Duong TH, Nguyen NT, et al (2013) Semantic similarity measures for enhancing information retrieval in folksonomies. Expert Systems with Applications 40:1645–1653.

Sebastiani F (2002) Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34:1–47. https://doi.org/10.1145/505282.505283

Luo Q, Chen E, Xiong H (2011) A semantic term weighting scheme for text categorization. Expert Systems with Applications 38:12708–12716. https://doi.org/10.1016/j.eswa.2011.04.058

Sabbah T, Selamat A, Selamat MdH, et al (2016) Hybridized term-weighting method for Dark Web classification. Neurocomputing 173:1908–1926. https://doi.org/10.1016/j.neucom.2015.09.063

Hu L-Y, Huang M-W, Ke S-W, Tsai C-F (2016) The distance function effect on k-nearest neighbor classification for medical datasets. SpringerPlus 5:. https://doi.org/10.1186/s40064-016-2941-7

Abu Alfeilat HA, Hassanat ABA, Lasassmeh O, et al (2019) Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review. Big Data 7:221–248. https://doi.org/10.1089/big.2018.0175

Ryu D, Jang J-I, Baik J (2015) A Hybrid Instance Selection Using Nearest-Neighbor for Cross-Project Defect Prediction. Journal of Computer Science and Technology 30:969–980. https://doi.org/10.1007/s11390-015-1575-5

Biswas N, Chakraborty S, Mullick SS, Das S (2018) A parameter independent fuzzy weighted k -Nearest neighbor classifier. Pattern Recognition Letters 101:80–87. https://doi.org/10.1016/j.patrec.2017.11.003

Chen L, Jiang L, Li C (2021) Using modified term frequency to improve term weighting for text classification. Engineering Applications of Artificial Intelligence 101:104215. https://doi.org/10.1016/j.engappai.2021.104215


Refbacks

  • There are currently no refbacks.


------------------------------------------------------------------------------------------------------------------------

The ADBU Journal of Engineering Technology (AJET)" ISSN:2348-7305

This journal is published under the terms of the Creative Commons Attribution (CC-BY) (http://creativecommons.org/licenses/)

Number of Visitors to this Journal: