Park, Kwangil and Hong, June Seok and Kim, Wooju (2020) A Methodology Combining Cosine Similarity with Classifier for Text Classification. Applied Artificial Intelligence, 34 (5). pp. 396-411. ISSN 0883-9514
A Methodology Combining Cosine Similarity with Classifier for Text Classification.pdf - Published Version
Download (1MB)
Abstract
Text Classification has received significant attention in recent years because of the proliferation of digital documents and is widely used in various applications such as filtering and recommendation. Consequently, many approaches, including those based on statistical theory, machine learning, and classifier performance improvement, have been proposed for improving text classification performance. Among these approaches, centroid-based classifier, multinomial naïve bayesian (MNB), support vector machines (SVM), convolutional neural network (CNN) are commonly used. In this paper, we introduce a cosine similarity-based methodology for improving performance. The methodology combines cosine similarity (between a test document and fixed categories) with conventional classifiers such as MNB, SVM, and CNN to improve the accuracy of the classifiers, and then we call the conventional classifiers with cosine similarity as enhanced classifiers. We applied the enhanced classifiers to famous datasets – 20NG, R8, R52, Cade12, and WebKB – and evaluated the performance of the enhanced classifiers in terms of the confusion matrix’s accuracy; we obtained outstanding results in that the enhanced classifiers show significant increases in accuracy. Moreover, through experiments, we identified which of two considered knowledge representation techniques (word count and term frequency-inverse document frequency (TFIDF)) is more suitable in terms of classifier performance.
Item Type: | Article |
---|---|
Subjects: | East Asian Archive > Computer Science |
Depositing User: | Unnamed user with email support@eastasianarchive.com |
Date Deposited: | 19 Jun 2023 09:39 |
Last Modified: | 20 Jul 2024 09:49 |
URI: | http://library.eprintdigipress.com/id/eprint/1086 |