A Methodology Combining Cosine Similarity with Classifier for Text Classification

Park, Kwangil and Hong, June Seok and Kim, Wooju (2020) A Methodology Combining Cosine Similarity with Classifier for Text Classification. Applied Artificial Intelligence, 34 (5). pp. 396-411. ISSN 0883-9514

[thumbnail of A Methodology Combining Cosine Similarity with Classifier for Text Classification.pdf] Text
A Methodology Combining Cosine Similarity with Classifier for Text Classification.pdf - Published Version

Download (1MB)

Abstract

Text Classification has received significant attention in recent years because of the proliferation of digital documents and is widely used in various applications such as filtering and recommendation. Consequently, many approaches, including those based on statistical theory, machine learning, and classifier performance improvement, have been proposed for improving text classification performance. Among these approaches, centroid-based classifier, multinomial naïve bayesian (MNB), support vector machines (SVM), convolutional neural network (CNN) are commonly used. In this paper, we introduce a cosine similarity-based methodology for improving performance. The methodology combines cosine similarity (between a test document and fixed categories) with conventional classifiers such as MNB, SVM, and CNN to improve the accuracy of the classifiers, and then we call the conventional classifiers with cosine similarity as enhanced classifiers. We applied the enhanced classifiers to famous datasets – 20NG, R8, R52, Cade12, and WebKB – and evaluated the performance of the enhanced classifiers in terms of the confusion matrix’s accuracy; we obtained outstanding results in that the enhanced classifiers show significant increases in accuracy. Moreover, through experiments, we identified which of two considered knowledge representation techniques (word count and term frequency-inverse document frequency (TFIDF)) is more suitable in terms of classifier performance.

Item Type: Article
Subjects: East Asian Archive > Computer Science
Depositing User: Unnamed user with email support@eastasianarchive.com
Date Deposited: 19 Jun 2023 09:39
Last Modified: 20 Jul 2024 09:49
URI: http://library.eprintdigipress.com/id/eprint/1086

Actions (login required)

View Item
View Item