A comparative study of machine learning algorithms and the prompting approach using GPT-3.5 turbo for questions categorization
Дата
Назва журналу
Номер ISSN
Назва тому
Видавець
Анотація
The study focuses on text categorization tasks, comparing the effectiveness of traditional Machine Learning (ML) models with Large Language Models (LLM), such as GPT-3.5 turbo. The literature review tracks the historical progress in text categorization from early ML algorithms to LLMs, which automatically determine contextual features, simplifying the process. The goal of the research is to evaluate whether LLMs with a prompt-based approach can outperform traditional ML models in text categorization. A dataset of 55,235 questions in nine categories is used. The effectiveness of categorization is determined by the F1 score. Various ML models such as Logistic Regression and Random Forest were used for categorization, while models like curie, davinci, and GPT-3.5 turbo were used for categorization with LLM. The study found that traditional ML models provided better categorization (F1 score – 88%), whereas LLMs, particularly GPT-3.5 turbo, offered competitive but inferior results without prior training (F1 score – 72%). The discussion highlights the advantages of LLMs, such as their suitability in scenarios without historical data for training and their ease of use. Disadvantages are also cited, such as higher costs for large data volumes and potential instability in API operation. In conclusion, the study recommends LLMs for certain applications, such as new applications or those with limited categorization needs. Traditional ML models remain more suitable for scenarios requiring high accuracy or processing sensitive data.
Опис
Тип публікації
Text
Тип текстової публікації
Стаття
ISSN
23674512
Ключові слова
Бібліографічний опис
1. A Comparative Study of Machine Learning Algorithms and the Prompting Approach Using GPT-3.5 Turbo for Text Categorization. Oleksandr Mitsa, Yurii Voloshchuk, Oleksandr Levchuk, Vasyl Petsko Lecture Notes on Data Engineering and Communications Technologies. 2025. Vol. 242. P. 156-167.