|

Development of automatic classification system for textual feedback in Russian language using machine learning

Authors: Kuznetsov T.A., Gavrilenkov S.I.
Published in issue: #5(70)/2022
DOI: 10.18698/2541-8009-2022-5-794


Category: Informatics, Computer Engineering and Control | Chapter: System Analysis, Control, and Information Processing, Statistics

Keywords: machine learning, Industry 4.0, natural language processing, textual feedback classification, automation of textual feedback processing, sentiment analysis of textual feedback, text data vectorization, naive Bayes classifier
Published: 24.06.2022

In today’s highly competitive environment enterprises could increase their flexibility and profitability by conducting analytical studies of textual customer feedback. One of the primary objectives of these studies is to determine the sentiment class of textual feedback in order to understand the consumer’s overall evaluation of the product. This paper considers the task of classifying textual feedback into sentiment classes using machine learning techniques. Text data vectorization methods are studied and applied while solving the task. A comparative analysis of algorithms for classifying by sentiment classes is carried out: random forest algorithm, support vector machine, and naive Bayes classifier. The algorithm showing the best performance on quality evaluation metrics of the classification model is chosen. The subsystem of textual feedback classification that automates the process of text mining in the framework of enterprise product research is obtained.


References

[1] Liu B., Zhang L. A survey of opinion mining and sentiment analysis. In: Mining text data. Springer, 2021, pp. 415–463. DOI: https://doi.org/10.1007/978-1-4614-3223-4_13

[2] Pang B., Lee L., Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques. Proc. EMNLP, 2002, pp. 79–86. DOI: https://doi.org/10.3115/1118693.1118704

[3] Trstenjak B., Mikac S., Donko D. KNN with TF-IDF based framework for text categorization. Procedia Eng., 2014, vol. 69, pp. 1356–1364. DOI: https://doi.org/10.1016/j.proeng.2014.03.129

[4] Haddi E., Liu X., Shi Y. The role of text pre-processing in sentiment analysis. Procedia Comput. Sci., 2013, vol. 17, pp. 26–32. DOI: https://doi.org/10.1016/j.procs.2013.05.005

[5] Tripathy A., Agrawal A., Rath S.K. Classification of sentimental reviews using machine learning techniques. Procedia Comput. Sci., 2015, vol. 57, pp. 821–829. DOI: https://doi.org/10.1016/j.procs.2015.07.523

[6] Srujan K.S., Nikhil S.S., Raghav Rao H. et al. Classification of Amazon book reviews based on sentiment analysis. In: Information systems design and intelligent applications. Springer, 2018, pp. 401–411. DOI: https://doi.org/10.1007/978-981-10-7512-4_40

[7] Haque T.U., Saber N.N., Shah F.M. Sentiment analysis on large scale Amazon product reviews. Proc. ICIRD, 2018. DOI: https://doi.org/10.1109/ICIRD.2018.8376299

[8] Dey S., Wasif S., Tonmoy D.S. et al. A comparative study of support vector machine and Naive Bayes classifier for sentiment analysis on Amazon product reviews. Proc. IC3A, 2020, pp. 217–220. DOI: https://doi.org/10.1109/IC3A48958.2020.233300

[9] Polyakov E.V., Voskov L.S., Abramov P.S. et al. Generalized approach to sentiment analysis of short text messages in natural language processing. Informatsionno-upravlyayushchie sistemy [Information and Control Systems], 2020, no. 1, pp. 2–14. DOI: https://doi.org/10.31799/1684-8853-2020-1-2-14 (in Russ.).

[10] Dvoynikova A.A., Karpov A.A. Analytical review of approaches to Russian text sentiment recognition. Informatsionno-upravlyayushchie sistemy [Information and Control Systems], 2020, no. 4, pp. 20–30. DOI: https://doi.org/10.31799/1684-8853-2020-4-20-30 (in Russ.).

[11] Smetanin S. The applications of sentiment analysis for Russian language texts: current challenges and future perspectives. IEEE Access, 2020, vol. 8, pp. 110693–110719. DOI: https://doi.org/10.1109/ACCESS.2020.3002215

[12] Kotelnikov E., Peskisheva T., Kotelnikova A. et al. A comparative study of publicly available Russian sentiment lexicons. Proc. AINL 2018. Springer, 2018, pp. 139–151. DOI: https://doi.org/10.1007/978-3-030-01204-5_14

[13] Medhat W., Hassan A., Korashy H. Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J., 2014, vol. 5, no. 4, pp. 1093–1113. DOI: https://doi.org/10.1016/j.asej.2014.04.011

[14] Rybakov V., Malafeev A. Aspect-based sentiment analysis of Russian hotel reviews. Proc. AIST-SUP, 2018, pp. 75–84.

[15] Zvonarev A., Bilyi A. A comparison of machine learning methods of sentiment analysis based on Russian language twitter data. Proc. MICSECS, 2019. URL: https://dblp.org/rec/conf/micsecs/ZvonarevB19.html (accessed: 15.05.2022).

[16] Mikolov T., Chen K., Corrado G. et al. Efficient estimation of word representations in vector space. Proc. Workshop at ICLR, 2013. DOI: https://doi.org/10.48550/arXiv.1301.3781

[17] Natasha: tools for Russian NLP. github.com: website. URL: https://github.com/natasha (accessed: 15.05.2022).