|

Collecting and preparing text data for natural language processing tasks

Authors: Ladontsev A.F.
Published in issue: #6(59)/2021
DOI: 10.18698/2541-8009-2021-6-708


Category: Informatics, Computer Engineering and Control | Chapter: Automation, Control of Technological Processes, and Industrial Control

Keywords: computational linguistics, natural languages, processing, sentiment analysis, machine learning, programming language, Python
Published: 07.07.2021

The study of computer representation and analysis of natural language is one of the topical research areas in modern science in the context of the digitalization of society. The article describes one of the possible options for collecting and preparing data in order to use supervised machine learning methods to create a text sentiment classifier. As a practical material, we selected and analyzed the responses of Internet users to foreign literature and the corresponding assessments. As a result, a variable with feedback texts and a variable with the corresponding estimates were obtained, which will allow further preprocessing and use of this data for training the automatic sentiment recognition model.


References

[1] Bol’shakova E.I., Klyshinskiy E.S., Lande D.V., et al. Avtomaticheskaya obrabotka tekstov na estestvennom yazyke i komp’yuternaya lingvistika [Automated text processing natural language and computer linguistics]. Moscow, MIEM Publ., 2011 (in Russ.).

[2] Nikolaev I.S., Mitrenina O.V., Lando T.M., eds. Prikladnaya i komp’yuternaya lingvistika [Applied and computer linguistics]. Moscow, URSS Publ., 2017 (in Russ.).

[3] VanderPlas J. Python data science handbook. Essential tools for working with data. ‎ O’Reilly Media, 2016. (Russ. ed.: Vander Python dlya slozhnykh zadach: nauka o dannykh i mashinnoe obuchenie. Sankt-Petersburg, Piter Publ., 2018.)

[4] Otzyvy chitateley o knigakh Dzhordzha Martina [Reader reviews on George Martin’s books]. irecommend.ru: website (in Russ.). URL: https://irecommend.ru/category/khudozhestvennaya-literatura?tid=2633&tid1=106869 (accessed: 25.04.2021).

[5] Webscraper: website. URL: https://webscraper.io/ (accessed: 25.04.2021).

[6] Python: website. URL: https://www.python.org/ (accessed: 25.04.2021).

[7] Lutz M. Learning Python. O’Reilly Media , 2013. (Russ. ed.: Izuchaem Python. Moscow, Vil’yams Publ., 2015.)

[8] Cielen D., Meysman A., Ali M. Introducing data science. Big data, machine learning, and more, using Python tools. Manning Publications, 2016. (Russ. ed.: Osnovy Data Science i Big Data. Python i nauka o dannykh. Sankt-Petersburg, Piter Publ., 2017.)

[9] PyCharm. jetbrains.com: website (in Russ.). URL: https://www.jetbrains.com/ru-ru/pycharm/ (accessed: 25.04.2021).

[10] Pandas. devdocs.io: website. URL: https://devdocs.io/pandas~0.25/ (accessed: 25.04.2021).