Authorship Identification of a Russian-Language Text Using Support Vector Machine and Deep Neural Networks; Future Internet; Vol. 13, iss. 1

Podrobná bibliografie
Parent link:	Future Internet Vol. 13, iss. 1.— 2021.— [16 p.]
Korporativní autor:	Национальный исследовательский Томский политехнический университет Инженерная школа информационных технологий и робототехники Отделение автоматизации и робототехники
Další autoři:	Romanov A. S. Aleksandr Sergeevich, Kurtukova A. V. Anna Vladimirovna, Shelupanov A. A, Aleksandr Aleksandrovich, Fedotova A. M. Anastasia Mikhaylovna, Goncharov V. I. Valery Ivanovich
Shrnutí:	Title screen The article explores approaches to determining the author of a natural language text and the advantages and disadvantages of these approaches. The importance of the considered problem is due to the active digitalization of society and reassignment of most parts of the life activities online. Text authorship methods are particularly useful for information security and forensics. For example, such methods can be used to identify authors of suicide notes, and other texts are subjected to forensic examinations. Another area of application is plagiarism detection. Plagiarism detection is a relevant issue both for the field of intellectual property protection in the digital space and for the educational process. The article describes identifying the author of the Russian-language text using support vector machine (SVM) and deep neural network architectures (long short-term memory (LSTM), convolutional neural networks (CNN) with attention, Transformer). The results show that all the considered algorithms are suitable for solving the authorship identification problem, but SVM shows the best accuracy. The average accuracy of SVM reaches 96%. This is due to thoroughly chosen parameters and feature space, which includes statistical and semantic features (including those extracted as a result of an aspect analysis). Deep neural networks are inferior to SVM in accuracy and reach only 93%. The study also includes an evaluation of the impact of attacks on the method on models’ accuracy. Experiments show that the SVM-based methods are unstable to deliberate text anonymization. In comparison, the loss in accuracy of deep neural networks does not exceed 20%. Transformer architecture is the most effective for anonymized texts and allows 81% accuracy to be achieved.
Jazyk:	angličtina
Vydáno:	2021
Témata:	электронный ресурс труды учёных ТПУ authorship text mining machine learning
On-line přístup:	http://dx.doi.org/10.3390/fi13010003
Médium:	Elektronický zdroj Kapitola
KOHA link:	https://koha.lib.tpu.ru/cgi-bin/koha/opac-detail.pl?biblionumber=665057

MARC


LEADER	00000naa0a2200000 4500
001	665057
005	20250127141352.0
035			\|a (RuTPU)RU\TPU\network\36256
035			\|a RU\TPU\network\36201
090			\|a 665057
100			\|a 20210702d2021 k\|\|y0rusy50 ba
101	0		\|a eng
102			\|a CH
135			\|a drnn ---uucaa
181		0	\|a i
182		0	\|a b
200	1		\|a Authorship Identification of a Russian-Language Text Using Support Vector Machine and Deep Neural Networks \|f A. S. Romanov, A. V. Kurtukova, A. A, Shelupanov [et al.]
203			\|a Text \|c electronic
300			\|a Title screen
320			\|a [References: 16 tit.]
330			\|a The article explores approaches to determining the author of a natural language text and the advantages and disadvantages of these approaches. The importance of the considered problem is due to the active digitalization of society and reassignment of most parts of the life activities online. Text authorship methods are particularly useful for information security and forensics. For example, such methods can be used to identify authors of suicide notes, and other texts are subjected to forensic examinations. Another area of application is plagiarism detection. Plagiarism detection is a relevant issue both for the field of intellectual property protection in the digital space and for the educational process. The article describes identifying the author of the Russian-language text using support vector machine (SVM) and deep neural network architectures (long short-term memory (LSTM), convolutional neural networks (CNN) with attention, Transformer). The results show that all the considered algorithms are suitable for solving the authorship identification problem, but SVM shows the best accuracy. The average accuracy of SVM reaches 96%. This is due to thoroughly chosen parameters and feature space, which includes statistical and semantic features (including those extracted as a result of an aspect analysis). Deep neural networks are inferior to SVM in accuracy and reach only 93%. The study also includes an evaluation of the impact of attacks on the method on models’ accuracy. Experiments show that the SVM-based methods are unstable to deliberate text anonymization. In comparison, the loss in accuracy of deep neural networks does not exceed 20%. Transformer architecture is the most effective for anonymized texts and allows 81% accuracy to be achieved.
461			\|t Future Internet
463			\|t Vol. 13, iss. 1 \|v [16 p.] \|d 2021
610	1		\|a электронный ресурс
610	1		\|a труды учёных ТПУ
610	1		\|a authorship
610	1		\|a text mining
610	1		\|a machine learning
701		1	\|a Romanov \|b A. S. \|g Aleksandr Sergeevich
701		1	\|a Kurtukova \|b A. V. \|g Anna Vladimirovna
701		1	\|a Shelupanov \|b A. A, \|g Aleksandr Aleksandrovich
701		1	\|a Fedotova \|b A. M. \|g Anastasia Mikhaylovna
701		1	\|a Goncharov \|b V. I. \|c radio technician, specialist in the field of informatics and computer technology \|c Professor of Tomsk Polytechnic University, Doctor of technical sciences \|f 1937- \|g Valery Ivanovich \|3 (RuTPU)RU\TPU\pers\31330 \|9 15502
712	0	2	\|a Национальный исследовательский Томский политехнический университет \|b Инженерная школа информационных технологий и робототехники \|b Отделение автоматизации и робототехники \|3 (RuTPU)RU\TPU\col\23553
801		2	\|a RU \|b 63413507 \|c 20220701 \|g RCR
856	4		\|u http://dx.doi.org/10.3390/fi13010003
942			\|c CF

Authorship Identification of a Russian-Language Text Using Support Vector Machine and Deep Neural Networks; Future Internet; Vol. 13, iss. 1

MARC

Podobné jednotky