Development of the combined method of identification of near duplicates in electronic scientific works

Лізунов, Петро Петрович; Білощицький, Андрій Олександрович; Кучанський, Олександр Юрійович; Андрашко, Юрій Васильович; Білощицька, Світлана; Сербін, Олег

Будь ласка, використовуйте цей ідентифікатор, щоб цитувати або посилатися на цей матеріал: https://dspace.uzhnu.edu.ua/jspui/handle/lib/42600

Назва:	Development of the combined method of identification of near duplicates in electronic scientific works
Автори:	Лізунов, Петро Петрович Білощицький, Андрій Олександрович Кучанський, Олександр Юрійович Андрашко, Юрій Васильович Білощицька, Світлана Сербін, Олег
Ключові слова:	near-duplicate, electronic scientific paper, antiplagiarism system, locally sensitive hashing
Дата публікації:	2021
Видавництво:	Eastern-European Journal of Enterprise Technologies
Бібліографічний опис:	8. Lizunov P., Biloshchytskyi A., Kuchansky A., Andrashko Y., Biloshchytska S., Serbin O. Development of the combined method of identification of near duplicates in electronic scientific works. Eastern-European Journal of Enterprise Technologies. 2021. Vol. 4/4 (112). P. 57–63. DOI: https://doi.org/10.15587/1729-4061.2021.238318
Короткий огляд (реферат):	The methods for identification of near-du-plicates in electronic scientific papers, which include the content of the same type, for example, text data, mathematical formulas, numerical data, etc. were described. For text data, the method of locally sensitive hash-ing with the finding of Hamming distance between the elements of indices of electronic scientific papers was formalized. If Hamming distance exceeds a fixed numerical threshold, a scientific paper contains a near-duplicate. For numerical data, sub-sequences for each scientific work are formed and the proximi-ty between the papers is determined as the Euclidian distance between the vectors con-sisting of the numbers of these sub-sequences. To compare mathematical formulas, the me-thod for comparing the sample of formulas is used and the names of variables are com-pared. To identify near-duplicates in graphic information, two directions are separated: finding key points in the image and apply-ing locally sensitive hashing for individual pixels of the image. Since scientific papers often include such objects as schemes and diagrams, subscriptions to them are exami-ned separately using the methods for compar-ing text information. The combined method for identification of near-duplicates in elec-tronic scientific papers, which combines the methods for identification of near-dupli-cates of various types of data, was proposed. To implement the combined method for the identification of near-duplicates in electro-nic scientific papers, an information-analyti-cal system that processes scientific materials depending on the content type was devised. This makes it possible to qualitatively identi-fy near-duplicates and as widely as possible identify possible abuses and plagiarism in electronic scientific papers: scientific arti-cles, dissertations, monographs, conference materials, etc
Тип:	Text
Тип публікації:	Стаття
URI (Уніфікований ідентифікатор ресурсу):	https://dspace.uzhnu.edu.ua/jspui/handle/lib/42600
ISSN:	1729-3774
Розташовується у зібраннях:	Наукові публікації кафедри cистемного аналізу та теорії оптимізації

Файли цього матеріалу:

Файл	Опис	Розмір	Формат
238318-Article Text-549440-2-10-20210901.pdf		184.97 kB	Adobe PDF	Переглянути/Відкрити

Показати повний опис матеріалу Перегляд статистики

Усі матеріали в архіві електронних ресурсів захищені авторським правом, всі права збережені.