Combined methods for identification near-duplicates in electronic scientific papers

Анотація

The monograph is devoted to the development of combined methods for determining near-duplicates in text data, graphic images, mathematical formulas, and tables, revealing the completeness of the description of scientific results of dissertation research. A method for identifying context-dependent values and indexing text data has been developed, as well as recommendations for its use in a software package for determining near-duplicates in electronic documents. Modifications of the N-gram analysis method have been developed to find the degree of similarity of electronic documents with different types of content and their fragments. Methods and schemes have been developed for adapting algorithms for detecting near- duplicates to the specifics of documents, in particular the presence of various types of data in the content: graphical data, mathematical formulas, tables, etc. It is intended for researchers, teachers, postgraduates, students of higher technical educational institutions

Опис

Approved for publication by the Scientific and Methodological Council of Astana IT University (Minutes No. 5 dated November 25, 2021).

Тип публікації

Text

Тип текстової публікації

Монографія

ISSN

Ключові слова

Бібліографічний опис

Combined methods for identification near-duplicates in electronic scientific papers : monograph / [Lizunov P., Biloshchytskyi A., Kuchansky A., Andrashko Yu. et al.] ; reviewers : S. Seilov, O. Mulesa ; Astana IT University. — Nur-Sultan : [ Astana IT University], 2021. — 168 p. : table, figure. — Texst english. — Bibliography: p. 155—167. — ISBN 978-601-08-1675-6.

Endorsement

Review

Supplemented By

Referenced By