Combined methods for identification near-duplicates in electronic scientific papers
Дата
Назва журналу
Номер ISSN
Назва тому
Видавець
Анотація
The monograph is devoted to the development of combined methods for determining near-duplicates in text data, graphic images, mathematical formulas, and tables, revealing the completeness of the description of scientific results of dissertation research. A method for identifying context-dependent values and indexing text data has been developed, as well as recommendations for its use in a
software package for determining near-duplicates in electronic documents. Modifications of the N-gram analysis method have been developed to find the degree of similarity of electronic documents with different types of content and their fragments. Methods and schemes have been developed for adapting
algorithms for detecting near- duplicates to the specifics of documents, in particular the presence of various types of data in the content: graphical data, mathematical formulas, tables, etc.
It is intended for researchers, teachers, postgraduates, students of higher technical educational institutions
Опис
Approved for publication by the Scientific and Methodological Council of
Astana IT University (Minutes No. 5 dated November 25, 2021).
Тип публікації
Text
Тип текстової публікації
Монографія
ISSN
Ключові слова
Бібліографічний опис
Combined methods for identification near-duplicates in electronic scientific papers : monograph / [Lizunov P., Biloshchytskyi A., Kuchansky A., Andrashko Yu. et al.] ; reviewers : S. Seilov, O. Mulesa ; Astana IT University. — Nur-Sultan : [ Astana IT University], 2021. — 168 p. : table, figure. — Texst english. — Bibliography: p. 155—167. — ISBN 978-601-08-1675-6.