Due the voluminous number of all research articles, please wait for a moment.

Cross-language text alignment: A proposed two-level matching scheme for plagiarism detection

date_range 2020
person
Author Roostaee M.
description
Abstract The exponential growth of documents in various languages throughout the web, along with the availability of several editing and translation tools have made the cross-language plagiarism detection a challenging issue. Regarding its high importance, the present study focuses on the task of cross-language text alignment also known as detailed analysis which works on the outputs of the source retrieval step of cross-language plagiarism detection systems. The paper proposes a two-level matching approach with the aim of considering both syntactic and semantic information to align plagiarism fragments from the source and suspicious documents, accurately. At the first level, a vector space model which employs a multilingual word embeddings based dictionary and a local weighting technique is used in order to extract a minimal set of highly potential candidate fragment pairs rather than considering all possible pairs of fragments. This step also contains a dynamic expansion technique to cover more candidate pairs aiming at improving the system`s recall. It is followed by a more precise algorithm that examines the candidate pairs at the sentence level using a graph-of-words representation of text. As a result, by modelling both the words and their relationships, an acceptable increase in the system`s precision which is the goal of the second level is also observed. To identify evidence of plagiarism, i.e. potential cases of unauthorized text reuse, the algorithm tries to find maximum cliques from the match graph of source and suspicious texts. With this two-level investigation, the approach is capable to discriminate true plagiarism cases from the original text. The experimental results on different datasets such as PAN-PC-11, PAN-PC-12, and SemEval-2017 show that the proposed cross-language text alignment approach significantly outperforms the state-of-the-art models and can be fed into an expert system for further improvement of cross-language plagiarism detection. The source codes are publicly available on GitHub, for the purposes of reproducible research. © 2020 Elsevier Ltd
article
DOI 10.1016/j.eswa.2020.113718
language
Journal Expert Systems with Applications
description
Source Scopus

Submit your feeback

CARI! has performed crawling, tagging, and other data processing to produce this page. If you find an error or have feedback for this page, please fill out the form below. Thank You.
How to correct
  • Name and Email are required!
  • One of the location fields (prov, district, or sub-district) must be filled in
  • Fields other than those mentioned above are optional

Meta Tags

Source from CARI Engine
Provincies : Papua
Cities : KEEROM
Districts :
Hazards :
Sub DM Phase : Early Warning,Hazard Assesment
Sub Aspects :

Citations Articles

Source from Semantic Scholar
Candidate entity generation in lexical semantics
The Utilisation of TF-IDF and Cosine Similarity for Automating Course Waivers in Academic Institutions
SentiVol-GA: a volatility-scaled genetic fusion of predictive models and financial sentiment for adaptive stock forecasting
E-BERT: A Deep Learning and Local Alignment-Based Approach for Paraphrased Plagiarism Detection
Plagiarism types and detection methods: a systematic survey of algorithms in text analysis
A New Classification Model Using a Decision Tree Generated from Hyperplanes in Dimensional Space
Second-Order Text Matching Algorithm for Agricultural Text
Machine learning model for chatGPT usage detection in students’ answers to open-ended questions: Case of Lithuanian language
An effective text plagiarism detection system based on feature selection and SVM techniques
A Review on diverse algorithms used in the context of Plagiarism Detection
A Deep Learning Approach to Detect Plagiarism in Bengali Textual Content using Similarity Algorithms
A Simple and Effective Method of Cross-Lingual Plagiarism Detection
Automatic Plagiarism Detection Using Natural Language Processing
Important Arguments Nomination Based on Fuzzy Labeling for Recognizing Plagiarized Semantic Text
Cross-lingual sentence embedding for mining low-resources parallel sentences
Improving plagiarism detection in text document using hybrid weighted similarity
Transformer-Based Multilingual Language Models in Cross-Lingual Plagiarism Detection
Reliable plagiarism detection system based on deep learning approaches
Do Language Models Plagiarize?
An external plagiarism detection system based on part-of-speech (POS) tag n-grams and word embedding
Citation Worthiness Identification for Fine-Grained Citation Recommendation Systems
Duplicate product record detection engine for e-commerce platforms
Plagiarism detection and prevention: a primer for researchers
A Systematic Review of Multilingual Plagiarism Detection: Approaches and Research Challenges
A simple and efficient text matching model based on deep interaction
Hierarchical ensemble framework for detecting paraphrased near duplicates in scientific abstracts

References Articles

Source from Semantic Scholar