Plagiarism is described as the reuse of someone else's previous ideas, work or even words without sufficient attribution to the source. This paper presents a method to detect external plagiarism using the integration of semantic relations between words and their syntactic composition. The problem with the available methods is that they fail to capture the meaning in comparison between a source document sentence and a suspicious document sentence, when two sentences have same surface text (the words are the same) or they are a paraphrase of each other. Therefore it causes inaccurate or unnecessary matching results. However, this method can improve the performance of plagiarism detection because it is able to avoid selecting the source text sentence whose similarity with suspicious text sentence is high but its meaning is different. It is executed by computing the semantic and syntactic similarity of the sentence-to-sentence. Besides, the proposed method expands the words in sentences to tackle the problem of information limit. It bridges the lexical gaps for semantically similar contexts that are expressed in a different wording. This method is also capable to identify various kinds of plagiarism such as the exact copied text, paraphrasing, transformation of sentences and changing of word structure in the sentences. As a result, the experimental results have displayed that the proposed method is able to improve the performance compared with the participating systems in PAN-PC-11. The experimental results also displayed that the proposed method demonstrates better performance as compared to other existing techniques on PAN-PC-10 and PAN-PC-11 datasets.
- Automatic plagiarism detection
- Text reuse
- String matching
- Semantic analysis