Last year I followed a course on Machine Learning where the final assignment consisted of writing a review of five papers about one of the topics covered on the course. I chose combining Hidden Markov Models and neural networks for intrusion detection as my subject. The subject was quite rare and difficult to search for because of different ways to describe the method, but eventually I found eleven papers on the subject. I decided to read them all and select five best ones to review. As I had other subjects at the time and the deadline was quite relaxed, I didn’t do the whole assignment at once, so I often read just one paper at a time.
When I was reading one of the papers (X), I thought it seemed strangely familiar. There were quite a few spelling errors and sentence structures which I have seen before, but I assumed such errors could be common for people whose native language is not English. When I got to the results section, I noticed that the performance of the proposed method was compared to the (presumably) current state-of-the-art method. Paper X didn’t actually mention that this method was state-of-the-art, so I decided to see if other papers mentioned it. Indeed, one of the papers (Y) also compared their results to this traditional method! Both paper X and paper Y reported their method to be better than the “state-of-the-art”. Of course, I wanted to know which method was the best: X or Y. This is where things got really interesting, because X and Y reported EXACTLY the same performances.
I started comparing the rest of X and Y only to conclude they were completely identical down to the spelling errors, but had different titles. “Oh, that explains it, these are just different versions of the same paper!”. Unfortunately, the papers had different authors from different educational institutions. Then I thought, “Oh, X is the real paper and Y was never published”. Wrong again, both papers were published at conferences and journals.
Last year, I tried googling the authors of either of the papers together with “plagiarism” but I didn’t find any results. I forgot about the whole thing until now, when I noticed both papers were still on my USB drive. I decided to see whether I could find both papers with one query: I googled an exact sentence from the abstract. Yes, both papers were there. But it gets even better!
There was also THIRD paper. Although it was “better disguised” and somewhat different from X and Y, large parts were still completely identical! And yet again, the paper had different authors from different educational institutions and was published at a different conference.
I wonder who these people are and why (so obviously) they copy existing research. Maybe it’s just one big experiment to show the world how easy it is to get away with plagiarism.