(Moral) Hazards of Scanning for Plagiarists: Evidence from Shoplifting

Nearly everything we buy nowadays is electronically scanned to ensure that we paid for the items in our bags—and under our coats.  Stores began using sensor tags and security screens in the early 1970s.  According to the New York Times, the market for anti-theft systems grew rapidly because they were viewed as “more reliable and less expensive” than having employees watch customers.

Students are being scanned as well to make sure that the words in their papers were not swiped from other sources.  Scanning papers began a decade ago when anti-plagiarism software was created to compare the phrases of student papers with other sources.  The leading anti-plagiarism software is Turnitin, which compares student papers with academic journals, Internet web pages and its library of previously submitted papers.  On its home page, Turnitin quotes an instructor as saying, “I used to spend hours on Google searching for unusual wording when I suspected that the paper was not written by the student. Now, I can search quickly with Turnitin!”

Scanning store customers and student papers are touted as substitutes for labor, so that clerks and instructors can spend less time guarding against thievery and more time doing what they do best, serving customers and teaching students.  Sounds great; sounds efficient; sounds easy!

In the years before sensor tags and security screens, the battle against shoplifters was waged with security guards and convex mirrors.  An expert on store security—quoted in Shoplifting: A Social History—argues that hiring security guards may have actually increased shoplifting because other employees were likely to think, “Pete is here, so I don’t have to watch out for shoplifters.”

Today, store clerks may think, “our stuff is tagged, so I don’t have to watch out for shoplifters.”  Indeed, stores have begun to question whether the substitution of security systems for labor has gone too far, reenlisting labor by having employees greet customers.  On a recent shopping trip, my daughter Emma and I were greeted by a handsome teenager at Abercrombie, a well-dressed woman at J. Crew and an elderly guy at Wal-Mart.  Sure, the first two were modeling clothes.  (I hope the last guy wasn’t modeling them for me!)  But, their real job is to make eye contact with customers to deter shoplifters.

Similarly, teachers might think, “I’m using Turnitin, so I don’t have to watch out for plagiarists.”  The instructor quoted on Turnitin’s website certainly thinks so, implicitly arguing that Turnitin is a perfect substitute for her own investigations using Google.  Not surprisingly, Turnitin encourages this belief.  On its website—right next to her quote—Turnitin advertises that it has crawled and indexed “14+ billion web pages.”  Choosing between Turnitin and instructor investigations seems like a no-brainer.

But wait, how many web pages are there on the Internet?

A few years ago, Google announced that it had crawled and indexed a trillion web pages.  That makes TurnItIn’s crawlers look puny, having searched and indexed only 1.4 percent as much of the Internet as Google’s.

I wanted to test Turnitin but needed a suspicious manuscript.  I had one in my hands—Shoplifting: A Social History.  I suspected Kerry Segrave of plagiarism when I heard echoes of his book while reading New York Times articles he cites.    He cites a lot of them—35 in the first 14 pages!   To investigate my suspicions, I created a document containing the 14 pages stripped of direct quotations and another one containing the New York Times articles.  I began by searching for identical phrases (of at least 6 words) in the two documents using the open source software Copyfind, which highlighted the matches it found in each document and produced the metric that 15 percent of the early pages of Shoplifting were taken verbatim from the New York Times.  (Here is the document that highlights the matches.)

But this measure captures only the most flagrant form of plagiarism, where passages are copied from one document and pasted unchanged into another.  Just as shoplifters slip the goods they steal under coats or into pocketbooks, most plagiarists tinker with the passages they copy before claiming them as their own.  In other words, they cloak their thefts by scrambling the passages and right-clicking on words to find synonyms.  This isn’t writing; it is copying, cloaking and pasting; and it’s plagiarism.

Kerry Segrave is a right-clicker, changing “cellar of store” to “basement of shop.”  Similarly, he changes goods to items, articles to goods, accomplice to confederate, neighborhood to area, and women to females.  He is also a scrambler, changing “accidentally fallen” to “fallen accidentally;” “only with” to “with only;” and, “Leon and Klein,” to “Klein and Leon.”  And, he scrambles phrases within sentences; in other words, the phases of his sentences are sometimes scrambled.

I spent hours comparing the two documents, matching phrases and highlighting the ones that were copied, cloaked and pasted into Shoplifting. My estimate is that 32 percent of the early pages of Shoplifting are taken nearly verbatim from the New York Times. (Here is the document that highlights the matching phrases.)

To test Turnitin’s crawlers, I uploaded the document containing the New York Times articles to my website a few months ago.  Google now matches many of the plagiarized phrases from Shoplifting to the New York Times articles on my website and some of the phrases to articles in the archives of the paper. Google also matches them to Shoplifting itself, which has been scanned into Google Books.

Turnitin fails to match the plagiarized phrases to any of these sources.  I e-mailed Turnitin’s help desk, essentially asking, “What’s going on?  Why can’t Turnitin find these things?”

A few hours later, a guy at Turnitin’s product support sent me a detailed answer that boils down to three basic points—the Internet is a big place and it takes our crawlers time to scan it; we can’t scan the New York Times because it requires a subscription; and, we can’t scan images of text like those used by Google Books.  In other words, our crawlers are puny compared to Google’s.

I decided to give Turnitin a little help, so I submitted the document containing the New York Times articles as a student paper, causing the file to be catalogued in Turnitin’s library of student papers.  This enabled Turnitin to find the file and then to compare Shoplifting with the New York Times articles.  It produced an originality report that highlighted matched phrases and concluded that 25 percent of the phrases of Shoplifting were very similar to those of the newspaper articles. (Here is the document that highlights the matching phrases.)

Nearly all the passages highlighted by TurnItIn are also highlighted by me.  However, I highlight a few more because my algorithm—embedded in my brain—casts a wider net than the one used by Turnitin.  However, the differences are relatively minor—they both present compelling evidence that Shoplifting is an example of Wordlifting.

But Turnitin needed my help to find the original sources of the plagiarized phrases, making it a poor substitute for instructors who are willing to “spend hours on Google searching for unusual wording.”  It needs the help of instructors who are willing to investigate suspicious papers; otherwise, greater reliance on Turnitin could lead to more plagiarism.

There are other ways that instructors may change their behavior if they believe that anti-plagiarism software insures them against the risk that their students are plagiarizing.  Economists give a fancy name for changes in behavior induced by insurance—it’s called moral hazard.

One instructor told me that he used to devote an hour to discussing plagiarism with his class—what it is; why it’s wrong; and, where students go when they get caught.  Now, he just tells them that he uses Turnitin and lets them infer that plagiarizing is not worth the penalty.  He lauded the change, saying it saved him valuable class time.

Relying on students to weigh the benefits and costs of plagiarism in this way assumes that they are good stewards of their future selves.  Just as some shoplifters may give too much weight to the thrill of shoplifting, some students may give too much weight to starting their weekends early.

Instructors may also change the way they write their essay assignments.  One of the best ways to suppress plagiarism is to come up with creative assignments that are literally one-of-a-kind.  For example, I like to rip mine from the headlines by asking my students to write op-eds on current legislative proposals. If I felt insured against plagiarism, I might not spend hours looking for unusual proposals and instead tell students to write their essays on any topic they found interesting.

Instructors, like all human beings, look for excuses to avoid doing things they don’t want to do.  Grading essays is hard—often discouraging—work, so instructors look for excuses to avoid assigning them.  One plausible excuse is that plagiarism is rampant, making in-class exams better measures of students’ performance.  Anti-plagiarism software may make this excuse less credible, nudging some instructors to assign more essays.  Hence, moral hazard can work in the opposite direction, something akin to moral security.  Feeling insured against plagiarism, instructors may decide to do the right thing and assign more essays.

Turnitin is also being used to teach the wrong lessons concerning how to write well.  Searching Google, I found syllabi of instructors who use Turnitin to teach students how to paraphrase well.  In particular, they ask students to check the originality reports of their rough drafts and make any necessary changes to improve their paraphrasing of sources prior to submitting the essays to be graded.  In the hands of a skilled instructor, it might teach students how to paraphrase well.  But, I think it is more likely to teach students how to right-click words and scramble phrases to get acceptable scores on Turnitin.

I want to teach my students how to write well, not simply paraphrase well.  I also fear that copying, cloaking and pasting is endemic.  Hence, I would not allow my students to use originality reports to revise their drafts.

But I would have no choice because Turnitin offers another product called WriteCheck that allows students to “check [their] work against the same database as Turnitin.”  I signed up and submitted the early pages of Shoplifting.  WriteCheck matched many of Shoplifting’s phrases to those of the New York Times articles in its library of student papers.  Remember, I submitted them as a student paper to help Turnitin find them; now WriteCheck has them too!  WriteCheck warned me that “a significant amount of this paper is unoriginal” and advised me to revise it.  After a few hours of right-clicking and scrambling, I resubmitted it and WriteCheck said it was okay, being cleansed of easily recognizable plagiarism.

Turnitin is playing both sides of the fence, helping instructors identify plagiarists while helping plagiarists avoid detection.  It is akin to selling security systems to stores while allowing shoplifters to test whether putting tagged goods into bags lined with aluminum thwart the detectors.

I am not a Luddite.  I use an online homework system in many of my courses and I plan to experiment with student response systems.  And, I think that anti-plagiarism software is a useful tool, but should be used as a complement to, not a substitute for, instructor investigations of suspicious language, class conversations on plagiarism, and creative essay assignments.

This fall, I plan to say to people, “I’m using anti-plagiarism software, but I’m still watching out for plagiarists.”

This entry was posted in Uncategorized. Bookmark the permalink.

24 Responses to (Moral) Hazards of Scanning for Plagiarists: Evidence from Shoplifting

  1. Pingback: Turnitin: Arming both sides in the Plagiarism War — Marginal Revolution

  2. Pingback: Wired Campus - The Chronicle of Higher Education

  3. Jason Bennett says:

    I wonder about the Turnitin – Google comparison. Although Google clearly does string matching, its results are also weighted by an algorithm that takes into account the relative popularity of a web page – that is, how many time is it linked from other websites. Obviously, it should rank a direct phrase match highly, but what about those matches that are less clear and not linked anywhere? There’s a lot that goes into a search algorithm and what makes a good page to index and prioritize for Turnitin is probably quite different from that for Google. I’m just saying that the page count may not be that useful a comparison.

  4. Mike Tsinido says:

    Having used Turnitin before, I can definitely say that most of the points made here are valid. However, I have also observed that teachers who don’t enable originality reports or tell students to use the WriteCheck system will get fewer cases of plagiarized papers. This is because students are afraid that the computer system is smarter than an actual human reviewer–and will usually write on their own rather than copying material from elsewhere. It’s much easier to trick a person than a piece of software, or so one might think.

    In addition, I don’t think most students would spend a few hours re-arranging the words in their plagiarized content to evade systems like Turnitin, because that’s almost as much effort as just writing something from scratch.

    What would really be interesting is a system like Turnitin, sponsored by Google, that checks for plagiarism or violations of copyright license terms on places like Wikipedia, or various blogs. Google would make a good deal of money buying Turnitin, I think, and marketing an expanded version using their vast resources.

    I’ve never been comfortable with the fact that Turnitin archives all essay submissions, many of which I presume contain personal details that someone might not want others to read. If their database of articles was compromised, that would be very embarrassing indeed.

  5. God is just. If you cheat in school or on a resume, you might land yourself in a job over-your-head. Actually, even accepting affirmitive action seems foolish. Integration, however, can be good if it prevents arrogant big fish in little ponds.

  6. Everyone whose writing builds on other sources (i.e. all students and all scholars) struggles with finding the fair balance between giving credit to sources and taking credit for their own words and ideas. Students in the sciences initially find this very difficult, perhaps because they do little writing. Rather than my using Turnitin in an adversarial way (I can catch them cheating!), I have them use Turnitin as a tool to help them learn what’s appropriate, encouraging them to check their drafts on Turnitin before they submit the final versions to me. I agree that it’s a pretty crude tool, but it’s better than nothing.

  7. Thanks for throwing this out there. I know I’ve had trouble with turnitin.com in the past not catching things, while at the same time putting up far too many false positives. I suppose its still best to go with your gut and the smell test. I had no idea it wouldn’t read through an internet pay wall. I wonder what the fees paid actually cover.

  8. Alan says:

    I recognize three problems with plagiarism: (1) that the plagiarizer cheats themself by not gaining the experience that writing a paper of their own ought to give them, (2) that the true author does not get credit for their words, and (3) other students may be unfairly penalized by the false competition.

    The second case may be important in instances where the plagiarizer takes rewards intended for the true author, but in the case of student papers plagiarized from a published author this is seldom significant. The third case can be highly detrimental to other students who make the commitment to learn the material for themselves but cannot get recognition for it because they are unknowingly competing against experienced professionals. The first case is by far the most important, because a student cheats themself of the education they are paying for.

    It seems to me that we need to start paying less attention to grades and more attention to people. With rampant grade inflation, we should already be aware that good grades no longer mean much – and reducing the usefulness of competition for grades can do a great deal to encourage those students who are actually interested in learning to not be afraid to learn instead of competing for grades.

  9. Rozenbury says:

    Turnitin has a history of using copyrighted material for profit without permission. This is unsurprising.

  10. Bill Glassman says:

    An interesting analysis, and while Harrington makes some good points, he never addresses the most serious moral hazards of using Turnitin: first, it takes away a student’s right to control use of their creative works (by compelling inclusion in their database). Second, and more importantly, it breaks the moral bond of trust and respect which I consider fundamental to any effective teaching relationship. Imagine going into a store, and being told as you enter that “we put electronic tags on our goods, to make sure you can’t steal from us”. If they did so, I suspect many customers would turn around and leave. Yet to start a course by saying “I require you to submit all your work to Turnitin, to make sure you don’t plagiarize”, is to say “I don’t trust you to take your role in this course seriously.” Research has shown that one of the most influential determinants of cheating is the degree of respect that students have for a teacher–they are significantly less likely to cheat in courses where they respect the teacher than when they don’t. During my university teaching career, I refused to use Turnitin, and I’m saddened by the shifts in the academic world that make many teachers feel they have no option–and that even Harrington fails to see the moral issue of a broken relationship between teachers and students.

  11. Amoeba says:

    Almost my entire life I studied science (real science like math, physics, chemistry, algorithms) and never had to write an “essay” filled with BS about . It was easy for a professor to give unique assignment for every student that vary in few parameters (like design reactor control system for exothermic reaction of … ) and never repeat next year assignments. That is till students found … Mathcad (it was back in 1990s) and 2 day assignment turned into 5 minute task with symbolic algebra system. Some started accusing students in cheating with computers but some simply changed assignment so that it would take same 2 days WITH algebra system on computer. How about finding an original subjects for essays? Writing a blog post that will trigger heated discussion and participating in it with arguments. In other words how about evolving education at same speed world evolves.

  12. Pingback: Moral hazards of plagarism detection | Stephen Tyler

  13. Mick Hayes says:

    This is a very interesting article and does raise some good points about Turnitin and plagiarism prevention / detection. The central point that Turnitin is just a tool is well made and is supported by my experience (for whatever that is worth.

    For me, Turnitin works quite well as part of whole set of actions both during teaching and after the coursework is submitted, although to rely on it totally would be foolhardy. If combined with taught sessions on academic writing, plagiarism and referencing before students begin writing and with due diligence during the marking stages (as suggested in the original post), Turnitin can cut down a great deal on the grunt work associated with tracking down sources and is particularly useful at spotting collusion and the theft of another student’s work.

    What is important is recognising that as well as the different types of plagiarism discussed in the original post (I like the term cloaking and pasting, by the way), there are different reasons for plagiarism, which require different approaches. The accidental plagiarist can be educated using Turnitin as a way to demonstrate what should be referenced or what is not a paraphrase (obviously as the original post states there is a need to counsel against simple rearrangement. A subset of these includes those students whose previous study culture (either in a different country or in a different type of institution) either encourages or does not punish copying. Again counselling and plagiarism can deter this.

    Of those deliberately attempting to deceive, many are driven to it by panic having left writing the work too late, and these by definition are more likely to simply cut and paste with minimal changes and are unlikely to be able to submit to Turnitin early enough to use the information it provides to hide their plagiarism.

    Those who are for various reasons more coldblooded about their plagiarism are most likely to employ counter measures, such as rearrangement or cloaking and pasting. These are more difficult to spot naturally, but I have found Turnitin useful in this, in that sometimes a short phrase will be spotted (or a string of references that the student has lifted) and similarities can be seen in the text around the source shown on the originality report and the student work, even if the phrases are different.

    One final point at the end of this ridiculously long response, is that quite often the use of synonyms can catch students out, particularly if they do not understand what they are writing, since Word will not always suggest synonyms that are appropriate (such as a student of mine at the end of a long paragraph lifted wholesale, who changed “bog down”, as in slow down, to “marsh down”). Turnitin can spot individual changed words in a quote, so if a marker spots such a change being made once in a Turnitin report they know to keep an eye out for others, but this is really the sort of thing that an engaged marker should be seeing for themselves

    Apologies for the length of the reply, but this is an issue I feel passionately about and I am also trying to avoid doing a particularly boring piece of work! Maybe I should just copy it from somewhere….?

  14. Ron Bannon says:

    Schools, for the most part, are staffed by adjuncts who care little and are paid even less. In turn, getting a degree, even if it is hard earned, has little to no value. I’m afraid that all this talk about plagiarism is of no practical use.

  15. Will says:

    How about finding an original subjects for essays?

    In theory, this is a great approach. In practice, things get a lot more complicated.

    1) Coming up with a topic that no one on the Internet has ever written about anywhere is so difficult as to be practically impossible. The best you can hope for is to find a topic that hasn’t been written about much — an obscure topic.

    2) Then, when you’ve found a sufficiently obscure topic, your students will have a hard time finding sources discussing that topic. This helps cut down on plagiarism, but at the price of making the task MUCH harder for the honest students. If they can’t find existing materials discussing the topic, students must either turn in a paper with inadequate support for their argument, or conduct original research. Conducting original research would be great, but it’s both hugely time consuming and extremely difficult for an inexperienced researcher. Most undergraduates don’t have the dedication it takes to pull that off, and practically none of them have the time between balancing 5-6 classes and busy social lives.

    3) Perversely, when the writing prompt is too hard because the students can’t find enough sources on the topic, it actually encourages plagiarism. Students who despair because they just can’t find enough about the topic are much more likely to break down and paste in chunks of whatever they did manage to find.

    4) Lastly, picking an obscure topic to discourage plagiarism undermines any sense of participation in a conversation. First and foremost, writing is about communication with other people. When there are few or no other people writing on a topic, where’s no one to communicate with. Writing teachers have a hard enough time fighting the I-am-a-robot-now-I-will-write-a-five-paragraph-essay mentality as it is. The more obscure the topic, the more artificial the assignment; and giving artificial assignments makes it much harder to get the students to put serious thought into what they’re writing.

    • noone says:

      “a topic that no one on the Internet has ever written about anywhere”
      Especially if you are teaching classics, history, philosophy of all of which are well trodden paths. Not that that is a bad thing.

  16. David Wees says:

    This is a very minor point, and one which strengthens part of your argument in one sense.
    14 billion is 1.4% of 1 trillion, not 14%. This means that TurnItIn.com actually indexes an even smaller percentages of the Web.

    This number is a bit misleading, as much of the web is comments on YouTube or Porn, neither of which are worth indexing by TurnItIn.com

    That aside, I can imagine students taking the time to download a paper, and mix it up, and check it for originality. It takes much less time and effort to reword someone else’s paper than it does to come up with your own original thought, or to do proper research in advance.

    The alternative to relying on TurnItIn.com (besides good discussions with students about plagiarism in advance of assignments) is to use assignments which are too specific to your course, and to the particular group of students you have, to be plagiarized.

  17. Empty Vessel says:

    So, rearranging words is plagiarism. Rearranging ideas constitutes creation of original work.

    Nothing like a bright line to understand a distinction, eh?

  18. Jennifer says:

    “In addition, I don’t think most students would spend a few hours re-arranging the words in their plagiarized content to evade systems like Turnitin, because that’s almost as much effort as just writing something from scratch.” –Mike

    Unfortunately, while it is rare, it does happen. As does simply changing the characters’ names and the location, changing sources every other line, and the issues described in the article. It’s sad because as you say, the original assignment would have required less effort, but some still do it anyway.

  19. Pingback: Turn it in, or check first? | ededu.net

  20. Pingback: Anti-plagiarism tool Turnitin can be a plagiarist’s best friend | Ebooks on Crack

  21. Janne says:

    I’m not sure just how valid your comparison really is. After all, you already had reason to believe the book was plagiarized, and had a very good guess as to the source. If you had no reason to suspect the book had lifted material from the NYTimes, and you didn’t have a subscription to the newspaper to actually see the original articles, would you have fared much better than Turnitin?

    Also, I think it is reasonable that people can check their writings against the database. Intentional plagiarizing is only one way to match others writings after all. You may have read a source once, and particular turns of phrase, a specific example, or a particular analysis may have got stuck in your mind. A year later the same ideas pop up again, with the original source long forgotten. Better to be able to check, lest you get accused of cheating.

  22. Eulises says:

    The article taught me well on how plagiarism works in many different ways. Turitin is a Web site that teaches how to write plagiarized work on different articles, and how to do it. Shoplifting is also Plagiarism its like copying and pasting, except your just stealing it. I like how the author compared Shop lifting to Shoplifting.

  23. Sam Deeks says:

    I got pretty good at detecting plagiarism at the UK universities where I lectured in the early days of Google. The thing that always caught my eye was passages that were too confident to have been written by one of my students. Other lecturers (2nd markers) would berate students for what they thought were ‘unsubstantiated claims’. I on the other hand, knew that the reason that these passages were brimming with ‘unsubstantiated’ confidence was because they’d most likely come from a far more (and rightfully) authoritative source.

    Even back then, Google would find stuff incredibly quickly and effectively. The real problem we had (and probably still have except I quit academia in 2004) was the total unwillingness of the Faculty to ever punish the students I proved were plagiarists. Reason? Couldn’t afford to lose anyone – either from ‘failure’ of a year and certainly not for plagiarism.

    It’s inevitable that someone would try to cash in on this with software. But just like with online reputation management, I’ve also found nothing beats experience, an awareness of the underlying psychology of plagiarism and the basics of Google itself.

    When all’s said and done, the real tragedy was that – at least in the Arts where I taught – students were not only plagiarising their written work to the point that it was a pointless exercise, but also plagiarising their practical work. I remember the look of self-disgust growing on the face of one student as he got steadily drunk at the Degree Show as it dawned on him just how meaningless his 2.1 Degree award was. Tragic.