Tuesday, January 14, 2014

What is Self-plagiarism? (2)

As the discussion continues about what exactly constitutes self-plagiarism, acceptable re-use, and generally 'good publication ethics', it seems a good idea - in the hope of separating opinions from facts - to collect some more authoritative sources on the topic.
  • The Committee on Publication Ethics (COPE) always seems to me to care mostly about combating predatory publishing houses (and they do great work in that arena), but they have also set up some guidelines for authors. On the topic of self-publication they say

    Multiple publications arising from a single research project should be clearly identified as such and the primary publication should be referenced. Translations and adaptations for different audiences should be clearly identified as such, should acknowledge the original source, and should respect relevant copyright conventions and permission requirements. (art. 4.6)
  • Ron Ritzen pointed me to an editorial of The Lancet in 2009, which gives (in my opinion) a balanced view on self-plagiarism. The central statement is Deception is the key issue in all forms of self- plagiarism (...), meaning that whether a similarity constitutes self-plagiarism depends on whether deception is involved.
  • Although individual researchers should not be considered authoritative on a topic of self-plagiarism, Nijkamp himself is probably quickly gaining a vast amount of knowledge on this topic. Since he is also a central element in this discussion, it is interesting to note his defense. As regards the claim of self-plagiarism, the defense is organized around two arguments: first the constraints of language and topic (see below), and secondly the existence of 'halffabrikaten'. ('Halffabrikaten' refers to intermediate products as they are passed from one manufacturer to another, such as components or basic materials such as metal sheets or polymer pallets; is there a good english word for that?) In Nijkamp's defense, the 'halffabrikaten' are a common component of current publication culture, and they will have non-trivial overlap with other 'halffabrikaten' and with the 'final products' as a matter of course.
One question that I have not found a single answer to is the very mechanistic (and mathematical?) question
  • How does one distinguish, in concrete situations, between duplication that is the result of the limits of language and the constraints of the topic, and duplication that indicates a real copying of creative work?

    At one end of the spectrum, any two english articles - any two articles - will share at least 90% of their words. And: This sentence has been written before. I mean it: the sentence "This sentence has been written before" has been written before - Google alone gives five cases. Does my writing it here constitute plagiarism? I can honestly vouch that I didn't copy the sentence - I first wrote it (i.e. created it myself), then copy-pasted it into a Google search box, and indeed five hits came up. There are many obvious situations, like this, in which multi-word similarities are perfectly compatible with honest creation.

    At the other end, someone who copies a multi-page article lock, stock, and barrel (maybe even the biographical details? :-) can obviously not claim to have created themselves.

    (The remark about the bio has a barb to it - Frank van Harmelen pointed out that the Volkskrant list of Nijkamp's 'similarities' includes a biographical description of Nijkamp himself. Is it self-plagiarism to describe your own career twice in the same words? A better proof of the limitations of similarity-detection software is hard to give.)
As always, all comments are welcome.

Tuesday, January 07, 2014

What is self-plagiarism?

Tonight's NRC reports on their own research into plagiarism and self-plagiarism of Peter Nijkamp, a highly renowned and respected Dutch economist. The research shows a number of cases of plagiarism and self-plagiarism, the NRC claims, and the paper gives a few examples in the article at nrc.nl.

Upon reading the article I found myself wondering a few things. To start with, I realized that some of my own work would be self-plagiarism in the definition that NRC uses, and according to the 'code of conduct' that they quote would be 'scientific misconduct'. That realization really stopped me in my tracks - I believe that I have always been very conscientious with my scientific-ethical standards.  More on this below.

The article also got me thinking about the more abstract question: What exactly is self-plagiarism?

 NRC defines self-plagiarism as follows:
'Wetenschappers dienen bronnen die zij gebruiken zorgvuldig aan te geven, ook als het werk van henzelf betreft. De vindplaats van de oorspronkelijke tekst en naam van de auteur moet erbij worden vermeld en letterlijk overgenomen tekst moet tussen aanhalingstekens. Als een auteur eigen teksten gebruikt voor een nieuwe publicatie, maar dit er niet bij vermeldt, is dat zelfplagiaat.' 

According to NRC, self-plagiarism is a violation of the VSNU code of conduct: 'Deze regels gelden ook als een auteur eigen teksten opnieuw gebruikt in een nieuwe publicatie. Gebeurt dat niet, dan is er sprake van (zelf)plagiaat. In de Nederlandse Gedragscode Wetenschapsbeoefening van de Vereniging van Universiteiten staat dat wetenschappers de regels voor bronvermelding en auteurschap zorgvuldig moeten naleven, ondanks toenemende prestatiedruk.'

Actually, on the topic of citation ('bronvermelding') the VSNU code of conduct states 'Door correcte bronvermelding wordt duidelijk gemaakt dat er niet wordt gepronkt met andermans veren. Dit geldt ook voor informatie die van het internet is gehaald.' This does not seem to condemn self-plagiarism. Not condemning something is not the same as allowing it, of course, but in the whole code of conduct I could not find any mention of self-plagiarism, or anything at all that seems to relate to re-use of one's own work.

Let's take a wider view, and include some other sources and arguments. 
  1. For instance, Dutch copyright law gives the maker an unalienable right to 'say the same thing again', i.e. to plagiarize yourself. This right is unalienable - it means that whatever contract you sign, as long as Dutch copyright law applies, you retain the right to copy from your own creations. In other words, Elsevier (for instance) can not prevent you from publishing your Elsevier article a second time somewhere else. (Whether this is a good idea is a different question; but copyright law can not be used to prevent or punish it). (Update: see below)
  2. The ESF has a Code of Conduct for Scientific Integrity, which doesn't mention self-plagiarism or re-use of own material.
  3. SIAM, the Society for Industrial and Applied Mathematics, has a section on duplicate publication, which proposes a 'reasonable person'-criterion: a few sentences are okay, a whole paper is not.
  4. The Office of Research Integrity has - unsurprisingly - an extensive discussion of plagiarism and self-plagiarism. I find rather interesting that the ORI adopts the principle “… the essence of self-plagiarism is [that] the author attempts to deceive the reader”.  
  5. The source of the ORI quote is this piece by Hexham, which discusses self-plagiarism at length, and makes various (in my eyes) reasonable points, such as 'The extent of re-cycling is also an indication of self-plagiarism. Academics are expected to republish revised versions of their Ph.D. thesis. They also often develop different aspects of an argument in several papers that require the repetition of certain key passages. This is not self-plagiarism if the work develops new insights. It is self-plagiarism if the argument, examples, evidence, and conclusion remain the same without the development of new ideas or presentation of additional evidence. In other words it is self-plagiarism when two works only differ in their appearance.'
  6. I couldn't find any guidelines by the NSF, although they must exist, since google shows various integrity investigations of NSF researchers. 
Putting the various views together, I'm not sure I agree with the NRC's wholesale condemnation of self-plagiarism. I feel the ORI's point of view strikes a good balance: the central criterion is whether the re-use leads to deception of the reader. I might even include the possibility that the deception is non-intentional, i.e. the author does not appear to have the intention of deceiving the reader, even though a typical reader might well be deceived.

And how about my own re-use of material? I'm still not quite sure whether it would fall under the NRC definition. (For those interested: I'm talking about these two papers, which are summarized and reformulated in this, this, and this paper.) But I do know that I was above-board about the re-use: each of the editors of these three proceedings knew that this was non-original work, and agreed to it. Following my preference for the ORI's point of view, I see now that the question really is: did we manage to non-deceive the reader?

Your comments are welcome ...

(There's also an interesting related issue about re-using material in grant proposals. Prodigal Academic has a nice discussion on this topic.)

(Update: the Free University is starting a wholesale investigation of Nijkamp's work)

(Update: It turns out I was wrong about the unalienable right to re-say the same thing again. It turns out that this is restricted to painters, interestingly enough. To be precise,

Tenzij anders is overeengekomen blijft de maker van eenig schilderwerk, niettegenstaande de overdracht van zijn auteursrecht, bevoegd gelijke schilderwerken te vervaardigen (artikel 24 van Auteurswet 1912, zoals geldig op 13 januari 2014).

Another article may be important in this context:

Als inbreuk op het auteursrecht op een werk van letterkunde, wetenschap of kunst wordt niet beschouwd de incidentele verwerking ervan als onderdeel van ondergeschikte betekenis in een ander werk (artikel 18a).


What exactly constitutes 'ondergeschikt' will pay for the salaries of many lawyers :-)