Revisiting Hurlbert 1984

In a paper published in Ecological Monographs in 1984, Stuart Hurlbert examined 176 experimental studies in ecology and found that 27% suffered from ‘pseudoreplication’ – the use of statistical statistical testing in situations in which treatments were not replicated or the replicates were not independent. When only studies that used inferential statistics were considered, the figure was as high as 48%. To overcome this problem, Hurlbert recommended that treatments should always be interspersed in experiments, even if it meant not being able to randomized samples, especially in small experiments. Thirty-two years after the paper was published, I spoke to Stuart Hurlbert about how he got interested in this topic, his memories of working on this paper, and what we have learnt since about how to deal with pseudoreplication in ecological experiments.

Citation: Hurlbert, S. H. (1984). Pseudoreplication and the design of ecological field experiments. Ecological monographs, 54(2), 187-211.

Date of interview: 2 December 2016 (via Skype)

Hari Sridhar: This was one of the first papers we were asked to read when I was doing my Master’s in Wildlife Science at the Wildlife Institute of India. I remember really enjoying it at that time.

Stuart Hurlbert: One reason that I think people find that paper easy to understand was that when I wrote that paper my own understanding was only about a year ahead of the students. I was pretty naïve at that time.

HS: What was your motivation to write this paper at this point in time in your career?

SH: Well, when I was a graduate student at Cornell, I was doing my doctoral research on salamander migrations, and my doctoral research was not experimental. I had a couple of semesters of statistics there, but I was not interested in learning more than I needed to analyze my own data. Then, I went for a post-doc at University of California, in Riverside, working with Mir Mulla, an economic entomologist, interested in mosquito control. The university had developed, both in Riverside and in a couple of other locations, sets of experimental ponds for doing experiments with treatment replication on effects of pesticides on aquatic wildlife – plankton, insects, ducks, and so on. I did some experiments of that sort as a post-doc, and I also started looking at the effects of fish predation on the plankton communities using these same ponds. That four year post-doc  – because it was so different from my graduate work – was almost like a second PhD, and got me into all sorts of new things –  aquatic ecology, pesticides -and using experimental systems. And when I was researching other studies that had been done on pesticide-wildlife interactions or pesticide-aquatic systems interactions, one thing that struck me was that many of these studies did not seem well designed because – although, I’d never had a course in experimental design- it was intuitively obvious that if you’ve got two ponds and you treat one with pesticide and keep the other as a control, you’re doing something wrong there. You don’t have the sort of treatment replication that you need. And that’s sort of a common sense thing that, I think, a lot of non-statisticians would understand. And so some years later, in 1980, a bunch of ecologists at Florida State University – Larry Abele, Dan Simberloff and Donald Strong  – decided to have a symposium on community ecology, and they invited me to give a paper at the symposium on any topic I wanted to talk about. At that point, I said, “Well, this will be a good chance for me to put together all of my notes about these studies with un-replicated treatments and do a review paper on that issue”. So that’s how the paper started. I prepared a manuscript that I brought with me to this symposium, and gave a presentation at the symposium that got a lot of interest from people. It was a manuscript that I had written without ever having read a book on experimental design. And it listed, by name, all of the studies in which I had found this error I was calling “pseudoreplication”.  I think I reviewed 101papers and found 48% of them with pseudoreplication.

Two people who really helped me at that symposium made a very gentle suggestion. One was Tony Underwood, a marine ecologist in Australia, and one was Bill Platt, who is, I think, a forest ecologist in Florida. They suggested that they thought the manuscript was pretty good, but that maybe I should read a couple of books on experimental design before I did the next draft! So it actually took me quite a long time. And the paper was completely revised and much longer by the time it was submitted. Abele, Simberloff and Strong had told us they were going to be publishing the proceedings of the symposium. So, after the symposium and after we had revised our papers, everybody submitted them and eventually most of the symposium came out as a book edited by those three guys and Anne Thistle titled “Ecological communities: conceptual issues and the evidence”. They’d given us some instructions about length of the manuscript, which I had sort of ignored: my manuscript was about 90 pages long! So when they got it, they said, “Gee, this is really good stuff, but we just don’t have room for it. This would be, you know, a third of the book.” They opined I should not have trouble finding some other place to publish it. So I went off to Ecological Monographs and got very good service from Nelson Hairston Sr. who was the editor of Ecological Monographs at that time.  And he basically accepted it within two months with no changes. It’s just one of these accidents of fate. If they had published it in the book, I’m sure it would have gotten much less attention than it did by coming out in Ecological Monographs. So that was a case where being rejected had a very positive outcome.

HS: Did you say the manuscript you submitted for the symposium book was 90 pages long?

SH:  Yeah, the manuscript was about 90 pages, right.

HS: Did you shorten it to submit it to Ecological Monographs?

SH: No no, there was no shortening. That was 90 pages, double spaced. I think the published version is about 30 pages.

HS: You mentioned a couple of people who appreciated it and gave you comments at the meeting. What was the general response when you gave the talk? Was this something that people took to immediately or was it something that they found controversial?

SH: I don’t recall too much. Because there were a lot of talks, you know, and we had the standard 10 minutes for discussion after each talk. And I don’t recall too much of what the comments were at the meeting. I do recall, when Nelson Hairston accepted the paper for Ecological Monographs – he had been at the symposium and given a paper on his salamander experimental work – he wanted to add his own little note to the paper, and it was going to say that how he himself had heard this talk at the meeting, and how he and many other people in the audience were shrinking back in their chairs as they heard my talk, recognizing that, in fact, papers they had recently published had the same problem in them. I don’t recall actually sensing people shrinking back in the chairs, but he was referring to his own experimental work. One of the ironies is that – and I wrote this up in one of the commentaries that I’ve published about that paper- when I was doing my survey of the literature for that paper, some of Hairston’s papers actually fell into the set of journals and issues that I was using as my source for experimental papers. And I remember looking at his papers and thinking that his statistical analyses weren’t tremendously complicated, but they were fairly complicated relative to my understanding of statistics at that time. So I looked at them and said, “Oh, this, you know, I’m not quite sure. But yeah, I guess these are okay.” But in retrospect – and I commented this in published things later – his experimental papers, most of his salamander papers have pseudoreplication in them.  And I’m not sure what the fate of my manuscript would have been, if I had listed him, the editor, as a pseudoreplicator. And ironically, he had two different people review my manuscript. One gave it very positive view, said“fine”. And the other said, “Well, this is very nice, but, basically, everybody already knows this stuff. So, have Hurlbert reduce this to a letter to the Bulletin of the Ecological Society of America.” Reducing 90 pages to a letter! Hairston simply ignored that, and simply accepted the manuscript, essentially, as it was. And some years later I found out, when I was giving a talk in North Carolina where Hairston had been before teaching, that the person who recommended reduction of my manuscript to a letter was the same person that Hairston had relied on for advice on statistical analysis. Very interesting interconnections there.

HS: Was there some truth to what the second reviewer said, i.e. had people earlier hinted at the problems you were talking about, or was it completely fresh?

SH: So, for example, for people who work in say agriculture, their journals would never accept the sort of papers that the ecological journals were accepting. Agricultural papers would never accept a study on the effects of fertilizer on wheat yield where you had one field with wheat and fertilizer and one field just with wheat. This is the most elementary aspect of experimental design there is.

HS: Why was this not the case in ecology? I mean, at this point, experiments ecology had already been around for so many decades.

SH: Well, most scientists, ecologists and otherwise, use statistics, but even most experimental biologists of any sort, have never had a course in experimental design. And so what they know, what they learn and don’t learn, are dependent completely on the textbooks they use or the reference books they use, and the best books are all written by statisticians. You know, probably the best book on experimental design, actually, up until recent decades, was one by DR Cox, a British statistician who’s now in his 90s and still writing. It is titled “Planning of Experiments.” It’s very basic sort of stuff but still a very good reference work. But the books that biologists have used on statistics very often are books that were written by non-statisticians. Sokal & Rohlf is a classic example. I think Rohlf has a degree in statistics, but Sokal did not. The same is true in other fields. In psychology, most of the statistics texts in psychology are written by people who have a PhD in psychology, took a few courses in grad school in statistics and think they can do a book. I’ve reviewed a number of these books, including most recently, the most recent edition of Sokal and Rohlf, just a couple of years ago. And I also reviewed a book by statistician George Casella a couple of years ago. And these are all books that actually promote pseudoreplication via some of their examples. So that’s one big part of the problem.

HS: In the introduction, you say, “The citing of particular studies is critical to the hoped-for effectiveness of this essay. To forego mention of specific negative examples would be to forego a powerful pedagogic technique. Past reviews have been too polite and even apologetic…” When you decided to use these experiments to illustrate the problems you were discussing, were you worried about how the authors of these examples would react?

SH: No, that that never was a concern to me.  I have a whole library of review papers that other people have written, criticizing statistical malpractice of one sort or another, in one subject field or another. And one of the things that bothered me was that, because they didn’t cite specific studies, you just couldn’t get a grasp for exactly what they were complaining about. Also, the advantage of naming the papers is tremendous from a pedagogical point of view, because, first of all, nobody who’s cited is not going to know about it, within a few days of the paper coming out. And then there are the co-authors, and then there are the editors and reviewers who said this paper should be published. Actually, in one of my recent papers on pseudofactorialism…did I ever send that to you?

HS: Yes.

SH: Okay. In that review I found 95 papers that had analyses constituting pseudofactorialism. Sixty of those papers were examined in detail. More than 500 persons were associated with those 60, when all the responsible authors, reviewers and editors involved are tallied. When I’ve reviewed other authors’ manuscripts assessing frequency of pseudoreplication, my suggestion in each case was that the authors actually list all the papers, make some neat table showing the papers that were problems, and calculating the frequencies of the types of errors in the different papers. Some people have accepted that suggestion and taken an extra six months to do a revision, and others haven’t.  The editors seem not to push them if they don’t want to do that.

In papers on other statistical topics I wrote later on that reviewed a  large number of cases and listed particular papers, I always sent out my draft manuscript to all the people I was critiquing. I did not send the manuscript of my 1984 pseudoreplication paper out to all the pseudoreplicators it listed, but I have done that sort of thing ever since. And it’s a very useful though time-consuming exercise.

HS: Was “pseudoreplication” always the obvious choice for what you were describing, or did you consider other terms?

SH: Yeah, that was the immediate one that came to me. I’m not sure exactly at what point. That term actually has been used in a couple of other contexts, but very rarely in the statistical literature. And in these other contexts, it’s a word that somebody throws out in a particular paper, but without giving it a concrete definition. You understand it in the context of that paper, but you don’t see it as a general term.  I think that’s another reason that paper had impact. I could have written exactly the same paper, and just talked in terms of papers using the wrong error term, and describing the error in conventional statistical language without giving it a name. But I think giving it a name helps people. It gives them a handle. And once they understand, you know, exactly what the definition is, they can communicate with other people without having to write paragraphs. They just say, “This is pseudreplication here.”

HS: The other phrase you use in the paper that has really caught on is “demonic intrusion”. Could you tell us a little more about how you came up with that?

SH: Yeah, most people who have cited that that are actually misreading the paper, but the idea was that non-demonic intrusion is just another name for chance events that can affect any experiment. And so non-demonic intrusion you have to anticipate, and that’s why you have replication, that’s why you have controls, that’s why you’re blocking, and so on. So, I figured, “Well, demonic intrusion just refers to situations where, in fact, some entity, human or otherwise, actually is trying to ruin your experiment and doing things secretly, and there are no ways you can know whether that’s happening in your experiment.” Demons are mostly shape-shifters and you can never tell who they are or where they are.

HS: The other striking aspect of this paper is the writing style, which is extremely easy to understand and even humorous at times. Was this a voice you tried to adopt consciously for this paper, or was this the way you wrote all the time?

SH: I think that’s the way I’ve written all the time, more or less. Part of it is just using the advice you can get from a good journalism professor, or a good professor of creative writing or something. Scientists tend to think they can just be completely dry, very straightforward, they don’t need to use the techniques that professional writers of different types, novelists, journalists and so on use, to get the reader’s attention and to keep things clear, to put the big picture sort of up front and then go into some details; that sort of thing. I was just fortunate in who I had as writing teachers over the years. It’s a matter of, sort of, anticipating the readers reaction — what do you need to do to help the reader understand without talking down to anybody when you’re doing that.

HS: I want to go over the names of the people you acknowledge to get an idea of who these people were and how they helped. Can we do that?

SH: Sure.

HS: C. Chang

SH: Cecily Chang was a graduate student who was working as my assistant, not on statistical stuff, when I was writing this paper.

HS: BD Collier

SH: That was Boyd Collier. He was a colleague at San Diego State with me. He was an insect ecologist and who taught our basic biostatistics courses.

HS: CF Cooper

SH: CF Cooper, fair bit older than me. He was a big-picture, ecosystem ecologist kind of guy. He was an ecosystem ecologist back before ecosystem ecology was fashionable.

HS: PG Fairweather

SH: PG Fairweather is an Australian ecologist, one of Tony Underwood’s students.

HS: Were all of these people at the meeting, or did you send them a draft of the manuscript?

SH: A few of the people listed there – Platt, Underwood – were at the meeting, as was Hairston, the Ecological Monographs editor, but most were not.

HS: DA Farris

SH: DA Farris. He was another colleague of mine who, who taught the basic statistics courses.

HS: D Wise

SH: David Wise. He’s a very good spider ecologist, wrote nice book on spider ecology. He talks a lot about pseudoreplication in that book.This was some years later, when he did that. He was one of the people strongly influenced by my paper early on, I think.

HS: PH Zedler

SH: Paul Zedler. He was another one of my colleagues in San Diego State again, who occasionally taught statistics courses.

HS: JF Box

SH: Yeah, that’s Joan Fisher Box, who was RA Fishers’s oldest daughter,who was married to… what’s his first name…a professor of statistics at University Wisconsin, pretty well known statistician… George Box.

HS: I wanted to ask you a little bit about Fisher, because you mention him in the paper and you are critical of some of his ideas. Did that come up when you were reading up for this topic, you know, the problems you discuss about focusing on randomization and ignoring interspersion, or focussing too much on alpha. Can you tell us a little about that?

SH: So, I’m going to be kind of vague on the sequence of things. After I’d prepared this manuscript and given a talk at the symposium, I asked my department if I could teach a course on experimental design, which had never been taught in our biology department. It was taught in our statistic department, but at a very theoretical level and would not have been useful to biologists, I think. And so I started teaching, about once every two years for a few decades, a course on experimental design. Most of my statistics I’ve learned since I started teaching that course. So, this was a sort of self training by way of  manuscript writing and then by having to prepare lectures on topics in addition to those covered in the manuscript. And the very first time I taught that, I think I used Fisher’s 1935 text, “The Design of Experiments”, as the textbook. It wasn’t a very good choice. And then I think the next time I used Cox’s “Planning of Experiments.” So I was still spending years getting into the statistical literature after I had published that paper. Most of the papers I’ve written since then are papers that had their origin in me developing lectures on different topics for that course.

HS: Was discussing Fisher’s ideas part of the motivation from the beginning to write this paper?

SH: I suspect… I don’t even think I have a copy of my very first draft of this, the one I presented at the symposium. I suspect the stuff on Fisher and the other books I’ve listed in there were added after the symposium. I remember Bill Platt said I should read the book by Gill (“Design and Analysis of Experiments in the Animal and Medical Sciences”). I don’t see that cited in the manuscript. But I did a lot more reading in statistics between the first and second draft of that manuscript than I did before the first draft.

HS: You cite a 1934 paper titled “Statistics in agricultural research” for a quote you use in your conclusions “Damn the duplicate plot; give me one plot and I know where I am (Wishart 1934:56)”…

SH: Right, that’s one of those conclusions where you’ve come to the end of the paper and you open a bottle of wine and you say, “All right, how can I wrap this up?” And I remembered these statements that had been floating around in my head, so I thought I would not really do a summary but just insert a little bit of humour at the end there with a pub metaphor.

HS: How was this paper received when it was published? Was there a lot of discussion around it?

SH: Yeah, there was.  I wasn’t really involved with much of that. Well, I got a lot of feedback from people I had cited. And nobody was particularly disturbed that I had caught this error in their paper.  I think one advantage was that, if you publish a critique and it’s a critique of one particular paper, you can expect the authors to really be unhappy with that focused attention. But if you’re listing, you know, maybe almost 100 authors, including all the co-authors, then people start looking at the list of who else is in the pot with them? They say, “Oh, there’s three different former presidents of the Ecological Society of America, there are five current members of the editorial board of Ecology. They say, “Hey, this is a pretty good group of people to be associated with.”

I never went to a lot of professional meetings, partly simply because I often didn’t have the money. But people would come back for the next couple of years, from the Ecological Society meetings, Limnology and Oceanography meetings,and they would come in and say, “Hey, you know, I was in this session about such and such topic or something, and somebody brought up your paper and there was a big discussion and some arguments and so on.” So it got a lot of attention very quickly. I think somebody told me that in the first ESA meeting after the paper came out, Robert Paine, a marine ecologist at the University of Washington, was giving one of the plenary talks or some big talk at the meeting. And he made much positive reference to the paper. For  the first few years after this came out, I kept count for a while,  I had over 2000 requests for reprints And this was at a time when you didn’t send PDFs. So I had to keep getting more money from my dean to  pay the cost of running off, you know, another couple hundred copies every once in a while.

HS: Was there any pushback against the ideas around the time it came out?

SH: No, not around the time it came out. But there’s been a little bit since then which you may have seen. Did I send you the stuff? My response to Oksanen? And then some psychologists at University of California Davis whom I had to rebut. Those are the two major things. There have been an awful lot of papers and even books where people have simply incorrectly used the term; impossible to try to correct all those and probably not worth it. No, I didn’t get any meaningful negative feedback on that, really.  I’ve written another half dozen or so papers on pseudoreplication, often with other people. But most authors, when they mention or discuss pseudoreplication, even now cite only the 1984 paper. Apparently they’ve never seen any of the later papers, which clarified some things that were not so clear in the first paper.

HS: Has your thinking on the ideas expressed in this paper changed substantially from the time you published the 1984 paper?

SH: No. I had somebody who was interviewing me a while ago asking me what effect I thought the paper had on the quality of statistical analysis in the ecological literature in particular. And I said, you know, all I can tell you is pseudoreplication seems to me to be as common now as it was back then. And that’s based not on my own recent surveys, but I’ve reviewed a number of papers in recent years. One was by somebody looking at all of the experimental studies on effects of logging on tropical forest biodiversity. And there was another review paper on all the experimental studies using microcosms and mesocosms, looking at the effects of acidification on marine organisms. And both those papers found  – I forget the exact numbers, but I think it was in the range of – 40 to 60% of the studies had pseudoreplication in them.  I suppose I could argue that if it hadn’t been for my paper, it would be 100%. But obviously, you know, reviewers and editors collectively still aren’t up enough on this pretty elementary sort of thing, to pick out the problem when it is present in manuscript.

HS: At the end of your paper, you make some clear suggestions for statisticians and for editors.I wanted you to reflect on how well these two groups of people have taken up your suggestions.

SH: Well, let’s see, I’ve got this here in front of me. Yeah, I would say the statisticians have not shaped up at all. And I think that applies to the professional statisticians who teach statistics as well as other scientists who also often are teaching stat courses – biologists, psychologists, so on. The stat books, including some of those most commonly used by biologists, often completely ignore design. And that’s a major flaw in Sokal &Rohlf, probably the single biggest flaw of many that are in that book. I’m not sure if I published this or just commented on it, but there is a  basic problem in the interaction between statisticians and people in other disciplines: statisticians often have a professional interest and professional incentives to develop fancy new methods for very special case type situations, and have no incentive to try to help people stop making simple errors in papers. They think that’s a little bit below them. And they’re not interested, they don’t enjoy reading critical reviews of statistical practice, like the sorts of things I’ve written. The editors often have a hard time because the statisticians are saying, “Well, yeah, these ecologists are making all these stupid mistakes, and that’s what Hurlbert’s talking about in this paper, but you know, they should clean up their own mess. You don’t need statisticians in order to avoid pseudoreplication.” Or, something of that sort. So the books remain not very good. And then I’ve always argued that, from the very beginning, from the very first courses that you have in any statistics curriculum, design and analysis should be taught together. Most stat courses teach you tremendous amounts of analysis before they talk in any explicit way about design. And in the case of experiments,the basic principles and terminology of experimental design apply universally. I mean, you could use the exact same terminology and language in medical research, industrial research, ecological research, psychological research, but authors of stat textbooks don’t do that. Each discipline instead abandons the classical terms of experimental design and invents new terminology, just for their own discipline, which doesn’t help things very much.

But in principle you could go: the first course in statistics should be “Design and Analysis of Experiments.” The second course might be “Design and Analysis of Experiments – Part Two”, because there are a very finite number of terms and concepts that apply to comparative or manipulative experiments. But once you get beyond manipulative experiments, there’s such a tremendous range of areas where statistics is used. There is no conventional or standard finite set of principles and study designs useful to epidemiologists are trying to work out what the cause of a locus of cancer cases might be in a particular region, to the mapping vegetation, to estimating the volume of an oil deposits, and to the carrying out of an opinion survey. There’s just this tremendous, almost infinite, range of study design types for observational studies. You can’t cover more than just a tiny fraction of those in one or two courses. On the other hand, a course just on the design and analysis of manipulative experiments can cover, fairly efficiently, all the basic aspects of statistical analysis and how to avoid commonest sorts of errors that researchers make in experimental studies.

And then, if you want to earn your black belt in statistics, you go on, you take the course of epidemiology, or one of the specialized branches – public opinion surveys and so on. Basically, most of the terminology was pretty well developed by the Brits,you know, by 1950. Of course, there are some fields – maybe astronomy, high energy physics, public health, and sociology- that do very little experimental work so they’re not so much interested in experimental design. But for most disciplines, I think it’s clear what would really work well. All the basic terminology and concepts of experimentation design, I taught, once, in a one hour lecture to a bunch of high school students. We didn’t get into any mathematics at all. But the concepts and utility of randomization, replication, blocking, and so on are easily understood without reference to mathematics. That’s the way to introduce them.

HS: If you were asked to pick one textbook to suggest to students to read, which one would it be?

SH: The Brits still do the best stuff in some ways. They are gradually being corrupted by the Americans. But some of the people who’ve inherited the mantle from people like Yates and Fisher and Cox. Roger Mead wrote the best book on experimental design (“The Design of Experiments”). It was published in 1988, I think. A basic text. There’s another book which I think is in its third edition called “Statistical Methods in Agriculture and Experimental Biology” by Roger Mead, Robert Curnow, and Anne Hasted. And that book does present the design right along with the introduction to statistics. That may be that best one I can cite off-hand. I think professional statisticians just want to get into the math, and they’ve got limited patience for dealing with, sticking with, some well defined set of terms, focusing on conceptual frameworks at any length. They figured they can do that in one page is the beginning of a block and that’s all the people they are trying to teach need.

HS: Have you ever considered writing a book about experimental design and statistics?

SH: I have a folder with that idea, a manila folder with pieces of paper in it that I started 20 years ago or so with that idea in mind. I have plenty of material on hand – extensive reading notes, my lecture notes, a large reprint collection, and my own published papers. But my feeling has been that there are other people out there who are probably in a better position to write the book.  I have minimal incentive to increase my workload. Officially, I’ve been retired for 10 years, and I’m still working 60 hours a week; involved in a lot of things. So long as there’s another whole category of statistical malpractice in need of a good review article– and there are several such categories — a review article on one of those might be a more useful contribution to enhancing the quality of the next generation of statistics books.

HS: And are there such topics you are working on currently

SH: Well, not working on them currently, but there are ones I have big folders on. One is on the use of log transformation in data analysis and what you do when you have zero values. And there’s a lot of literature on that and a lot of incorrect statements in textbooks on that topic. I’ve got a paper sort of outlined on that. At one point I was trying to convince my son to do a joint paper with me, on the paranoia over correcting for spatial auto correlation in some types of studies, so I could make him do most of the work. He’s an ecologist at the University of North Carolina. There are a number of other ripe topics. One is the misuse of repeated measures analysis of variance. Another is following up on our campaign to get editors to disallow fixing of alpha and use of the phrase “statistically significant” (“Coup de grace for a tough old bull: “statistically significant” expires”).

HS: One of the suggestions you make to editors is, “Be liberal in accepting good papers that refrain from using inferential statistics when these cannot validly be applied. And that, “it is often easier to get a paper published if one uses erroneous statistical analysis than if one uses no statistical analysis at all.” Today, the emphasis on statistics and using maybe using more and more complicated statistics is actually increasing. Do you see that changing, and your suggestion of not expecting statistics at all in some scenarios becoming more accepted among editors?

HS: Well, there are some studies, and I singled out two in my original pseudoreplication paper, that appropriately get by with minimal use of inferential statistics. One was a Canadian study on eutrophication using whole lakes as experimental units, where there was one control lake and one phosphorus-treated lake. Another study, by Gene Likens’s group, used two small watersheds in New Hampshire. They cut down all the trees on one, kept the other as a control, and with stream gauges and water sampling they monitored changes in discharge and water chemistry over following years. At some point prior to my submitting the manuscript, somebody said to me, “Oh, you know, how about these kinds of studies? These seem like good studies.” So the last section I added to that manuscript was a short one saying, “Yes, these were good studies, they did not constitute pseudoreplication, and they did not pretend to apply formal inferential statistical tests.” And, actually, I think that section proved important to the favourable reception of the paper by a lot of people who were or are working with large-scale systems. So from such studies, yeah, I mean, you can, learn a lot. But these are studies which have very intensive measurements going on, usually before manipulation and post manipulation, and you’re looking at a lot of variables, and you’re trying to work out the actual mechanisms whereby different effects are produced. Sometimes this involves mini-experimental and observational studies embedded in the larger experiment. If you work in large-scale systems where treatment replication is difficult or impossible, these are things you can do. And those papers did not try to play any fancy games with statistics. They showed, I think, error bars and that sort of thing, and didn’t pretend, as other papers have, that there was some magic recipe for dealing with the fact you had no replication.

It’s not a matter of expecting or “not expecting statistics at all in some scenarios becoming more accepted among editors.” The methods that can be validly and usefully applied will be dictated by the design of the study.  Whether the design of the study is appropriate or sufficiently good will be a subjective decision for editors and reviewers to make, taking real world limitations into account. It was one thing for them to accept a study involving two lakes, one treated and one control. It would be a very different matter for them to accept a study involving two aquaria, one treated and one control. The weak design of the latter would be regarded as inexcusable.

I think two things have happened. One is that there is some increase in the variety and complexity of statistical methods being used. Another is that with more and more papers being submitted, journals are trying to make space for everybody and demanding greater condensation of methods sections. This is a bad combination. Nowadays, it’s very common for all sorts of critical details of the methodology to be omitted, and sometimes even critical details of the results. And so the paper itself sometimes approaches being a glorified abstract. Now, if I’m starting to read a paper in some area of interest to me, before I want to spend a lot of time on the paper, I want to make sure that the methodology seems at least reasonable at a first glance.  And if you can’t tell that, or if you have to go to the supplementary information to get that information, it’s makes it harder to detect errors. That’s probably true for the reviewers in general. How often will they bother to look at the supplementary information for a paper they’re reviewing?

All of this is on top of the more general problem of many editors and reviewers lacking a high level of competence in statistics themselves. There is no quick fix for that problem. But when an editor asks me to review a manuscript, I’ll often say, “Look, I’ll review this manuscript, but I’d like later on to see all the other reviews that will be submitted on this manuscript and also have my review circulated to the other reviewers.” A lot of journals do that as a matter of course now. And that is the very best way, I think, to educate people who already think they’re pretty educated on matters. If you’ve ever been in the position of an editor getting three reviews on a manuscript, you know how completely different from each other they can be. Even if all recommend acceptance or all recommend rejection, they’ll often offer completely different suggestions or criticisms. Over time, any journal’s stable of reviewers can learn a lot from each other without a lot of extra work on anybody’s part.

HS: I just noticed that one name I missed in the Acknowledgments was Lincoln Brower who you dedicate this paper to.

SH: Lincoln Brower was the ecology professor at Amherst College, when I was an undergraduate. I did an Honors thesis on bird feeding behaviour. That was the first time I did experiments of any sort. Brower was famous for studies of monarch butterfly migration and butterfly mimicry; those areas. So he basically introduced me to experimental work when I was just an undergraduate. He had me read Fisher’s 1925 book “Statistical Methods for Research Workers” a book he’d used in his own work.  He himself had done a post-doc at Cambridge, so he knew Fisher.

HS: In addition to your own review of the literature, you also provide results from a review that students did. Were they your students?

SH: Those were all students who were in some of the experimental design courses I taught.  They were very happy to see their names in print.  In that course, I had a big independent project that each student had to carry out. They had to pick 25 experimental papers in some field of interest to them – they could be from a particular journal, or they could be on a particular topic –  they were supposed to determine for each paper the specific experimental design used, the specific analyses that were applied to the results, and whether those two things matched up. They were supposed to check each paper for pseudoreplication and a number of other possible errors. That was tremendously empowering to the students. At the beginning of the semester, they were saying, “You mean I’m supposed to look at all these papers, these articles published on glossy paper, by these well known scientists and you are expecting me to find problems in them? These are good scientists! I’m just a second year graduate student.” But actually, you know, once they learned the most elementary things about experimental design and a few other things,  they discovered they could find serious errors in papers by people who were supposed to be top people in the field. Then the students had to write tactful, clear letters to authors of at least three of the error-containing papers, and see if they could extract confessions of error. It led to some mostly good-natured humorous correspondence. Via such a project, students become much more confident of their own abilities and much more critical of the scientific literature in general. It’s a great teaching tool, I think.

HS: What kind of impact did this paper have on your career and research trajectory? Did “pseudoreplication” sort of get attached to your name after this?

SH: Yeah, in fact, although my area of specialization really is lake ecology, and I’ve published a number of papers, some influential, on lake ecology, mostly I’m known for this paper. And the year after the paper came out,the American Statistical Association gave the paper, the GW Snedecor Award for the best paper in biometry in 1984. so that was a real plus.  And then a few years after that, some scientists I didn’t even know nominated me to be a fellow of the American Association for Advancement of Science, so that was another big thing that happened. I’d already been promoted to full professor by the time that paper came out. So it didn’t really have any impact on my position in the university. But it definitely was the single most influential paper prompting awards I got later.

HS: What kind of impact did it have on your research trajectory?

SH: Basically, most of my research after my post-doc has been non=experimental, except for that done with my students. In early years, I always tried to get my students to do experimental studies partly just as a good way to learn about experiment design and analysis, and also because you get some very interesting results doing experiments on systems that people have never done experiments with. So I’ve had students do microcosm experiments on effects of fish predation, effects of pesticides, effects of invertebrate predators, effects of increasing salinity levels, and effects. But my own work, over the years, was mostly fieldwork in the central Andes – studying salt lakes and flamingos in Chile, Bolivia, Peru and Argentina. And then later, on doing basic observational studies on the Salton Sea, the largest lake in California, a giant salt lake about two hours east of San Diego. And our studies of the Salton Sea required pretty minimal use of statistics actually. And for my students that did microcosm experiments, I never let them specify alpha or use the phrase word “statistically significant” when they wrote their theses. That’s another fetish I developed. Yeah, so I was getting my highs, my emotional highs in experimentation, by simply helping my students through their experimental studies and then trotting off into the field by myself or with my other sets of colleagues.

HS: Over the last 30 years you’ve occasionally come back to focusing on what you see as problems in statistics and in design and writing about it – a variety of different things. Would you say that’s just an interest on the side, i.e. when you notice something you write about it?

SH: So, as I approached retirement, I had dozens of folders with my lecture notes and reading notes on different topics in statistics and experimental design. As I had developed those lectures, I realized there was a real opening here to write critical reviews on additional controversial aspects of methodology. And so, often with Celia Lombardi, an Argentine animal behaviorist I’d started collaborating with in the late 1980s, I’ve worked through almost all those folders; I just have a couple left. And it all really came out of my developing a course on a subject new to me.

HS: Have you read the 1984 paper after it was published?

SH: No, no. I’ve got a problem. I have too many stacks of paper. I had to move my university office home about a year ago, and that meant adding several eight foot-tall bookcases to various places in my house and trying to find a place to set up a new desk. Now, I’ve got a new desk; I don’t have any work tables with it yet. So I have enough trouble just filing papers, without going back and looking at all this old stuff. When I do it, I say, “How did the editors let me get away with that?” They were pretty liberal!

HS: When writing the more recent papers on replication, have you ever gone back to checking what you said in the original one?

SH: Yeah, sometimes I’ve had to do that. But not too much, because several of these recent papers on pseudoreplication were on one particular article or another that I and my colleague Celia Lombardi were critiquing, and there it was a matter of trying to write something short and focused and in civil enough language to get it published.

HS: Would you count this as a favourite among the papers you’ve written?

SH: Yeah, definitely. Yeah. Yeah. The other paper that I’m proud of is one describing an aquatic microcosm experiment. Actually, it started out as a class project for an aquatic ecology course I was teaching during my first semester at SDSU. Published conspicuously in Science in 1971, it showed how adding (or removing) a top predator could not only alter the populations of species at lower trophic levels but also affect the chemical and physical properties of ecosystems. It discussed the possibility of reducing the magnitudes of algal blooms via manipulation of fish assemblages. The paper initiated a boom in “trophic cascade” research at many institutes and universities in Europe, South America, and the US. But it has been cited only a few hundred times. I didn’t invent the term trophic cascade – that’s one reason why the paper’s not cited. That metaphor was first used by Bob Paine about 10 years later. Then the first big review article on trophic cascades in aquatic systems was published by Steve Carpenter and colleagues in 1985. Neither it nor their later book cited our mosquitofish paper, yet the conclusions of their 1985 review were identical to those we had reached 14 years earlier. Carpenter made clear in other articles his strong belief that microcosm and mesocosm studies were of “limited relevance” and not capable of providing insights into the functioning of natural lake ecosystems. It thus was a tad ironic that, after his 10 years of whole lake studies on trophic cascades, his 1985 review paper essentially paraphrased the insights gained earlier from our two-meter diameter microcosms on the roof of the SDSU Biology building. Well, sorry for this lengthy excursion, but your question said “among the papers you’ve written” and its low number of citations notwithstanding, I’d say the mosquitofish paper was the second most influential one I’ve written. Another interesting connection is that Carpenter and his gang did take my 1984 paper very seriously since their whole lake experiments involved no replication of treatments. The pseudoreplication paper suddenly increased the scrutiny of all such studies claiming to have identified treatment effects. So Carpenter and colleagues did publish some papers presenting fancy statistical approaches they claimed, erroneously to my mind, resolved the “no treatment replication” problem.

HS: What do you like about the 1984 paper?

SH: Well, I learned a tremendous amount in writing it. It was over two years I spent writing that paper and doing that research. And I enjoyed testing the limits of the editors to see how much humour or salacious metaphor or whatever I could put in the paper. And I had the sense that the paper was going to be influential simply because previous critiques that had tried to assess the frequency of statistical problems in the literature never told the authors of the papers they were criticizing that they were being criticized. They didn’t cite them! And so I thought that aspect alone would get people’s attention quickly.  It was a long, long process, although not quite as long as some more recent papers which have taken almost 20 years to finish.

HS: When you wrote this paper, did you did you anticipate at all how important it would become in the field? And do you have a sense of what it mostly gets cited for?

SH: Well, as happened  to you,  I’ve heard from lots of people that  – both students and professors  – that this paper is required reading for beginning grad students especially in various courses in the environmental sciences. So, a lot of people are being forced to read this whether they want to or not. And then the students seem to like the paper, I guess because it’s more understandable than their statistics books. And, you know, it’s irreverent. Students love iconoclasm. This is a little bit iconoclastic, both in terms of the people it’s listing as pseudoreplicators and what it is saying about some of the old time statisticians.I didn’t anticipate, I guess, how widely this paper would end up being used. I mean, even in the education literature, in the psychological literature, in the medical literature, this paper’s being cited more often. Some of those fields are just discovering the paper. I’ve seen a number of recent papers that are actually talking about pseudoreplication in clinical research. The biomedical statisticians have long known about this sort of problem. And they’ve had a whole bunch of different names that they’ve used for it –  for example, “unit of analysis error” –  but these other labels they use for it are never given a clear definition. And there have been so many of them, but they just never, you know, gave people something to hold on to and say, “okay, yes, there is this particular error for which we have a label – pseudoreplication – now.” So, I didn’t expect it, I probably didn’t even think about it actually having an impact on all the other disciplines.

HS: What would you say to a student who’s about to read this paper today, 32 years after it was published? Would you guide their reading in some way? Would you add any caveats they should keep in mind when reading it?

SH: Yeah, a few. For example, the definition of replication, I think it was in a 1993 paper on statistical errors and zooplankton research that I wrote with Mike White, one of my grad students, that we came up with a clearer definition of pseudoreplication, and did that by the use of the term “evaluation unit”, which a statistician by  the name of Scott Urquhart actually proposed. An evaluation unit is a particular entity within an experimental unit on which you make some individual measurement. Often, you have multiple evaluation units being measured within one experimental unit. So that term, evaluation unit, in conjunction with experimental unit is useful. And the other caveat I probably would add is that in the 1984 paper I used the term “replicate” fairly freely and I’ve recommended in other places now in print, that that can be very confusing. It’s no clearer than when someone somebody asks you:“What’s your N?” So, I’ve suggested that replicate should only be used as a modifier and not as a noun. So you can talk about replicated treatments, replicate samples, replicate blocks etc., but don’t use it on its own because in most experiments you have replication at multiple levels within the experiment. Those would be the two major warnings. I suppose I would give students who are going to read that paper, a copy of my 2009 paper on the ancient black art, because there I sort of make a point of giving specific definitions to all the basic concepts of design. Yeah, I think it is still interesting for people to read the 1984 paper. But I think the next thing to read after that would be the 2009 paper because I used that opportunity to respond to criticism, and, in a way, to actually write my mini-book on experimental design.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s