In a paper published in Ecology in 1993, Pierre Legendre highlighted and proposed solutions to the problem of “spatial autocorrelation” in the statistical analysis of ecological data. The methods proposed by Legendre have had a lasting impact on the analysis of field ecological research. Twenty-five years after the paper was published, I asked Pierre Legendre about his motivation to write this paper, its impact on statistical analysis of ecological data, and what we have learnt since about dealing with the problem of spatial autocorrelation.
Citation: Legendre, P. (1993). Spatial autocorrelation: trouble or new paradigm? Ecology, 74(6), 1659-1673.
Date of interview: Questions sent by email on 16th January, 2018; responses received on 8th July 2018.
(1) Hari Sridhar: I would like to start by asking you about your motivation to write this paper. From looking at your publication profile I notice that you became interested in the topic of autocorrelation a few years before this paper (your first paper with the word autocorrelation in the title was in 1988). How did you get interested in this topic?
Pierre Legendre: In the year 1970, important hydroelectric developments took place in northern Québec, with the impoundment of very large reservoirs by the state hydroelectricity company, Hydro-Québec, and its daughter society, Société d’Énergie de la Baie James, in a 410 000 km2 territory called Radissonie located between 49° and 56° N. This society asked university-based ecologists to collect data and analyse the ecology of this poorly studied area. I agreed to do several studies, which provided opportunities to train graduate students in the analysis and interpretation of ecological data, using data collected by the society’s bureau of ecologists. In one of the studies, we analysed 173 lakes across the territory using physical and geomorphological data and related them to the presence or absence of 13 fish species (Legendre et al. 1980). Trying to understand the differences among lakes, I examined maps of the geomorphology and glacial history of the area and realized that the spatial variation that we were observing in the fish communities and water quality variables were correlated with the larger-scale geomorphological and glacial history variables. My interest for the spatial variation of communities and the spatially correlated processes that caused them started there.
(2) HS: This paper (“Spatial autocorrelation: trouble or new paradigm?”) is part of a special issue of the journal Ecology in 1993. Could you tell us how your contribution to his special issue came about? Did you write this paper specifically in response to an invitation to contribute to this issue?
PL: The countdown of the publication of this paper began when colleague Catherine Potvin, a forest ecologist and biostatistician at McGill University, came to visit me at Université de Montréal on 14 November 1990. She had accepted the invitation of the Ecological Society of America (ESA) to act as special editor, with Joseph Travis, of a Special Feature entitled Statistical methods: an upgrade for ecologists, which ended up containing five papers. Dr. Potvin offered me to write a contribution on spatial autocorrelation for the Special Feature.
I had been interested in the spatial structure of communities for a long time in a biogeographic perspective. I was already interested in spatial statistical analysis and had contributed to the development of statistical methods to address this kind of questions.
- I had developed a form of spatially-constrained clustering, as well as indices of community dispersal direction, in a biogeographic study published in 1984 (Legendre and Legendre 1984).
- During a NATO Advanced Study Institute (NATO ISI) that Louis Legendre and I had organised at the Roscoff Marine Station (France) in 1986, some methods of spatial analysis had been discussed: spatial point pattern analysis, spatial correlograms, Mantel tests, and constrained clustering of spatially correlated data. The book of proceedings that ensued reflected this new trend (Legendre and Legendre 1987).
- With Marie-Josée Fortin, now Canada Research Chair in Spatial Ecology at the University of Toronto, we had written a first paper describing early methods of spatial analysis and their application to univariate and multivariate data (Legendre & Fortin 1989). This paper has been a favourite among ecologists, and highly cited.
- I had obtained a two-year Killam Research Fellowship of the Canada Council in 1989-1991 to develop methods of spatial analysis of community composition data.
- We (in the lab) had received the Canoco program (ter Braak 1988) for multivariate ordination of community data in 1989. Daniel Borcard and I used it to learn canonical ordination methods (RDA and CCA), from which we developed variation partitioning (Borcard et al. 1992). In that paper, we were using a polynomial function of the geographic coordinates of the study sites as an approximation of the geographic structure in spatial modelling. We knew of the limitations of this polynomial, but we had nothing better to offer at the time. That part of the method was replaced by PCNM (now called MEM) eigenfunction analysis by Borcard & Legendre (2002).
All in all, I was ready at that point to write a synthesis on the subject. I felt a need to write that synthesis to mark a pause in the development of spatial analysis methods for community composition data and produce a first synthesis of what had been accomplished up to that point. After discussions with Catherine Potvin, we decided to split the Special Feature topic in two papers, one to be written by Pierre Dutilleul who was then postdoctoral researcher in my lab, the other by me. Pierre Dutilleul is now Professor in the Department of Plant Science at McGill University and Associate Member of the Department of Mathematics and Statistics of that university.
In the following weeks, Pierre Dutilleul and I had daily interactions about what we were writing, and discussed how to minimize the overlap between the two papers. In about 5 weeks, I had produced a first draft. I sent the paper to several colleagues, asking them to revise specific sections, and gave copies to graduate students. Eight colleagues and graduate students sent me remarks and comments. Then, on 16 January 1991, I sent the completed paper to Catherine Potvin for scientific and editorial comments.
Edition of a special feature takes time. Submission of the series of 5 papers to the ESA took place in December 1992. My paper was accepted on 15 January 1993 and the Special Feature was published in the September 1993 issue of Ecology.
(3) HS: Stepping back a bit, how did you get interested in statistics in relation to ecology? I notice that this has been your main research interest right from the start of your career.
PL: In junior college, my main academic interest was mathematics. I had also developed a strong interest for “natural sciences” before the term “ecology” became a buzzword in college curriculum. When the time came to choose a branch of knowledge for university formation, I chose biology, with the sad impression that I was leaving mathematics for good in my career. I had not heard then that it was possible to combine biology and mathematics. It is during my M.Sc. studies that I discovered that mathematics, and especially statistics, could be used to address biological questions. I was then working on a hybridization problem in freshwater fish, and, through reading, I found that some people had applied discriminant analysis, a multivariate form of statistical analysis, to this kind of question. I decided to follow that route. I learned the basics of multivariate statistics and discriminant analysis by myself and found help in the Faculty of Engineering to run my analyses on a computer with the help of a programmer, as things were done at that time. Using a computer at that time was done through punched cards. I had to prepare my data and computer runs on special sheets of paper, which were then transcribed by a keypuncher and fed into the computer through a card reader by a certified programmer. Biologists could not get anywhere close to the computer at the time.
(4) HS: In the first sentence of your paper you define “spatial autocorrelation”: “Spatial autocorrelation may be loosely defined as the property of random variables taking values, at pairs of locations a certain distance apart, that are more similar(positive autocorrelation) or less similar (negative autocorrelation) than expected for randomly associated pairs of observations.”
(4.1) Today, 25 years later, would you use the same definition? (4.2) Also, was the phrase “spatial autocorrelation” coined by you?
PL: (4.1) That definition is still valid. It applies to what we now call “spatial correlation” (see below) or “spatial autocorrelation sensu lato”.
(4.2) The expression “spatial autocorrelation” has been used by statistical geographers for decades. The phenomenon was first described by Francis Galton in 1889 (comment on Tylor 1889). An intuitive description was later given by Tobler (1970) in his first law of geography: “everything is related to everything else, but near things are more related than distant things”.
The first formal definition may be due to Cliff and Ord (1973) who defined it as follows: “If the presence of some quantity in a county (sampling unit) makes its presence in neighbouring counties (sampling units) more or less likely, we say that the phenomenon exhibits spatial autocorrelation”. Sokal & Oden (1978) used that expression and described it in their first paper studying spatial autocorrelation in biology. They wrote: “In investigating these data [described in previous sentences] it is important to discover whether the observed value of the variable at one locality is dependent on the values at neighbouring localities. If such dependence exists, the variable is said to exhibit spatial autocorrelation.” Personally, I learned a lot about spatial autocorrelation when I visited Professor Sokal’s lab at the State University of New York at Stony Brook during a sabbatical leave-of-absence in the fall of 1985. During that semester, he invited me to join in a discussion on spatial analysis that he was holding every week with Neal Oden.
Actually, that expression has generated much confusion in the literature. Strictly speaking and for statisticians, spatial autocorrelation refers to the correlation that remains in the residuals of a response variable of interest after the effect of all possible extrinsic explanatory variables has been taken into account. It thus refers to a type of similarity among the study sites that does not result from the action of explanatory variables, e.g. environmental, but from some process generated by the variable under study itself. A more general term is “spatial correlation”, which refers to the similarity among values of a variable that can be due to any type of process, intrinsic and extrinsic to the variable under study. But then, computer programs and functions entitled “spatial autocorrelation analysis”, which compute spatial correlation and display it as a function of the distance among sites, were made available, and users of these functions, including ecologists, were led to believe that the “spatial autocorrelation” values produced by these programs represented the process of “spatial autocorrelation” (sensu stricto) of the statisticians. That is not necessarily the case. In the latest edition of the Numerical Ecology textbook (Legendre & Legendre 2012), we clearly distinguished the general process of “spatial correlation” that can be observed in ecological data, from the specific processes that produce “spatial autocorrelation” sensu stricto, which differs from “induced spatial dependence”, which is the spatial correlation due to extrinsic forcing variables.
(5) HS: Today, accounting for spatial autocorrelation in ecological data is standard practice. Could you give us a sense of analysis of ecological data at that time, with regard to this idea? What I’m asking is how novel was what you were proposing in this paper? Were there precursors of this idea in the writings of others before you?
PL: I believe it is in this paper that I formulated for the first time the hypothesis that spatial correlation observed in ecological data can be produced by two entirely different processes: either by physical forcing of environmental variables, through a process called induced spatial dependence in later works (e.g. Legendre & Legendre, 2012), or from community-generated mechanisms such as neutral processes (e.g. ecological drift and limited dispersal), or interactions among species, which generate true spatial autocorrelation (sensu stricto) among the observations. Actual ecological data may be the result of the two types of processes acting jointly, and possibly creating structures at different spatial scales. A third process generating spatial correlation in ecological data is historical dynamics; see Borcard & Legendre (1994, Table 3); table reproduced in Legendre & Legendre (2012, Table 14.1); results of historical dynamics, in a non-ecological context, are shown in Legendre & Legendre (2012, Plate 14.1, p. 906).
I was among the first ecologists (with Simon Levin) to develop the paradigm of the importance of spatial structures for ecological theory. This key message of the paper has revolutionized the way ecologists approach ecosystems nowadays. Prior to that paper, spatial relationships among observations in ecosystems were mostly considered by ecologists and statisticians to be statistical nuisances. In this paper, I explained that these structures are a fundamental property of ecosystems, one that creates the conditions for their functioning as ecological and thermodynamic machines. This idea is now widely accepted among researchers. This is probably why this paper has been so highly cited by ecologists (3361 citations on 03 July 2018). In 2015, as part of the Centennial celebration of the Ecological Society of America, the Society put together a list of 115 papers published in its five scientific journals, since the first issues, that the Society’s officers considered to have made notable contributions to the development of ideas in ecology; this paper was one of them.
The novel aspects of the paper were thus the following: (1) to describe two different kinds of processes that can generate spatial correlation in data, and (2) to state that spatial correlation is not merely a nuisance for the statistical analysis of ecological data, but a property that is fundamental for the working of ecosystems and that should be studied for its own sake.
(6) HS: How long did it take you to write this paper? When and where did you do most of the writing?
PL: About 5 weeks. See my answer to question 2.
(7) HS: At the time when you were developing these ideas and writing this paper, did you have a peer-group with whom you were discussing ideas, in your university or elsewhere?
PL: See my answer to question 2. Pierre Dutilleul was the immediate collaborator with whom I discussed the ideas as I was writing the paper. In addition, I sent the paper to a group of colleagues whom I asked to revise specific sections, and gave copies to the graduate students in my lab. Eight of these people sent me comments, which I used to produce an updated version of the paper.
(8) HS: What kind of attention did this paper receive when it was published?
PL: See my answer to question 5.
(9) HS: What kind of impact did this paper have on your career? In what ways, if any, did it influence the future course of your research?
PL: Being the author of highly cited papers helped me, of course, in obtaining research grants and, later, scientific honours and distinctions.
This paper has been a foundation stone of my research. Working out the logic of studying spatial structures of ecological data led to the formulation of the two hypotheses for the generation of spatial structures in community composition data, and later, to the development of spatial eigenfunction modelling.
This paper, together with other highly cited papers, offered exposure of my ideas to the scientific community. It resulted in invitations to give keynote lectures at conferences and facilitated invitations to carry out research with research groups abroad. One such opportunity was a 1.5-month workshop to which I was invited by the National Institute of Water and Atmospheric Research (NIWA) of New Zealand and held in their lab in Hamilton. During that workshop, we, a group of local and foreigner researchers, designed field experiments and carried them out on benthos in the sediment of a marine strand. This resulted in the publication of a special issue of the Journal of Experimental Marine Biology and Ecology (Volume 216), edited by Simon Thrush and published in 1997, which described the results of the work carried out during that workshop (13 papers).
(10) HS: This paper has been cited over 3000 times. Do you have a sense of what it mostly gets cited for?
PL: Ecologists cite this paper as support of their interest in studying spatial ecological structures. Before that paper, spatial structures were seen merely as a nuisance for statistical analysis and not as a proper field for interesting ecological studies. I believe this paper has turned the table around by making ecologists realize that spatial structures are a fundamental property of ecosystems, one that creates the conditions for their functioning as ecological and thermodynamic machines, so that they are worth studying for their own sake. This is what I explained in my answer to question 5 above.
(11) HS: Today, 25 years after it was published, if you were to write a paper on this topic, what would be its main takeaway? How far do you think we have come in the last 24 years?
PL: Twenty years after the 1993 paper, we wrote another article (Legendre & De Cáceres 2013) that I consider just as important as the 1993 synthesis, and which, in my mind, follows from the principles and ideas presented in 1993. It was also a homage to Robert Whittaker, who gave ecology the concepts of alpha, beta and gamma diversity. In the 2013 paper, we showed, on the one hand, that beta diversity can be computed as the variance of the community composition data matrix, transformed in one of several appropriate ways (either transformed raw data or dissimilarity), and that the variance can be partitioned into site contributions (LCBD (Local Contribution to Beta Diversity) indices) and species contributions (SCBD (Species Contribution to Beta Diversity) indices) to spatial or temporal beta diversity. The paper also showed another important point: that measuring beta as the variance of the community composition matrix links the ecological concept of beta diversity to most of the methods of multivariate analysis that ecologists are computing daily on their data, like: simple and canonical ordinations, which decompose the total sum of squares, SSTotal, of the community matrix into ordination axes; multivariate analysis of variance by RDA (Redundancy Analysis), which decomposes SSTotal among the classes of a factor or among two or several factors and their interactions; variation partitioning, which decomposes SSTotal with respect to two or more matrices of explanatory variables; and spatial eigenfunction modelling, which decomposes SSTotal among different spatial or temporal scales. The latter method helps elucidate the processes behind multi-scale ecological patterns, which was the problem raised by Simon Levin in his 1992 paper.
This view, which unifies theoretical and numerical ecology, could only emerge after these statistical developments had come about and had been tested by ecologists in a large number of studies involving field data. In turn, these statistical developments were made possible because they were grounded in theoretical foundations elaborated in the 1993 paper, which claimed (a) that spatial structures are a fundamental property of ecosystems, one that creates the conditions for their functioning as ecological and thermodynamic machines, and (b) that spatial variation can be generated by different processes in ecosystems, in particular induced spatial dependence and community-generated processes.
(12) HS: If you were to remake Table 3 of the paper today what would be the major changes?
PL: Most of the software in Table 3 of the paper is obsolete in 2018. It has been replaced by functions in R and Matlab. Only Canoco is still around; this is the mother of all modern ordination packages, the one to which developers of new software compare the results of their code to make sure it works correctly.
(13) HS: What would you add to the “FURTHER READING AND APPLICATIONS” section if you wrote the paper today?
PL: Students of ecology should know that the theories and ideas in their field have not been there of all eternity. To understand how they appeared, ecologists should read the foundation papers where these ideas were presented and debated.
They could start by reading the important papers for which you have published interviews on your Web page, https://reflectionsonpaperspast.wordpress.com/. Then they could expand their reading by choosing, for example, from the list of the 115 notable papers published by the ESA in its first century of its existence (1915-2015). The Web address of that list is provided in my answer to question 5.
The same goes for ideas, concepts and methods in the field of numerical ecology. The earliest editions of the Numerical Ecology book (Legendre & Legendre; 2 editions in French in 1979 and 1984, and the first English edition in 1983) were describing the statistical bases of multivariate numerical ecology: dissimilarity measures, linear model for single variables or response data matrices, methods of ordination, and a small incursion into canonical ordination. Canoco added the asymmetric canonical ordination methods (RDA and CCA) to the ecologists’ statistical toolbox. These were the foundations on which Daniel Borcard and I developed the methods of variation partitioning and spatial eigenfunction modelling, which make it possible to detect spatial structures at different spatial scales and associate these structures to different ecological processes.
This evolution of ideas and methods did not proceed from a great plan elaborated in 1993. We built it a small brick at a time as we were trying to find ways of applying the concepts depicted in the 1993 paper to real ecological data. This is a good illustration of the interaction between theoretical development and confrontation with field data in the natural sciences. Other colleagues joined in and developed special methods of analysis based on spatial eigenfunctions, in particular Helene Wagner (2004), Stéphane Dray (2006), Pedro Peres-Neto (2010) and Guillaume Guénard (2010, 2018).
About the material presented in the Ecology (1993) paper that we are discussing in this interview, developments in the theory of spatial correlation and spatial eigenfunction analysis are presented in Chapters 1 and 14 of the latest edition of the Numerical Ecology (2012) textbook. These chapters form a highly expanded version of some aspects of the 1993 paper and could be considered “further reading” after the 1993 paper.
The adventure of the development of ideas and software in numerical ecology was recently described in an encyclopaedia paper narrating the development of Numerical ecology (Legendre 2018).
(14) HS: In the paper you say “Two approaches are proposed; in the raw-data approach, the spatial structure takes the form of a polynomial of the x and y geographic coordinates of the sampling stations; in the matrix approach, the spatial structure is introduced in the form of a geographic distance matrix among locations. These two approaches are compared in the concluding section.
Could you reflect on the adoption of these two methods by the research community?
PL: I had been worried for a long time with the distance approach to spatial analysis. I had suggested to postdoctoral researchers in our lab to investigate this question using numerical simulations, but they were always interested in more pressing questions. In 2002, we saw that partitioning the variation on distance matrices was spreading in the literature in studies of beta diversity. So, Daniel Borcard and I decided to get to work and study how variation partitioning should be done for the assessment of beta diversity.
Looking back into the file, I found a letter that we wrote in April 2002 to Don Strong, the Editor-in-Chief of Ecology/Ecological Monographs, explaining the ecological and statistical problem that we wanted to clarify and asking if he would consider such a paper for publication in Ecology or Ecological Monographs. Don encouraged us to proceed, of course without any promise as to the issue of our submission.
Pedro Peres-Neto joined us and we produced a first paper (Legendre et al. 2005) comparing variation partitioning using “raw data” by canonical analysis and spatial eigenfunction modelling, to the “distance approach” using partial Mantel tests. The performance of partial Mantel tests was appalling and led us to write strong recommendations firmly discouraging ecologists and geneticists from using the Mantel approach for spatial analysis and variation partitioning, and recommending the use of canonical analysis in all cases.
This paper led to a barrage of objections in the literature, because in many labs, Mantel tests were considered the standard way of detecting spatial structures. It took a long time, and two more strong papers (Legendre & Fortin 2010, Legendre et al. 2015), plus seminars that I gave in many universities and research centres, to convince at least some ecologists to stop using Mantel tests for spatial analysis and beta diversity assessment. The objective was worth the effort that we invested in this cause; this is how science can make progress.
(15) HS: Towards the end of the paper you say that comparative studies of the raw-data approach and the matrix approach are needed in cases where both methods are equally suitable. To what extent have such comparative studies happened?
PL: See my answer to question 14: we produced these studies in our Mantel papers (Legendre et al. 2005, 2010 and 2015).
(16) HS: Would you count this paper as a favourite, among all the papers you have written?
PL: More than a favourite. I first wrote this paper for myself. Writing this paper turned out to be a self-inflicted exercise in the Socratic method of maieutics (midwifery of the mind). Working this paper provided a magic opportunity to put my ideas in order and fill the (many) gaps in the reasoning, until a fairly logical line of thoughts emerged, which lead to the main conclusion of the paper: offering a new paradigm to ecologists that spatial structures provided a key to understand how ecosystems are structured and function.
(17) HS: What would you say to a student who is about to read this paper today? What should he or she take away from this paper written 25 years ago? Would you point him or her to other papers they should read along with this? Would you add any caveats?
PL: I described the main messages of the 1993 paper in my answer to question 5. Then, students of science often wonder: how did the ideas in this paper emerge? Authors seldom narrate how their ideas arose. Your finely dissected questions offered me the opportunity to narrate how the ideas in the 1993 paper emerged and how that paper led to further ideas and a comprehensive body of methods to understand ecological communities and ecosystems. Notice also the part that chance and unplanned events played in the maturation of concepts and ideas recounted here.
One final message to students is: keep your mind open and seize opportunities to reorient your research line when such opportunities present themselves. In particular, when writing a paper, proving that your starting hypothesis was correct is no great achievement, whereas finding that it was wrong may suggest a change of paradigm that may benefit your research and that of others.
 Levin (1992) had written that the problem of pattern and scale (spatial, temporal and organizational) is the central problem in ecology and that the key to prediction and understanding lies in the elucidation of mechanisms underlying the observed patterns.