The MolEcol Toolbox: Species Distribution Modelling

Where on Earth are species?

Understanding the spatial distribution of species is a critical component for many different aspects of biological studies. Particularly for conservation, the biogeography of regions is a determinant factor for designating and managing biodiversity hotspots and management units. Or understanding the biogeographical mechanisms that have shaped modern biodiversity may allow us to understand how species will change under future climate change scenarios, and how their distributions will (and have) shift(ed).

Typically, the maximum distribution of species is based on their ecological tolerances: that is, the most extreme environments they can tolerate and proliferate within. Of course, there are a huge number of other factors on top of just natural environment which can shape species distributions, particularly related to human-induced environmental changes (or introducing new species as invasive pests, which we seem to be good at). But exactly where species are and why they occur there are intrinsically linked to the adaptive characteristics of species relative to their environment.

Species distribution modelling

The connection of a species distribution with innate environmental tolerances is the background for a type of analysis we call species distribution modelling (SDM) or environmental niche modelling (ENM). Species distribution modelling seeks to correlate the locations where a species occurs with the local environment around those sites to predict where the species should occur. This is an effective tool for trying to understand the distribution of species that might be tricky to study so thoroughly in the wild; either because they are hard to catch, live in very remote areas, or because they are highly threatened. There are a number of different algorithms and data types that will work with SDM, and there is always ongoing debate about ‘best practices’ in modelling techniques.

SDM method.jpg
The generalised pipeline of SDM, taken from Svenning et al. (2011). By correlating species occurrence data (bottom left) with environmental data (top left), we can develop a model that describes how the species is distributed based on environmental limitations (top right). From here, we can choose to validate the model with other methods (top and bottom centre) or see how the distribution might change with different environmental changes (e.g. bottom right).

A basic how-to on running SDM

The first major component that is needed for SDM is the occurrence data. Some methods will work with presence-only data: that is, a map of GPS coordinates which describes where that species has been found. Others work with presence-absence data, which may require including sites of known non-occurrence. This is an important aspect as the non-occurring sites defines the environment beyond the tolerance threshold of the species: however, it’s very likely that we haven’t sampled every location where they occur, and there will be some GPS co-ordinates that appear to be absent of our species where they actually occur. There are some different analytical techniques which can account for uneven sampling across the real distribution of the species, but they can get very technical.

Edited_koala_data.jpg
An example of species (occurrence only) locality data (with >72,000 records) for the koala (Phascolarctos cinereus) across Australia, taken from the Atlas of Living Australia. Carefully checking the locality data is important, as visual inspection clearly shows records where koalas are not native: they might have been recorded from an introduced individual, given incorrect GPS coordinates or incorrectly identified (red circles).

The second major component is our environmental data. Typically, we want to include environmental data for the types of variables that are likely to constrain the distribution of our species: often temperature and precipitation variables are included, as these two largely predict habitat types. However, it can also be important to include non-climatic variables such as topography (e.g. elevation, slope) in our model to help constrain our predictions to a more reasonable area. It is also important to test for correlation between our variables, as using many variables which are highly correlated may ‘overfit’ the model and underestimate the range of the distribution by placing an unrealistic number of restrictions on the model.

Enviro_maps.jpg
An example of some of the environmental data/maps we might choose to include in a species distribution model, obtained from the Atlas of Living AustraliaA) Mean annual temperature. B) Mean annual precipitation. C) Elevation. D) Weighted distance to nearest waterbody (e.g. rivers, lakes, streams).

Our SDM analysis of choice (e.g. MaxEnt) will then use various algorithms to build a model which best correlates where the species occurs with the environmental variables at those sites. The model tries to create a set of environmental conditions that best encapsulate the occurrence sites whilst excluding the non-occurrence sites from the prediction. From the final model, we can evaluate how strong the effect of each of our variables is on the distribution of the species, and also how well our overall model predicts the locality data.

Projecting our SDM into the past and the future

One reason to use SDM is the ability to project distributions onto alternative environments based on the correlative model. For example, if we have historic data (say, from the last glacial maximum, 21,000 years ago), we can use our predictions of how the species responds to climatic variables and compare that to the environment back then to see how the distribution would have shifted. Similarly, if we have predictions for future climates based on climate change models, we can try and predict how species distributions may shift in the future (an important part of conservation management, naturally).

 

Correct LGM projection example.png
An example of projecting a species distribution model back in time (in this case, to the Last Glacial Maximum 21,000 years ago), taken from Pelletier et al. (2016). On the left is the contemporary distribution of each species; on the right the historic projection. The study focused on three different species of American salamanders and how they had evolved and responded to historic climate change. This figure clearly shows how the distribution of the species have changed over time, particularly how the top two species have significantly reduced in distribution in modern times.

 

Species distribution modelling continues to be a useful tool for conservation and evolution studies, and improvements in analytical algorithms, available environmental data and increased sampling of species will similarly improve SDM. Particularly, improvements in environmental projections from both the distant past and future will improve our ability to understand and predict how species will change, and have changed, with climatic changes

Surviving the Real-World Apocalypse

The changing world

Climate change seems to be the centrefold of a large amount of scientific research and media attention, and rightly so: it has the capacity to affect every living organism on the planet. It’s our duty as curators and residents of Earth to be responsible for our influences on the global environmental stage. While a significant part of this involves determining causes and solutions to our contributions to climate change, we also need to know how extensive the effects will be: for example, how can we predict how well species will do in the future?

Predicting the effect of climate change on all of the world’s biodiversity is an immense task. Climate change itself is a complicated system, and causes diverse, interconnected and complex alterations to both global and local climate. Adding on top of this, though, is that climate affects different species in different ways; where some species might be sensitive to some climatic variables (such as rainfall, available sunlight, seasonality), others may be more tolerant to the same factors. But all living things share some requirements, so surely there must be some consistency in their responses to climate change, right?

Apocalypse 2
Lucky for Mr Fish here, he’s responding to a (very dramatic) climate change much, much better than his bird counterpart.

How predictable are species responses to climate change?

Well, evidence would surprisingly suggest not. Many species, even closely related ones, can show very different responses to the exact same climatic pressures or biogeographical events. There are a number of different traits that might affect a species’ ability to adapt, particularly their adaptive genetic diversity (which underpins ‘adaptive potential’). Thus, we need good information of a variety of genetic, physiological and life history traits to be able to make predictions about how likely a species is to adapt and respond to future (and current) climate changes.

Although this can be hard to study in species of high extinction risk (getting a good number of samples is always an issue…), traditional phylogeographic methods might help us to make some comparisons. See, although the modern Earth is rapidly changing (undoubtedly influenced by human society), the climate of the globe has always varied to some degree. There has always been some tumultuousness in the climate and specific Earth history events like volcano eruptions, sea-level changes, or glaciation periods (‘ice ages’) have had diverse effects on organisms globally.

Using comparative phylogeography to predict species responses

One tool for looking at how different species have, in the past, responded to the same biogeographical force is the domain of ‘comparative phylogeography’. Phylogeography itself is something we have discussed before: the ‘comparative’ aspect simply means comparing (with complex statistical methods) these patterns across different and often unrelated species to see how universal (‘congruent’) or unique (‘incongruent’) these patterns are among species. The more broadly we look at the species community in the region, the more we can observe widespread effects of any given environmental or geographical event: if we only look at fish, for example, we might not to be able to infer what response mammals, birds or invertebrates have had to our given event. Sometimes this still meets the scale we wish to focus; other times, we want to see how all the species of an area have been affected.

Actual island diagram
An (very busy) example of different species responses to a single environmental event. In this example, we have three species (a fish, a lizard, and a bird) all living on the same island. In the middle of the island, there is a small mountain range (A). At this point in time, all three species are connected across the whole island; fish can travel via lakes and wetlands (green arrows), lizards can travel across the land (blue arrow) and birds can fly anywhere. However, as the mountain range grows with tectonic movements, the waterways are altered and the north and south are disconnected (B). The fish species is now split into two evolutionarily separate groups (green and gold), while lizards and birds are not. As the range expands further, however, the dispersal route for lizards is cut off, causing them to eventually also become separated into blue and black groups (C). Birds, however, have no problems flying over the mountain range and remain one unified and connected orange group over time (D). Thus, each species has a different response to the formation of the mountain range.
Evol history of island diagram
The phylogenetic history of the three different species in the above example. As you can see, each lineage has a slightly different pattern; birds show no divergences at all, whereas the timing of the lizard and fish N/S splits are different (i.e. temporally incongruent).

Typically, comparative phylogeographic studies have looked at the neutral components of species’ evolution (as is the realm of traditional phylogeography). This includes studying the size of populations over time, how well connected they are and were, what their spatial patterns are and how these relate to the environment. Comparing all of these patterns across species can allow us to start painting a fuller picture of the history of biota in a region. In this way, we can start to see exactly which species have shown what responses and start to relate these to the characteristics that allowed them to respond in that certain way (and including adaptation in our studies). So, what kinds of traits are important?

What traits matter? Who wins?

Often, we find that life history traits of an organism better dictates how they will respond to a certain pressure than other factors such as phylogeny (e.g. one group does not always do better than another). Instead, individual species with certain physical characteristics might handle the pressure better than others. For example, a fish, bird and snake that are all able to tolerate higher temperatures than other fish, birds or snakes in that region are more likely to survive a drought. In this case, none of the groups (fish, birds or snakes) inherently do better than the other two groups. Thus, it can be hard to predict how a large swathe of species will respond to any given environmental change, unless we understand the physical characteristics of every species.

Climate change risk flowchart
A generalised framework of various factors, and their interactions, on the vulnerability of species under current and future climate changes by Williams et al. 2018. The schematic includes genetic, ecological, physical and environmental factors and how these can interact with one another to alleviate or exacerbate the risk of extinction.

We can also see that other physiological or ecological traits, such as climatic preferences and tolerance thresholds, can be critical for adapting to climatic pressures. Naturally, the genetic diversity of species is also an important component underlying their ability to adapt to these new selective pressures and to survive into the future. Trying to incorporate all of these factors into a projected model can be difficult, but with more data of higher quality we can start to make more refined predictions. But by understanding how particular traits influence how well a species may adapt to a changing climate, as well as knowing the what traits different species have, might just be the key to predicting who wins and who dies in the real-world Game of Thrones.

Evolution and the space-time continuum

Evolution travelling in time

As I’ve mentioned a few times before, evolution is a constant force that changes and flows over time. While sometimes it’s more convenient to think of evolution as a series of rather discrete events (a species pops up here, a population separates here, etc.), it’s really a more continual process. The context and strength of evolutionary forces, such as natural selection, changes as species and the environment they inhabit also changes. This is important to remember in evolutionary studies because although we might think of more recent and immediate causes of the evolutionary changes we see, they might actually reflect much more historic patterns. For example, extremely low contemporary levels of genetic diversity in cheetah is likely largely due to a severe reduction in their numbers during the last ice age, ~12 thousand years ago (that’s not to say that modern human issues haven’t also been seriously detrimental to them). Similarly, we can see how the low genetic diversity of a small population colonise a new area can have long term effects on their genetic variation: this is called ‘founder effect’. Because of this, we often have to consider the temporal aspect of a species’ evolution.

Founder effect diagram
An example of founder effect. Each circle represents a single organism; the different colours are an indicator of how much genetic diversity that individual possesses (more colours = more variation). We start with a single population; one (A) or two (B) individuals go on a vacation and decide to stay on a new island. Even after the population has become established and grows over time, it takes a long time for new diversity to arise. This is because of the small original population size and genetic diversity; this is called founder effect. The more genetic diversity in the settled population (e.g. vs A), the faster new diversity arises and the weaker the founder effect.

Evolution travelling across space

If the environmental context of species and populations are also important for determining the evolutionary pathways of organisms, then we must also consider the spatial context. Because of this, we also need to look at where evolution is happening in the world; what kinds of geographic, climatic, hydrological or geological patterns are shaping and influencing the evolution of species? These patterns can influence both neutral or adaptive processes by shaping exactly how populations or species exist in nature; how connected they are, how many populations they can sustain, how large those populations can sustainably become, and what kinds of selective pressures those populations are under.

Allopatry diagram
An example of how the environment (in this case, geology) can have both neutral and adaptive effects. Let’s say we start with one big population of cats (N = 9; A), which is distributed over a single large area (the green box). However, a sudden geological event causes a mountain range to uplift, splitting the population in two (B). Because of the reduced population size and the (likely) randomness of which individuals are on each side, we expect some impact of genetic drift. Thus, this is the neutral influence. Over time, these two separated regions might change climatically (C), with one becoming much more arid and dry (right) and the other more wet and shady (left). Because of the difference of the selective environment, the two populations might adapt differently. This is the adaptive influence. 

Evolution along the space-time continuum

Given that the environment also changes over time (and can be very rapid, and we’ve seen recently), the interaction of the spatial and temporal aspects of evolution are critical in understanding the true evolutionary history of species. As we know, the selective environment is what determines what is, and isn’t, adaptive (or maladaptive), so we can easily imagine how a change in the environment could push changes in species. Even from a neutral perspective, geography is important to consider since it can directly determine which populations are or aren’t connected, how many populations there are in total or how big populations can sustainably get. It’s always important to consider how evolution travels along the space-time continuum.

Genetics TARDIS
“Postgraduate Student Who” doesn’t quite have the same ring to it, unfortunately.

Phylogeography

The field of evolutionary science most concerned with these two factors and how the influence evolution is known as ‘phylogeography’, which I’ve briefly mentioned in previous posts. In essence, phylogeographers are interested in how the general environment (e.g. geology, hydrology, climate, etc) have influenced the distribution of genealogical lineages. That’s a bit of a mouthful and seems a bit complicated, by the genealogical part is important; phylogeography has a keen basis in evolutionary genetics theory and analysis, and explicitly uses genetic data to test patterns of historic evolution. Simply testing the association between broad species or populations, without the genetic background, and their environment, falls under the umbrella field of ‘biogeography’. Semantics, but important.

Birds phylogeo
Some example phylogeographic models created by Zamudio et al. (2016). For each model, there’s a demonstrated relationship between genealogical lineages (left) and the geographic patterns (right), with the colours of the birds indicating some trait (let’s pretend they’re actually super colourful, as birds are). As you can see, depending on which model you look at, you will see a different evolutionary pattern; for example, model shows specific lineages that are geographically isolated from one another each evolved their own colour. This contrasts with in that each colour appears to have evolved once in each region based on the genetic history.

For phylogeography, the genetic history of populations or species gives the more accurate overview of their history; it allows us to test when populations or species became separated, which were most closely related, and whether patterns are similar or different across other taxonomic groups. Predominantly, phylogeography is based on neutral genetic variation, as using adaptive variation can confound the patterns we are testing. Additionally, since neutral variation changes over time in a generally predictable, mathematical format (see this post to see what I mean), we can make testable models of various phylogeographic patterns and see how well our genetic data makes sense under each model. For example, we could make a couple different models of how many historic populations there were and see which one makes the most sense for our data (with a statistical basis, of course). This wouldn’t work with genes under selection since they (by their nature) wouldn’t fit a standard ‘neutral’ model.

Coalescent
If it looks mathematically complicated, it’s because it is. This is an example of the coalescent from Brito & Edwards, 2008: a method that maps genes back in time (the different lines) to see where the different variants meet at a common ancestor. These genes are nested within the history of the species as a whole (the ‘tubes’), with many different variables accounted for in the model.

That said, there are plenty of interesting scientific questions within phylogeography that look at exploring the adaptive variation of historic populations or species and how this has influenced their evolution. Although this can’t inherently be built into the same models as the neutral patterns, looking at candidate genes that we think are important for evolution and seeing how their distributions and patterns relate to the overall phylogeographic history of the species is one way of investigating historic adaptive evolution. For example, we might track changes in adaptive genes by seeing which populations have which variants of the gene and referring to our phylogeographic history to see how and when these variants arose. This can help us understand how phylogeographic patterns have influenced the adaptive evolution of different populations or species, or inversely, how adaptive traits might have influenced the geographic distribution of species or populations.

Where did you come from and where will you go?

Phylogeographic studies can tell us a lot about the history of a species, and particularly how that relates to the history of the Earth. All organisms share an intimate relationship with their environment, both over time and space, and keeping this in mind is key for understanding the true evolutionary history of life on Earth.