Bringing alleles back together: applications of coalescent theory

Coalescent theory

A recurring analytical method, both within The G-CAT and the broader ecological genetic literature, is based on coalescent theory. This is based on the mathematical notion that mutations within genes (leading to new alleles) can be traced backwards in time, to the point where the mutation initially occurred. Given that this is a retrospective, instead of describing these mutation moments as ‘divergence’ events (as would be typical for phylogenetics), these appear as moments where mutations come back together i.e. coalesce.

There are a number of applications of coalescent theory, and it is particularly fitting process for understanding the demographic (neutral) history of populations and species.

Mathematics of the coalescent

Before we can explore the multitude of applications of the coalescent, we need to understand the fundamental underlying model. The initial coalescent model was described in the 1980s, built upon by a number of different ecologists, geneticists and mathematicians. However, John Kingman is often attributed with the formation of the original coalescent model, and the Kingman’s coalescent is considered the most basic, primal form of the coalescent model.

From a mathematical perspective, the coalescent model is actually (relatively) simple. If we sampled a single gene from two different individuals (for simplicity’s sake, we’ll say they are haploid and only have one copy per gene), we can statistically measure the probability of these alleles merging back in time (coalescing) at any given generation. This is the same probability that the two samples share an ancestor (think of a much, much shorter version of sharing an evolutionary ancestor with a chimpanzee).

Normally, if we were trying to pick the parents of our two samples, the number of potential parents would be the size of the ancestral population (since any individual in the previous generation has equal probability of being their parent). But from a genetic perspective, this is based on the genetic (effective) population size (Ne), multiplied by 2 as each individual carries two copies per gene (one paternal and one maternal). Therefore, the number of potential parents is 2Ne.

Constant Ne and coalescent prob
A graph of the probability of a coalescent event (i.e. two alleles sharing an ancestor) in the immediately preceding generation (i.e. parents) relatively to the size of the population. As one might expect, with larger population sizes there is low chance of sharing an ancestor in the immediately prior generation, as the pool of ‘potential parents’ increases.

If we have an idealistic population, with large Ne, random mating and no natural selection on our alleles, the probability that their ancestor is in this immediate generation prior (i.e. share a parent) is 1/(2Ne). Inversely, the probability they don’t share a parent is 1 − 1/(2Ne). If we add a temporal component (i.e. number of generations), we can expand this to include the probability of how many generations it would take for our alleles to coalesce as (1 – (1/2Ne))t-1 x 1/2Ne.

Variable Ne and coalescent probs
The probability of two alleles sharing a coalescent event back in time under different population sizes. Similar to above, there is a higher probability of an earlier coalescent event in smaller populations as the reduced number of ancestors means that alleles are more likely to ‘share’ an ancestor. However, over time this pattern consistently decreases under all population size scenarios.

Although this might seem mathematically complicated, the coalescent model provides us with a scenario of how we would expect different mutations to coalesce back in time if those idealistic scenarios are true. However, biology is rarely convenient and it’s unlikely that our study populations follow these patterns perfectly. By studying how our empirical data varies from the expectations, however, allows us to infer some interesting things about the history of populations and species.

Testing changes in Ne and bottlenecks

One of the more common applications of the coalescent is in determining historical changes in the effective population size of species, particularly in trying to detect genetic bottleneck events. This is based on the idea that alleles are likely to coalesce at different rates under scenarios of genetic bottlenecks, as the reduced number of individuals (and also genetic diversity) associated with bottlenecks changes the frequency of alleles and coalescence rates.

For a set of k different alleles, the rate of coalescence is determined as k(k – 1)/4Ne. Thus, the coalescence rate is intrinsically linked to the number of genetic variants available: Ne. During genetic bottlenecks, the severely reduced Ne gives the appearance of coalescence rate speeding up. This is because alleles which are culled during the bottleneck event by genetic drift causes only a few (usually common) alleles to make it through the bottleneck, with the mutation and spread of these alleles after the bottleneck. This can be a little hard to think of, so the diagram below demonstrates how this appears.

Bottleneck test figure.jpg
A diagram of how the coalescent can be used to detect bottlenecks in a single population (centre). In this example, we have contemporary population in which we are tracing the coalescence of two main alleles (red and green, respectively). Each circle represents a single individual (we are assuming only one allele per individual for simplicity, but for most animals there are up to two).  Looking forward in time, you’ll notice that some red alleles go extinct just before the bottleneck: they are lost during the reduction in Ne. Because of this, if we measure the rate of coalescence (right), it is much higher during the bottleneck than before or after it. Another way this could be visualised is to generate gene trees for the alleles (left): populations that underwent a bottleneck will typically have many shorter branches and a long root, as many branches will be ‘lost’ by extinction (the dashed lines, which are not normally seen in a tree).

This makes sense from theoretical perspective as well, since strong genetic bottlenecks means that most alleles are lost. Thus, the alleles that we do have are much more likely to coalesce shortly after the bottleneck, with very few alleles that coalesce before the bottleneck event. These alleles are ones that have managed to survive the purge of the bottleneck, and are often few compared to the overarching patterns across the genome.

Testing migration (gene flow) across lineages

Another demographic factor we may wish to test is whether gene flow has occurred across our populations historically. Although there are plenty of allele frequency methods that can estimate contemporary gene flow (i.e. within a few generations), coalescent analyses can detect patterns of gene flow reaching further back in time.

In simple terms, this is based on the idea that if gene flow has occurred across populations, then some alleles will have been transferred from one population to another. Because of this, we would expect that transferred alleles coalesce with alleles of the source population more recently than the divergence time of the two populations. Thus, models that include a migration rate often add it as a parameter specifying the probability than any given allele coalesces with an allele in another population or species (the backwards version of a migration or introgression event). Again, this might be difficult to conceptualise so there’s a handy diagram below.

Migration rate test figure
A similar model of coalescence as above, but testing for migration rate (gene flow) in two recently diverged populations (right). In this example, when we trace two alleles (red and green) back in time, we notice that some individuals in Population 1 coalesce more recently with individuals of Population 2 than other individuals of Population 1 (e.g. for the red allele), and vice versa for the green allele. This can also be represented with gene trees (left), with dashed lines representing individuals from Population 2 and whole lines representing individuals from Population 1. This incomplete split between the two populations is the result of migration transferring genes from one population to the other after their initial divergence (also called ‘introgression’ or ‘horizontal gene transfer’).

Testing divergence time

In a similar vein, the coalescent can also be used to test how long ago the two contemporary populations diverged. Similar to gene flow, this is often included as an additional parameter on top of the coalescent model in terms of the number of generations ago. To convert this to a meaningful time estimate (e.g. in terms of thousands or millions of years ago), we need to include a mutation rate (the number of mutations per base pair of sequence per generation) and a generation time for the study species (how many years apart different generations are: for humans, we would typically say ~20-30 years).

Divergence time test figure.jpg
An example of using the coalescent to test the divergence time between two populations, this time using three different alleles (red, green and yellow). Tracing back the coalescence of each alleles reveals different times (in terms of which generation the coalescence occurs in) depending on the allele (right). As above, we can look at this through gene trees (left), showing variation how far back the two populations (again indicated with bold and dashed lines respectively) split. The blue box indicates the range of times (i.e. a confidence interval) around which divergence occurred: with many more alleles, this can be more refined by using an ‘average’ and later related to time in years with a generation time.

 

The basic model of testing divergence time with the coalescent is relatively simple, and not all that different to phylogenetic methods. Where in phylogenetics we relate the length of the different branches in the tree to the amount of time that has occurred since the divergence of those branches, with the coalescent we base these on coalescent events, with more coalescent events occurring around the time of divergence. One important difference in the two methods is that coalescent events might not directly coincide with divergence time (in fact, we expect many do not) as some alleles will separate prior to divergence, and some will lag behind and start to diverge after the divergence event.

The complex nature of the coalescent

While each of these individual concepts may seem (depending on how well you handle maths!) relatively simple, one critical issue is the interactive nature of the different factors. Gene flow, divergence time and population size changes will all simultaneously impact the distribution and frequency of alleles and thus the coalescent method. Because of this, we often use complex programs to employ the coalescent which tests and balances the relative contributions of each of these factors to some extent. Although the coalescent is a complex beast, improvements in the methodology and the programs that use it will continue to improve our ability to infer evolutionary history with coalescent theory.

What’s the (allele) frequency, Kenneth?

Allele frequency

A number of times before on The G-CAT, we’ve discussed the idea of using the frequency of different genetic variants (alleles) within a particular population or species to test a number of different questions about evolution, ecology and conservation. These are all based on the central notion that certain forces of nature will alter the distribution and frequency of alleles within and across populations, and that these patterns are somewhat predictable in how they change.

One particular distinction we need to make early here is the difference between allele frequency and allele identity. In these analyses, often we are working with the same alleles (i.e. particular variants) across our populations, it’s just that each of these populations may possess these particular alleles in different frequencies. For example, one population may have an allele (let’s call it Allele A) very rarely – maybe only 10% of individuals in that population possess it – but in another population it’s very common and perhaps 80% of individuals have it. This is a different level of differentiation than comparing how different alleles mutate (as in the coalescent) or how these mutations accumulate over time (like in many phylogenetic-based analyses).

Allele freq vs identity figure.jpg
An example of the difference between allele frequency and identity. In this example (and many of the figures that follow in this post), the circle denote different populations, within which there are individuals which possess either an A gene (blue) or a B gene. Left: If we compared Populations 1 and 2, we can see that they both have A and B alleles. However, these alleles vary in their frequency within each population, with an equal balance of A and B in Pop 1 and a much higher frequency of B in Pop 2. Right: However, when we compared Pop 3 and 4, we can see that not only do they vary in frequencies, they vary in the presence of alleles, with one allele in each population but not the other.

Non-adaptive (neutral) uses

Testing neutral structure

Arguably one of the most standard uses of allele frequency data is the determination of population structure, one which more avid The G-CAT readers will be familiar with. This is based on the idea that populations that are isolated from one another are less likely to share alleles (and thus have similar frequencies of those alleles) than populations that are connected. This is because gene flow across two populations helps to homogenise the frequency of alleles within those populations, by either diluting common alleles or spreading rarer ones (in general). There are a number of programs that use allele frequency data to assess population structure, but one of the most common ones is STRUCTURE.

Gene flow homogeneity figure
An example of how gene flow across populations homogenises allele frequencies. We start with two initial populations (and from above), which have very different allele frequencies. Hybridising individuals across the two populations means some alleles move from Pop 1 and Pop 2 into the hybrid population: which alleles moves is random (the smaller circles). Because of this, the resultant hybrid population has an allele frequency somewhere in between the two source populations: think of like mixing red and blue cordial and getting a purple drink.

 

Simple YPP structure figure.jpg
An example of a Structure plot which long-term The G-CAT readers may be familiar with. This is taken from Brauer et al. (2013), where the authors studied the population structure of the Yarra pygmy perch. Each small column represents a single individual, with the colours representing how well the alleles of that individual fit a particular genetic population (each population has one colour). The numbers and broader columns refer to different ‘localities’ (different from populations) where individuals were sourced. This shows clear strong population structure across the 4 main groups, except for in Locality 6 where there is a mixture of Eastern and Merri/Curdies alleles.

Determining genetic bottlenecks and demographic change

Other neutral aspects of population identity and history can be studied using allele frequency data. One big component of understanding population history in particular is determining how the population size has changed over time, and relating this to bottleneck events or expansion periods. Although there are a number of different approaches to this, which span many types of analyses (e.g. also coalescent methods), allele frequency data is particularly suited to determining changes in the recent past (hundreds of generations, as opposed to thousands of generations ago). This is because we expect that, during a bottleneck event, it is statistically more likely for rare alleles (i.e. those with low frequency) in the population to be lost due to strong genetic drift: because of this, the population coming out of the bottleneck event should have an excess of more frequent alleles compared to a non-bottlenecked population. We can determine if this is the case with tests such as the heterozygosity excess, M-ratio or mode shift tests.

Genetic drift and allele freq figure
A diagram of how allele frequencies change in genetic bottlenecks due to genetic drift. Left: Large circles again denote a population (although across different sequential times), with smaller circle denoting which alleles survive into the next generation (indicated by the coloured arrows). We start with an initial ‘large’ population of 8, which is reduced down to 4 and 2 in respective future times. Each time the population contracts, only a select number of alleles (or individuals) ‘survive’: assuming no natural selection is in process, this is totally random from the available gene pool. Right: We can see that over time, the frequencies of alleles A and B shift dramatically, leading to the ‘extinction’ of Allele B due to genetic drift. This is because it is the less frequent allele of the two, and in the smaller population size has much less chance of randomly ‘surviving’ the purge of the genetic bottleneck. 

Adaptive (selective) uses

Testing different types of selection

We’ve also discussed previously about how different types of natural selection can alter the distribution of allele frequency within a population. There are a number of different predictions we can make based on the selective force and the overall population. For understanding particular alleles that are under strong selective pressure (i.e. are either strongly adaptive or maladaptive), we often test for alleles which have a frequency that strongly deviates from the ‘neutral’ background pattern of the population. These are called ‘outlier loci’, and the fact that their frequency is much more different from the average across the genome is attributed to natural selection placing strong pressure on either maintaining or removing that allele.

Other selective tests are based on the idea of correlating the frequency of alleles with a particular selective environmental pressure, such as temperature or precipitation. In this case, we expect that alleles under selection will vary in relation to the environmental variable. For example, if a particular allele confers a selective benefit under hotter temperatures, we would expect that allele to be more common in populations that occur in hotter climates and rarer in populations that occur in colder climates. This is referred to as a ‘genotype-environment association test’ and is a good way to detect polymorphic selection (i.e. when multiple alleles contribute to a change in a single phenotypic trait).

Genotype by environment figure.jpg
An example of how the frequency of alleles might vary under natural selection in correlation to the environment. In this example, the blue allele A is adaptive and under positive selection in the more intense environment, and thus increases in frequency at higher values. Contrastingly, the red allele B is maladaptive in these environments and decreases in frequency. For comparison, the black allele shows how the frequency of a neutral (non-adaptive or maladaptive) allele doesn’t vary with the environment, as it plays no role in natural selection.

Taxonomic (species identity) uses

At one end of the spectrum of allele frequencies, we can also test for what we call ‘fixed differences’ between populations. An allele is considered ‘fixed’ it is the only allele for that locus in the population (i.e. has a frequency of 1), whilst the alternative allele (which may exist in other populations) has a frequency of 0. Expanding on this, ‘fixed differences’ occur when one population has Allele A fixed and another population has Allele B fixed: thus, the two populations have as different allele frequencies (for that one locus, anyway) as possible.

Fixed differences are sometimes used as a type of diagnostic trait for species. This means that each ‘species’ has genetic variants that are not shared at all with its closest relative species, and that these variants are so strongly under selection that there is no diversity at those loci. Often, fixed differences are considered a level above populations that differ by allelic frequency only as these alleles are considered ‘diagnostic’ for each species.

Fixed differences figure.jpg
An example of the difference between fixed differences and allelic frequency differences. In this example, we have 5 cats from 3 different species, sequencing a particular target gene. Within this gene, there are three possible alleles: T, A or G respectively. You’ll quickly notice that the allele is both unique to Species A and is present in all cats of that species (i.e. is fixed). This is a fixed difference between Species A and the other two. Alleles and G, however, are present in both Species B and C, and thus are not fixed differences even if they have different frequencies.

Intrapopulation (relatedness) uses

Allele frequency-based methods are even used in determining relatedness between individuals. While it might seem intuitive to just check whether individuals share the same alleles (and are thus related), it can be hard to distinguish between whether they are genetically similar due to direct inheritance or whether the entire population is just ‘naturally’ similar, especially at a particular locus. This is the distinction between ‘identical-by-descent’, where alleles that are similar across individuals have recently been inherited from a similar ancestor (e.g. a parent or grandparent) or ‘identical-by-state’, where alleles are similar just by chance. The latter doesn’t contribute or determine relatedness as all individuals (whether they are directly related or not) within a population may be similar.

To distinguish between the two, we often use the overall frequency of alleles in a population as a basis for determining how likely two individuals share an allele by random chance. If alleles which are relatively rare in the overall population are shared by two individuals, we expect that this similarity is due to family structure rather than population history. By factoring this into our relatedness estimates we can get a more accurate overview of how likely two individuals are to be related using genetic information.

The wild world of allele frequency

Despite appearances, this is just a brief foray into the many applications of allele frequency data in evolution, ecology and conservation studies. There are a plethora of different programs and methods that can utilise this information to address a variety of scientific questions and refine our investigations.

You’re perfect, you’re beautiful, you look like a model (species)

What is a ‘model’?

There are quite literally millions of species on Earth, ranging from the smallest of microbes to the largest of mammals. In fact, there are so many that we don’t actually have a good count on the sheer number of species and can only estimate it based on the species we actually know about. Unsurprisingly, then, the number of species vastly outweighs the number of people that research them, especially considering the sheer volumes of different aspects of species, evolution, conservation and their changes we could possibly study.

Species on Earth estimate figure
Some estimations on the number of eukaryotic species (i.e. not including things like bacteria), with the number of known species in blue and the predicted number of total species on Earth in purpleSource: Census of Marine Life.

This is partly where the concept of a ‘model’ comes into it: it’s much easier to pick a particular species to study as a target, and use the information from it to apply to other scenarios. Most people would be familiar with the concept based on medical research: the ‘lab rat’ (or mouse). The common house mouse (Mus musculus) and the brown rat (Rattus norvegicus) are some of the most widely used models for understanding the impact of particular biochemical compounds on physiology and are often used as the testing phase of medical developments before human trials.

So, why are mice used as a ‘model’? What actually constitutes a ‘model’, rather than just a ‘relatively-well-research-species’? Well, there are a number of traits that might make certain species ideal subjects for understanding key concepts in evolution, biology, medicine and ecology. For example, mice are often used in medical research given their (relative) similar genetic, physiological and behavioural characteristics to humans. They’re also relatively short-lived and readily breed, making them ideal to observe the more long-term effects of medical drugs or intergenerational impacts. Other species used as models primarily in medicine include nematodes (Caenorhabditis elegans), pigs (Sus scrofa domesticus), and guinea pigs (Cavia porcellus).

The diversity of models

There are a wide variety and number of different model species, based on the type of research most relevant to them (and how well it can be applied to other species). Even with evolution and conservation-based research, which can often focus on more obscure or cryptic species, there are several key species that have widely been applied as models for our understanding of the evolutionary process. Let’s take a look at a few examples for evolution and conservation.

Drosophila

It would be remiss of me to not mention one of the most significant contributors to our understanding of the genetic underpinning of adaptation and speciation, the humble fruit fly (Drosophila melanogaster, among other species). The ability to rapidly produce new generations (with large numbers of offspring with very short generation time), small fully-sequenced genome, and physiological variation means that observing both phenotypic and genotypic changes over generations due to ‘natural’ (or ‘experimental’) selection are possible. In fact, Drosphilia spp. were key in demonstrating the formation of a new species under laboratory conditions, providing empirical evidence for the process of natural selection leading to speciation (despite some creationist claims that this has never happened).

Drosophila speciation experiment
A simplified summary of the speciation experiment in Drosophila, starting with a single species and resulting in two reproductively isolated species based on mating and food preference. Source: Ilmari Karonen, adapted from here.

Darwin’s finches

The original model of evolution could be argued to be Darwin’s finches, as the formed part of the empirical basis of Charles Darwin’s work on the theory of evolution by natural selection. This is because the different species demonstrate very distinct and obvious changes in morphology related to a particular diet (e.g. the physiological consequences of natural selection), spread across an archipelago in a clear demonstration of a natural experiment. Thus, they remain the original example of adaptive radiation and are fundamental components of the theory of evolution by natural selection. However, surprisingly, Darwin’s finches are somewhat overshadowed in modern research by other species in terms of the amount of available data.

Darwin's finches drawings
Some of Darwin’s early drawings of the morphological differences in Galapagos finch beaks, which lead to the formulation of the theory of evolution by natural selection.

Zebra finches

Even as far as birds go, one species clearly outshines the rest in terms of research. The zebra finch is one of the most highly researched vertebrate species, particularly as a model of song learning and behaviour in birds but also as a genetic model. The full genome of the zebra finch was the second bird to ever be sequenced (the first being a chicken), and remains one of the more detailed and annotated genomes in birds. Because of this, the zebra finch genome is often used as a reference for other studies on the genetics of bird species, especially when trying to understand the function of genetic changes or genes under selection.

Zebra finches.jpg
A pair of (very cute) model zebra finches. Source: Michael Lawton via Smithsonian.com.

 

Fishes

Fish are (perhaps surprisingly) also relatively well research in terms of evolutionary studies, largely due to their ancient origins and highly diverse nature, with many different species across the globe. They also often demonstrate very rapid and strong bouts of divergence, such as the cichlid fish species of African lakes which demonstrate how new species can rapidly form when introduced to new and variable environments. The cichlids have become the poster child of adaptive radiation in fishes much in the same way that Darwin’s finches highlighted this trend in birds. Another group of fish species used as a model for similar aspects of speciation, adaptive divergence and rapid evolutionary change are the three-spine and nine-spine stickleback species, which inhabit a variety of marine, estuarine and freshwater environments. Thus, studies on the genetic changes across these different morphotypes is a key in understanding how adaptation to new environments occur in nature (particularly the relatively common transition into different water types in fishes).

cichlid diversity figure
The sheer diversity of species and form makes African cichlids an ideal model for testing hypotheses and theories about the process of evolution and adaptive radiation. Figure sourced from Brawand et al. (2014) in Nature.

Zebra fish

More similar to the medical context of lab rats is the zebrafish (ironically, zebra themselves are not considered a model species). Zebrafish are often used as models for understanding embryology and the development of the body in early formation given the rapid speed at which embryonic development occurs and the transparent body of embryos (which makes it easier to detect morphological changes during embryogenesis).

Zebrafish embryo
The transparent nature of zebrafish embryos make them ideal for studying the development of organisms in early stages. Source: yourgenome.org.

Using information from model species for non-models

While the relevance of information collected from model species to other non-model species depends on the similarity in traits of the two species, our understanding of broad concepts such as evolutionary process, biochemical pathways and physiological developments have significantly improved due to model species. Applying theories and concepts from better understood organisms to less researched ones allows us to produce better research much faster by cutting out some of the initial investigative work on the underlying processes. Thus, model species remain fundamental to medical advancement and evolutionary theory.

That said, in an ideal world all species would have the same level of research and resources as our model species. In this sense, we must continue to strive to understand and research the diversity of life on Earth, to better understand the world in which we live. Full genomes are progressively being sequenced for more and more species, and there are a number of excellent projects that are aiming to sequence at least one genome for all species of different taxonomic groups (e.g. birds, bats, fish). As the data improves for our non-model species, our understanding of evolution, conservation management and medical research will similarly improve.

Lost in a forest of (gene) trees

Using genetics to understand species history

The idea of using the genetic sequences of living organisms to understand the evolutionary history of species is a concept much repeated on The G-CAT. And it’s a fundamental one in phylogenetics, taxonomy and evolutionary biology. Often, we try to analyse the genetic differences between individuals, populations and species in a tree-like manner, with close tips being similar and more distantly separated branches being more divergent. However, this runs on one very key assumption; that the patterns we observe in our study genes matches the overall patterns of species evolution. But this isn’t always true, and before we can delve into that we have to understand the difference between a ‘gene tree’ and a ‘species tree’.

A gene tree or a species tree?

Our typical view of a phylogenetic tree is actually one of a ‘gene tree’, where we analyse how a particular gene (or set of genes) have changed over time between different individuals (within and across populations or species) based on our understanding of mutation and common ancestry.

However, a phylogenetic tree based on a single gene only demonstrates the history of that gene. What we assume in most cases is that the history of that gene matches the history of the species: that branches in the genetic tree mirror when different splits in species occurred throughout history.

The easiest way to conceptualise gene trees and species trees is to think of individual gene trees that are nested within an overarching species tree. In this sense, individual gene trees can vary from one another (substantially, even) but by looking at the overall trends of many genes we can see how the genome of the species have changed over time.

Gene tree incongruence figure
A (potentially familiar) depiction of individual gene trees (coloured lines) within the broader species tree (defined b the black boundaries). As you might be able to tell, the branching patterns of the different genes are not the same, and don’t always match the overarching species tree.

Gene tree incongruence

Different genes may have different patterns for a number of reasons. Changes in the genetic sequences of organisms over time don’t happen equally across the entire genome, and very specific parts of the genome can evolve in entirely different directions, or at entirely different rates, than the rest of the genome. Let’s take a look at a few ways we could have conflicting gene trees in our studies.

Incomplete lineage sorting

One of the most prolific, but more complicated, ways gene trees can vary from their overarching species tree is due to what we call ‘incomplete lineage sorting’. This is based on the idea that species and the genes that define them are constantly evolving over time, and that because of this different genes are at different stages of divergence between population and species. If we imagine a set of three related populations which have all descended from a single ancestral population, we can start to see how incomplete lineage sorting could occur. Our ancestral population likely has some genetic diversity, containing multiple alleles of the same locus. In a true phylogenetic tree, we would expect these different alleles to ‘sort’ into the different descendent populations, such that one population might have one of the alleles, a second the other, and so on, without them sharing the different alleles between them.

If this separation into new populations has been recent, or if gene flow has occurred between the populations since this event, then we might find that each descendent population has a mixture of the different alleles, and that not enough time has passed to clearly separate the populations. For this to occur, sufficient time for new mutations to occur and genetic drift to push different populations to differently frequent alleles needs to happen: if this is too recent, then it can be hard to accurately distinguish between populations. This can be difficult to interpret (see below figure for a visualisation of this), but there’s a great description of incomplete lineage sorting here.

ILS_adaptedfigure
A demonstration of incomplete lineage sorting, generously adapted from a talk by fellow MELFU postdocs Dr Yuma (Jonathon) Sandoval-Castillo and Dr Catherine Attard. On the left is a depiction of a single gene coalescent tree over time: circles represent a single individual at a particular point in time (row) with the colours representing different alleles of that same gene. The tree shows how new mutations occur (colour changes along the branches) and spread throughout the descendent populations. In this example, we have three recently separated species, with a good number of different alleles. However, when we study these alleles in tree form (the phylogeny on the right), we see that the branches themselves don’t correlate well with the boundaries of the species. For example, the teal allele found within Species C is actually more similar to Species B alleles (purple and blue) than any other Species B alleles, based on the order and patterns of these mutations.

Hybridisation and horizontal transfer

Another way individual genes may become incongruent with other genes is through another phenomenon we’ve discussed before: hybridisation (or more specifically, introgression). When two individuals from different species breed together to form a ‘hybrid’, they join together what was once two separate gene pools. Thus, the hybrid offspring has (if it’s a first generation hybrid, anyway) 50% of genes from Species A and 50% of genes from Species B. In terms of our phylogenetic analysis, if we picked one gene randomly from the hybrid, we have 50% of picking a gene that reflects the evolutionary history of Species A, and 50% chance of picking a gene that reflects the evolutionary history of Species B. This would change how our outputs look significantly: if we pick a Species A gene, our ‘hybrid’ will look (genetically) very, very similar to Species A. If we pick a Species B gene, our ‘hybrid’ will look like a Species B individual instead. Naturally, this can really stuff up our interpretations of species boundaries, distributions and identities.

Hybridisation_figure
An example of hybridisation leading to gene tree incongruence with our favourite colourful fishA) We have a hybridisation event between a red fish (Species A) and a green fish (Species B), resulting in a hybrid species (‘Species’ H). The red fish genome is indicated by the yellow DNA, the green fish genomes by the blue DNA, and the hybrid orange fish has a mixture of these two. B) If we sampled one set of genes in the hybrid, we might select a gene that originated from the red fish, showing that the hybrid is identical (or very similar) the Species A. D) Conversely, if we sampled a gene originating from the green fish, the resultant phylogeny might show that the hybrid is the same as Species B. C) If we consider these two patterns in combination, which see the true pattern of species formation, which is not a clear dichotomous tree and rather a mixture of the two sets of trees.

Paralogous genes

More confusingly, we can even have events where a single gene duplicates within a genome. This is relatively rare, although it can have huge effects: for example, salmon have massive genomes as the entire thing was duplicated! Each version of the gene can take on very different forms, functions, and evolve in entirely different ways. We call these duplicated variants paralogous genes: genes that look the same (in terms of sequence), but are totally different genes.

This can have a profound impact as paralogous genes are difficult to detect: if there has been a gene duplication early in the evolutionary history of our phylogenetic tree, then many (or all) of our study samples will have two copies of said gene. Since they look similar in sequence, there’s all possibility that we pick Variant 1 in some species and Variant 2 in other species. Being unable to tell them apart, we can have some very weird and abstract results within our tree. Most importantly, different samples with the same duplicated variant will seem similar to one another (e.g. have evolved from a common ancestor more recently) than it will to any sample of the other variant (even if they came from the exact same species)!

Paralogy_figure.jpg
An example of how paralogous genes can confound species tree. We start with a single (purple) gene: at a particular point in time, this gene duplicates into a red and a blue form. Each of these genes then evolve and spread into four separate descendent species (A, B, C and D) but not in entirely the same way. However, since both the red and blue genetic sequences are similar, if we took a single gene from each species we might (somewhat randomly) sequence either the red or the blue copy. The different phylogenetic trees on the right demonstrate how different combinations of red and blue genes give very different patterns, since all blue copies will be more related to other blue genes than to the red gene of the same species. E.g. a blue A and a blue C are more similar than a blue A and a red A.

Overcoming incongruence with genomics

Although a tricky conundrum in phylogenetics and evolutionary genetics broadly, gene tree incongruence can largely be overcome with using more loci. As the random changes of any one locus has a smaller effect of the larger total set of loci, the general and broad patterns of evolutionary history can become clearer. Indeed, understanding how many loci are affected by what kind of process can itself become informative: large numbers of introgressed loci can indicate whether hybridisation was recent, strong, or biased towards one species over another, for example. As with many things, the genomic era appears poised to address the many analytical issues and complexities of working with genetic data.

 

Hotter and colder: how historic glacial cycles have shaped modern diversity

A tale as old as time

Since evolution is a constant process, occurring over both temporal and spatial scales, the impact of evolutionary history for current and future species cannot be overstated. The various forces of evolution through natural selection have strong, lasting impacts on the evolution of organisms, which is exemplified within the genetic make-up of all species. Phylogeography is the domain of research which intrinsically links this genetic information to historical selective environment (and changes) to understand historic distributions, evolutionary history, and even identify biodiversity hotspots.

The Ice Age(s)

Although there are a huge number of both historic and contemporary climatic factors that have influenced the evolution of species, one particularly important time period is referred to as the Pleistocene glacial cycles. The Pleistocene epoch spans from ~2 million years ago until ~100,000 years ago, and is a time of significant changes in the evolution of many species still around today (particularly for vertebrates). This is because the Pleistocene largely consisted of several successive glacial periods: at times, the climate was significantly cooler, glaciers were more widespread and sea-levels were lower (due to the deeper freezing of water around the poles). These periods were then followed by ‘interglacial periods’, where much of the globe warmed, ice caps melted and sea-levels rose. Sometimes, this natural pattern is argued as explaining 100% of recent climate change: don’t be fooled, however, as Pleistocene cycles were never as dramatic or irreversible as modern, anthropogenically-driven climate change.

Annotated glacial cycles.jpg
The general pattern of glacial and interglacial periods over the last 1 million years, adapted from Oceanbites.

The glacial cycles of the Pleistocene had a number of impacts on a plethora of species on Earth. For many of these species, these glacial-interglacial periods resulted in what we call ‘glacial refugia’ and ‘interglacial expansion’: at the peak of glacial periods, many species’ distributions contracted to small patches of suitable habitat, like tiny islands in a freezing ocean. As the globe warmed during interglacial periods, these habitats started to spread and with them the inhabiting species. While it’s expected that this likely happened many times throughout the Pleistocene, the most clearly observed cycle would be the most recent one: referred to as the Last Glacial Maximum (LGM), at ~21,000 years ago. Thus, a quick dive into the literature shows that it is rife with phylogeographic examples of expansions and contractions related to the LGM.

glacial refugia example figure.jpg
An example of how phylogeographic analysis can find glacial refugia in species, in this case the montane caddisfly Thremma gallicum from Macher et al. (2017). The colours refer to the two datasets they used (blue = ddRADseq; red = mtDNA) and the arrows demonstrate migration pathways in the interglacial period following the LGM.

The glacial impact on genetic diversity

Why does any of this matter? Didn’t it all happen in the past? Well, that leads us back to the original point in this post: forces of evolution leave distinct impacts on the genetic architecture of species. In regards to glacial refugia, a clear pattern is often observed: populations occurring approximately in line with the refugia have maintained greater genetic diversity over time, whilst those in more unstable or unsuitable regions show much more reduced genetic diversity. And this makes sense: many of those populations likely went extinct during glaciation, and only within the last 20,000 or so years have been recolonised from nearby refugia. Accounting for genetic drift due to founder effect, it’s easy to see how this would cause genetic diversity to plummet.

Case study: the charismatic cheetah

And this loss of genetic diversity isn’t just a hypothetical, or an interesting note in evolution. It can have dire impacts for the survivability of species. Take for example, the very charismatic cheetah. Like many large, apex predator species, the cheetah in the modern day is endangered and at risk of extinction to a variety of threats, and although many of these are linked to modern activity (such as being killed to protect farms or habitat clearing), some of these go back much further in history.

Believe it not, the cheetah as a species actually originated from an ancestor in the Americas: they’re closely related to other American big cats such as the puma/cougar. During the Miocene (5 – 8 million years ago), however, the ancestor of the modern cheetah migrated a very long way to Africa, diverging from its shared ancestor with jaguarandi and cougars. Subsequent migrations into Africa and Asia (where only the Iranian subspecies remains) during the Pleistocene, dated at ~100,000 and ~12,000 years ago, have been shown through whole genome analysis to have resulted in significant reductions in the genetic diversity of the cheetah. This timing correlates with the extinction of the cheetah and puma within North America, and the worldwide extinction of many large mammals including mammoths, dire wolves and sabre-tooth tigers.

cheetah bottleneck.jpg
The demographic history of the African cheetah population, based on whole genomes in Dobrynin et al. (2015). In this figure, ‘Eastern’ refers to a Tanzanian population whilst ‘southern’ refers to a Namibian population (and as such doesn’t depict bottlenecks elsewhere in the cheetah e.g. Iran). The initial population underwent a severe genetic bottleneck ~12,000 years ago, likely due to glaciation.

What does this mean for the cheetah? Well, the cheetah has one of the lowest amounts of genetic variation for any living mammal. It’s even lower than the Tasmanian Devil, a species with such notoriously low genetic diversity that a rampant face cancer (Devil Facial Tumour Disease) is transmissible simply because their immune system can’t recognise the transferred cancer cells as being different to the host animal. Similarly, for the cheetah, it’s possible to do reciprocal skin transplants without the likelihood of organ rejection simply because their immune system is incapable of determining the difference between foreign and host tissue cells.

cheetah diversity 2.jpg
Examples of the incredibly low genetic diversity in cheetah, both from Dobrynin et al. (2015)A) shows the relative level of genetic diversity in cheetah compared to many other species, being lower than Tasmanian Devils and significantly lower than humans and domestic cats. D) shows the overall variation across the genome of a domestic cat (top), the inbred Abyssinian cat (middle) and the cheetah (bottom). Highly variable regions are indicated in red, whilst low variability regions are indicated in green. As you can see, the entirety of the cheetah genome has incredibly low genetic variation, even compared to another cat species considered to have low genetic variation (the Abyssinian).

Inference for the future

Understanding the impact of the historic environment on the evolution and genetic diversity of living species is not just important for understanding how species became what they are today. It also helps us understand how species might change in the future, by providing the natural experimental evidence of evolution in a changing climate.

 

Rescuing the damselfish in distress: rescue or depression?

Conservation management

Managing and conserving threatened and endangered species in the wild is a difficult process. There are a large number of possible threats, outcomes, and it’s often not clear which of these (or how many of these) are at play at any one given time. Thankfully, there are also a large number of possible conservation tools that we might be able to use to protect, bolster and restore species at risk.

Using genetics in conservation

Naturally, we’re going to take a look at the more genetics-orientated aspects of conservation management. We’ve discussed many times the various angles and approaches we can take using large-scale genetic data, some of which include:
• studying the evolutionary history and adaptive potential of species
• developing breeding programs using estimates of relatedness to increase genetic diversity
identifying and describing new species for government legislation
• identifying biodiversity hotspots and focus areas for conservation
• identifying population boundaries for effective management/translocations

Genetics flowchart.jpg
An example of just some of the conservation applications of genetics research that we’ve talked about previously on The G-CAT.

This last point is a particularly interesting one, and an area of conservation research where genetics is used very often. Most definitions of a ‘population’ within a species rely on using genetic data and analysis (such as Fst) to provide a statistical value of how different groups of organisms are within said species. Ignoring some of the philosophical issues with the concept of a population versus a species due to the ‘speciation continuum’ (read more about that here), populations are often interpreted as a way to cluster the range of a species into separate units for conservation management. In fact, the most commonly referred to terms for population structure and levels are evolutionarily-significant units (ESUs), which are defined as a single genetically connected group of organisms that share an evolutionary history that is distinct from other populations; and management units (MUs), which may not have the same degree of separation but are still definably different with enough genetic data.

Hierarchy of structure.jpg
A diagram of the hierarchy of structure within a species. Remember that ESUs, by definition, should be evolutionary different from one another (i.e. adaptively divergent) whilst MUs are not necessarily divergent to the same degree.

This can lead to a particular paradigm of conservation management: keeping everything separate and pure is ‘best practice’. The logic is that, as these different groups have evolved slightly differently from one another (although there is often a lot of grey area about ‘differently enough’), mixing these groups together is a bad idea. Particularly, this is relevant when we consider translocations (“it’s never acceptable to move an organism from one ESU into another”) and captive breeding programs (“it’s never acceptable to breed two organisms together from different ESUs”). So, why not? Why does it matter if they’re a little different?

Outbreeding depression

Well, the classic reasoning is based on a concept called ‘outbreeding depression’. We’ve mentioned outbreeding depression before, and it is a key concept kept in mind when developing conservation programs. The simplest explanation for outbreeding depression is that evolution, through the strict process of natural selection, has pushed particularly populations to evolve certain genetic variants for a certain selective pressure. These can vary across populations, and it may mean that populations are locally adapted to a specific set of environmental conditions, with the specific set of genetic variants that best allow them to do this.

However, when you mix in the genetic variants that have evolved in a different population, by introducing a foreign individual and allowing them to breed, you essentially ‘tarnish’ the ‘pure’ gene pool of that population with what could be very bad (maladaptive) genes. The hybrid offspring of ‘native’ and this foreign individual will be less adaptive than their ‘pure native’ counterparts, and the overall adaptiveness of the population will decrease as those new variants spread (depending on the number introduced, and how negative those variants are).

Outbreeding depression example figure.jpg
An example of how outbreeding depression can affect a species. The original red fish population is not doing well- it is of conservation concern, and has very little genetic diversity (only the blue gene in this example). So, we decide to introduce new genetic diversity by adding in green fish, which have the orange gene. However, the mixture of the two genes and the maladaptive nature of the orange gene actually makes the situation worse, with the offspring showing less fitness than their preceding generations.

You might be familiar with inbreeding depression, which is based on the loss of genetic diversity from having too similar individuals breeding together to produce very genetically ‘weak’ offspring through inbreeding. Outbreeding depression could be thought of as the opposite extreme; breeding too different individuals introduced too many ‘bad’ alleles into the population, diluting the ‘good’ alleles.

Inbreeding vs outbreeding figure.jpg
An overly simplistic representation of how inbreeding and outbreeding depression can reduce overall fitness of a species. In inbreeding depression, the lack of genetic diversity due to related individuals breeding with one another makes them at risk of being unable to adapt to new pressures. Contrastingly, adding in new genes from external populations which aren’t fit for the target population can also reduce overall fitness by ‘diluting’ natural, adaptive allele frequencies in the population.

Genetic rescue

It might sound awfully purist to only preserve the local genetic diversity, and to assume that any new variants could be bad and tarnish the gene pool. And, surprisingly enough, this is an area of great debate within conservation genetics.

The counterpart to the outbreeding depression concerns is the idea of genetic rescue. For populations with already severely depleted gene pools, lacking the genetic variation to be able to adapt to new pressures (such as contemporary climate change), the situation seems incredibly dire. One way to introduce new variation, which might be the basis of new adaptation, bringing in individuals from another population of the same species can provide the necessary genetic diversity to help that population bounce back.

Genetic rescue example figure.jpg
An example of genetic rescue. This circumstance is identical to the one above, with the key difference being in the fitness of the introduced gene. The orange gene in this example is actually beneficial to the target population: by providing a new, adaptive allele for natural selection to act upon, overall fitness is increased for the red fish population.

The balance

So, what’s the balance between the two? Is introducing new genetic variation a bad idea, and going to lead to outbreeding depression; or a good idea, and lead to genetic rescue? Of course, many of the details surrounding the translocation of new genetic material is important: how different are the populations? How different are the environments (i.e. natural selection) between them? How well will the target population take up new individuals and genes?

Overall, however, the more recent and well-supported conclusion is that fears regarding outbreeding depression are often strongly exaggerated. Bad alleles that have been introduced into a population can be rapidly purged by natural selection, and the likelihood of a strongly maladaptive allele spreading throughout the population is unlikely. Secondly, given the lack of genetic diversity in the target population, most that need the genetic rescue are so badly maladaptive as it is (due to genetic drift and lack of available adaptive alleles) that introducing new variants is unlikely to make the situation much worse.

Purging and genetic rescue figure.jpg
An example of how introducing maladaptive alleles might not necessarily lead to decreased fitness. In this example, we again start with our low diversity red fish population, with only one allele (AA). To help boost genetic diversity, we introduce orange fish (with the TT allele) and green fish (with the GG allele) into the population. However, the TT allele is not very adaptive in this new environment, and individuals with the TT gene quickly die out (i.e. be ‘purged’). Individual with the GG gene, however, do well, and continue to integrate into the red population. Over time, these two variants will mix together as the two populations hybridise and overall fitness will increase for the population.

That said, outbreeding depression is not an entirely trivial concept and there are always limitations in genetic rescue procedures. For example, it would be considered a bad idea to mix two different species together and make hybrids, since the difference between two species, compared to two populations, can be a lot stronger and not necessarily a very ‘natural’ process (whereas populations can mix and disjoin relatively regularly).

The reality of conservation management

Conservation science is, at its core, a crisis discipline. It exists solely as an emergency response to the rapid extinction of species and loss of biodiversity across the globe. The time spent trying to evaluate the risk of outbreeding depression – instead of immediately developing genetic rescue programs – can cause species to tick over to the afterlife before we get a clear answer. Although careful consideration and analysis is a requirement of any good conservation program, preventing action due to almost paranoid fear is not a luxury endangered species can afford.

Origination of adaptation: the old and the new (genes)

Adaptation is arguably the most critical biological process in the evolution of species. The process of evolution by natural selection is the cornerstone of evolutionary biology (and indeed, all of contemporary biology!) and adaptation remains fundamental to the process. We know that adaptation is based on the idea that some genetic variants are ‘better’ adapted than others, and thus are unequally shared across a population. But where does this genetic variation come from?

The accumulation of new genetic variation

The classic way for new genetic variants to appear is often thought of as mutation: changes in a single base in the DNA are caused by various external processes such as chemical, physical or environmental influences (such as the sci-fi classics like UV rays or toxic chemicals). Although these forms of mutations happen very rarely and certainly don’t have the same effects comic books would leave you to believe, new mutations can occur relatively rapidly depending on the characteristics of the species. However, the most common way for new mutations to occur is actually part of the DNA replication process: copying DNA is not always perfect and even though the relevant proteins essentially run a spellcheck, sometimes the copy is not 100% perfect and new mutations occur.

Adaptation of mutation figure
An example of how adaptation can occur from a new mutation. In this example, we have one gene (TTXTT), with initial only one allele (variant), TTATT. In the second generation (row), a mutation occurs in one individual which creates a new, second allele: TTGTT. This allele is favoured over the TTATT allele, and in the next generation it’s frequency increases as the alternative allele frequency decreases (the pattern is shown in the frequency values on the right side).

It is important to remember that only mutations that are present in the reproductive cells (sperm and eggs) can be inherited and passed on, and thus be a source for adaptation. Mutations in other tissues of the body, such as within the skin, are not spread across the entire body of the subject and thus aren’t passed on to offspring.

Standing genetic variation

Alternatively, genetic variation might already be present within a species or population. This is more likely if population sizes are large and populations are well connected and interbreeding. We refer to this diverse initial gene pool as ‘standing genetic variation’: that is, the amount of genetic variation within the population or species before the selective pressure requiring adaptation. Standing genetic variation can be thought of as the ‘diversity of choices’ for natural selection to act upon: the variants are readily available, and if a good choice exists it will be favoured by natural selection and become more widespread within the population or species (i.e. evolve).

Adaptation of standing variation figure.jpg
A slightly more complex example of how adaptation can occur from standing variation, this time with two different genes. One codes for fur colour, with two different alleles: GCATA codes for orange fur, and GCGTA codes for grey fur. The other gene codes for ear tufts, with TTCCT coding for tufts and TCCCT coding for no tufts. Natural selection favours both orange fur and tufted ears, and cats with these traits reproduce more frequently than those without (see graph below). These cats probably look familiar.
Graph of standing variation.jpg
The frequency of all four alleles (i.e. either allele for both genes) over the generations in the above figure. Clearly, we can see how adaptation rapidly favours orange fur and tufted ears over grey fur and non-tufted ears with the shifts in frequencies over the different alleles.

We’ve discussed standing genetic variation before on The G-CAT, but often in a different light (and phrasing). For example, when we’ve talked about founder effect: that is, when a population is formed from only a few different individuals which causes it to be very genetically depauperate. In populations under strong founder effect, there is very little standing genetic variation for natural selection to act upon. This has long been an enigma for many pest species: how have they managed to proliferate so widely when they often originate from so few individuals and lack genetic diversity?

Adaptive variation

Adaptation may not require new genetic variants to be generated from mutation. If there are a large number of alleles within the gene pool to start with, then natural selection may favour one of those variants over others and allow adaptation to start immediately. Compared to the rate at which new mutations occur, are potentially corrected for in DNA repair, are potentially erased by genetic drift, and then put under selective pressure, adaptation from standing genetic variation can occur very quickly.

Rate of adaptation figure.jpg
A rough example of the speed of adaptation depending on how the adaptive allele originated: whether it was already present (in the form of standing variation), or whether it was created by a new mutation. As one would expect, there is a significant lag delay in adaptation in the mutation scenario, based on the time it takes for said adaptive mutation to be created through relatively random processes. Thus, a positively selected allele from standing variation can allow a species to adapt much faster than waiting for a positive mutation to occur.

Conserving genetic variation

Given the adaptive potential provided by maintaining a good amount of standing genetic variation, it is imperative to conserve genetic diversity within populations in conservation efforts. This is why we often equate genetic diversity with ‘adaptive potential’ of species, although the exact amount of genetic diversity required for adaptive potential depends on a large number of other factors. Clearly, in some instances species show the ability to adapt to new pressures or novel environments even without a large amount of standing genetic variation.

It is important to remember that standing genetic variation consists of two types: neutral genetic diversity, which is not necessarily under selection at the time, and adaptive genetic diversity, which is directly under selection (although this can be either for or against the given variant). However, currently neutral genetic variants may become adaptive variants in the future if selective pressures change: although those different variants aren’t necessarily beneficial or detrimental at the moment, that may change in the future. Thus, conserving both types of genetic diversity is important for the survivability and longevity of populations under conservation programs.

Other types of adaptation

Although genetic diversity is clearly critically important for adaptive potential, alternative mechanisms for adaptation also exist. One of these relies less on the actual genetic variants being different, but rather how individual genes are used. This can happen in a few different ways, but mostly commonly this is through alternative splicing: when a gene is being ‘read’ and a protein is produced, different parts of the gene can be used (and in different order) to make a completely different protein.

Alternate splicing figure.jpg
An extreme example of alternate splicing of one gene. We start with a single gene, composed of 5 (AE) main gene elements (exons). Different environmental pressures (like fire risk, flooding, cold weather or predators, for example) cause the organism to use different combinations of these exons to make different proteins (right side; AD). Actual alternate splicing is not usually this straight-forward (one gene doesn’t conveniently split into four forms depending on the threat), but the process is generally the same.

Believe it or not, we’ve sort of discussed the effects of alternative splicing before. Phenotypic plasticity occurs when a single organism can have very different physiological traits depending on the environment: even though the genes are the same, they are utilised in different ways to make a different body shape. This is how some species can look incredibly different when they live in different places even if they’re genetically very similar. That said, for the vast majority of species maintaining good levels of genetic diversity is critical for the survivability of said species.

It takes (at least) two: coevoultion and species interactions

The environmental context of adaptation

We’ve talked many times before about how species evolve in response to some kind of environmental pressure, which favours (or disfavours) certain traits within that species. Over time, this drives changes in the frequencies of species traits and alters the overall average phenotype of that species (sometimes slowly, sometimes rapidly).

While we usually talk about the environment in terms of abiotic conditions such as temperature or climate, biotic factors are equally important: that is, the parts of the environment which are themselves also alive. Because of this, changes in one species can have profound repercussions on other species linked within the ecosystem. Thus, the evolution of one species is intrinsically linked to the evolution of other relevant species within the ecosystem: often, these connected evolutionary pathways battle with one another as each one changes. Let’s take a look at a few different examples of how evolution of one species may impact the evolution of another.

Predator-prey coevolution

One of the most obvious ways the evolution of two different species can interact is in predator and prey relationships. Naturally, prey species evolve to be able to defend themselves from predators in various ways, such as crypsis (e.g. camouflage), toxicity or behavioural changes (such as nocturnalism or group herding). Contrastingly, predators will evolve new and improved methods for detecting and hunting prey, such as enhanced senses, venom and stealth (through soft-padded feet, for example).

There are millions of possible examples of predator-prey coevolution that could be used as examples here, based on the continual drive for one species to get the upper hand over the other. But one that comes to mind is of a creature that I learnt about while on holiday in Scandinavia: the pine marten, and how it affects squirrels.

38542167_10216809232693743_2189871337374220288_o.jpg
This photo is one that I took whilst on a lunch break at a bakery in the Norwegian mountains, of a small critter running among the rocks by the lakeside. Not sure exactly what species it was, I asked the tour director who excitedly told me that it was a pine marten. After doing a bit of research on them (and trying to figure out what the difference between a pine marten, a stoat, and a weasel is), I’ve discovered that it’s actually more likely to be a stoat than a pine marten, based on size and colour. But pine martens are still an intriguing species in their own right (and also found in Norway, so the confusion is understandable).

The pine marten is a species in the mustelid family, along with otters, weasels, stoats, and wolverines. Like many mustelids, they are carnivorous mammals which feed on a variety of different prey items like rodents, small birds and insects. One of the more abundant species that they prey upon are squirrels: both red squirrels and grey squirrels are potential food for the cute yet savage pine marten.

However, within the distribution of pine martens (across much of Europe), red squirrels are the native species and grey squirrels are invasive, originating from North America. Because of the long-lasting relationship between red squirrels and pine martens, they’ve co-evolved: most notably, by red squirrels changing to a mostly arboreal lifestyle and avoiding the ground as much as possible. Grey squirrels, however, have not had the evolutionary history to learn this lesson and are easy food for a smart pine marten. Thus, in regions where pine martens have been conserved or reintroduced, they are actively controlling the invasive grey squirrel population, which in turn boosts the native red squirrel population by reduction of competition. The coevolutionary link between red squirrels and pine martens is critical for combating the invasive species.

 

Martens and squirrels figure.jpg
The relationship between pine marten abundance and the abundance of both red (native) and grey (invasive) squirrels. On the left, without pine martens the invasive species runs rampant, outcompeting the native species. However, as pine martens increase in the ecosystem, the grey squirrels are predated on much more than the red squirrels due to their naivety, leading to the ‘natural’ balance on the right.
Martens and squirrels stats.jpg
A diagram of how the abundance of squirrels changes relative to the number of pine martens. The invasive grey squirrels are significantly depleted by pine marten presence, which in turn allows the native red squirrels to increase in population size after being freed from competition.

Host-parasite coevolution

In a similar vein to predator and prey coevolution, pathogenic species and their unfortunate hosts also undergo a sort of ‘arms race’. Parasites must keep evolving new ways to infect and transmit to hosts as the hosts evolve new methods of resisting and avoiding the infecting species. This spiralling battle of evolutionary forces is dubbed as the ‘Red Queen hypothesis’, formulated in 1973 by Leigh Van Valen and used to describe many other forms of coevolution. The name comes from Lewis Carroll’s Through the Looking Glass, and one quote in particular:

‘Now, here, you see, it takes all the running you can do, to keep in the same place’.

The quote references how species must continually adapt and respond to the evolution of other species just keep existing and prevent extinction. Species that remain static and stop evolving will inevitably go extinct as the world around them changes.

Mimicry

Plenty of other strange and unique mechanisms of coevolution exist within nature. One of them is mimicry, the process by which one species attempts to look like another to protect itself. The most iconic group known for this is butterflies: many species, although they may be evolutionarily very different, share similar colouration patterns and body shapes as mimics. Depending on the nature of the copy, mimicry can be classified into two broad categories. In either case, the initial ‘reference’ species is toxic or unpalatable to predators and uses a type of colour signal to communicate this: think of the bright yellow colours of bees and wasps or the red of ladybirds. Where the two categories change is in the nature of the ‘mimic’ species.

Müllerian mimicry

If the mimic is also toxic or unpalatable, we call this Müllerian mimicry (after Johann Friedrich Theodor Müller). By sharing the same colouration patterns and both being toxic, the two mimicking species boost the potential for the signal to be learnt by predators. If a predator eats either species, it will associate that colour pattern with toxicity and neither species are as likely to be preyed upon in the future. In this sense, it is a cooperative coevolutionary relationship between the two physically similar species.

Mullerian mimicry figure
A (somewhat familiar) example of Müllerian mimicry with two species of butterflies, the monarch and the viceroy. Although this has traditionally been thought of as a textbook case of Batesian mimicry (see below), the toxicity of both species likely makes it a scenario of Müllerian mimicry instead. Since both butterflies share the same pattern and both are toxic, it sends a strong signal to predators such as wasps to avoid them both.

Batesian mimicry

In contrast, the mimic might not actually be toxic or unpalatable, and simply copying a toxic species. This is referred to as Batesian mimicry (after Henry Walter Bates), and involves a mimic species relying on the association of colour and toxicity to have been learnt by predators through the ‘reference’ species. Although the mimic is not toxic, it is essentially piggy-backing on the hard evolutionary work that has already been done by the actually toxic species. In this case, the coevolutionary relationship is more parasitic as the mimic benefits from the ‘reference’ but the favour is not returned.

Batesian mimicry figure
An example of Batesian mimicry, with hoverflies and wasps. Hoverflies are not at all toxic, and are generally harmless; however, by mimicking the clear bright yellow warning systems of more dangerous species like wasps and bees, they avoid being eaten by predators such as birds.

Coevolution of species and the importance of species interactions

There are countless of other species interactions which could drive coevolutionary relationships in nature. These can include various forms of symbiosis, or the response of different species to ecosystem engineers: that is, species that can change and shape the environment around them (such as corals in reef systems). Understanding how a species evolves within its environment thus needs to consider how many other local species are also evolving and responding in their own ways.

 

 

Moving right along: dispersal and population structure

The impact of species traits on evolution

Although we often focus on the genetic traits of species in molecular ecology studies, the physiological (or phenotypic) traits are equally as important in shaping their evolution. These different traits are not only the result themselves of evolutionary forces but may further drive and shape evolution into the future by changing how an organism interacts with the environment.

There are a massive number of potential traits we could focus on, each of which could have a large number of different (and interacting) impacts on evolution. One that is often considered, and highly relevant for genetic studies, is the influence of dispersal capability.

Dispersal

Dispersal is essentially the process of an organism migrating to a new habitat, to the point of the two being used almost interchangeably. Often, however, we regard dispersal as a migration event that actually has genetic consequences; particularly, if new populations are formed or if organisms move from one population to another. This can differ from straight migration in that animals that migrate might not necessarily breed (and thus pass on genes) into a new region during their migration; thus, evidence of those organisms will not genetically proliferate into the future through offspring.

Naturally, the ability of organisms to disperse is highly variable across the tree of life and reliant on a number of other physiological factors. Marine mammals, for example, can disperse extremely far throughout their lifetimes, whereas some very localised species like some insects may not move very far within their lifetime at all. The movement of organisms directly facilitates the movement of genetic material, and thus has significant impacts on the evolution and genetic diversity of species and populations.

Dispersal vs pop structure
The (simplistic) relationship between dispersal capability and one aspect of population genetics, population structure (measured as Fst). As organisms are more capable of dispersing longer distance (or more frequently), the barriers between populations become weaker.

Highly dispersive species

At one end of the dispersal spectrum, we have highly dispersive species. These can move extremely long distances and thus mix genetic material from a wide range of habitats and places into one mostly-cohesive population. Because of this, highly dispersive species often have strong colonising abilities and can migrate into a range of different habitats by tolerating a wide range of conditions. For example, a single whale might hang around Antarctica for part of the year but move to the tropics during other times. Thus, this single whale must be able to tolerate both ends of the temperature spectrum.

As these individuals occupy large ranges, localised impacts are unlikely to critically affect their full distribution. Individual organisms that are occupying an unpleasant space can easily move to a more favourable habitat (provided that one exists). Furthermore, with a large population (which is more likely with highly dispersive species), genetic drift is substantially weaker and natural selection (generally) has a higher amount of genetic diversity to work with. This is, of course, assuming that dispersal leads to a large overall population, which might not be the case for species that are critically endangered (such as the cheetah).

Highly dispersive animals often fit the “island model” of Wright, where individual subpopulations all have equal proportions of migrants from all other subpopulations. In reality, this is rare (or unreasonable) due to environmental or physiological limitations of species; distance, for example, is not implicitly factored into the basic island model.

Island model
The Wright island model of population structure. In this example, different independent populations are labelled in the bold letters, with dispersal pathways demonstrated by the different arrows. In the island model, dispersal is equally likely between all populations (including from BD in this example, even though there aren’t any arrows showing it). Naturally, this is not overly realistic and so the island model is used mostly as a neutral, base model.

Intermediately dispersing species

A large number of species, however, are likely to occupy a more intermediate range of dispersal ability. These species might be able to migrate to neighbouring populations, or across a large proportion of their geographic range, but individuals from one end of the range are still somewhat isolated from individuals at the other end.

This often leads to some effect of population structure; different portions of the geographic range are genetically segregated from one another depending on how much gene flow (i.e. dispersal) occurs between populations. In the most simplest scenario, this can lead to what we call isolation-by-distance. Rather than forming totally independent populations, gene flow occurs across short ranges between adjacent ‘populations’. This causes a gradient of genetic differentiation, with one end of the range being clearly genetically different to the other end, with a gradual slope throughout the range. We see this often in marine invertebrates, for example, which might have somewhat localised dispersal but still occupy a large range by following oceanographic currents.

River IDB network
An example of how an isolation-by-distance population network might come about. In this example, we have a series of populations (the different pie charts) spread throughout a river system (that blue thing). The different pie charts represent how much of the genetics of that population matches one end of the river: either the blue end (left) or red end (right). Populations can easily disperse into adjacent populations (the green arrows) but less so to further populations. This leads to gradual changes across the length of the river, with the far ends of the river clearly genetically distinct from the opposite end but relatively similar to neighbouring populations.
River IDB pop structure.jpg
The genetic representation of the above isolation-by-distance example. Each column represents a single population (in the previous figure, a pie chart), with the different colours also representing the relative genetic identity of that population. As you can see, moving from Population 1 to 10 leads to a gradient (decreasing) in blue genes but increase in red genes. The inverse can be said moving in the opposite direction. That said, comparing Population 1 and Population 10 shows that they’re clearly different, although there is no clear cut-off point across the range of other populations.

Medium dispersal capabilities are also often a requirement for forming ‘metapopulations’. In this population arrangement, several semi-independent populations are present within the geographic range of the species. Each of these are subject to their own local environmental pressures and demographic dynamics, and because of this may go locally extinct at any given time. However, dispersal connections between many of these populations leads to recolonization and gene flow patterns, allowing for extinction-dispersal dynamics to sustain the overall metapopulation. Generally, this would require greater levels of dispersal than those typically found within metapopulation species, as individuals must traverse uninhabitable regions relatively frequently to recolonise locally extinct habitat.

Metapopulation structure.jpg
An example of metapopulation dynamics. Different subpopulations (lettered circles) are connected via dispersal (arrows). These different subpopulations can be different sizes and are mostly independent of one another, meaning that a single subpopulation can go locally extinct (the red X) without collapsing the entire system. The different dispersal pathways mean that one population can recolonise extinct habitat and essentially ‘rebirth’ other subpopulations (the green arrows).

Weakly dispersing species

At the far opposite end of the dispersal ability spectrum, we have low dispersal species. These are often localised, endemic species that for various reasons might be unable to travel very far at all; for some, they may spend their entire adult life in a sedentary form. The lack of dispersal lends to very strong levels of population structure, and individual populations often accumulate genetic differences relatively quickly due to genetic drift or local adaptation.

Species with low dispersal capabilities are often at risk of local extinction and are unable to easily recolonise these habitats after the event has ended. Their movement is often restricted to rare environmental events such as flooding that carry individuals long distances despite their physiological limitations. Because of this, low dispersal species are often at greater risk of total extinction and extinction vertices than their higher dispersing counterparts.

Accounting for dispersal in population genetics

Incorporating biological and physiological aspects of our study taxa is important for interpreting the evolutionary context of species. Dispersal ability is but one of many characteristics that can influence the ability of species to respond to selective pressures, and the context in which this natural selection occurs. Thus, understanding all aspects of an organism is important in building the full picture of their evolution and future prospects.

An identity crisis: using genomics to determine species identities

This is the fourth (and final) part of the miniseries on the genetics and process of speciation. To start from Part One, click here.

In last week’s post, we looked at how we can use genetic tools to understand and study the process of speciation, and particularly the transition from populations to species along the speciation continuum. Following on from that, the question of “how many species do I have?” can be further examined using genetic data. Sometimes, it’s entirely necessary to look at this question using genetics (and genomics).

Cryptic species

A concept that I’ve mentioned briefly previously is that of ‘cryptic species’. These are species which are identifiable by their large genetic differences, but appear the same based on morphological, behavioural or ecological characteristics. Cryptic species often arise when a single species has become fragmented into several different populations which have been isolated for a long time from another. Although they may diverge genetically, this doesn’t necessarily always translate to changes in their morphology, ecology or behaviour, particularly if these are strongly selected for under similar environmental conditions. Thus, we need to use genetic methods to be able to detect and understand these species, as well as later classify and describe them.

Cryptic species fish
An example of cryptic species. All four fish in this figure are morphologically identical to one another, but they differ in their underlying genetic variation (indicated by the different colours of DNA). Thus, from looking at these fish alone we would not perceive any differences, but their genetic make-up might suggest that there are more than one species…
Cryptic species heatmap example
The level of genetic differentiation between the fish in the above example. The phylogenies on the left and top of the figure demonstrate the evolutionary relationships of these four fish. The matrix shows a heatmap of the level of differences between different pairwise comparisons of all four fish: red squares indicate zero genetic differences (such as when comparing a fish to itself; the middle diagonal) whilst yellow squares indicate increasingly higher levels of genetic differentiation (with bright yellow = all differences). By comparing the different fish together, we can see that Fish 1 and 2, and Fish 3 and 4, are relatively genetically similar to one another (red-deep orange). However, other comparisons show high level of genetic differences (e.g. 1 vs 3 and 1 vs 4). Based on this information, we might suggest that Fish 1 and 2 belong to one cryptic species (A) and Fish 3 and 4 belong to a second cryptic species (B).

Genetic tools to study species: the ‘Barcode of Life’

A classically employed method that uses DNA to detect and determine species is referred to as the ‘Barcode of Life’. This uses a very specific fragment of DNA from the mitochondria of the cell: the cytochrome c oxidase I gene, CO1. This gene is made of 648 base pairs and is found pretty well universally: this and the fact that CO1 evolves very slowly make it an ideal candidate for easily testing the identity of new species. Additionally, mitochondrial DNA tends to be a bit more resilient than its nuclear counterpart; thus, small or degraded tissue samples can still be sequenced for CO1, making it amenable to wildlife forensics cases. Generally, two sequences will be considered as belonging to different species if they are certain percentage different from one another.

Annotated mitogeome
The full (annotated) mitochondrial genome of humans, with the different genes within it labelled. The CO1 gene is labelled with the red arrow (sometimes also referred to as COX1) whilst blue arrows point to other genes often used in phylogenetic or taxonomic studies, depending on the group or species in question.

Despite the apparent benefits of CO1, there are of course a few drawbacks. Most of these revolve around the mitochondrial genome itself. Because mitochondria are passed on from mother to offspring (and not at all from the father), it reflects the genetic history of only one sex of the species. Secondly, the actual cut-off for species using CO1 barcoding is highly contentious and possibly not as universal as previously suggested. Levels of sequence divergence of CO1 between species that have been previously determined to be separate (through other means) have varied from anywhere between 2% to 12%. The actual translation of CO1 sequence divergence and species identity is not all that clear.

Gene tree – species tree incongruences

One particularly confounding aspect of defining species based on a single gene, and with using phylogenetic-based methods, is that the history of that gene might not actually be reflective of the history of the species. This can be a little confusing to think about but essentially leads to what we call “gene tree – species tree incongruence”. Different evolutionary events cause different effects on the underlying genetic diversity of a species (or group of species): while these may be predictable from the genetic sequence, different parts of the genome might not be as equally affected by the same exact process.

A classic example of this is hybridisation. If we have two initial species, which then hybridise with one another, we expect our resultant hybrids to be approximately made of 50% Species A DNA and 50% Species B DNA (if this is the first generation of hybrids formed; it gets a little more complicated further down the track). This means that, within the DNA sequence of the hybrid, 50% of it will reflect the history of Species A and the other 50% will reflect the history of Species B, which could differ dramatically. If we randomly sample a single gene in the hybrid, we will have no idea if that gene belongs to the genealogy of Species A or Species B, and thus we might make incorrect inferences about the history of the hybrid species.

Gene tree incongruence figure
A diagram of gene tree – species tree incongruence. Each individual coloured line represents a single gene as we trace it back through time; these are mostly bound within the limits of species divergences (the black borders). For many genes (such as the blue ones), the genes resemble the pattern of species divergences very well, albeit with some minor differences in how long ago the splits happened (at the top of the branches). However, the red genes contrast with this pattern, with clear movement across species (from and into B): this represents genes that have been transferred by hybridisation. The green line represents a gene affected by what we call incomplete lineage sorting; that is, we cannot trace it back far enough to determine exactly how/when it initially diverged and so there are still two separate green lines at the very top of the figure. You can think of each line as a separate phylogenetic tree, with the overarching species tree as the average pattern of all of the genes.

There are a number of other processes that could similarly alter our interpretations of evolutionary history based on analysing the genetic make-up of the species. The best way to handle this is simply to sample more genes: this way, the effect of variation of evolutionary history in individual genes is likely to be overpowered by the average over the entire gene pool. We interpret this as a set of individual gene trees contained within a species tree: although one gene might vary from another, the overall picture is clearer when considering all genes together.

Species delimitation

In earlier posts on The G-CAT, I’ve discussed the biogeographical patterns unveiled by my Honours research. Another key component of that paper involved using statistical modelling to determine whether cryptic species were present within the pygmy perches. I didn’t exactly elaborate on that in that section (mostly for simplicity), but this type of analysis is referred to as ‘species delimitation’. To try and simplify complicated analyses, species delimitation methods evaluate possible numbers and combinations of species within a particular dataset and provides a statistical value for which configuration of species is most supported. One program that employs species delimitation is Bayesian Phylogenetics and Phylogeography (BPP): to do this, it uses a plethora of information from the genetics of the individuals within the dataset. These include how long ago the different populations/species separated; which populations/species are most related to one another; and a pre-set minimum number of species (BPP will try to combine these in estimations, but not split them due to computational restraints). This all sounds very complex (and to a degree it is), but this allows the program to give you a statistical value for what is a species and what isn’t based on the genetics and statistical modelling.

Vittata cryptic species
The cryptic species of pygmy perches identified within my research paper. This represents part of the main phylogenetic tree result, with the estimates of divergence times from other analyses included. The pictures indicate the physiology of the different ‘species’: Nannoperca pygmaea is morphologically different to the other species of Nannoperca vittata. Species delimitation analysis suggested all four of these were genetically independent species; at the very least, it is clear that there must be at least 2 species of Nannoperca vittata since is more related to N. pygmaea than to other N. vittata species. Photo credits: N. vittata = Chris Lamin; N. pygmaea = David Morgan.

The end result of a BPP run is usually reported as a species tree (e.g. a phylogenetic tree describing species relationships) and statistical support for the delimitation of species (0-1 for each species). Because of the way the statistical component of BPP works, it has been found to give extremely high support for species identities. This has been criticised as BPP can, at time, provide high statistical support for genetically isolated lineages (i.e. divergent populations) which are not actually species.

Improving species identities with integrative taxonomy

Due to this particular drawback, and the often complex nature of species identity, using solely genetic information such as species delimitation to define species is extremely rare. Instead, we use a combination of different analytical techniques which can include genetic-based evaluations to more robustly assign and describe species. In my own paper example, we suggested that up to three ‘species’ of N. vittata that were determined as cryptic species by BPP could potentially exist pending on further analyses. We did not describe or name any of the species, as this would require a deeper delve into the exact nature and identity of these species.

As genetic data and analytical techniques improve into the future, it seems likely that our ability to detect and determine species boundaries will also improve. However, the additional supported provided by alternative aspects such as ecology, behaviour and morphology will undoubtedly be useful in the progress of taxonomy.