Unravelling the evolutionary history of organisms – one of the main goals of phylogenetic research – remains a challenging prospect due to a number of theoretical and analytical aspects. Particularly, trying to reconstruct evolutionary patterns based on current genetic data (the most common way phylogenetic trees are estimated) is prone to the erroneous influence of some secondary factors. One of these is referred to as ‘incomplete lineage sorting’, which can have a major effect on how phylogenetic relationships are estimated and the statistical confidence we may have around these patterns. Today, we’re going to take a look at incomplete lineage sorting (shortened to ILS for brevity herein) using a game-based analogy – a Pachinko machine. Or, if you’d rather, the same general analogy also works for those creepy clown carnival games, but I prefer the less frightening alternative.
To expand on this, we’re going to look at a few different models of how the spatial distribution of populations influences their divergence, and particularly how these factor into different processes of speciation.
What comes first, ecological or genetic divergence?
The order of these two processes have been in debate for some time, and different aspects of species and the environment can influence how (or if) these processes occur.
Different spatial models of speciation
Generally, when we consider the spatial models for speciation we divide these into distinct categories based on the physical distance of populations from one another. Although there is naturally a lot of grey area (as there is with almost everything in biological science), these broad concepts help us to define and determine how speciation is occurring in the wild.
A step closer in bringing populations geographically together in speciation is “parapatry” and “peripatry”. Parapatric populations are often geographically close together but not overlapping: generally, the edges of their distributions are touching but do not overlap one another. A good analogy would be to think of countries that share a common border. Parapatry can occur when a species is distributed across a broad area, but some form of narrow barrier cleaves the distribution in two: this can be the case across particular environmental gradients where two extremes are preferred over the middle.
This can be tricky to visualise, so let’s invent an example. Say we have a tropical island, which is occupied by one bird species. This bird prefers to eat the large native fruit of the island, although there is another fruit tree which produces smaller fruits. However, there’s only so much space and eventually there are too many birds for the number of large fruit trees available. So, some birds are pushed to eat the smaller fruit, and adapt to a different diet, changing physiology over time to better acquire their new food and obtain nutrients. This shift in ecological niche causes the two populations to become genetically separated as small-fruit-eating-birds interact more with other small-fruit-eating-birds than large-fruit-eating-birds. Over time, these divergences in genetics and ecology causes the two populations to form reproductively isolated species despite occupying the same island.
As you can see, the processes and context driving speciation are complex to unravel and many factors play a role in the transition from population to species. Understanding the factors that drive the formation of new species is critical to understanding not just how evolution works, but also in how new diversity is generated and maintained across the globe (and how that might change in the future).
One particular distinction we need to make early here is the difference between allele frequency and allele identity. In these analyses, often we are working with the same alleles (i.e. particular variants) across our populations, it’s just that each of these populations may possess these particular alleles in different frequencies. For example, one population may have an allele (let’s call it Allele A) very rarely – maybe only 10% of individuals in that population possess it – but in another population it’s very common and perhaps 80% of individuals have it. This is a different level of differentiation than comparing how different alleles mutate (as in the coalescent) or how these mutations accumulate over time (like in many phylogenetic-based analyses).
Fixed differences are sometimes used as a type of diagnostic trait for species. This means that each ‘species’ has genetic variants that are not shared at all with its closest relative species, and that these variants are so strongly under selection that there is no diversity at those loci. Often, fixed differences are considered a level above populations that differ by allelic frequency only as these alleles are considered ‘diagnostic’ for each species.
To distinguish between the two, we often use the overall frequency of alleles in a population as a basis for determining how likely two individuals share an allele by random chance. If alleles which are relatively rare in the overall population are shared by two individuals, we expect that this similarity is due to family structure rather than population history. By factoring this into our relatedness estimates we can get a more accurate overview of how likely two individuals are to be related using genetic information.
The wild world of allele frequency
Despite appearances, this is just a brief foray into the many applications of allele frequency data in evolution, ecology and conservation studies. There are a plethora of different programs and methods that can utilise this information to address a variety of scientific questions and refine our investigations.
Meaning: Cinis: from [ash] in Latin; descendens from [descends] in Latin.
Translation: descending from the ash; describes hunting behaviour in ash mountains of Vvardenfell.
Kingdom Animalia; Phylum Chordata; Class Aves; Subclass Archaeornithes; Family Vvardidae; GenusCinis; Speciesdescendens
Least Concern [circa 3E 427]
Threatened [circa 4E 433]
Once widespread throughout the north eastern region of Tamriel, occupying regions from the island of Vvardenfell to mainland Morrowind and Solstheim. Despite their name, the cliff racer is found across nearly all geographic regions of Vvardenfell, although the species is found in greatest densities in the rocky interior region of Stonefalls.
Following a purge of the species as part of pest control management, the cliff racer was effectively exterminated from parts of its range, including local extinction on the island of Solstheim. Since the cull the cliff racer is much less abundant throughout its range although still distributed throughout much of Vvardenfell and mainland Morrowind.
Although, much as the name suggests, the cliff racer prefers rocky outcroppings and mountainous regions in which it can build its nest, the species is frequently seen in lowland swamp and plains regions of Morrowind.
Behaviour and ecology
The cliff racer is a highly aggressive ambush predator, using height and range to descend on unsuspecting victims and lashing at them with its long, sharp tail. Although preferring to predate on small rodents and insects (such as kwama), cliff racers have been known to attack much larger beasts such as agouti and guar if provoked or desperate. The highly territorial nature of cliff racer means that they often attack travellers, even if they pose no immediate threat or have done nothing to provoke the animal.
Despite the territoriality of cliff racers, large flocks of them can often be found in the higher altitude regions of Vvardenfell, perhaps facilitated by an abundance of food (reducing competition) or communal breeding grounds. Attempts by researchers to study these aggregations have been limited due to constant attacks and damage to equipment by the flock.
Following the control measures implemented, the population size of these populations of cliff racers declined severely; however, given the survival of the majority of the population it does not appear this bottleneck has severely impacted the longevity of the species. The extirpation of the Solstheim population of cliff racers likely removed a unique ESU from the species, given the relative isolation of the island. Whether the island will be recolonised in time by Vvardenfell cliff racers is unknown, although the presence of any cliff racers back onto Solstheim would likely be met with strong opposition from the local peoples.
The broad wings, dorsal sail and long tail allow the cliff racer to travel large distances in the air, serving them well in hunting behaviour. The drawback of this is that, if hunting during the middle hours of the day, the cliff racer leaves an imposing shadow on the ground and silhouette in the sky, often alerting aware prey to their presence. That said, the speed of descent and disorienting cry of the animal often startles prey long enough for the cliff racer to attack.
The plumes of the cliff racer are a well-sought-after commodity by local peoples, used in the creation of garments and household items. Whether these plumes serve any adaptive purpose (such as sexual selection through mate signalling) is unknown, given the difficulties with studying wild cliff racer behaviour.
Although suffering from a strong population bottleneck after the purge, the cliff racer is still relatively abundant across much of its range and maintains somewhat stable size. Management and population control of the cliff racer is necessary across the full distribution of the species to prevent strong recovery and maintain public safety and ecosystem balance. Breeding or rescuing cliff racers is strictly forbidden and the species has been widely declared as ‘native pest’, despite the somewhat oxymoron nature of the phrase.
There are a massive number of potential traits we could focus on, each of which could have a large number of different (and interacting) impacts on evolution. One that is often considered, and highly relevant for genetic studies, is the influence of dispersal capability.
Dispersal is essentially the process of an organism migrating to a new habitat, to the point of the two being used almost interchangeably. Often, however, we regard dispersal as a migration event that actually has genetic consequences; particularly, if new populations are formed or if organisms move from one population to another. This can differ from straight migration in that animals that migrate might not necessarily breed (and thus pass on genes) into a new region during their migration; thus, evidence of those organisms will not genetically proliferate into the future through offspring.
As these individuals occupy large ranges, localised impacts are unlikely to critically affect their full distribution. Individual organisms that are occupying an unpleasant space can easily move to a more favourable habitat (provided that one exists). Furthermore, with a large population (which is more likely with highly dispersive species), genetic drift is substantially weaker and natural selection (generally) has a higher amount of genetic diversity to work with. This is, of course, assuming that dispersal leads to a large overall population, which might not be the case for species that are critically endangered (such as the cheetah).
A large number of species, however, are likely to occupy a more intermediate range of dispersal ability. These species might be able to migrate to neighbouring populations, or across a large proportion of their geographic range, but individuals from one end of the range are still somewhat isolated from individuals at the other end.
Species with low dispersal capabilities are often at risk of local extinction and are unable to easily recolonise these habitats after the event has ended. Their movement is often restricted to rare environmental events such as flooding that carry individuals long distances despite their physiological limitations. Because of this, low dispersal species are often at greater risk of total extinction and extinction vertices than their higher dispersing counterparts.
Accounting for dispersal in population genetics
Incorporating biological and physiological aspects of our study taxa is important for interpreting the evolutionary context of species. Dispersal ability is but one of many characteristics that can influence the ability of species to respond to selective pressures, and the context in which this natural selection occurs. Thus, understanding all aspects of an organism is important in building the full picture of their evolution and future prospects.
One of the most fundamental aspects of natural selection and evolution is, of course, the underlying genetic traits that shape the physical, selected traits. Most commonly, this involves trying to understand how changes in the distribution and frequencies of particular genetic variants (alleles) occur in nature and what forces of natural election are shaping them. Remember that natural selection acts directly on the physical characteristics of species; if these characteristics are genetically-determined (which many are), then we can observe the flow-on effects on the genetic diversity of the target species.
Although we might expect that natural selection is a fairly predictable force, there are a myriad of ways it can shape, reduce or maintain genetic diversity and identity of populations and species. In the following examples, we’re going to assume that the mentioned traits are coded for by a single gene with two different alleles for simplicity. Thus, one allele = one version of the trait (and can be used interchangeably). With that in mind, let’s take a look at the three main broad types of changes we observe in nature.
Arguably the most traditional perspective of natural selection is referred to as ‘directional selection’. In this example, nature selection causes one allele to be favoured more than another, which causes it to increase dramatically in frequency compared to the alternative allele. The reverse effect (natural selection pushing against a maladaptive allele) is still covered by directional selection, except that it functions in the opposite way (the allele under negative selection has reduced frequency, shifting towards the alternative allele).
Natural selection doesn’t always push allele frequencies into different directions however, and sometimes maintains the diversity of alleles in the population. This is what happens in ‘balancing selection’ (sometimes also referred to as ‘stabilising selection’). In this example, natural selection favours non-extreme allele frequencies, and pushes the distribution of allele frequencies more to the centre. This may happen if deviations from the original gene, regardless of the specific change, can have strongly negative effects on the fitness of an organism, or in genes that are most fit when there is a decent amount of variation within them in the population (such as the MHC region, which contributes to immune response). There are a couple other reasons balancing selection may occur, though.
One example is known as ‘heterozygote advantage’. This is when an organism with two different alleles of a particular gene has greater fitness than an organism with two identical copies of either allele. A seemingly bizarre example of heterozygote advantage is related to sickle cell anaemia in African people. Sickle cell anaemia is a serious genetic disorder which is encoded for by recessive alleles of a haemoglobin gene; thus, a person has to carry two copies of the disease allele to show damaging symptoms. While this trait would ordinarily be strongly selected against in many population, it is maintained in some African populations by the presence of malaria. This seems counterintuitive; why does the presence of one disease maintain another?
Well, it turns out that malaria is not very good at infecting sickle cells; there are a few suggested mechanisms for why but no clear single answer. Naturally, suffering from either sickle cell anaemia or malaria is unlikely to convey fitness benefits. In this circumstance, natural selection actually favours having one sickle cell anaemia allele; while being a carrier isn’t ordinarily as healthy as having no sickle cell alleles, it does actually make the person somewhat resistant to malaria. Thus, in populations where there is a selective pressure from malaria, there is a heterozygote advantage for sickle cell anaemia. For those African populations without likely exposure to malaria, sickle cell anaemia is strongly selected against and less prevalent.
Another form of balancing selection is called ‘frequency-dependent selection’, where the fitness of an allele is inversely proportional to its frequency. Thus, once the allele has become common due to selection, the fitness of that allele is reduced and selection will start to favour the alternative allele (which is at much lower frequency). The constant back-and-forth tipping of the selective scales results in both alleles being maintained at an equilibrium.
This can happen in a number of different ways, but often the rarer trait/allele is fundamentally more fit because of its rarity. For example, if one allele allows an individual to use a new food source, it will be very selectively fit due to the lack of competition with others. However, as that allele accumulates within the population and more individuals start to feed on that food source, the lack of ‘uniqueness’ will mean that it’s not particularly better than the original food source. A balance between the two food sources (and thus alleles) will be maintained over time as shifts towards one will make the other more fit, and natural selection will compensate.
A third category of selection (although not as frequently mentioned) is known as ‘disruptive selection’, which is essentially the direct opposite of balancing selection. In this case, both extremes of allele frequencies are favoured (e.g. 1 for one allele or 1 for the other) but intermediate frequencies are not. This can be difficult to untangle in natural populations since it could technically be attributed to two different cases of directional selection. Each allele of the same gene is directionally selected for, but in opposite populations and directions so that overall pattern shows very little intermediates.
In direct contrast to balancing selection, disruptive selection can often be a case of heterozygote disadvantage (although it’s rarely called that). In these examples, it may be that individuals which are not genetically committed to one end or the other of the frequency spectrum are maladapted since they don’t fit in anywhere. An example would be a species that occupies both the desert and a forested area, with little grassland-type habitat in the middle. For the relevant traits, strongly desert-adapted genes would be selected for in the desert and strongly forest-adapted genes would be selected for in the forest. However, the lack of gradient between the two habitats means that individuals that are half-and-half are less adaptive in both the desert and the forest. A case of jack-of-all-trades, master of none.
Direction of selection
Although it would be convenient if natural selection was entirely predictable, it often catches up by surprise in how it acts and changes species and populations in the wild. Careful analysis and understanding of the different processes and outcomes of adaptation can feed our overall understanding of evolution, and aid in at least pointing in the right direction for our predictions.
Adaptation and evolution by natural selection remains one of the most significant research questions in many disciplines of biology, and this is undoubtedly true for molecular ecology. While traditional evolutionary studies have been based on the physiological aspects of organisms and how this relates to their evolution, such as how these traits improve their fitness, the genetic component of adaptation is still somewhat elusive for many species and traits.
Hunting for adaptive genes in the genome
We’ve previously looked at the two main categories of genetic variation: neutral and adaptive. Although we’ve focused predominantly on the neutral components of the genome, and the types of questions about demographic history, geographic influences and the effect of genetic drift, they cannot tell us (directly) about the process of adaptation and natural selective changes in species. To look at this area, we’d have to focus on adaptive variation instead; that is, genes (or other related genetic markers) which directly influence the ability of a species to adapt and evolve. These are directly under natural selection, either positively (‘selected for’) or negatively (‘selected against’).
Given how complex organisms, the environment and genomes can be, it can be difficult to determine exactly what is a real (i.e. strong) selective pressure, how this is influenced by the physical characteristics of the organism (the ‘phenotype’) and which genes are fundamental to the process (the ‘genotype’). Even determining the relevant genes can be difficult; how do we find the needle-like adaptive genes in a genomic haystack?
There’s a variety of different methods we can use to find adaptive genetic variation, each with particular drawbacks and strengths. Many of these are based on tests of the frequency of alleles, rather than on the exact genetic changes themselves; adaptation works more often by favouring one variant over another rather than completely removing the less-adaptive variant (this would be called ‘fixation’). So measuring the frequency of different alleles is a central component of many analyses.
Generally, FST reflects neutral genetic structure: it gives a background of how, on average, different are two populations. However, if we know what the average amount of genetic differentiation should be for a neutral DNA marker, then we would predict that adaptive markers are significantly different. This is because a gene under selection should be more directly pushed towards or away from one variant (allele) than another, and much more strongly than the neutral variation would predict. Thus, the alleles that are way more or less frequent than the average pattern we might assume are under selection. This is the basis of the FST outlier test; by comparing two or more populations (using FST), and looking at the distribution of allele frequencies, we can pick out a few alleles that vary from the average pattern and suggest that they are under selection (i.e. are adaptive).
Secondly, the cut-off for a ‘significant’ vs. ‘relatively different but possibly not under selection’ can be a bit arbitrary; some genes that are under weak selection can go undetected. Furthermore, recent studies have shown a growing appreciation for polygenic adaptation, wheretiny changes in allele frequencies of many different genescombine together to cause strong evolutionary changes. For example, despite the clear heritable nature of height (tall people often have tall children), there is no clear ‘height’ gene: instead, it appears that hundreds of genes are potentially very minor height contributors.
To overcome these biases, sometimes we might take a more methodological approach called ‘genotype-environment association’. This analysis differs in that we select what we think our selective pressures are: often environmental characteristics such as rainfall, temperature, habitat type or altitude. We then take two types of measures per individual organism: the genotype, through DNA sequencing, and the relevant environmental values for that organisms’ location. We repeat this over the full distribution of the species, taking a good number of samples per population and making sure we capture the full variation in the environment. Then we perform a correlation-type analysis, which seeks to see if there’s a connection or trend between any particular alleles and any environmental variables. The most relevant variables are often pulled out of the environmental dataset and focused on to reduce noise in the data.
The main benefit of GEA over FST outlier tests is that it’s unlikely to be as strongly influenced by genetic drift. Unless (coincidentally) populations are drifting at the same genes in the same pattern as the environment, the analysis is unlikely to falsely pick it up. However, it can still be confounded by neutral population structure; if one population randomly has a lot of unique alleles or variation, and also occurs in a somewhat unique environment, it can bias the correlation. Furthermore, GEA is limited by the accuracy and relevance of the environmental variables chosen; if we pick only a few, or miss the most important ones for the species, we won’t be able to detect a large number of very relevant (and likely very selective) genes. This is a universal problem in model-based approaches and not just limited to GEA analysis.
New spells to find adaptive genes?
It seems likely that with increasing datasets and better analytical platforms, many more types of analysis will be developed to delve deeper into the adaptive aspects of the genome. With whole-genome sequencing starting to become a reality for non-model species, better annotation of current genomes and a steadily increasing database of functional genes, the ability of researchers to investigate evolution and adaptation at the genomic level is also increasing.
Often, we like to think of evolution fairly anthropomorphically; as if natural selection actively decides what is, and what isn’t, best for the evolution of a species (or population). Of course, there’s not some explicit Evolution God who decrees how a species should evolve, and in reality, evolution reflects a more probabilistic system. Traits that give a species a better chance of reproducing or surviving, and can be inherited by the offspring, will over time become more and more dominant within the species; contrastingly, traits that do the opposite will be ‘weeded out’ of the gene pool as maladaptive organisms die off or are outcompeted by more ‘fit’ individuals. The fitness value of a trait can be determined from how much the frequency of that trait varies over time.
So, if natural selection is just probabilistic, does this mean evolution is totally random? Is it just that traits are selected based on what just happens to survive and reproduce in nature, or are there more direct mechanisms involved? Well, it turns out both processes are important to some degree. But to get into it, we have to explain the difference between genetic drift and natural selection (we’re assuming here that our particular trait is genetically determined).
When we consider the genetic variation within a species to be our focal trait, we can tell that different parts of the genome might be more related with natural selection than others. This makes sense; some mutations in the genome will directly change a trait (like fur colour) which might have a selective benefit or detriment, while others might not change anything physically or change traits that are neither here-nor-there under natural selection (like nose shape in people, for example). We can distinguish between these two by talking about adaptive or neutral variation; adaptive variation has a direct link to natural selection whilst neutral variation is predominantly the product of genetic drift. Depending on our research questions, we might focus on one type of variation over the other, but both are important components of evolution as a whole.
Genetic driftis considered the random, selectively ‘neutral’ changes in the frequencies of different traits (alleles) over time, due to completely random effects such as random mutations or random loss of alleles. This results in the neutral variation we can observe in the gene pool of the species. Changes in allele frequencies can happen due to entirely stochastic events. If, by chance, all of the individuals with the blue fur variant of a gene are struck by lightning and die, the blue fur allele would end up with a frequency of 0 i.e. go extinct. That’s not to say the blue fur ‘predisposed’ the individuals to be struck be lightning (we assume here, anyway), so it’s not like it was ‘targeted against’ by natural selection (see the bottom figure for this example).
Contrastingly to genetic drift, natural selectionis when particular traits are directly favoured (or unfavoured) in the environmental context of the population; natural selection is very specific to both the actual trait and how the trait works. A trait is only selected for if it conveys some kind of fitness benefit to the individual; in evolutionary genetics terms, this means it allows the individual to have more offspring or to survive better (usually).
While this might be true for a trait in a certain environment, in another it might be irrelevant or even have the reverse effect. Let’s again consider white fur as our trait under selection. In an arctic environment, white fur might be selected for because it helps the animal to camouflage against the snow to avoid predators or catch prey (and therefore increase survivability). However, in a dense rainforest, white fur would stand out starkly against the shadowy greenery of the foliage and thus make the animal a target, making it more likely to be taken by a predator or avoided by prey (thus decreasing survivability). Thus, fitness is very context-specific.
Who wins? Drift or selection?
So, which is mightier, the pen (drift) or the sword (selection)? Well, it depends on a large number of different factors such as mutation rate, the importance of the trait under selection, and even the size of the population. This last one might seem a little different to the other two, but it’s critically important to which process governs the evolution of the species.
In very small populations, we expect genetic drift to be the stronger process. Natural selection is often comparatively weaker because small populations have less genetic variation for it to act upon; there are less choices for gene variants that might be more beneficial than others. In severe cases, many of the traits are probably very maladaptive, but there’s just no better variant to be selected for; look at the plethora of physiological problems in the cheetah for some examples.
Genetic drift, however, doesn’t really care if there’s “good” or “bad” variation, since it’s totally random. That said, it tends to be stronger in smaller populations because a small, random change in the number or frequency of alleles can have a huge effect on the overall gene pool. Let’s say you have 5 cats in your species; they’re nearly extinct, and probably have very low genetic diversity. If one cat suddenly dies, you’ve lost 20% of your species (and up to that percentage of your genetic variation). However, if you had 500 cats in your species, and one died, you’d lose only <0.2% of your genetic variation and the gene pool would barely even notice. The same applies to random mutations, or if one unlucky cat doesn’t get to breed because it can’t find a mate, or any other random, non-selective reason. One way we can think of this is as ‘random error’ with evolution; even a perfectly adapted organism might not pass on its genes if it is really unlucky. A bigger sample size (i.e. more individuals) means this will have less impact on the total dataset (i.e. the species), though.
Both genetic drift and natural selection are important components of evolution, and together shape the overall patterns of evolution for any given species on the planet. The two processes can even feed into one another; random mutations (drift) might become the genetic basis of new selective traits (natural selection) if the environment changes to suit the new variation. Therefore, to ignore one in favour of the other would fail to capture the full breadth of the processes which ultimately shape and determine the evolution of all species on Earth, and thus the formation of the diversity of life.