Why we should always pander to diversity

Diversity in the natural world

‘Diversity’ is a term that gets used a lot these days, albeit usually in reference to social changes and structures. However, diversity is not merely a human construct and reflects an extremely important aspect of the natural world at a variety of levels. From the smallest genes to the biggest ecosystems, diversity is a trait that confers a massive range of benefits to individuals, populations, species and even the entire globe. Let’s dissect this diversity down at different scales and see how beneficial it can be.

Hierarchy of diversity
The generalised hierarchy at life, with diversity being an important component of each tier. At the smallest tier, genes underpin all life. The collection of genetic diversity is often summarised into a population (as a single cohesive genetic unit). Several populations can be pooled together into a single (usually) cohesive speciesDifferent species are then components of a larger community (which in turn are components of a broader ecosystem).

Genetic diversity

At the smallest scale in the hierarchy of genetic differentiation, we have the genes themselves. It is a well-established concept that having a diversity of genetic variants (alleles) within a population or species is critical to their future adaptation, evolution and persistance. This is because different alleles will have different benefits (or costs) depending on the environmental pressure that influences them; natural selection might favour one allele over another at one time, but a different one as the pressure changes. Having a higher number of alleles within the population or species means that there is a greater chance at least a few individuals will possess an adaptive gene with the changing environment (which we know can be quite rapid and very, very strong). The diversity serves as a ‘buffer’ against extinction; evolution by natural selection functions best when there are many options to choose from.

Without this diversity, species run the risk of having no adaptive genes at the ready to deal with a selective pressure. Either a new adaptive gene must mutate (or come about in other ways, such as through gene flow from another population or species) or the population/species will suffer and potentially go extinct. As strong selection causes the species to dwindle, it enters what is referred to as the ‘extinction vortex’. Without genetic diversity, they can’t adapt: thus, more individuals die off, causing more genetic diversity to be lost from the population. This pattern is a vicious cycle which can inevitably destroy species (without serious intervention).

Extinction vortex
A very dramatic representation of the extinction vortex.

For this reason, captive breeding programs aim to maintain as much of the genetic diversity of the original population as possible. This reduces the probability of entering a downward extinction spiral from inbreeding depression and helps to maintain populations into the future (both the captive one and the wild population when we reintroduce individuals into the wild).

“Population”  diversity

Because genetic diversity is critically important for species survival, we must also try to preserve the diversity of the entire gene pool of a species. This means conserving highly genetically differentiated populations within a species as a priority, as they may be the only ones that possess the necessary adaptive genes to save the rest of the species. This adaptive genetic variation can then be introduced into other populations in genetic rescue programs and serve as a means to semi-naturally allow the species to evolve. Evolutionarily-significant units (ESUs) are one measure of the invaluable nature of genetically unique populations.

Although many more traditional conservationists strongly believe that ESUs should be managed entirely independently of one another (to preserve their evolutionary ‘pedigree’ and prevent the risk of outbreeding depression), it has been suggested that the benefit of genetic rescue in many cases significantly outweighs this risk of outbreeding depression. For some species, this really is an act of rescue: they are at the edge of extinction, and if we do nothing we condemn them to die out.

Introducing genetic material across populations (or even species!) can generate new functional genes that allow the recipient species to adapt to selective pressures. This might sound very strange, and could be extremely rare, but examples of adaptive genetic material in one species originating from another species through hybridisation do exist in nature. For example, the black coat of wolves is a highly adaptive trait in some populations and is encoded for by the Melanocortin 1 receptor (Mc1r) gene. However, the specific mutation in Mc1r gene that generates the black coat colour actually first originated in domestic dogs; when wild wolves and domestic dogs interbred, this mutation was transferred into the wolf gene pool. Natural selection strongly favoured this new variant, and it very rapidly underwent strong positive selection. Thus, the adaptiveness of black wolves is thanks to a domestic dog mutation!

Species diversity

At a higher level of the hierarchy, the diversity of species within a particular community or ecosystem has been shown to be important for the health and stability of said community. Every species, however small or seemingly unimpressive, plays a role in the greater ecosystem balance, through interactions with other species (e.g. as predator, as prey, as competitor) and the abiotic environment. While some species are known to have very strong impacts on the immediate ecosystem (often dubbed ‘keystone species’, such as apex predators), all species have some influence on the world around them (we’re especially good at it).

Species interactions flowchart

The overall health and stability of an ecosystem, as well as the benefits it can provide to all living things (including humans) is largely determined by the diversity of species. For example, ‘habitat engineers’ are types of species that, by altering the physical environment around them (such as to build a home), directly provide new habitat for other species. They are a fundamental underpinning of many incredibly vibrant ecosystems; think of what a reef system would look like if there were no corals in it. There’d be no anemones growing colourfully; no fish to live in them; no sharks to feed on these non-existent fish. This is just one example of a complex ecosystem that truly relies on its inhabiting species to function.

Ecosystem jenga
Much like Jenga, taking out one block (a species) could cause the entire stack (the ecosystem) to collapse in on itself. Even if it stands up, however, the system will still be weaker without the full diversity to support it.

Protecting our diversity

Diversity is not just a social construct and is an important phenomenon in nature, at a variety of different levels. Preserving the full diversity of life, from genetic diversity within populations and species to full species diversity within ecosystems, is critical to maintaining healthy and robust natural systems. The more diversity we have at each level of this hierarchy, the greater robustness and security we will have in the future.

The history of histories: philosophy in biogeography

Biogeography of the globe

The distribution of organisms across the Earth, both over time and across space, is a fundamental aspect of the field of biogeography. But our understanding of the mechanisms by which organisms are distributed across the globe, and how this affects their evolution, can be at times highly enigmatic. Why are Australia and the Americas the only two places that have marsupials? How did lemurs get all the way to Madagascar, and why are they the only primate that has made the trip? How did Darwin’s famous finches get over to the Galápagos, and why are there so many species of them there now?

All of these questions can be addressed with a combination of genetic, environmental and ecological information across a variety of timescales. However, the overall field of biogeography (and phylogeography as a derivative of it) has traditionally been largely rooted on a strong yet changing theoretical basis. The earliest discussions and discoveries related to biogeography as a field of science date back to the 18th Century, and to Carl Linnaeus (to whom we owe our binomial classification system) and Alexander von Humboldt. These scientists (and undoubtedly many others of that era) were among the first to notice how organisms in similar climates (e.g. Australia, South Africa and South America) showed similar physical characteristics despite being so distantly separated (both in their groups and geographic distance). The communities of these regions also appeared to be highly similar. So how could this be possible over such huge distances?

Arctic and fennec final
A pretty unreasonable mechanism (and example) of dispersal in foxes. And yes, all tourists wear sunglasses and Hawaiian shirts, even arctic fox ones.

 

Dispersal or vicariance?

Two main explanations for these patterns are possible; dispersal and vicariance. As one might expect, dispersal denotes that an ancestral species was distributed in one of these places (referred to as the ‘centre of origin’) before it migrated and inhabited the other places. Contrastingly, vicariance suggests that the ancestral species was distributed everywhere originally, covering all contemporary ranges within it. However, changes in geography, climate or the formation of other barriers caused the range of the ancestor to fragment, with each fragmented group evolving into its own distinct species (or group of species).

Dispersal vs vicariance islands
An example of dispersal vs. vicariance patterns of biogeography in an island bird (pale blue). In the top example, the sequential separation of parts of the island also cause parts of the distribution of the original bird species to become fragmented. These fragments each evolve independently of their ancestor and form new species (red, and then blue). In the bottom example, the island geography doesn’t change but in rare events a bird disperses from the main island onto a new island. The new selective pressures of that island cause the dispersed birds to evolve into new species (red and blue). In both examples, islands that were recently connected or are easy to disperse across do not generate new species (in the sandy island in the bottom right). You’ll notice that both processes result in the same biogeographic distribution of species.

In initial biogeographic science, dispersal was the most heavily favoured explanation. At the time, there was no clear mechanism by which organisms could be present all over the globe without some form of dispersal: it was generally believed that the world was a static, unmoving system. Dispersal was well supported by some biological evidence such as the diversification of Darwin’s finches across the Galápagos archipelago. Thus, this concept was supported through the proposals of a number of prominent scientists such as Charles Darwin and A.R. Wallace. For others, however, the distance required for dispersal (such as across entire oceans) seemed implausible and biologically unrealistic.

 

A paradigm shift in biogeography

Two particular developments in theory are credited with a paradigm shift in the field; cladistics and plate tectonics. Cladistics simply involved using shared biological characteristics to reconstruct the evolutionary relationships of species (think like phylogenetics, but using physical traits instead of genetic sequence). Just as importantly, however, was plate tectonic theory, which provided a clear way for organisms to spread across the planet. By understanding that, deep in the past, all continents had been directly connected to one another provides a convenient explanation for how species groups spread. Instead of requiring for species to travel across entire oceans, continental drift meant that one widespread and ancient ancestor on the historic supercontinent (Pangaea; or subsequently Gondwana and Laurasia) could become fragmented. It only required that groups were very old, but not necessarily very dispersive.

Lemur dispersal
Surf’s up, dudes! Although continental drift was no doubt an important factor in the distribution and dispersal of many organisms on Earth, it actually probably wasn’t the reason lemurs got to Madagascar. Sorry for the mislead.

From these advances in theory, cladistic vicariance biogeography was born. The field rapidly overtook dispersal as the most likely explanation for biogeographic patterns across the globe by not only providing a clear mechanism to explain these but also an analytical framework to test questions relating to these patterns. Further developments into the analytical backbone of cladistic vicariance allowed for more nuanced questions of biogeography to be asked, although still fundamentally ignored the role of potential dispersals in explaining species’ distributions.

Modern philosophy of biogeography

So, what is the current state of the field? Well, the more we research biogeographic patterns with better data (such as with genomics) the more we realise just how complicated the history of life on Earth can be. Complex modelling (such as Bayesian methods) allow us to more explicitly test the impact of Earth history events on our study species, and can provide more detailed overview of the evolutionary history of the species (such as by directly estimating times of divergence, amount of dispersal, extent of range shifts).

From a theoretical perspective, the consistency of patterns of groups is always in question and exactly what determines what species occurs where is still somewhat debatable. However, the greater number of types of data we can now include (such as geological, paleontological, climatic, hydrological, genetic…the list goes on!) allows us to paint a better picture of life on Earth. By combining information about what we know happened on Earth, with what we know has happened to species, we can start to make links between Earth history and species history to better understand how (or if) these events have shaped evolution.

Surviving the Real-World Apocalypse

The changing world

Climate change seems to be the centrefold of a large amount of scientific research and media attention, and rightly so: it has the capacity to affect every living organism on the planet. It’s our duty as curators and residents of Earth to be responsible for our influences on the global environmental stage. While a significant part of this involves determining causes and solutions to our contributions to climate change, we also need to know how extensive the effects will be: for example, how can we predict how well species will do in the future?

Predicting the effect of climate change on all of the world’s biodiversity is an immense task. Climate change itself is a complicated system, and causes diverse, interconnected and complex alterations to both global and local climate. Adding on top of this, though, is that climate affects different species in different ways; where some species might be sensitive to some climatic variables (such as rainfall, available sunlight, seasonality), others may be more tolerant to the same factors. But all living things share some requirements, so surely there must be some consistency in their responses to climate change, right?

Apocalypse 2
Lucky for Mr Fish here, he’s responding to a (very dramatic) climate change much, much better than his bird counterpart.

How predictable are species responses to climate change?

Well, evidence would surprisingly suggest not. Many species, even closely related ones, can show very different responses to the exact same climatic pressures or biogeographical events. There are a number of different traits that might affect a species’ ability to adapt, particularly their adaptive genetic diversity (which underpins ‘adaptive potential’). Thus, we need good information of a variety of genetic, physiological and life history traits to be able to make predictions about how likely a species is to adapt and respond to future (and current) climate changes.

Although this can be hard to study in species of high extinction risk (getting a good number of samples is always an issue…), traditional phylogeographic methods might help us to make some comparisons. See, although the modern Earth is rapidly changing (undoubtedly influenced by human society), the climate of the globe has always varied to some degree. There has always been some tumultuousness in the climate and specific Earth history events like volcano eruptions, sea-level changes, or glaciation periods (‘ice ages’) have had diverse effects on organisms globally.

Using comparative phylogeography to predict species responses

One tool for looking at how different species have, in the past, responded to the same biogeographical force is the domain of ‘comparative phylogeography’. Phylogeography itself is something we have discussed before: the ‘comparative’ aspect simply means comparing (with complex statistical methods) these patterns across different and often unrelated species to see how universal (‘congruent’) or unique (‘incongruent’) these patterns are among species. The more broadly we look at the species community in the region, the more we can observe widespread effects of any given environmental or geographical event: if we only look at fish, for example, we might not to be able to infer what response mammals, birds or invertebrates have had to our given event. Sometimes this still meets the scale we wish to focus; other times, we want to see how all the species of an area have been affected.

Actual island diagram
An (very busy) example of different species responses to a single environmental event. In this example, we have three species (a fish, a lizard, and a bird) all living on the same island. In the middle of the island, there is a small mountain range (A). At this point in time, all three species are connected across the whole island; fish can travel via lakes and wetlands (green arrows), lizards can travel across the land (blue arrow) and birds can fly anywhere. However, as the mountain range grows with tectonic movements, the waterways are altered and the north and south are disconnected (B). The fish species is now split into two evolutionarily separate groups (green and gold), while lizards and birds are not. As the range expands further, however, the dispersal route for lizards is cut off, causing them to eventually also become separated into blue and black groups (C). Birds, however, have no problems flying over the mountain range and remain one unified and connected orange group over time (D). Thus, each species has a different response to the formation of the mountain range.
Evol history of island diagram
The phylogenetic history of the three different species in the above example. As you can see, each lineage has a slightly different pattern; birds show no divergences at all, whereas the timing of the lizard and fish N/S splits are different (i.e. temporally incongruent).

Typically, comparative phylogeographic studies have looked at the neutral components of species’ evolution (as is the realm of traditional phylogeography). This includes studying the size of populations over time, how well connected they are and were, what their spatial patterns are and how these relate to the environment. Comparing all of these patterns across species can allow us to start painting a fuller picture of the history of biota in a region. In this way, we can start to see exactly which species have shown what responses and start to relate these to the characteristics that allowed them to respond in that certain way (and including adaptation in our studies). So, what kinds of traits are important?

What traits matter? Who wins?

Often, we find that life history traits of an organism better dictates how they will respond to a certain pressure than other factors such as phylogeny (e.g. one group does not always do better than another). Instead, individual species with certain physical characteristics might handle the pressure better than others. For example, a fish, bird and snake that are all able to tolerate higher temperatures than other fish, birds or snakes in that region are more likely to survive a drought. In this case, none of the groups (fish, birds or snakes) inherently do better than the other two groups. Thus, it can be hard to predict how a large swathe of species will respond to any given environmental change, unless we understand the physical characteristics of every species.

Climate change risk flowchart
A generalised framework of various factors, and their interactions, on the vulnerability of species under current and future climate changes by Williams et al. 2018. The schematic includes genetic, ecological, physical and environmental factors and how these can interact with one another to alleviate or exacerbate the risk of extinction.

We can also see that other physiological or ecological traits, such as climatic preferences and tolerance thresholds, can be critical for adapting to climatic pressures. Naturally, the genetic diversity of species is also an important component underlying their ability to adapt to these new selective pressures and to survive into the future. Trying to incorporate all of these factors into a projected model can be difficult, but with more data of higher quality we can start to make more refined predictions. But by understanding how particular traits influence how well a species may adapt to a changing climate, as well as knowing the what traits different species have, might just be the key to predicting who wins and who dies in the real-world Game of Thrones.

Age and dating with phylogenetics

Timing the phylogeny

Understanding the evolutionary history of species can be a complicated matter, both from theoretical and analytical perspectives. Although phylogenetics addresses many questions about evolutionary history, there are a number of limitations we need to consider in our interpretations.

One of these limitations we often want to explore in better detail is the estimation of the divergence times within the phylogeny; we want to know exactly when two evolutionary lineages (be they genera, species or populations) separated from one another. This is particularly important if we want to relate these divergences to Earth history and environmental factors to better understand the driving forces behind evolution and speciation. A traditional phylogenetic tree, however, won’t show this: the tree is scaled in terms of the genetic differences between the different samples in the tree. The rate of genetic differentiation is not always a linear relationship with time and definitely doesn’t appear to be universal.

 

Anatomy of phylogenies.jpg
The general anatomy of a phylogenetic tree. A phylogeny describes the relationships of tips (i.e. which are more closely related than others; referred to as the topology), how different these tips are (the length of the branches) and the order they separated in time (separations shown by the nodes). Different trees can share some traits but not others: the red box shows two phylogenetic trees with similar branch lengths (all of the branches are roughly the same) but different topology (the tips connect differently: A and B are together on the left but not on the right, for example). Conversely, two trees can have the same topology, but show differing lengths in the branches of the same tree (blue box). Note that the tips are all in the same positions in these two trees. Typically, it’s easier to read a tree from right to left: the two tips who have branches that meet first are most similar genetically; the longer it takes for two tips to meet along the branches, the less similar they are genetically.

How do we do it?

The parameters

There are a number of parameters that are required for estimating divergence times from a phylogenetic tree. These can be summarised into two distinct categories: the tree model and the substitution model.

The first one of these is relatively easy to explain; it describes the exact relationship of the different samples in our dataset (i.e. the phylogenetic tree). Naturally, this includes the topology of the tree (which determines which divergences times can be estimated for in the first place). However, there is another very important factor in the process: the lengths of the branches within the phylogenetic tree. Branch lengths are related to the amount of genetic differentiation between the different tips of the tree. The longer the branch, the more genetic differentiation that must have accumulated (and usually also meaning that longer time has occurred from one end of the branch to the other). Even two phylogenetic trees with identical topology can give very different results if they vary in their branch lengths (see the above Figure).

The second category determines how likely mutations are between one particular type of nucleotide and another. While the details of this can get very convoluted, it essentially determines how quickly we expect certain mutations to accumulate over time, which will inevitably alter our predictions of how much time has passed along any given branch of the tree.

Calibrating the tree

However, at least one another important component is necessary to turn divergence time estimates into absolute, objective times. An external factor with an attached date is needed to calibrate the relative branch divergences; this can be in the form of the determined mutation rate for all of the branches of the tree or by dating at least one node in the tree using additional information. These help to anchor either the mutation rate along the branches or the absolute date of at least one node in the tree (with the rest estimated relative to this point). The second method often involves placing a time constraint on a particular node of the tree based on prior information about the biogeography of the species (for example, we might know one species likely diverged from another after a mountain range formed: the age of the mountain range would be our constraints). Alternatively, we might include a fossil in the phylogeny which has been radiocarbon dated and place an absolute age on that instead.

Ammonite comic.jpg
Don’t you know it’s rude to ask an ammomite her age?

In regards to the former method, mutation rates describe how fast genetic differentiation accumulates as evolution occurs along the branch. Although mutations gradually accumulate over time, the rate at which they occur can depend on a variety of factors (even including the environment of the organism). Even within the genome of a single organism, there can be variation in the mutation rate: genes, for example, often gain mutations slower than non-coding region.

Although mutation rates (generally in the form of a ‘molecular clock’) have been traditionally used in smaller datasets (e.g. for mitochondrial DNA), there are inherent issues with its assumptions. One is that this rate will apply to all branches in a tree equally, when different branches may have different rates between them. Second, different parts of the genome (even within the same individual) will have different evolutionary rates (like genes vs. non-coding regions). Thus, we tend to prefer using calibrations from fossil data or based on biogeographic patterns (such as the time a barrier likely split two branches based on geological or climatic data).

The analytical framework

All of these components are combined into various analytical frameworks or programs, each of which handle the data in different ways. Many of these are Bayesian model-based analysis, which in short generates hypothetical models of evolutionary history and divergence times for the phylogeny and tests how well it fits the data provided (i.e. the phylogenetic tree). The algorithm then alters some aspect(s) of the model and tests whether this fits the data better than the previous model and repeats this for potentially millions of simulations to get the best model. Although models are typically a simplification of reality, they are a much more tractable approach to estimating divergence times (as well as a number of other types of evolutionary genetics analyses which incorporating modelling).

Molecular dating pipeline
A (believe it or not, simplified) pipeline for estimating divergence times from a phylogeny. 1) We obtain our DNA sequences for our samples: in this example, we’ll see each Sample (A-E) is a representative of a single species. We align these together to make sure we’re comparing the same part of the genome across all of them. 2) We estimate the phylogenetic tree for our samples/species. In a Bayesian framework, this means creating simulation models containing a certain substitution model and a given tree model (containing certain topology and branch lengths). Together, these two models form the likelihood model: we then test how well this model explains our data (i.e. the likelihood of getting the patterns in our data if this model was true). We repeat these simulations potentially hundreds of thousands of times until we pinpoint the most likely model we can get. 3) Using our resulting phylogeny, we then calibrate some parts of it based on external information. This could either be by including a carbon-dated fossil (F) within the phylogeny, or constraining the age of one node based on biogeographic information (the red circle and cross). 4) Using these calibrations as a reference, we then estimated the most likely ages of all the splits in the tree, getting our final dated phylogeny.

Despite the developments in the analytical basis of estimating divergence times in the last few decades, there are still a number of limitations inherent in the process. Many of these relate to the assumptions of the underlying model (such as the correct and accurate phylogenetic tree and the correct estimations of evolutionary rate) used to build the analysis and generate simulations. In the case of calibrations, it is also critical that they are correctly dated based on independent methods: inaccurate radiocarbon dating of a fossil, for example, could throw out all of the estimations in the entire tree. That said, these factors are intrinsic to any phylogenetic analysis and regularly considered by evolutionary biologists in the interpretations and discussions of results (such as by including confidence intervals of estimations to demonstrate accuracy).

Understanding the temporal aspects of evolution and being able to relate them to a real estimate of age is a difficult affair, but an important component of many evolutionary studies. Obtaining good estimates of the timing of divergence of populations and species through molecular dating is but one aspect in building the picture of the history of all organisms, including (and especially) humans.

The many genetic faces of adaptation

The transition from genotype to phenotype

While evolutionary genetics studies often focus on the underlying genetic architecture of species and populations to understand their evolution, we know that natural selection acts directly on physical characteristics. We call these the phenotype; by studying changes in the genes that determine these traits (the genotype), we can take a nuanced approach at studying adaptation. However, our ability to look at genetic changes and relate these to a clear phenotypic trait, and how and why that trait is under natural selection, can be a difficult task.

One gene for one trait

The simplest (and most widely used) models of understanding the genetic basis of adaptation assume that a single genotype codes for a single phenotypic trait. This means that changes in a single gene (such as outliers that we have identified in our analyses) create changes in a particular physical trait that is under a selective pressure in the environment. This is a useful model because it is statistically tractable to be able to identify few specific genes of very large effect within our genomic datasets and directly relate these to a trait: adding more complexity exponentially increases the difficulty in detecting patterns (at both the genotypic and phenotypic level).

Single locus figure
An example of a single gene coding for a single phenotypic trait. In this example, the different combination of alleles of the one gene determines the colour of the cat.

Many genes for one trait: polygenic adaptation

Unfortunately, nature is not always convenient and recent findings suggest that the overwhelming majority of the genetics of adaptation operate under what is called ‘polygenic adaptation’. As the name suggestions, under this scenario changes (even very small ones) in many different genes combine together to have a large effect on a particular phenotypic trait. Given the often very small magnitude of the genetic changes, it can be extremely difficult to separate adaptive changes in genes from neutral changes due to genetic drift. Likewise, trying to understand how these different genes all combine into a single functional trait is almost impossible, especially for non-model species.

Polygenic adaptation is often seen for traits which are clearly heritable, but don’t show a single underlying gene responsible. Previously, we’ve covered this with the heritability of height: this is one of many examples of ‘quantitative trait loci’ (QTLs). Changes in one QTL (a single gene) causes a small quantitative change in a particular trait; the combined effect of different QTLs together can ‘add up’ (or counteract one another) to result in the final phenotype value.

Height QTL
An example of polygenic quantitative trait loci. In this example, height is partially coded for by a total of ten different genes: the dominant form of each gene (Capitals, green) provides more height whereas the recessive form (lowercase, red) doesn’t. The cumulative total of these components determines how tall the person is: the person on the far right was very unlucky and got 0/10 height bonuses and so is the shortest. Progressively from left to right, some genes are contributing to the taller height of the people, with the far right person standing tall with the ultimate 10/10 pro-height genes. For reference, height is actually likely to be coded for by thousands of genes, not 10.

The mechanisms which underlie polygenic adaptation can be more complex than simple addition, too. Individual genes might cause phenotypic changes which interact with other phenotypes (and their underlying genotypes) to create a network of changes. We call these interactions ‘epistasis’, where changes in one gene can cause a flow-on effect of changes in other genes based on how their resultant phenotypes interact. We can see this in metabolic pathways: given that a series of proteins are often used in succession within pathways, a change in any single protein in the process could affect every other protein in the pathway. Of course, knowing the exact proteins coded for every gene, including their physical structure, and how each of those proteins could interact with other proteins is an immense task. Similar to QTLs, this is usually limited to model species which have a large history of research on these specific areas to back up the study. However, some molecular ecology studies are starting to dive into this area by identifying pathways that are under selection instead of individual genes, to give a broader picture of the overall traits that are underlying adaptation.

Labrador epistasis figure
My favourite example of epistasis on coat colour in labradors. Two genes together determine the colour of the coat, with strong interactions between them. The first gene (E/e) determines whether or not the underlying coat gene (B/b) is masked or not: two recessive alleles of the first gene (ee) completely blocks Gene 2 and causes the coat to become golden regardless of the second gene genotype (much like my beloved late childhood pet pictured, Sunny). If the first gene has at least one dominant allele, then the second gene is allowed to express itself. Possessing a dominant allele (BB or Bb) leads to a black lab; possessing two recessive alleles (bb) makes a choc lab!
Labrador epistasis table
The possible combinations of genotypes for the two above genes and the resultant coat colour (indicated by the box colour).

One gene for many traits: pleiotropy and differential gene expression

In contrast to polygenic traits, changes in a single gene can also potentially alter multiple phenotypic traits simultaneously. This is referred to as ‘pleiotropy’ and can happen if a gene has multiple different functions within an organism; one particular protein might be a component of several different systems depending on where it is found or how it is arranged. A clear example of pleiotropy is in albino animals: the most common form of albinism is the result of possessing two recessive alleles of a single gene (TYR). The result of this is the absence of the enzyme tyrosinase in the organism, a critical component in the production of melanin. The flow-on phenotypic effects from the recessive gene most obviously cause a lack of pigmentation of the skin (whitening) and eyes (which appear pink), but also other physiological changes such as light sensitivity or total blindness (due to changes in the iris). Albinism has even been attributed to behavioural changes in wild field mice.

Albinism pleiotropy
A very simplified diagram of how one genotype (the albino version of the TYR gene) can lead to a large number of phenotypic changes via pleiotropy (although many are naturally physiologically connected).

Because pleiotropic genes code for several different phenotypic traits, natural selection can be a little more complicated. If some resultant traits are selected against, but others are selected for, it can be difficult for evolution to ‘resolve’ the balance between the two. The overall fitness of the gene is thus dependent on the balance of positive and negative fitness of the different traits, which will determine whether the gene is positively or negatively selected (much like a cost-benefit scenario). Alternatively, some traits which are selectively neutral (i.e. don’t directly provide fitness benefits) may be indirectly selected for if another phenotype of the same underlying gene is selected for.

Multiple phenotypes from a single ‘gene’ can also arise by alternate splicing: when a gene is transcribed from the DNA sequence into the protein, the non-coding intron sections within the gene are removed. However, exactly which introns are removed and how the different coding exons are arranged in the final protein sequence can give rise to multiple different protein structures, each with potentially different functions. Thus, a single overarching gene can lead to many different functional proteins. The role of alternate splicing in adaptation and evolution is a rarely explored area of research and its importance is relatively unknown.

Non-genes for traits: epigenetics

This gets more complicated if we consider ‘non-genetic’ aspects underlying the phenotype in what we call ‘epigenetics’. The phrase literally translates as ‘on top of genes’ and refers to chemical attachments to the DNA which control the expression of genes by allowing or resisting the transcription process. Epigenetics is a relatively new area of research, although studies have started to delve into the role of epigenetic changes in facilitating adaptation and evolution. Although epigenetics is still a relatively new research topic, future research into the relationship between epigenetic changes and adaptive potential might provide more detailed insight into how adaptation occurs in the wild (and might provide a mechanism for adaptation for species with low genetic diversity)!

 

The different interactions between genotypes, phenotypes and fitness, as well as their complex potential outcomes, inevitably complicates any study of evolution. However, these are important aspects of the adaptation process and to discard them as irrelevant will not doubt reduce our ability to examine and determine evolutionary processes in the wild.

The direction of selection

The nature of adaptation

One of the most fundamental aspects of natural selection and evolution is, of course, the underlying genetic traits that shape the physical, selected traits. Most commonly, this involves trying to understand how changes in the distribution and frequencies of particular genetic variants (alleles) occur in nature and what forces of natural election are shaping them. Remember that natural selection acts directly on the physical characteristics of species; if these characteristics are genetically-determined (which many are), then we can observe the flow-on effects on the genetic diversity of the target species.

Although we might expect that natural selection is a fairly predictable force, there are a myriad of ways it can shape, reduce or maintain genetic diversity and identity of populations and species. In the following examples, we’re going to assume that the mentioned traits are coded for by a single gene with two different alleles for simplicity. Thus, one allele = one version of the trait (and can be used interchangeably). With that in mind, let’s take a look at the three main broad types of changes we observe in nature.

Directional selection

Arguably the most traditional perspective of natural selection is referred to as ‘directional selection’. In this example, nature selection causes one allele to be favoured more than another, which causes it to increase dramatically in frequency compared to the alternative allele. The reverse effect (natural selection pushing against a maladaptive allele) is still covered by directional selection, except that it functions in the opposite way (the allele under negative selection has reduced frequency, shifting towards the alternative allele).

Directional selection diagram
An example of directional selection. In this instance, we have one population of cats and a single phenotypic trait (colour) which ranges from 0 (yellow) to 1 (red). Red colour is selected for above all other colours; the original population has a pretty diverse mix of colours to start. Over time, we can see the average colour of the entire population moves towards more red colours whilst yellow colours start to disappear. Note that although the final population is predominantly red, there is still some (minor) variation in colours. These changes are reflected in the distribution of the colour-coding alleles (right), as it moves towards the red end of the spectrum.

Balancing selection

Natural selection doesn’t always push allele frequencies into different directions however, and sometimes maintains the diversity of alleles in the population. This is what happens in ‘balancing selection’ (sometimes also referred to as ‘stabilising selection’). In this example, natural selection favours non-extreme allele frequencies, and pushes the distribution of allele frequencies more to the centre. This may happen if deviations from the original gene, regardless of the specific change, can have strongly negative effects on the fitness of an organism, or in genes that are most fit when there is a decent amount of variation within them in the population (such as the MHC region, which contributes to immune response). There are a couple other reasons balancing selection may occur, though.

Heterozygote advantage

One example is known as ‘heterozygote advantage’. This is when an organism with two different alleles of a particular gene has greater fitness than an organism with two identical copies of either allele. A seemingly bizarre example of heterozygote advantage is related to sickle cell anaemia in African people. Sickle cell anaemia is a serious genetic disorder which is encoded for by recessive alleles of a haemoglobin gene; thus, a person has to carry two copies of the disease allele to show damaging symptoms. While this trait would ordinarily be strongly selected against in many population, it is maintained in some African populations by the presence of malaria. This seems counterintuitive; why does the presence of one disease maintain another?

Well, it turns out that malaria is not very good at infecting sickle cells; there are a few suggested mechanisms for why but no clear single answer. Naturally, suffering from either sickle cell anaemia or malaria is unlikely to convey fitness benefits. In this circumstance, natural selection actually favours having one sickle cell anaemia allele; while being a carrier isn’t ordinarily as healthy as having no sickle cell alleles, it does actually make the person somewhat resistant to malaria. Thus, in populations where there is a selective pressure from malaria, there is a heterozygote advantage for sickle cell anaemia. For those African populations without likely exposure to malaria, sickle cell anaemia is strongly selected against and less prevalent.

Malaria and sickle diagram
A diagram of how heterozygote advantage works in sickle cell anaemia and malaria resistance. On the top we have our two main traits: the blood cell shape (which has two different alleles; normal and sickle celled) and malaria infection by mosquitoes. Blue circles indicate that the trait has good fitness, whilst red crosses indicate the trait has bad fitness. For the left hand person, having two sickle cell alleles (ss) means they are symptomatic of sickle cell anaemia and is unlikely to have a good quality of life. On the right, having two normal blood cell alleles (SS) means that he is susceptible to malaria infection. The middle person, however, having only one sickle cell allele (Ss) means they are asymptomatic but still resistant to malaria. Thus, being heterozygous for sickle cell is actually beneficial over being homozygous in either direction: this is reflected in the distribution of alleles (bottom). The left side is pushed down by sickle cell anaemia whilst the right side is pushed down by malaria, thus causing both blood cell alleles (s and S) to be maintained at an intermediate frequency (i.e. balanced). 

Frequency-dependent selection

Another form of balancing selection is called ‘frequency-dependent selection’, where the fitness of an allele is inversely proportional to its frequency. Thus, once the allele has become common due to selection, the fitness of that allele is reduced and selection will start to favour the alternative allele (which is at much lower frequency). The constant back-and-forth tipping of the selective scales results in both alleles being maintained at an equilibrium.

This can happen in a number of different ways, but often the rarer trait/allele is fundamentally more fit because of its rarity. For example, if one allele allows an individual to use a new food source, it will be very selectively fit due to the lack of competition with others. However, as that allele accumulates within the population and more individuals start to feed on that food source, the lack of ‘uniqueness’ will mean that it’s not particularly better than the original food source. A balance between the two food sources (and thus alleles) will be maintained over time as shifts towards one will make the other more fit, and natural selection will compensate.

Frequency dependent selection diagram
An example of frequency-dependent selection. The colour of the cat indicates both their genotype and their food sources: black cats eat red apples whilst green cats eat green apples (this species has apparently developed herbivory, okay?) To start with, the incredibly low frequency of green cats mean that the one green cat can exploit a huge food source compared to black cats. Because of this, natural selection favours green cats. However, in the next generation evolution overcompensates and produces way too many green cats, and now black cats are getting much more food. Natural selection bounces back to favour black cats. Eventually, this causes and equilibrium balance of the two cat types (as shifts one way will cause a shift back the other way immediately after). These changes are reflected in the overall frequency of the two types over time (top right), which eventually evens out. The bottom right figure demonstrates that for both cat types, the frequency of that colour is inversely proportional to the overall fitness (measured as a proxy by amount of food per cat).

Disruptive selection

A third category of selection (although not as frequently mentioned) is known as ‘disruptive selection’, which is essentially the direct opposite of balancing selection. In this case, both extremes of allele frequencies are favoured (e.g. 1 for one allele or 1 for the other) but intermediate frequencies are not. This can be difficult to untangle in natural populations since it could technically be attributed to two different cases of directional selection. Each allele of the same gene is directionally selected for, but in opposite populations and directions so that overall pattern shows very little intermediates.

In direct contrast to balancing selection, disruptive selection can often be a case of heterozygote disadvantage (although it’s rarely called that). In these examples, it may be that individuals which are not genetically committed to one end or the other of the frequency spectrum are maladapted since they don’t fit in anywhere. An example would be a species that occupies both the desert and a forested area, with little grassland-type habitat in the middle. For the relevant traits, strongly desert-adapted genes would be selected for in the desert and strongly forest-adapted genes would be selected for in the forest. However, the lack of gradient between the two habitats means that individuals that are half-and-half are less adaptive in both the desert and the forest. A case of jack-of-all-trades, master of none.

Disruptive selection diagram
The above example of disruptive selection. Bird colour is coded for by a single gene; green birds have a HH genotype, orange birds have a hh genotype, and yellow birds are heterozygotes (Hh). Habitats where the two homozygote colours are most adaptive are found; green birds do well in the forest whereas orange birds do well in the desert. However, there’s no intermediate habitat between the two and so yellow birds don’t really fit well anywhere; they’re outcompeted in the forest and desert by the respective other colours. This means selection favours either extreme (homozygotes), shown in the top right. If we split up the two alleles of the genotype though, we can see that this disruptive selection is really the product of two directionally selective traits working in inverse directions: H is favoured at one end and h at the other.

Direction of selection

Although it would be convenient if natural selection was entirely predictable, it often catches up by surprise in how it acts and changes species and populations in the wild. Careful analysis and understanding of the different processes and outcomes of adaptation can feed our overall understanding of evolution, and aid in at least pointing in the right direction for our predictions.

Fantastic Genes and Where to Find Them

The genetics of adaptation

Adaptation and evolution by natural selection remains one of the most significant research questions in many disciplines of biology, and this is undoubtedly true for molecular ecology. While traditional evolutionary studies have been based on the physiological aspects of organisms and how this relates to their evolution, such as how these traits improve their fitness, the genetic component of adaptation is still somewhat elusive for many species and traits.

Hunting for adaptive genes in the genome

We’ve previously looked at the two main categories of genetic variation: neutral and adaptive. Although we’ve focused predominantly on the neutral components of the genome, and the types of questions about demographic history, geographic influences and the effect of genetic drift, they cannot tell us (directly) about the process of adaptation and natural selective changes in species. To look at this area, we’d have to focus on adaptive variation instead; that is, genes (or other related genetic markers) which directly influence the ability of a species to adapt and evolve. These are directly under natural selection, either positively (‘selected for’) or negatively (‘selected against’).

Given how complex organisms, the environment and genomes can be, it can be difficult to determine exactly what is a real (i.e. strong) selective pressure, how this is influenced by the physical characteristics of the organism (the ‘phenotype’) and which genes are fundamental to the process (the ‘genotype’). Even determining the relevant genes can be difficult; how do we find the needle-like adaptive genes in a genomic haystack?

Magnifying glass figure
If only it were this easy.

There’s a variety of different methods we can use to find adaptive genetic variation, each with particular drawbacks and strengths. Many of these are based on tests of the frequency of alleles, rather than on the exact genetic changes themselves; adaptation works more often by favouring one variant over another rather than completely removing the less-adaptive variant (this would be called ‘fixation’). So measuring the frequency of different alleles is a central component of many analyses.

FST outlier tests

One of the most classical examples is called an ‘FST outlier test’. This can be a bit complicated without understanding what FST is actually measures: in short terms, it’s a statistical measure of ‘population differentiation due to genetic structure’. The FST value of one particular population can determine how genetically similar it is to another. An FST value of 1 implies that the two populations are as genetically different as they could possibly be, whilst an FST value of 0 implies that they are genetically identical populations.

Generally, FST reflects neutral genetic structure: it gives a background of how, on average, different are two populations. However, if we know what the average amount of genetic differentiation should be for a neutral DNA marker, then we would predict that adaptive markers are significantly different. This is because a gene under selection should be more directly pushed towards or away from one variant (allele) than another, and much more strongly than the neutral variation would predict. Thus, the alleles that are way more or less frequent than the average pattern we might assume are under selection. This is the basis of the FST outlier test; by comparing two or more populations (using FST), and looking at the distribution of allele frequencies, we can pick out a few alleles that vary from the average pattern and suggest that they are under selection (i.e. are adaptive).

There are a few significant drawbacks for FST outlier tests. One of the most major ones is that genetic drift can also produce a large number of outliers; in a small population, for example, one allele might be fixed (has a frequency of 1, with no alternative allele in the population) simply because there is not enough diversity or population size to sustain more alleles. Even if this particular allele was extremely detrimental, it’d still appear to be favoured by natural selection just because of drift.

Drift leading to outliers diagram
An example of genetic drift leading to outliers, featuring our friends the cat population. Top row: Two cat populations, one small (left; n = 5) and one large (middle, n = 12) show little genetic differentiation between them (right; each triangle represents a single gene or locus; the ‘colour’ gene is marked in green). The average (‘neutral’) pattern of differentiation is shown by the dashed line. Much like in our original example, one cat in the small population is horrifically struck by lightning and dies (RIP again). Now when we compare the frequency of the alleles of the two populations (bottom), we see that (because a green cat died), the ‘colour’ locus has shifted away from the general trend (right) and is now an outlier. Thus, genetic drift in the ‘colour’ gene gives the illusion of a selective loci (even though natural selection didn’t cause the change, since colour does not relate to how likely a cat is to be struck by lightning).

Secondly, the cut-off for a ‘significant’ vs. ‘relatively different but possibly not under selection’ can be a bit arbitrary; some genes that are under weak selection can go undetected. Furthermore, recent studies have shown a growing appreciation for polygenic adaptation, where tiny changes in allele frequencies of many different genes combine together to cause strong evolutionary changes. For example, despite the clear heritable nature of height (tall people often have tall children), there is no clear ‘height’ gene: instead, it appears that hundreds of genes are potentially very minor height contributors.

Polygenic height figure final
In this example, we have one tall parent (top) who produces two offspring; one who is tall (left) and one who isn’t (right). In order to understand what genetic factors are contributing to their height differences, we compare their genetics (right; each dot represents a single locus). Although there aren’t any particular loci that look massively different between the two, the cumulative effect of tiny differences (the green triangles) together make one person taller than the other. There are no clear outliers, but many (poly) different genes (genic) acting together.

Genotype-environment associations

To overcome these biases, sometimes we might take a more methodological approach called ‘genotype-environment association’. This analysis differs in that we select what we think our selective pressures are: often environmental characteristics such as rainfall, temperature, habitat type or altitude. We then take two types of measures per individual organism: the genotype, through DNA sequencing, and the relevant environmental values for that organisms’ location. We repeat this over the full distribution of the species, taking a good number of samples per population and making sure we capture the full variation in the environment. Then we perform a correlation-type analysis, which seeks to see if there’s a connection or trend between any particular alleles and any environmental variables. The most relevant variables are often pulled out of the environmental dataset and focused on to reduce noise in the data.

The main benefit of GEA over FST outlier tests is that it’s unlikely to be as strongly influenced by genetic drift. Unless (coincidentally) populations are drifting at the same genes in the same pattern as the environment, the analysis is unlikely to falsely pick it up. However, it can still be confounded by neutral population structure; if one population randomly has a lot of unique alleles or variation, and also occurs in a somewhat unique environment, it can bias the correlation. Furthermore, GEA is limited by the accuracy and relevance of the environmental variables chosen; if we pick only a few, or miss the most important ones for the species, we won’t be able to detect a large number of very relevant (and likely very selective) genes. This is a universal problem in model-based approaches and not just limited to GEA analysis.

New spells to find adaptive genes?

It seems likely that with increasing datasets and better analytical platforms, many more types of analysis will be developed to delve deeper into the adaptive aspects of the genome. With whole-genome sequencing starting to become a reality for non-model species, better annotation of current genomes and a steadily increasing database of functional genes, the ability of researchers to investigate evolution and adaptation at the genomic level is also increasing.

Pseudo or science? Interpreting scientific reports

Telling the real from the fake

The phrase ‘fake news’ seems to get thrown around ad nauseum these days, but there’s a reason for it (besides the original somewhat famous coining of the phrase). Inadvertently bad, or sometimes downright malicious, reporting of various apparent ‘trends’ or ‘patterns’ are rife throughout nearly all forms of media. Particularly, many entirely subjective or blatantly falsified presentations or reports of ‘fact’ cloud real scientific inquiry and its distillation into the broader community. In fact, a recent study has shown that falsified science spreads through social media at orders of magnitude faster than real science: so why is this? And how do we spot the real from the fake?

It’s imperative that we understand what real science entails to be able to separate it from the pseudoscience. Of course, scientific rigour and method are always of utmost importance, but these can be hard to detect (or can be effectively lied through colourful language choices). When reading a scientific article, whether it’s direct from the source (a journal, such as Nature or Science) or secondarily through a media outlet such as the news or online sources, there’s a few things that you should always look for that will help discern between the two categories.

Peer-review and adequate referencing

Firstly, is the science presented in an objective, logical manner? Does it systematically demonstrate the study system and question, with the relevant reference to peer-reviewed literature? Good science builds upon the wealth of previously done good science to contribute to a broader field of knowledge; in this way, critical observations and alternative ideas can be compared and contrasted to steer the broader field. Even entirely novel science, which go against the common consensus, will reference and build upon prior literature and justify the necessity and design of the study. Having written more than one literature review in my life, I can safely assure you that there is no shortage of relevant scientific studies that need to be read, understood and built upon in any future scientific study.

 

Methods, statistics and sampling

Secondly, is there a solid methodological basis for the science? In almost all cases this will include some kind of statistical measure for the validity (and accuracy) of the results. How does the sample size of the study measure up to what the target group? Remember, a study size of 500 people is definitely too small to infer the medical conditions of all humans, but rarely do we get sample sizes that big in evolutionary genetics studies (especially in non-model species). The sampling regime is extremely important for interpreting the results: particularly, keep in mind if there is an inherent bias in the way the sampling has been done. Are some groups more represented than others? Where do the samples come from? What other factors might be influencing the results, based on the origin of the samples?

Cat survey comic 2
Despite having a large sample size, and a significant result (p<0.05), this study cannot conclude that all dogs are awful. It can conclude, however, that cats are statistically significant assholes.

Presentation and language of findings

Thirdly, how does the source present the results? Does it make claims that seem beyond a feasible conclusion based on the study itself? Even if the underlying study is scientific, many secondary sources have a tendency to ‘sensationalise’ the results in order to make them both more appealing and more digestible to the general public. This is only exacerbated by the lack of information of the scientific method of the original paper, actual statistics, or the accurate summation of those statistics. Furthermore, a real scientific study will try to (in most cases) avoid evocative words such as ‘prove’, as a fundamental aspect of science is that no study is 100% ‘proven’ (see falsifiability below). Proofs are a relevant mathematical concept though, but these fall under a different category altogether.

Here’s an example: recently, an Australian mainstream media outlet (among many) shared a story about a ‘recent’ (six months old) study that found that second-born children are more likely to be criminals and first-born children have higher IQ. As you might expect, the original study does not imply that being born second will make you a sudden murderer nor will being the first born make you a prodigy. Instead, the authors suggest that there may be a link between differential parental investment/attention (between different age order children) as a potential mechanism. They ruled out, based on a wealth of statistics, the influence of alternative factors such as health or education (both in quality and quantity). Thus, there is a correlative (read: not causative) effect of age on these characteristics. If you directly interpreted the newscast (or read some of the misguided comments), you might think otherwise.

Falsifiability 

Fourthly, are the hypotheses in the study falsifiable? One of the foundations of the modern scientific method includes the requirement of any real scientific hypothesis to be falsifiable; that is, there must be a way to show evidence against that hypothesis. This can be difficult to evaluate, but is why some broad philosophical questions are considered ‘unscientific’. A classic example is the phrase “all swans are white”, which was apparently historically believed in Europe (where there are no black swans). This statement is technically falsifiable, since if one found a non-white swan it would ‘disprove’ the hypothesis. Lo and behold, Europeans arrive in Australia and find that, actually, some swans are black. The original statement was thus falsified.

Swan comic 2
Well, I’ll be damned falsified. Just pretend the swan is actually black: I don’t have enough ink to make it realistic…

The role of the peer: including you!

Peer-review is a critical aspect of scientific process, and despite some conspiracy-theory-esque remarks about the secret Big Science Society, it generally works. While independent people inevitably have their own personal biases and are naturally subjective to some degree (no matter how hard we may try to be objective), a larger number of well-informed, critical thinkers help to broaden the focus and perspective surrounding any scientific subject. Remember, nothing is more critical of science than science itself.

Peer review comic
One of the most apt representations of peer-review I’ve ever seen, from Dr. Nick D. Kim (PhD). Source: here.

While peer-review is technically aimed at other scientists as a way to steer and inform research, the input of outsider, non-specialist readers can still be informative. By closely looking at science, and better understanding both how it is done and what it is showing, can help us evaluate how valuable science is to broader society and shift scientific information into useful, everyday applications. Furthermore, by educating ourselves on what is real science, and what is disruptive drivel, we can aid the development of science and reduce the slowing impact of misinformation and deceit.

 

 

Evolution and the space-time continuum

Evolution travelling in time

As I’ve mentioned a few times before, evolution is a constant force that changes and flows over time. While sometimes it’s more convenient to think of evolution as a series of rather discrete events (a species pops up here, a population separates here, etc.), it’s really a more continual process. The context and strength of evolutionary forces, such as natural selection, changes as species and the environment they inhabit also changes. This is important to remember in evolutionary studies because although we might think of more recent and immediate causes of the evolutionary changes we see, they might actually reflect much more historic patterns. For example, extremely low contemporary levels of genetic diversity in cheetah is likely largely due to a severe reduction in their numbers during the last ice age, ~12 thousand years ago (that’s not to say that modern human issues haven’t also been seriously detrimental to them). Similarly, we can see how the low genetic diversity of a small population colonise a new area can have long term effects on their genetic variation: this is called ‘founder effect’. Because of this, we often have to consider the temporal aspect of a species’ evolution.

Founder effect diagram
An example of founder effect. Each circle represents a single organism; the different colours are an indicator of how much genetic diversity that individual possesses (more colours = more variation). We start with a single population; one (A) or two (B) individuals go on a vacation and decide to stay on a new island. Even after the population has become established and grows over time, it takes a long time for new diversity to arise. This is because of the small original population size and genetic diversity; this is called founder effect. The more genetic diversity in the settled population (e.g. vs A), the faster new diversity arises and the weaker the founder effect.

Evolution travelling across space

If the environmental context of species and populations are also important for determining the evolutionary pathways of organisms, then we must also consider the spatial context. Because of this, we also need to look at where evolution is happening in the world; what kinds of geographic, climatic, hydrological or geological patterns are shaping and influencing the evolution of species? These patterns can influence both neutral or adaptive processes by shaping exactly how populations or species exist in nature; how connected they are, how many populations they can sustain, how large those populations can sustainably become, and what kinds of selective pressures those populations are under.

Allopatry diagram
An example of how the environment (in this case, geology) can have both neutral and adaptive effects. Let’s say we start with one big population of cats (N = 9; A), which is distributed over a single large area (the green box). However, a sudden geological event causes a mountain range to uplift, splitting the population in two (B). Because of the reduced population size and the (likely) randomness of which individuals are on each side, we expect some impact of genetic drift. Thus, this is the neutral influence. Over time, these two separated regions might change climatically (C), with one becoming much more arid and dry (right) and the other more wet and shady (left). Because of the difference of the selective environment, the two populations might adapt differently. This is the adaptive influence. 

Evolution along the space-time continuum

Given that the environment also changes over time (and can be very rapid, and we’ve seen recently), the interaction of the spatial and temporal aspects of evolution are critical in understanding the true evolutionary history of species. As we know, the selective environment is what determines what is, and isn’t, adaptive (or maladaptive), so we can easily imagine how a change in the environment could push changes in species. Even from a neutral perspective, geography is important to consider since it can directly determine which populations are or aren’t connected, how many populations there are in total or how big populations can sustainably get. It’s always important to consider how evolution travels along the space-time continuum.

Genetics TARDIS
“Postgraduate Student Who” doesn’t quite have the same ring to it, unfortunately.

Phylogeography

The field of evolutionary science most concerned with these two factors and how the influence evolution is known as ‘phylogeography’, which I’ve briefly mentioned in previous posts. In essence, phylogeographers are interested in how the general environment (e.g. geology, hydrology, climate, etc) have influenced the distribution of genealogical lineages. That’s a bit of a mouthful and seems a bit complicated, by the genealogical part is important; phylogeography has a keen basis in evolutionary genetics theory and analysis, and explicitly uses genetic data to test patterns of historic evolution. Simply testing the association between broad species or populations, without the genetic background, and their environment, falls under the umbrella field of ‘biogeography’. Semantics, but important.

Birds phylogeo
Some example phylogeographic models created by Zamudio et al. (2016). For each model, there’s a demonstrated relationship between genealogical lineages (left) and the geographic patterns (right), with the colours of the birds indicating some trait (let’s pretend they’re actually super colourful, as birds are). As you can see, depending on which model you look at, you will see a different evolutionary pattern; for example, model shows specific lineages that are geographically isolated from one another each evolved their own colour. This contrasts with in that each colour appears to have evolved once in each region based on the genetic history.

For phylogeography, the genetic history of populations or species gives the more accurate overview of their history; it allows us to test when populations or species became separated, which were most closely related, and whether patterns are similar or different across other taxonomic groups. Predominantly, phylogeography is based on neutral genetic variation, as using adaptive variation can confound the patterns we are testing. Additionally, since neutral variation changes over time in a generally predictable, mathematical format (see this post to see what I mean), we can make testable models of various phylogeographic patterns and see how well our genetic data makes sense under each model. For example, we could make a couple different models of how many historic populations there were and see which one makes the most sense for our data (with a statistical basis, of course). This wouldn’t work with genes under selection since they (by their nature) wouldn’t fit a standard ‘neutral’ model.

Coalescent
If it looks mathematically complicated, it’s because it is. This is an example of the coalescent from Brito & Edwards, 2008: a method that maps genes back in time (the different lines) to see where the different variants meet at a common ancestor. These genes are nested within the history of the species as a whole (the ‘tubes’), with many different variables accounted for in the model.

That said, there are plenty of interesting scientific questions within phylogeography that look at exploring the adaptive variation of historic populations or species and how this has influenced their evolution. Although this can’t inherently be built into the same models as the neutral patterns, looking at candidate genes that we think are important for evolution and seeing how their distributions and patterns relate to the overall phylogeographic history of the species is one way of investigating historic adaptive evolution. For example, we might track changes in adaptive genes by seeing which populations have which variants of the gene and referring to our phylogeographic history to see how and when these variants arose. This can help us understand how phylogeographic patterns have influenced the adaptive evolution of different populations or species, or inversely, how adaptive traits might have influenced the geographic distribution of species or populations.

Where did you come from and where will you go?

Phylogeographic studies can tell us a lot about the history of a species, and particularly how that relates to the history of the Earth. All organisms share an intimate relationship with their environment, both over time and space, and keeping this in mind is key for understanding the true evolutionary history of life on Earth.

 

Drifting or driving: directionality in evolution

How random is evolution?

Often, we like to think of evolution fairly anthropomorphically; as if natural selection actively decides what is, and what isn’t, best for the evolution of a species (or population). Of course, there’s not some explicit Evolution God who decrees how a species should evolve, and in reality, evolution reflects a more probabilistic system. Traits that give a species a better chance of reproducing or surviving, and can be inherited by the offspring, will over time become more and more dominant within the species; contrastingly, traits that do the opposite will be ‘weeded out’ of the gene pool as maladaptive organisms die off or are outcompeted by more ‘fit’ individuals. The fitness value of a trait can be determined from how much the frequency of that trait varies over time.

So, if natural selection is just probabilistic, does this mean evolution is totally random? Is it just that traits are selected based on what just happens to survive and reproduce in nature, or are there more direct mechanisms involved? Well, it turns out both processes are important to some degree. But to get into it, we have to explain the difference between genetic drift and natural selection (we’re assuming here that our particular trait is genetically determined).  

Allele frequency over time diagram
The (statistical) overview of natural selection. In this example, we have two different traits in a population; the blue and the red O. Our starting population is 20 individuals (N), with 10 of each trait (a 1:1 ratio, or 50% frequency of each). We’re going to assume that, because the blue is favoured by natural selection, it doubles in frequency each generation (i.e. one individual with the blue has two offspring with one blue each). The red is neither here nor there and is stable over time (one red O produces one red O in the next generation). So, going from Gen 1 to Gen 2, we have twice as many blue Xs (Nt) as we did previously, changing the overall frequency of the traits (highlighted in yellow). Because populations probably don’t exponentially increase every generation, we’ll cut it back down to our original total of 20, but at the same ratios (Np). Over time, we can see that the population gradually accumulates more blue Xs relative to red Os, and by Gen 5 the red is extinct. Thus, the blue X has evolved!

When we consider the genetic variation within a species to be our focal trait, we can tell that different parts of the genome might be more related with natural selection than others. This makes sense; some mutations in the genome will directly change a trait (like fur colour) which might have a selective benefit or detriment, while others might not change anything physically or change traits that are neither here-nor-there under natural selection (like nose shape in people, for example). We can distinguish between these two by talking about adaptive or neutral variation; adaptive variation has a direct link to natural selection whilst neutral variation is predominantly the product of genetic drift. Depending on our research questions, we might focus on one type of variation over the other, but both are important components of evolution as a whole.

Genetic drift

Genetic drift is considered the random, selectively ‘neutral’ changes in the frequencies of different traits (alleles) over time, due to completely random effects such as random mutations or random loss of alleles. This results in the neutral variation we can observe in the gene pool of the species. Changes in allele frequencies can happen due to entirely stochastic events. If, by chance, all of the individuals with the blue fur variant of a gene are struck by lightning and die, the blue fur allele would end up with a frequency of 0 i.e. go extinct. That’s not to say the blue fur ‘predisposed’ the individuals to be struck be lightning (we assume here, anyway), so it’s not like it was ‘targeted against’ by natural selection (see the bottom figure for this example).

Because neutral variation appears under a totally random, probabilistic model, the mathematical basis of it (such as the rate at which mutations appear) has been well documented and is the foundation of many of the statistical aspects of molecular ecology. Much of our ability to detect which genes are under selection is by seeing how much the frequencies of alleles of that gene vary from the neutral model: if one allele is way more frequent than you’d expect by random genetic drift, then you’d say that it’s likely being ‘pushed’ by something: natural selection.

Manhattan plot example
A Manhattan plot, which measures the level of genetic differentiation between two different groups across the genome. The x-axis shows the length of the genome, in this example colour-coded by the specific chromosome of the sequence, while the y-axis shows the level of differentiation between the two groups being studied. The dots represent certain spots (loci, singular locus) in the genome, with the level of differentiation (Fst) measured for that locus in one group vs that locus in the other group. The dotted line represents the ‘average differentiation’: i.e. how different you’d expect the two groups to be by chance. Anything about that line is significantly different between the two groups, either because of drift or natural selection. This plot has been slightly adapted from Axelsson et al. (2013), who were studying domestication in dogs by comparing the genetic architecture of wild wolves versus domestic dogs. In this example we can see that certain regions of the genome are clearly different between dogs and wolves (circled); when the authors looked at the genes within those blocks, they found that many were related to behavioural changes (nervous system), competitive breeding (sperm-egg recognition) and interestingly, starch digestion. This last category suggests that adaptation to an omnivorous diet (likely human food waste) was key in the domestication process.

Natural selection

Contrastingly to genetic drift, natural selection is when particular traits are directly favoured (or unfavoured) in the environmental context of the population; natural selection is very specific to both the actual trait and how the trait works. A trait is only selected for if it conveys some kind of fitness benefit to the individual; in evolutionary genetics terms, this means it allows the individual to have more offspring or to survive better (usually).

While this might be true for a trait in a certain environment, in another it might be irrelevant or even have the reverse effect. Let’s again consider white fur as our trait under selection. In an arctic environment, white fur might be selected for because it helps the animal to camouflage against the snow to avoid predators or catch prey (and therefore increase survivability). However, in a dense rainforest, white fur would stand out starkly against the shadowy greenery of the foliage and thus make the animal a target, making it more likely to be taken by a predator or avoided by prey (thus decreasing survivability). Thus, fitness is very context-specific.

Who wins? Drift or selection?

So, which is mightier, the pen (drift) or the sword (selection)? Well, it depends on a large number of different factors such as mutation rate, the importance of the trait under selection, and even the size of the population. This last one might seem a little different to the other two, but it’s critically important to which process governs the evolution of the species.

In very small populations, we expect genetic drift to be the stronger process. Natural selection is often comparatively weaker because small populations have less genetic variation for it to act upon; there are less choices for gene variants that might be more beneficial than others. In severe cases, many of the traits are probably very maladaptive, but there’s just no better variant to be selected for; look at the plethora of physiological problems in the cheetah for some examples.

Genetic drift, however, doesn’t really care if there’s “good” or “bad” variation, since it’s totally random. That said, it tends to be stronger in smaller populations because a small, random change in the number or frequency of alleles can have a huge effect on the overall gene pool. Let’s say you have 5 cats in your species; they’re nearly extinct, and probably have very low genetic diversity. If one cat suddenly dies, you’ve lost 20% of your species (and up to that percentage of your genetic variation). However, if you had 500 cats in your species, and one died, you’d lose only <0.2% of your genetic variation and the gene pool would barely even notice. The same applies to random mutations, or if one unlucky cat doesn’t get to breed because it can’t find a mate, or any other random, non-selective reason. One way we can think of this is as ‘random error’ with evolution; even a perfectly adapted organism might not pass on its genes if it is really unlucky. A bigger sample size (i.e. more individuals) means this will have less impact on the total dataset (i.e. the species), though.

Drift in small pops
The effect of genetic drift on small populations. In this example, we have two very similar populations of cats, each with three different alleles (black, blue and green) in similar frequencies across the populations. The major difference is the size of the population; the left is much smaller (5 cats) compared to the right (20 cats). If one cat randomly dies from a bolt of lightning (RIP), and assuming that the colour of the cat has no effect on the likelihood of being struck by lightning (i.e. is not under natural selection), then the outcome of this event is entirely due to genetic drift. In this case, the left population has lost 1/5th of its population size and 1/3rd of its total genetic diversity thanks to the death of the genetically unique blue cat (He will be missed) whereas the right population has only really lost 1/20th of its size and no changes in total diversity (it’ll recover).

Both genetic drift and natural selection are important components of evolution, and together shape the overall patterns of evolution for any given species on the planet. The two processes can even feed into one another; random mutations (drift) might become the genetic basis of new selective traits (natural selection) if the environment changes to suit the new variation. Therefore, to ignore one in favour of the other would fail to capture the full breadth of the processes which ultimately shape and determine the evolution of all species on Earth, and thus the formation of the diversity of life.