Lost in a forest of (gene) trees

Using genetics to understand species history

The idea of using the genetic sequences of living organisms to understand the evolutionary history of species is a concept much repeated on The G-CAT. And it’s a fundamental one in phylogenetics, taxonomy and evolutionary biology. Often, we try to analyse the genetic differences between individuals, populations and species in a tree-like manner, with close tips being similar and more distantly separated branches being more divergent. However, this runs on one very key assumption; that the patterns we observe in our study genes matches the overall patterns of species evolution. But this isn’t always true, and before we can delve into that we have to understand the difference between a ‘gene tree’ and a ‘species tree’.

A gene tree or a species tree?

Our typical view of a phylogenetic tree is actually one of a ‘gene tree’, where we analyse how a particular gene (or set of genes) have changed over time between different individuals (within and across populations or species) based on our understanding of mutation and common ancestry.

However, a phylogenetic tree based on a single gene only demonstrates the history of that gene. What we assume in most cases is that the history of that gene matches the history of the species: that branches in the genetic tree mirror when different splits in species occurred throughout history.

The easiest way to conceptualise gene trees and species trees is to think of individual gene trees that are nested within an overarching species tree. In this sense, individual gene trees can vary from one another (substantially, even) but by looking at the overall trends of many genes we can see how the genome of the species have changed over time.

Gene tree incongruence figure
A (potentially familiar) depiction of individual gene trees (coloured lines) within the broader species tree (defined b the black boundaries). As you might be able to tell, the branching patterns of the different genes are not the same, and don’t always match the overarching species tree.

Gene tree incongruence

Different genes may have different patterns for a number of reasons. Changes in the genetic sequences of organisms over time don’t happen equally across the entire genome, and very specific parts of the genome can evolve in entirely different directions, or at entirely different rates, than the rest of the genome. Let’s take a look at a few ways we could have conflicting gene trees in our studies.

Incomplete lineage sorting

One of the most prolific, but more complicated, ways gene trees can vary from their overarching species tree is due to what we call ‘incomplete lineage sorting’. This is based on the idea that species and the genes that define them are constantly evolving over time, and that because of this different genes are at different stages of divergence between population and species. If we imagine a set of three related populations which have all descended from a single ancestral population, we can start to see how incomplete lineage sorting could occur. Our ancestral population likely has some genetic diversity, containing multiple alleles of the same locus. In a true phylogenetic tree, we would expect these different alleles to ‘sort’ into the different descendent populations, such that one population might have one of the alleles, a second the other, and so on, without them sharing the different alleles between them.

If this separation into new populations has been recent, or if gene flow has occurred between the populations since this event, then we might find that each descendent population has a mixture of the different alleles, and that not enough time has passed to clearly separate the populations. For this to occur, sufficient time for new mutations to occur and genetic drift to push different populations to differently frequent alleles needs to happen: if this is too recent, then it can be hard to accurately distinguish between populations. This can be difficult to interpret (see below figure for a visualisation of this), but there’s a great description of incomplete lineage sorting here.

ILS_adaptedfigure
A demonstration of incomplete lineage sorting, generously adapted from a talk by fellow MELFU postdocs Dr Yuma (Jonathon) Sandoval-Castillo and Dr Catherine Attard. On the left is a depiction of a single gene coalescent tree over time: circles represent a single individual at a particular point in time (row) with the colours representing different alleles of that same gene. The tree shows how new mutations occur (colour changes along the branches) and spread throughout the descendent populations. In this example, we have three recently separated species, with a good number of different alleles. However, when we study these alleles in tree form (the phylogeny on the right), we see that the branches themselves don’t correlate well with the boundaries of the species. For example, the teal allele found within Species C is actually more similar to Species B alleles (purple and blue) than any other Species B alleles, based on the order and patterns of these mutations.

Hybridisation and horizontal transfer

Another way individual genes may become incongruent with other genes is through another phenomenon we’ve discussed before: hybridisation (or more specifically, introgression). When two individuals from different species breed together to form a ‘hybrid’, they join together what was once two separate gene pools. Thus, the hybrid offspring has (if it’s a first generation hybrid, anyway) 50% of genes from Species A and 50% of genes from Species B. In terms of our phylogenetic analysis, if we picked one gene randomly from the hybrid, we have 50% of picking a gene that reflects the evolutionary history of Species A, and 50% chance of picking a gene that reflects the evolutionary history of Species B. This would change how our outputs look significantly: if we pick a Species A gene, our ‘hybrid’ will look (genetically) very, very similar to Species A. If we pick a Species B gene, our ‘hybrid’ will look like a Species B individual instead. Naturally, this can really stuff up our interpretations of species boundaries, distributions and identities.

Hybridisation_figure
An example of hybridisation leading to gene tree incongruence with our favourite colourful fishA) We have a hybridisation event between a red fish (Species A) and a green fish (Species B), resulting in a hybrid species (‘Species’ H). The red fish genome is indicated by the yellow DNA, the green fish genomes by the blue DNA, and the hybrid orange fish has a mixture of these two. B) If we sampled one set of genes in the hybrid, we might select a gene that originated from the red fish, showing that the hybrid is identical (or very similar) the Species A. D) Conversely, if we sampled a gene originating from the green fish, the resultant phylogeny might show that the hybrid is the same as Species B. C) If we consider these two patterns in combination, which see the true pattern of species formation, which is not a clear dichotomous tree and rather a mixture of the two sets of trees.

Paralogous genes

More confusingly, we can even have events where a single gene duplicates within a genome. This is relatively rare, although it can have huge effects: for example, salmon have massive genomes as the entire thing was duplicated! Each version of the gene can take on very different forms, functions, and evolve in entirely different ways. We call these duplicated variants paralogous genes: genes that look the same (in terms of sequence), but are totally different genes.

This can have a profound impact as paralogous genes are difficult to detect: if there has been a gene duplication early in the evolutionary history of our phylogenetic tree, then many (or all) of our study samples will have two copies of said gene. Since they look similar in sequence, there’s all possibility that we pick Variant 1 in some species and Variant 2 in other species. Being unable to tell them apart, we can have some very weird and abstract results within our tree. Most importantly, different samples with the same duplicated variant will seem similar to one another (e.g. have evolved from a common ancestor more recently) than it will to any sample of the other variant (even if they came from the exact same species)!

Paralogy_figure.jpg
An example of how paralogous genes can confound species tree. We start with a single (purple) gene: at a particular point in time, this gene duplicates into a red and a blue form. Each of these genes then evolve and spread into four separate descendent species (A, B, C and D) but not in entirely the same way. However, since both the red and blue genetic sequences are similar, if we took a single gene from each species we might (somewhat randomly) sequence either the red or the blue copy. The different phylogenetic trees on the right demonstrate how different combinations of red and blue genes give very different patterns, since all blue copies will be more related to other blue genes than to the red gene of the same species. E.g. a blue A and a blue C are more similar than a blue A and a red A.

Overcoming incongruence with genomics

Although a tricky conundrum in phylogenetics and evolutionary genetics broadly, gene tree incongruence can largely be overcome with using more loci. As the random changes of any one locus has a smaller effect of the larger total set of loci, the general and broad patterns of evolutionary history can become clearer. Indeed, understanding how many loci are affected by what kind of process can itself become informative: large numbers of introgressed loci can indicate whether hybridisation was recent, strong, or biased towards one species over another, for example. As with many things, the genomic era appears poised to address the many analytical issues and complexities of working with genetic data.

An identity crisis: using genomics to determine species identities

This is the fourth (and final) part of the miniseries on the genetics and process of speciation. To start from Part One, click here.

In last week’s post, we looked at how we can use genetic tools to understand and study the process of speciation, and particularly the transition from populations to species along the speciation continuum. Following on from that, the question of “how many species do I have?” can be further examined using genetic data. Sometimes, it’s entirely necessary to look at this question using genetics (and genomics).

Cryptic species

A concept that I’ve mentioned briefly previously is that of ‘cryptic species’. These are species which are identifiable by their large genetic differences, but appear the same based on morphological, behavioural or ecological characteristics. Cryptic species often arise when a single species has become fragmented into several different populations which have been isolated for a long time from another. Although they may diverge genetically, this doesn’t necessarily always translate to changes in their morphology, ecology or behaviour, particularly if these are strongly selected for under similar environmental conditions. Thus, we need to use genetic methods to be able to detect and understand these species, as well as later classify and describe them.

Cryptic species fish
An example of cryptic species. All four fish in this figure are morphologically identical to one another, but they differ in their underlying genetic variation (indicated by the different colours of DNA). Thus, from looking at these fish alone we would not perceive any differences, but their genetic make-up might suggest that there are more than one species…
Cryptic species heatmap example
The level of genetic differentiation between the fish in the above example. The phylogenies on the left and top of the figure demonstrate the evolutionary relationships of these four fish. The matrix shows a heatmap of the level of differences between different pairwise comparisons of all four fish: red squares indicate zero genetic differences (such as when comparing a fish to itself; the middle diagonal) whilst yellow squares indicate increasingly higher levels of genetic differentiation (with bright yellow = all differences). By comparing the different fish together, we can see that Fish 1 and 2, and Fish 3 and 4, are relatively genetically similar to one another (red-deep orange). However, other comparisons show high level of genetic differences (e.g. 1 vs 3 and 1 vs 4). Based on this information, we might suggest that Fish 1 and 2 belong to one cryptic species (A) and Fish 3 and 4 belong to a second cryptic species (B).

Genetic tools to study species: the ‘Barcode of Life’

A classically employed method that uses DNA to detect and determine species is referred to as the ‘Barcode of Life’. This uses a very specific fragment of DNA from the mitochondria of the cell: the cytochrome c oxidase I gene, CO1. This gene is made of 648 base pairs and is found pretty well universally: this and the fact that CO1 evolves very slowly make it an ideal candidate for easily testing the identity of new species. Additionally, mitochondrial DNA tends to be a bit more resilient than its nuclear counterpart; thus, small or degraded tissue samples can still be sequenced for CO1, making it amenable to wildlife forensics cases. Generally, two sequences will be considered as belonging to different species if they are certain percentage different from one another.

Annotated mitogeome
The full (annotated) mitochondrial genome of humans, with the different genes within it labelled. The CO1 gene is labelled with the red arrow (sometimes also referred to as COX1) whilst blue arrows point to other genes often used in phylogenetic or taxonomic studies, depending on the group or species in question.

Despite the apparent benefits of CO1, there are of course a few drawbacks. Most of these revolve around the mitochondrial genome itself. Because mitochondria are passed on from mother to offspring (and not at all from the father), it reflects the genetic history of only one sex of the species. Secondly, the actual cut-off for species using CO1 barcoding is highly contentious and possibly not as universal as previously suggested. Levels of sequence divergence of CO1 between species that have been previously determined to be separate (through other means) have varied from anywhere between 2% to 12%. The actual translation of CO1 sequence divergence and species identity is not all that clear.

Gene tree – species tree incongruences

One particularly confounding aspect of defining species based on a single gene, and with using phylogenetic-based methods, is that the history of that gene might not actually be reflective of the history of the species. This can be a little confusing to think about but essentially leads to what we call “gene tree – species tree incongruence”. Different evolutionary events cause different effects on the underlying genetic diversity of a species (or group of species): while these may be predictable from the genetic sequence, different parts of the genome might not be as equally affected by the same exact process.

A classic example of this is hybridisation. If we have two initial species, which then hybridise with one another, we expect our resultant hybrids to be approximately made of 50% Species A DNA and 50% Species B DNA (if this is the first generation of hybrids formed; it gets a little more complicated further down the track). This means that, within the DNA sequence of the hybrid, 50% of it will reflect the history of Species A and the other 50% will reflect the history of Species B, which could differ dramatically. If we randomly sample a single gene in the hybrid, we will have no idea if that gene belongs to the genealogy of Species A or Species B, and thus we might make incorrect inferences about the history of the hybrid species.

Gene tree incongruence figure
A diagram of gene tree – species tree incongruence. Each individual coloured line represents a single gene as we trace it back through time; these are mostly bound within the limits of species divergences (the black borders). For many genes (such as the blue ones), the genes resemble the pattern of species divergences very well, albeit with some minor differences in how long ago the splits happened (at the top of the branches). However, the red genes contrast with this pattern, with clear movement across species (from and into B): this represents genes that have been transferred by hybridisation. The green line represents a gene affected by what we call incomplete lineage sorting; that is, we cannot trace it back far enough to determine exactly how/when it initially diverged and so there are still two separate green lines at the very top of the figure. You can think of each line as a separate phylogenetic tree, with the overarching species tree as the average pattern of all of the genes.

There are a number of other processes that could similarly alter our interpretations of evolutionary history based on analysing the genetic make-up of the species. The best way to handle this is simply to sample more genes: this way, the effect of variation of evolutionary history in individual genes is likely to be overpowered by the average over the entire gene pool. We interpret this as a set of individual gene trees contained within a species tree: although one gene might vary from another, the overall picture is clearer when considering all genes together.

Species delimitation

In earlier posts on The G-CAT, I’ve discussed the biogeographical patterns unveiled by my Honours research. Another key component of that paper involved using statistical modelling to determine whether cryptic species were present within the pygmy perches. I didn’t exactly elaborate on that in that section (mostly for simplicity), but this type of analysis is referred to as ‘species delimitation’. To try and simplify complicated analyses, species delimitation methods evaluate possible numbers and combinations of species within a particular dataset and provides a statistical value for which configuration of species is most supported. One program that employs species delimitation is Bayesian Phylogenetics and Phylogeography (BPP): to do this, it uses a plethora of information from the genetics of the individuals within the dataset. These include how long ago the different populations/species separated; which populations/species are most related to one another; and a pre-set minimum number of species (BPP will try to combine these in estimations, but not split them due to computational restraints). This all sounds very complex (and to a degree it is), but this allows the program to give you a statistical value for what is a species and what isn’t based on the genetics and statistical modelling.

Vittata cryptic species
The cryptic species of pygmy perches identified within my research paper. This represents part of the main phylogenetic tree result, with the estimates of divergence times from other analyses included. The pictures indicate the physiology of the different ‘species’: Nannoperca pygmaea is morphologically different to the other species of Nannoperca vittata. Species delimitation analysis suggested all four of these were genetically independent species; at the very least, it is clear that there must be at least 2 species of Nannoperca vittata since is more related to N. pygmaea than to other N. vittata species. Photo credits: N. vittata = Chris Lamin; N. pygmaea = David Morgan.

The end result of a BPP run is usually reported as a species tree (e.g. a phylogenetic tree describing species relationships) and statistical support for the delimitation of species (0-1 for each species). Because of the way the statistical component of BPP works, it has been found to give extremely high support for species identities. This has been criticised as BPP can, at time, provide high statistical support for genetically isolated lineages (i.e. divergent populations) which are not actually species.

Improving species identities with integrative taxonomy

Due to this particular drawback, and the often complex nature of species identity, using solely genetic information such as species delimitation to define species is extremely rare. Instead, we use a combination of different analytical techniques which can include genetic-based evaluations to more robustly assign and describe species. In my own paper example, we suggested that up to three ‘species’ of N. vittata that were determined as cryptic species by BPP could potentially exist pending on further analyses. We did not describe or name any of the species, as this would require a deeper delve into the exact nature and identity of these species.

As genetic data and analytical techniques improve into the future, it seems likely that our ability to detect and determine species boundaries will also improve. However, the additional supported provided by alternative aspects such as ecology, behaviour and morphology will undoubtedly be useful in the progress of taxonomy.

How did pygmy perch swim across the desert?

“Pygmy perch swam across the desert”

As regular readers of The G-CAT are likely aware, my first ever scientific paper was published this week. The paper is largely the results of my Honours research (with some extra analysis tacked on) on the phylogenomics (the same as phylogenetics, but with genomic data) and biogeographic history of a group of small, endemic freshwater fishes known as the pygmy perch. There are a number of different messages in the paper related to biogeography, taxonomy and conservation, and I am really quite proud of the work.

Southern_pygmy_perch 1 MHammer
A male southern pygmy perch, which usually measures 6-8 cm long.

To my honest surprise, the paper has received a decent amount of media attention following its release. Nearly all of these have focused on the biogeographic results and interpretations of the paper, which is arguably the largest component of the paper. In these media releases, the articles are often opened with “…despite the odds, new research has shown how a tiny fish managed to find its way across the arid Australian continent – more than once.” So how did they manage it? These are tiny fish, and there’s a very large desert area right in the middle of Australia, so how did they make it all the way across? And more than once?!

 The Great (southern) Southern Land

To understand the results, we first have to take a look at the context for the research question. There are seven officially named species of pygmy perches (‘named’ is an important characteristic here…but we’ll go into the details of that in another post), which are found in the temperate parts of Australia. Of these, three are found with southwest Western Australia, in Australia’s only globally recognised biodiversity hotspot, and the remaining four are found throughout eastern Australia (ranging from eastern South Australia to Tasmania and up to lower Queensland). These two regions are separated by arid desert regions, including the large expanse of the Nullarbor Plain.

Pygmyperch_distributionmap
The distributions of pygmy perch species across Australia. The dots and labels refer to different sampling sites used in the study. A: the distribution of western pygmy perches, and essentially the extent of the southwest WA biodiversity hotspot region. B: the distribution of eastern pygmy perches, excluding N. oxleyana which occurs in upper NSW/lower QLD (indicated in C). C: the distributions relative to the map of Australia. The black region in the middle indicates the Nullarbor Plain. 

 

The Nullarbor Plain is a remarkable place. It’s dead flat, has no trees, and most importantly for pygmy perches, it also has no standing water or rivers. The plain was formed from a large limestone block that was pushed up from beneath the Earth approximately 15 million years ago; with the progressive aridification of the continent, this region rapidly lost any standing water drainages that would have connected the east to the west. The remains of water systems from before (dubbed ‘paleodrainages’) can be seen below the surface.

Nullarbor Plain photo
See? Nothing here. Photo taken near Watson, South Australia. Credit: Benjamin Rimmer.

Biogeography of southern Australia

As one might expect, the formation of the Nullarbor Plain was a huge barrier for many species, especially those that depend on regular accessible water for survival. In many species of both plants and animals, we see in their phylogenetic history a clear separation of eastern and western groups around this time; once widely distributed species become fragmented by the plain and diverged from one another. We would most certainly expect this to be true of pygmy perch.

But our questions focus on what happened before the Nullarbor Plain arrived in the picture. More than 15 million years ago, southern Australia was a massively different place. The climate was much colder and wetter, even in central Australia, and we even have records of tropical rainforest habitats spreading all the way down to Victoria. Water-dependent animals would have been able to cross the southern part of the continent relatively freely.

Biogeography of the enigmatic pygmy perches

This is where the real difference between everything else and pygmy perch happens. For most species, we see only one east and west split in their phylogenetic tree, associated with the Nullarbor Plain; before that, their ancestors were likely distributed across the entire southern continent and were one continuous unit.

Not for pygmy perch, though. Our phylogenetic patterns show that there were multiple splits between eastern and western ancestral pygmy perch. We can see this visually within the phylogenetic tree; some western species of pygmy perches are more closely related, from an evolutionary perspective, to eastern species of pygmy perches than they are to other western species. This could imply a couple different things; either some species came about by migration from east to west (or vice versa), and that this happened at least twice, or that two different ancestral pygmy perches were distributed across all of southern Australia and each split east-west at some point in time. These two hypotheses are called “multiple invasion” and “geographic paralogy”, respectively.

MCC_geographylabelled
The phylogeny of pygmy perches produced by this study, containing 45 different individuals across all species of pygmy perch. Species are labelled in the tree in brackets, and their geographic location (east or west) is denoted by the colour on the right. This tree clearly shows more than one E/W separation, as not all eastern species are within the same clade. For example, despite being an eastern species, N. variegata is more closely related to Nth. balstoni or N. vittata than to the other eastern species (N. australisN. obscuraN. oxleyana and N. ‘flindersi’.

So, which is it? We delved deeper into this using a type of analysis called ‘ancestral clade reconstruction’. This tries to guess the likely distributions of species ancestors using different models and statistical analysis. Our results found that the earliest east-west split was due to the fragmentation of a widespread ancestor ~20 million years ago, and a migration event facilitated by changing waterways from the Nullarbor Plain pushing some eastern pygmy perches to the west to form the second group of western species. We argue for more than one migration across Australia since the initial ancestor of pygmy perches must have expanded from some point (either east or west) to encompass the entirety of southern Australia.

BGB_figure
The ancestral area reconstruction of pygmy perches, estimated using the R package BioGeoBEARS. The different pie charts denote the relative probability of the possible distributions for the species or ancestor at that particular time; colours denote exactly where the distribution is (following the legend). As you can see, the oldest E/W split at 21 million years ago likely resulted from a single widespread ancestor, with it’s range split into an east and west group. The second E/W event, at 15 million years ago, most likely reflects a migration from east to west, resulting in the formation of the N. vittata species group. This coincides with the Nullarbor Plain, so it’s likely that changes in waterway patterns allowed some eastern pygmy perch to move westward as the area became more arid.

So why do we see this for pygmy perch and no other species? Well, that’s the real mystery; out of all of the aquatic species found in southeast and southwest Australia, pygmy perch are one of the worst at migrating. They’re very picky about habitat, small, and don’t often migrate far unless pushed (by, say, a flood). It is possible that unrecorded extinct species of pygmy perch might help to clarify this a little, but the chances of finding a preserved fish fossil (let alone for a fish less than 8cm in size!) is extremely unlikely. We can really only theorise about how they managed to migrate.

Pygmy perch biogeo history
A diagram of the distribution of pygmy perch species over time, as suggested by the ancestral area reconstruction. A: the initial ancestor of pygmy perches was likely found throughout southern Australia. B: an unknown event splits the ancestor into an eastern and western group; the sole extant species of the W group is Nth. balstoniC: the ancestor of the eastern pygmy perches spreads towards the west, entering part of the pre-Nullarbor region. D: due to changes in the hydrology of the area, some eastern pygmy perches (the maroon colour in C) are pushed towards the west; these form N. vittata species and N. pygmaea. The Nullarbor Plain forms and effectively cuts off the two groups from one another, isolating them.

What does this mean for pygmy perches?

Nearly all species of pygmy perch are threatened or worse in the conservation legislation; there have been many conservation efforts to try and save the worst-off species from extinction. Pygmy perches provide a unique insight to the history of the Australian climate and may be a key in unlocking some of the mysteries of what our land was like so long ago. Every species is important for conservation and even those small, hard-to-notice creatures that we might forget about play a role in our environmental history.

The direction of evolution: divergence vs. convergence

Direction of evolution

We’ve talked previously on The G-CAT about how the genetic underpinning of certain evolutionary traits can change in different directions depending on the selective pressure it is under. Particularly, we can see how the frequency of different alleles might change in one direction or another, or stabilise somewhere in the middle, depending on its encoded trait. But thinking bigger picture than just the genetics of one trait, we can actually see that evolution as an entire process works rather similarly.

Divergent evolution

The classic view of the direction of evolution is based on divergent evolution. This is simply the idea that a particular species possess some ancestral trait. The species (or population) then splits into two (for one reason or another), and each one of these resultant species and populations evolves in a different way to the other. Over time, this means that their traits are changing in different directions, but ultimately originate from the same ancestral source.

Evidence for divergent evolution is rife throughout nature, and is a fundamental component of all of our understanding of evolution. Divergent evolution means that, by comparing similar traits in two species (called homologous traits), we can trace back species histories to common ancestors. Some impressive examples of this exist in nature, such as the number of bones in most mammalian species. Humans have the same number of neck bones as giraffes; thus, we can suggest that the ancestor of both species (and all mammals) probably had a similar number of neck bones. It’s just that the giraffe lineage evolved longer bones whereas other lineages did not.

Homology figure
A diagrammatic example of homologous structures in ‘hand’ bones. The coloured bones demonstrate how the same original bone structures have diverged into different forms. Source: BiologyWise.

Convergent evolution

But of course, evolution never works as simply as you want it to, and sometimes we can get the direct opposite pattern. This is called convergent evolution, and occurs when two completely different species independently evolve very similar (sometimes practically identical) traits. This is often caused by a limitation of the environment; some extreme demand of the environment requires a particular physiological solution, and thus all species must develop that trait in order to survive. An example of this would be the physiology of carnivorous marsupials like Tasmanian devils or thylacines: despite being in another Class, their body shapes closely resemble something more canid. Likely, the carnivorous diet places some constraints on physiology, particularly jaw structure and strength.

Convergent evol intelligence
A surprising example of convergent evolution is cognitive ability in apes and some bird groups (e.g. corvids). There’s plenty of other animal groups more related to each of these that don’t demonstrate the same level of cognitive reasoning (based on the traits listed in the centre): thus, we can conclude that cognition has evolved twice in very, very different lineages. Source: Emery & Clayton, 2004.

A more dramatic (and potentially obvious) example of convergent evolution would be wings and the power of flight. Despite the fact that butterflies, bees, birds and bats all have wings and can fly, most of them are pretty unrelated to one another. It seems much more likely that flight evolved independently multiple times, rather than the other 99% of species that shared the same ancestor lost the capacity of flight.

Parallel evolution

Sometimes convergent evolution can work between two species that are pretty closely related, but still evolved independently of one another. This is distinguished from other categories of evolution as parallel evolution: the main difference is that while both species may have shared the same start and end point, evolution has acted on each one independent of the other. This can make it very difficult to diagnose from convergent evolution, and is usually determined by the exact history of the trait in question.

Parallel evolution is an interesting field of research for a few reasons. Firstly, it provides a scenario in which we can more rigorously test expectations and outcomes of evolution in a particular environment. For example, if we find traits that are parallel in a whole bunch of fish species in a particular region, we can start to look at how that particular environment drives evolution across all fish species, as opposed to one species case studies.

Marsupial handedness.jpg
Here’s another weird example; different populations of marsupials (particularly kangaroos and wallabies) show preferential handedness depending on where the population is. That is, different populations of different species of marsupials shows parallel evolution of handedness, since they’re related to one another but have evolved it independently of the other species. Source: Giljov et al. (2015).

Following from that logic, it is then important to question the mechanisms of parallelism. From a genetic point of view, do these various species use the same genes (and genetic variants) to produce the same identical trait? Or are there many solutions to the selective question in nature? While these questions are rather complicated, and there has been plenty of evidence both for and against parallel genetic underpinning of parallel traits, it seems surprisingly often that many different genetic combinations can be used to get the same result. This gives interesting insight into how complex genetic coding of traits can be, and how creative and diverse evolution can be in the real world.

Where is evolution going?

Cat phylogeny
An example of all three types of evolutionary trajectory in a single phylogeny of cats (you know how we do it here at The G-CAT). This phylogeny consists of two distinct genera; one with one species (P. aliquam) and another of three species (the red box indicates their distance). Our species have three main physical traits: coat colour, ear tufts and tail shape. At the ancestral nodes of the tree, we can see what the ancestor of these species looked like for these three traits. Each of these traits has undergone a different type of evolution. The tufts on the ears are the result of divergent evolution, since F. tuftus evolved the trait differently to its nearest relative, F. griseo. Contrastingly, the orange coat colour of F. tuftus and P. aliquam are the result of convergent evolution: neither of these species are very closely related (remembering the red box) and evolved orange coats independently of one another (since their ancestors are grey). And finally, the fluffy tails of F. hispida and F. griseo can be considered parallel evolution, since they’re similar evolutionarily (same genus) but still each evolved tail fluff independently (not in the ancestor). This example is a little convoluted, but if you trace the history of each trait in the phylogeny you can more easily see these different patterns.

So, where is evolution going for nature? Well, the answer is probably all over the place, but steered by the current environmental circumstances. Predicting the evolutionary impacts of particular environmental change (e.g. climate change) is exceedingly difficult but a critical component of understanding the process of evolution and the future of species. Evolution continually surprises us with creative solution to complex problems and I have no doubt new mysteries will continue to be thrown at us as we delve deeper.

Age and dating with phylogenetics

Timing the phylogeny

Understanding the evolutionary history of species can be a complicated matter, both from theoretical and analytical perspectives. Although phylogenetics addresses many questions about evolutionary history, there are a number of limitations we need to consider in our interpretations.

One of these limitations we often want to explore in better detail is the estimation of the divergence times within the phylogeny; we want to know exactly when two evolutionary lineages (be they genera, species or populations) separated from one another. This is particularly important if we want to relate these divergences to Earth history and environmental factors to better understand the driving forces behind evolution and speciation. A traditional phylogenetic tree, however, won’t show this: the tree is scaled in terms of the genetic differences between the different samples in the tree. The rate of genetic differentiation is not always a linear relationship with time and definitely doesn’t appear to be universal.

 

Anatomy of phylogenies.jpg
The general anatomy of a phylogenetic tree. A phylogeny describes the relationships of tips (i.e. which are more closely related than others; referred to as the topology), how different these tips are (the length of the branches) and the order they separated in time (separations shown by the nodes). Different trees can share some traits but not others: the red box shows two phylogenetic trees with similar branch lengths (all of the branches are roughly the same) but different topology (the tips connect differently: A and B are together on the left but not on the right, for example). Conversely, two trees can have the same topology, but show differing lengths in the branches of the same tree (blue box). Note that the tips are all in the same positions in these two trees. Typically, it’s easier to read a tree from right to left: the two tips who have branches that meet first are most similar genetically; the longer it takes for two tips to meet along the branches, the less similar they are genetically.

How do we do it?

The parameters

There are a number of parameters that are required for estimating divergence times from a phylogenetic tree. These can be summarised into two distinct categories: the tree model and the substitution model.

The first one of these is relatively easy to explain; it describes the exact relationship of the different samples in our dataset (i.e. the phylogenetic tree). Naturally, this includes the topology of the tree (which determines which divergences times can be estimated for in the first place). However, there is another very important factor in the process: the lengths of the branches within the phylogenetic tree. Branch lengths are related to the amount of genetic differentiation between the different tips of the tree. The longer the branch, the more genetic differentiation that must have accumulated (and usually also meaning that longer time has occurred from one end of the branch to the other). Even two phylogenetic trees with identical topology can give very different results if they vary in their branch lengths (see the above Figure).

The second category determines how likely mutations are between one particular type of nucleotide and another. While the details of this can get very convoluted, it essentially determines how quickly we expect certain mutations to accumulate over time, which will inevitably alter our predictions of how much time has passed along any given branch of the tree.

Calibrating the tree

However, at least one another important component is necessary to turn divergence time estimates into absolute, objective times. An external factor with an attached date is needed to calibrate the relative branch divergences; this can be in the form of the determined mutation rate for all of the branches of the tree or by dating at least one node in the tree using additional information. These help to anchor either the mutation rate along the branches or the absolute date of at least one node in the tree (with the rest estimated relative to this point). The second method often involves placing a time constraint on a particular node of the tree based on prior information about the biogeography of the species (for example, we might know one species likely diverged from another after a mountain range formed: the age of the mountain range would be our constraints). Alternatively, we might include a fossil in the phylogeny which has been radiocarbon dated and place an absolute age on that instead.

Ammonite comic.jpg
Don’t you know it’s rude to ask an ammomite her age?

In regards to the former method, mutation rates describe how fast genetic differentiation accumulates as evolution occurs along the branch. Although mutations gradually accumulate over time, the rate at which they occur can depend on a variety of factors (even including the environment of the organism). Even within the genome of a single organism, there can be variation in the mutation rate: genes, for example, often gain mutations slower than non-coding region.

Although mutation rates (generally in the form of a ‘molecular clock’) have been traditionally used in smaller datasets (e.g. for mitochondrial DNA), there are inherent issues with its assumptions. One is that this rate will apply to all branches in a tree equally, when different branches may have different rates between them. Second, different parts of the genome (even within the same individual) will have different evolutionary rates (like genes vs. non-coding regions). Thus, we tend to prefer using calibrations from fossil data or based on biogeographic patterns (such as the time a barrier likely split two branches based on geological or climatic data).

The analytical framework

All of these components are combined into various analytical frameworks or programs, each of which handle the data in different ways. Many of these are Bayesian model-based analysis, which in short generates hypothetical models of evolutionary history and divergence times for the phylogeny and tests how well it fits the data provided (i.e. the phylogenetic tree). The algorithm then alters some aspect(s) of the model and tests whether this fits the data better than the previous model and repeats this for potentially millions of simulations to get the best model. Although models are typically a simplification of reality, they are a much more tractable approach to estimating divergence times (as well as a number of other types of evolutionary genetics analyses which incorporating modelling).

Molecular dating pipeline
A (believe it or not, simplified) pipeline for estimating divergence times from a phylogeny. 1) We obtain our DNA sequences for our samples: in this example, we’ll see each Sample (A-E) is a representative of a single species. We align these together to make sure we’re comparing the same part of the genome across all of them. 2) We estimate the phylogenetic tree for our samples/species. In a Bayesian framework, this means creating simulation models containing a certain substitution model and a given tree model (containing certain topology and branch lengths). Together, these two models form the likelihood model: we then test how well this model explains our data (i.e. the likelihood of getting the patterns in our data if this model was true). We repeat these simulations potentially hundreds of thousands of times until we pinpoint the most likely model we can get. 3) Using our resulting phylogeny, we then calibrate some parts of it based on external information. This could either be by including a carbon-dated fossil (F) within the phylogeny, or constraining the age of one node based on biogeographic information (the red circle and cross). 4) Using these calibrations as a reference, we then estimated the most likely ages of all the splits in the tree, getting our final dated phylogeny.

Despite the developments in the analytical basis of estimating divergence times in the last few decades, there are still a number of limitations inherent in the process. Many of these relate to the assumptions of the underlying model (such as the correct and accurate phylogenetic tree and the correct estimations of evolutionary rate) used to build the analysis and generate simulations. In the case of calibrations, it is also critical that they are correctly dated based on independent methods: inaccurate radiocarbon dating of a fossil, for example, could throw out all of the estimations in the entire tree. That said, these factors are intrinsic to any phylogenetic analysis and regularly considered by evolutionary biologists in the interpretations and discussions of results (such as by including confidence intervals of estimations to demonstrate accuracy).

Understanding the temporal aspects of evolution and being able to relate them to a real estimate of age is a difficult affair, but an important component of many evolutionary studies. Obtaining good estimates of the timing of divergence of populations and species through molecular dating is but one aspect in building the picture of the history of all organisms, including (and especially) humans.