Age and dating with phylogenetics

Timing the phylogeny

Understanding the evolutionary history of species can be a complicated matter, both from theoretical and analytical perspectives. Although phylogenetics addresses many questions about evolutionary history, there are a number of limitations we need to consider in our interpretations.

One of these limitations we often want to explore in better detail is the estimation of the divergence times within the phylogeny; we want to know exactly when two evolutionary lineages (be they genera, species or populations) separated from one another. This is particularly important if we want to relate these divergences to Earth history and environmental factors to better understand the driving forces behind evolution and speciation. A traditional phylogenetic tree, however, won’t show this: the tree is scaled in terms of the genetic differences between the different samples in the tree. The rate of genetic differentiation is not always a linear relationship with time and definitely doesn’t appear to be universal.

 

Anatomy of phylogenies.jpg
The general anatomy of a phylogenetic tree. A phylogeny describes the relationships of tips (i.e. which are more closely related than others; referred to as the topology), how different these tips are (the length of the branches) and the order they separated in time (separations shown by the nodes). Different trees can share some traits but not others: the red box shows two phylogenetic trees with similar branch lengths (all of the branches are roughly the same) but different topology (the tips connect differently: A and B are together on the left but not on the right, for example). Conversely, two trees can have the same topology, but show differing lengths in the branches of the same tree (blue box). Note that the tips are all in the same positions in these two trees. Typically, it’s easier to read a tree from right to left: the two tips who have branches that meet first are most similar genetically; the longer it takes for two tips to meet along the branches, the less similar they are genetically.

How do we do it?

The parameters

There are a number of parameters that are required for estimating divergence times from a phylogenetic tree. These can be summarised into two distinct categories: the tree model and the substitution model.

The first one of these is relatively easy to explain; it describes the exact relationship of the different samples in our dataset (i.e. the phylogenetic tree). Naturally, this includes the topology of the tree (which determines which divergences times can be estimated for in the first place). However, there is another very important factor in the process: the lengths of the branches within the phylogenetic tree. Branch lengths are related to the amount of genetic differentiation between the different tips of the tree. The longer the branch, the more genetic differentiation that must have accumulated (and usually also meaning that longer time has occurred from one end of the branch to the other). Even two phylogenetic trees with identical topology can give very different results if they vary in their branch lengths (see the above Figure).

The second category determines how likely mutations are between one particular type of nucleotide and another. While the details of this can get very convoluted, it essentially determines how quickly we expect certain mutations to accumulate over time, which will inevitably alter our predictions of how much time has passed along any given branch of the tree.

Calibrating the tree

However, at least one another important component is necessary to turn divergence time estimates into absolute, objective times. An external factor with an attached date is needed to calibrate the relative branch divergences; this can be in the form of the determined mutation rate for all of the branches of the tree or by dating at least one node in the tree using additional information. These help to anchor either the mutation rate along the branches or the absolute date of at least one node in the tree (with the rest estimated relative to this point). The second method often involves placing a time constraint on a particular node of the tree based on prior information about the biogeography of the species (for example, we might know one species likely diverged from another after a mountain range formed: the age of the mountain range would be our constraints). Alternatively, we might include a fossil in the phylogeny which has been radiocarbon dated and place an absolute age on that instead.

Ammonite comic.jpg
Don’t you know it’s rude to ask an ammomite her age?

In regards to the former method, mutation rates describe how fast genetic differentiation accumulates as evolution occurs along the branch. Although mutations gradually accumulate over time, the rate at which they occur can depend on a variety of factors (even including the environment of the organism). Even within the genome of a single organism, there can be variation in the mutation rate: genes, for example, often gain mutations slower than non-coding region.

Although mutation rates (generally in the form of a ‘molecular clock’) have been traditionally used in smaller datasets (e.g. for mitochondrial DNA), there are inherent issues with its assumptions. One is that this rate will apply to all branches in a tree equally, when different branches may have different rates between them. Second, different parts of the genome (even within the same individual) will have different evolutionary rates (like genes vs. non-coding regions). Thus, we tend to prefer using calibrations from fossil data or based on biogeographic patterns (such as the time a barrier likely split two branches based on geological or climatic data).

The analytical framework

All of these components are combined into various analytical frameworks or programs, each of which handle the data in different ways. Many of these are Bayesian model-based analysis, which in short generates hypothetical models of evolutionary history and divergence times for the phylogeny and tests how well it fits the data provided (i.e. the phylogenetic tree). The algorithm then alters some aspect(s) of the model and tests whether this fits the data better than the previous model and repeats this for potentially millions of simulations to get the best model. Although models are typically a simplification of reality, they are a much more tractable approach to estimating divergence times (as well as a number of other types of evolutionary genetics analyses which incorporating modelling).

Molecular dating pipeline
A (believe it or not, simplified) pipeline for estimating divergence times from a phylogeny. 1) We obtain our DNA sequences for our samples: in this example, we’ll see each Sample (A-E) is a representative of a single species. We align these together to make sure we’re comparing the same part of the genome across all of them. 2) We estimate the phylogenetic tree for our samples/species. In a Bayesian framework, this means creating simulation models containing a certain substitution model and a given tree model (containing certain topology and branch lengths). Together, these two models form the likelihood model: we then test how well this model explains our data (i.e. the likelihood of getting the patterns in our data if this model was true). We repeat these simulations potentially hundreds of thousands of times until we pinpoint the most likely model we can get. 3) Using our resulting phylogeny, we then calibrate some parts of it based on external information. This could either be by including a carbon-dated fossil (F) within the phylogeny, or constraining the age of one node based on biogeographic information (the red circle and cross). 4) Using these calibrations as a reference, we then estimated the most likely ages of all the splits in the tree, getting our final dated phylogeny.

Despite the developments in the analytical basis of estimating divergence times in the last few decades, there are still a number of limitations inherent in the process. Many of these relate to the assumptions of the underlying model (such as the correct and accurate phylogenetic tree and the correct estimations of evolutionary rate) used to build the analysis and generate simulations. In the case of calibrations, it is also critical that they are correctly dated based on independent methods: inaccurate radiocarbon dating of a fossil, for example, could throw out all of the estimations in the entire tree. That said, these factors are intrinsic to any phylogenetic analysis and regularly considered by evolutionary biologists in the interpretations and discussions of results (such as by including confidence intervals of estimations to demonstrate accuracy).

Understanding the temporal aspects of evolution and being able to relate them to a real estimate of age is a difficult affair, but an important component of many evolutionary studies. Obtaining good estimates of the timing of divergence of populations and species through molecular dating is but one aspect in building the picture of the history of all organisms, including (and especially) humans.

“Who Do You Think You Are?”: studying the evolutionary history of species

The constancy of evolution

Evolution is a constant, endless force which seeks to push and shape species based on the context of their environment: sometimes rapidly, sometimes much more gradually. Although we often think of discrete points of evolution (when one species becomes two, when a particular trait evolves), it is nevertheless a continual force that influences changes in species. These changes are often difficult to ‘unevolve’ and have a certain ‘evolutionary inertia’ to them; because of these factors, it’s often critical to understand how a history of evolution has generated the organisms we see today.

What do I mean when I say evolutionary history? Well, the term is fairly diverse and can relate to the evolution of particular traits or types of traits, or the genetic variation and changes related to these changes. The types of questions and points of interest of evolutionary history can depend at which end of the timescale we look at: recent evolutionary histories, and the genetics related to them, will tell us different information to very ancient evolutionary histories. Let’s hop into our symbolic DeLorean and take a look back in time, shall we?

Labelled_evolhistory
A timeslice of evolutionary history (a pseudo-phylogenetic tree, I guess?), going from more recent history (bottom left) to deeper history (top right). Each region denoted in the tree represents the generally area of focus for each of the following blog headings. 1: Recent evolutionary history might look at individual pedigrees, or comparing populations of a single species. 2: Slightly older comparisons might focus on how species have arisen, and the factors that drive this (part of ‘phylogeography’). 3: Deep history might focus on the origin of whole groups of organisms and a focus on the evolution of particular traits like venom or sociality.

Very recent evolutionary history: pedigrees and populations

While we might ordinarily consider ‘evolutionary history’ to refer to events that happened thousands or millions of years ago, it can still be informative to look at history just a few generations ago. This often involves looking at pedigrees, such as in breeding programs, and trying to see how very short term and rapid evolution may have occurred; this can even include investigating how a particular breeding program might accidentally be causing the species to evolve to adapt to captivity! Rarely does this get referred to as true evolutionary history, but it fits on the spectrum, so I’m going to count it. We might also look at how current populations are evolving differently to one another, to try and predict how they’ll evolve into the future (and thus determine which ones are most at risk, which ones have critically important genetic diversity, and the overall survivability of the total species). This is the basis of ‘evolutionarily significant units’ or ESUs which we previously discussed on The G-CAT.

Captivefishcomic
Maybe goldfish evolved 3 second memory to adapt to the sheer boringness of captivity? …I’m joking, of course: the memory thing is a myth and adaptation works over generations, not a lifetime.

A little further back: phylogeography and species

A little further back, we might start to look at how different populations have formed or changed in semi-recent history (usually looking at the effect of human impacts: we’re really good at screwing things up I’m sorry to say). This can include looking at how populations have (or have not) adapted to new pressures, how stable populations have been over time, or whether new populations are being ‘made’ by recent barriers. At this level of populations and some (or incipient) species, we can find the field of ‘phylogeography’, which involves the study of how historic climate and geography have shaped the evolution of species or caused new species to evolve.

Evolution of salinity
An example of trait-based phylogenetics, looking at the biogeographic patterns and evolution/migration to freshwater in perch-like fishes, by Chen et al. (2014). The phylogeny shows that a group of fishes adapted to freshwater environments (black) from a (likely) saltwater ancestor (white), with euryhaline tolerance evolving two separate times (grey).

One high profile example of phylogeographic studies is the ‘Out of Africa’ hypothesis and debate for the origination of the modern human species. Although there has been no shortage of debate about the origin of modern humans, as well as the fate of our fellow Neanderthals and Denisovans, the ‘Out of Africa’ hypothesis still appears to be the most supported scenario.

human phylogeo
A generalised diagram of the ‘Out of Africa’ hypothesis of human migration, from Oppenheimer, 2012. 

Phylogeography is also component for determining and understanding ‘biodiversity hotspots’; that is, regions which have generated high levels of species diversity and contain many endemic species and populations, such as tropical hotspots or remote temperate regions. These are naturally of very high conservation value and contribute a huge amount to Earth’s biodiversity, ecological functions and potential for us to study evolution in action.

Deep, deep history: phylogenetics and the origin of species (groups)

Even further back, we start to delve into the more traditional concept of evolutionary history. We start to look at how species have formed; what factors caused them to become new species, how stable the new species are, and what are the genetic components underlying the change. This subfield of evolution is called ‘phylogenetics’, and relates to understanding how species or groups of species have evolved and are related to one another.

Sometimes, this includes trying to look at how particular diagnostic traits have evolved in a certain group, like venom within snakes or eusocial groups in bees. Phylogenetic methods are even used to try and predict which species of plants might create compounds which are medically valuable (like aspirin)! Similarly, we can try and predict how invasive a pest species may be based on their phylogenetic (how closely related the species are) and physiological traits in order to safeguard against groups of organisms that are likely to run rampant in new environments. It’s important to understand how and why these traits have evolved to get a good understanding of exactly how the diversity of life on Earth came about.

evolution of venom
An example of looking at trait evolution with phylogenetics, focusing on the evolution of venom in snakes, from Reyes-Velasco et al. (2014). The size of the boxes demonstrates the number of species in each group, with the colours reflecting the number of venomous (red) vs. non-venomous (grey) species. The red dot shows the likely origin of venom.

Phylogenetics also allows us to determine which species are the most ‘evolutionarily unique’; all the special little creatures of plant Earth which represent their own unique types of species, such as the tuatara or the platypus. Naturally, understanding exactly how precious and unique these species are suggests we should focus our conservation attention and particularly conserve them, since there’s nothing else in the world that even comes close!

Who cares what happened in the past right? Well, I do, and you should too! Evolution forms an important component of any conservation management plan, since we obviously want to make sure our species can survive into the future (i.e. adapt to new stressors). Trying to maintain the most ‘evolvable’ groups, particularly within breeding programs, can often be difficult when we have to balance inbreeding depression (not having enough genetic diversity) with outbreeding depression (obscuring good genetic diversity by adding bad genetic diversity into the gene pool). Often, we can best avoid these by identifying which populations are evolutionarily different to one another (see ESUs) and using that as a basis, since outbreeding vs. inbreeding depression can be very difficult to measure. This all goes back to the concept of ‘adaptive potential’ that we’ve discussed a few times before.

In any case, a keen understanding of the evolutionary trajectory of a species is a crucial component for conservation management and to figure out the processes and outcomes of evolution in the real world. Thus, evolutionary history remains a key area of research for both conservation and evolution-related studies.

 

Welcome to The G-CAT!

Hi all! Welcome to The Genetics Cat, or The G-CAT for short! This blog was initially started as a way for me to not only practice writing and communicating science to the general public, but also as an avenue for me to share scientific research that I’m interested in to a broader community. As one might expect, this blog will predominantly feature discussions of evolution, ecology and genetics in a (hopefully) digestible manner. I will try to keep the topics broad to encompass a range of interests, but I undoubtedly have a bias towards conservation and evolutionary genetics…that said, if you have suggestions for content you’d like to see, please request away! I will try my absolute best to facilitate them!

You may be shocked to discover that this blog is, in fact, not written by a cat. In fact, I don’t even study cats. I’m sorry to burst that bubble for you. My real name is Sean Buckley, and I’m a PhD student within the Molecular Ecology Lab of Flinders University (MELFU) in Adelaide, South Australia. My research involves using large-scale genetic data to investigate the evolutionary history of a group of rather cute, and very endangered, small endemic freshwater fish known as the pygmy perches.

Yarra pygmy perch
One of the charismatic critters I work with! This is a Yarra pygmy perch, who is currently a founder of a genetics-based captive breeding program for a population that is now extinct in the wild.

Specifically, my research aims to use genomic data and complex statistical modelling to see how some species of pygmy perches have changed over time. Particularly, I will look at how their population sizes, genetic connectivity and distributions have changed throughout history, and how these relate to changes in the climate, geology and hydrology of their habitats. My research will help to address historical patterns of genetic diversity and evolution in freshwater organisms across Australia, as well as inform conservation management of modern pygmy perches.

Prior to my PhD, I also did an Honours thesis on a similar topic, but focusing on the broad evolutionary (phylogenetic) relationships of pygmy perches. These patterns were related to historic environmental factors across the continent of Australia. Furthermore, through my Honours research, I discovered that one species of pygmy perch is actually three genetically distinct but physically indistinguishable species! My PhD will expand on these to (hopefully) start to suggest some of the environmental and spatial factors that may have influenced this previously hidden diversity of species.

Without further ado, welcome to The G-CAT!