Unravelling the evolutionary history of organisms – one of the main goals of phylogenetic research – remains a challenging prospect due to a number of theoretical and analytical aspects. Particularly, trying to reconstruct evolutionary patterns based on current genetic data (the most common way phylogenetic trees are estimated) is prone to the erroneous influence of some secondary factors. One of these is referred to as ‘incomplete lineage sorting’, which can have a major effect on how phylogenetic relationships are estimated and the statistical confidence we may have around these patterns. Today, we’re going to take a look at incomplete lineage sorting (shortened to ILS for brevity herein) using a game-based analogy – a Pachinko machine. Or, if you’d rather, the same general analogy also works for those creepy clown carnival games, but I prefer the less frightening alternative.
It’s been a few minutes (okay, several weeks) since the last post here on The G-CAT. Naturally, over that time I’ve spent holidays with both my own family and my partner’s family. Hopefully, you’ve enjoyed your own Christmas/New Year/Other-Non-Denominational-Celebrations break (and for us Aussies, that you’ve managed to avoid much of the devastation of the recent bushfire epidemic).
Because of this period of time (and a few other more pressing deadlines I had for the start of the new year), I haven’t prepared a new post in some time. However, I’d like to take this time to address how the nature of this blog might change of the next year or so (and into the future).
A new schedule
For those of you who keep more up-to-date with my academic progress, you’ll be aware that this year is the final year of my PhD. As it stands, I’m due to submit my thesis in August of this year (which feels much, much sooner than it really is). Similarly, for anyone who has ever interacted with a PhD student in the final year of their studies, you’ll also be aware that this can be a time of high stress, stacking deadlines and the overall impending doom of D-Day (thesis submission).
In light of all of this, I have decided to move away from the more predictable fortnightly post routine in favour of a more organic timetable. This will likely mean a fairly significant reduction in the frequency of blog posts, whereby I will post as a topic comes into my mind or when it appears relevant to other parts of my studies (e.g. in reading for writing manuscripts, etc.). This decision has also been playing on my mind for some time to also balance the quality of the posts I write: in some circumstances, I feel like the consistent deadline of once per fortnight causes some posts to suffer a little as I rush to produce something at least in the vicinity of every second Wednesday.
The future of The G-CAT
It is not my intention to completely abandon this project: The G-CAT is something that I have invested a fair time and inspiration into and provides a solid avenue for science communication. As always, my inbox (on whichever platform you choose) is wide open for suggestions on topics of discussion. I’m looking forward to a more organic schedule that will allow me to properly explore and expand on the topics of interest whilst maintaining a healthy balance of PhD progression and down-time.
Further to this, we can expand the site-frequency spectrum to compare across populations. Instead of having a simple 1-dimensional frequency distribution, for a pair of populations we can have a grid. This grid specifies how often a particular allele occurs at a certain frequency in Population A and at a different frequency in Population B. This can also be visualised quite easily, albeit as a heatmap instead. We refer to this as the 2-dimensional SFS (2DSFS).
The same concept can be expanded to even more populations, although this gets harder to represent visually. Essentially, we end up with a set of different matrices which describe the frequency of certain alleles across all of our populations, merging them together into the joint SFS. For example, a joint SFS of 4 populations would consist of 6 (4 x 4 total comparisons – 4 self-comparisons, then halved to remove duplicate comparisons) 2D SFSs all combined together. To make sense of this, check out the diagrammatic tables below.
The different forms of the SFS
Which alleles we choose to use within our SFS is particularly important. If we don’t have a lot of information about the genomics or evolutionary history of our study species, we might choose to use the minor allele frequency (MAF). Given that SNPs tend to be biallelic, for any given locus we could have Allele A or Allele B. The MAF chooses the least frequent of these two within the dataset and uses that in the summary SFS: since the other allele’s frequency would just be 2N – the frequency of the other allele, it’s not included in the summary. An SFS made of the MAF is also referred to as the folded SFS.
Alternatively, if we know some things about the genetic history of our study species, we might be able to divide Allele A and Allele B into derived or ancestral alleles. Since SNPs often occur as mutations at a single site in the DNA, one allele at the given site is the new mutation (the derived allele) whilst the other is the ‘original’ (the ancestral allele). Typically, we would use the derived allele frequency to construct the SFS, since under coalescent theory we’re trying to simulate that mutation event. An SFS made of the derived alleles only is also referred to as the unfolded SFS.
Applications of the SFS
How can we use the SFS? Well, it can moreorless be used as a summary of genetic variation for many types of coalescent-based analyses. This means we can make inferences of demographic history (see here for more detailed explanation of that) without simulating large and complex genetic sequences and instead use the SFS. Comparing our observed SFS to a simulated scenario of a bottleneck and comparing the expected SFS allows us to estimate the likelihood of that scenario.
The SFS can even be used to detect alleles under natural selection. For strongly selected parts of the genome, alleles should occur at either high (if positively selected) or low (if negatively selected) frequency, with a deficit of more intermediate frequencies.
Adding to the analytical toolbox
The SFS is just one of many tools we can use to investigate the demographic history of populations and species. Using a combination of genomic technologies, coalescent theory and more robust analytical methods, the SFS appears to be poised to tackle more nuanced and complex questions of the evolutionary history of life on Earth.
Meaning: Octorokus from [octorok] in Hylian; infletus from [inflate] in Latin.
Translation: inflating octorok; all varieties use an inflatable air sac derived from the swim bladder to float and scan the horizon.
Octorokus infletus hydros [aquatic morphotype]
Octorokus infletus petram [mountain morphotype]
Octorokus infletus silva [forest morphotype]
Octorokus infletus arctus [snow morphotype]
Octorokus infletus imitor [deceptive morphotype]
Kingdom Animalia; Phylum Mollusca; Class Cephalapoda; Order Octopoda; Family Octopididae; GenusOctorokus; Speciesinfletus
The species is found throughout all major habitat regions of Hyrule, with localised morphotypes found within specific habitats. The only major region where the variable octorok is not found is within the Gerudo Desert, suggesting some remnant dependency of standing water.
Habitat choice depends on the physiology of the morphotype; so long as the environment allows the octorok to blend in, it is highly likely there are many around (i.e. unseen).
Behaviour and ecology
The variable octorok is arguably one of the most diverse species within modern Hyrule, exhibiting a large number of different morphotypic forms and occurring in almost all major habitat zones. Historical data suggests that the water octorok (Octorokus infletus hydros) is the most ancestral morphotype, with ancient literature frequently referring to them as sea-bearing or river-traversing organisms. Estimates from the literature suggests that their adaptation to land-based living is a recent evolutionary step which facilitated rapid morphological radiation of the lineage.
Several physiological characteristics unite the variable morphological forms of the octorok into a single identifiable species. Other than the typical body structure of an octopod (eight legs, largely soft body with an elongated mantle region), the primary diagnostic trait of the octorok is the presence of a large ‘balloon’ with the top of the mantle. This appears to be derived from the swim bladder of the ancestral octorok, which has shifted to the cranial region. The octorok can inflate this balloon using air pumped through the gills, filling it and lifting the octorok into the air. All morphotypes use this to scan the surrounding region to identify prey items, including attacking people if aggravated.
Diets of the octorok vary depending on the morphotype and based on the ecological habitat; adaptations to different ecological niches is facilitated by a diverse and generalist diet.
Although limited information is available on the amount of gene flow and population connectivity between different morphotypes, by sheer numbers alone it would appear the variable octorok is highly abundant. Some records of interactions between morphotypes (such as at the water’s edge within forested areas) implies that the different types are not reproductively isolated and can form hybrids: how this impacts resultant hybrid morphotypes and development is unknown. However, given the propensity of morphotypes to be largely limited to their adaptive habitats, it would seem reasonable to assume that some level of population structure is present across types.
The variable octorok appears remarkably diverse in physiology, although the recent nature of their divergence and the observed interactions between morphological types suggests that they are not reproductively isolated. Whether these are the result of phenotypic plasticity, and environmental pressures are responsible for associated physiological changes to different environments, or genetically coded at early stages of development is unknown due to the cryptic nature of octorok spawning.
All octoroks employ strong behavioural and physiological traits for camouflage and ambush predation. Vegetation is usually placed on the top of the cranium of all morphotypes, with the exact species of plant used dependent on the environment (e.g. forest morphotypes will use grasses or ferns, whilst mountain morphotypes will use rocky boulders). The octorok will then dig beneath the surface until just the vegetation is showing, effectively blending in with the environment and only occasionally choosing to surface by using the balloon. Whether this behaviour is passed down genetically or taught from parents is unclear.
Few management actions are recommended for this highly abundant species. However, further research is needed to better understand the highly variable nature and the process of evolution underpinning their diverse morphology. Whether morphotypes are genetically hardwired by inheritance of determinant genes, or whether alterations in gene expression caused by the environmental context of octoroks (i.e. phenotypic plasticity) provides an intriguing avenue of insight into the evolution of Hylian fauna.
Nevertheless, the transition from the marine environment onto the terrestrial landscape appears to be a significant stepping stone in the radiation of morphological structures within the species. How this has been facilitated by the genetic architecture of the octorok is a mystery.
But the real question is: why are there so many endemics in Australia? What is so special about our country that lends to our unique flora and fauna? Although we naturally associate tropical regions with lush, vibrant and diverse life, most of Australia is complete desert. That said, most of our species are concentrated in the tropical regions of the country, particularly in the upper east coast and far north (the ‘Top End’).
There are a number of different factors which contribute to the high species diversity of Australia. Most notably is how isolated we are as a continent: Australia has been separated from most of the rest of the world for millions of years. In this time, the climate has varied dramatically as the island shifted northward, creating a variety of changing environments and unique ecological niches for species to specialise into. We refer to these species groups as ‘Gondwana relicts’, since their last ancestor with the rest of the world would have been distributed across the supercontinent Gondwana over 100 million years ago. These include marsupials, many birds groups (including ratites and megapodes), many fish groups and a plethora of others. A Gondwanan origin explains why they are only found within Australia, southern Africa and South America (the closest landmass that was also historically connected to Gondwana).
Early arrivals and naturalisation to the Australian ecosystem
Eventually, this connection also brought with them one of our most iconic species; the dingo. Estimates of their arrival dates the migration at around 6 thousand years ago. As Australia’s only ‘native’ dog, there has been much debate about its status as an Australian icon. To call the dingo ‘native’ implies it’s always been there: but 6 thousand years is more than enough time to become ingrained within the ecosystem in a stable fashion. So, to balance the debate (and prevent the dingo from being labelled as an ‘invasive pest’ unfairly), we often refer to them as ‘naturalised’. This term helps us to disentangle modern-day pests, many of which our immensely destructive to the natural environment, from other species that have naturally migrated and integrated many years ago.
Invaders of the Australian continent
Of course, we can never ignore the direct impacts of humans on the ecosystem. Particularly with European settlement, another plethora of animals were introduced for the first time into Australia; these were predominantly livestock animals or hunting-related species (both as predators and prey). This includes the cane toad, widely regarded as one of the biggest errors in pest control on the planet.
When European settlers in the 1930s attempted to grow sugar cane in the far eastern part of the country, they found their crops decimated by a local beetle. In an effort to eradicate them, they brought over a species of cane toad, with the idea that they would control the beetle population and all would be well. Only, cane toads are particularly lazy and instead of targeting the cane beetles, they just thrived on all the other native invertebrates around. They’re also very resilient and adaptable (and highly toxic), so their numbers exploded and they’ve since spread across a large swathe of the country. Their toxic skin makes them fatal food objects for many native predators and they strongly compete against other similar native animals (such as our own amphibians). The cane toad introduction of 1935 is the poster child of how bad failed pest control can be.
But is native always better?
History tells a very stark tale about the poor native animals and the ravenous, rampaging pest species. Because of this, it is a widely adopted philosophical viewpoint that ‘native is always best’. And while I don’t disagree with the sentiment (of course we need to preserve our native wildlife, and not the massively overabundant pests), there are rare examples where nature is a little more complicated. In Australia, this is exemplified in the noisy miner.
The noisy miner is a small bird which, much like its name implies, is incredibly noisy and aggressive. It’s highly abundant, found predominantly throughout urban and suburban areas, and seems to dominate the habitat. It does this by bullying out other bird species from nesting grounds, creating a monopoly on the resource to the exclusion of many other species (even larger ones such as crows and magpies). Despite being native, it seems to have thrived on human alteration of the landscape and is a serious threat to the survival and longevity of many other species. If we thought of it solely under the ‘nature is best’ paradigm, we would dismiss the noisy miner as ‘doing what it should be.’ The truth is really more of a philosophical debate: is it natural to let the noisy miner outcompete many other natives, possibly resulting in their extinction? Or is it only because of human interference (and thus is our responsibility to fix) that the noisy miner is doing so well in the first place? It’s not a simple question to answer, although the latter seems to be incredibly important.
The amazing biodiversity of Australia is a badge of honour we should wear with patriotic pride. Conservation efforts of our endemic fauna are severely limited by a lack of funding and resources, and despite a general acceptance of the importance of diverse ecosystems we remain relatively ineffective at preserving it. Understanding and connecting with our native wildlife, whilst finding methods to control invasive species, is key to conserving our wonderful ecosystems.
‘Diversity’ is a term that gets used a lot these days, albeit usually in reference to social changes and structures. However, diversity is not merely a human construct and reflects an extremely important aspect of the natural world at a variety of levels. From the smallest genes to the biggest ecosystems, diversity is a trait that confers a massive range of benefits to individuals, populations, species and even the entire globe. Let’s dissect this diversity down at different scales and see how beneficial it can be.
At the smallest scale in the hierarchy of genetic differentiation, we have the genes themselves. It is a well-established concept that having a diversity of genetic variants (alleles) within a population or species is critical to their future adaptation, evolution and persistance. This is because different alleles will have different benefits (or costs) depending on the environmental pressure that influences them; natural selection might favour one allele over another at one time, but a different one as the pressure changes. Having a higher number of alleles within the population or species means that there is a greater chance at least a few individuals will possess an adaptive gene with the changing environment (which we know can be quite rapid and very, very strong). The diversity serves as a ‘buffer’ against extinction; evolution by natural selection functions best when there are many options to choose from.
Without this diversity, species run the risk of having no adaptive genes at the ready to deal with a selective pressure. Either a new adaptive gene must mutate (or come about in other ways, such as through gene flow from another population or species) or the population/species will suffer and potentially go extinct. As strong selection causes the species to dwindle, it enters what is referred to as the ‘extinction vortex’. Without genetic diversity, they can’t adapt: thus, more individuals die off, causing more genetic diversity to be lost from the population. This pattern is a vicious cycle which can inevitably destroy species (without serious intervention).
For this reason, captive breeding programs aim to maintain as much of the genetic diversity of the original population as possible. This reduces the probability of entering a downward extinction spiral from inbreeding depression and helps to maintain populations into the future (both the captive one and the wild population when we reintroduce individuals into the wild).
Because genetic diversity is critically important for species survival, we must also try to preserve the diversity of the entire gene pool of a species. This means conserving highly genetically differentiated populations within a species as a priority, as they may be the only ones that possess the necessary adaptive genes to save the rest of the species. This adaptive genetic variation can then be introduced into other populations in genetic rescue programs and serve as a means to semi-naturally allow the species to evolve. Evolutionarily-significant units (ESUs) are one measure of the invaluable nature of genetically unique populations.
Although many more traditional conservationists strongly believe that ESUs should be managed entirely independently of one another (to preserve their evolutionary ‘pedigree’ and prevent the risk of outbreeding depression), it has been suggested that the benefit of genetic rescue in many cases significantly outweighs this risk of outbreeding depression. For some species, this really is an act of rescue: they are at the edge of extinction, and if we do nothing we condemn them to die out.
Introducing genetic material across populations (or even species!) can generate new functional genes that allow the recipient species to adapt to selective pressures. This might sound very strange, and could be extremely rare, but examples of adaptive genetic material in one species originating from another species through hybridisation do exist in nature. For example, the black coat of wolves is a highly adaptive trait in some populations and is encoded for by the Melanocortin 1 receptor (Mc1r) gene. However, the specific mutation in Mc1r gene that generates the black coat colour actually first originated in domestic dogs; when wild wolves and domestic dogs interbred, this mutation was transferred into the wolf gene pool. Natural selection strongly favoured this new variant, and it very rapidly underwent strong positive selection. Thus, the adaptiveness of black wolves is thanks to a domestic dog mutation!
At a higher level of the hierarchy, the diversity of species within a particular community or ecosystem has been shown to be important for the health and stability of said community. Every species, however small or seemingly unimpressive, plays a role in the greater ecosystem balance, through interactions with other species (e.g. as predator, as prey, as competitor) and the abiotic environment. While some species are known to have very strong impacts on the immediate ecosystem (often dubbed ‘keystone species’, such as apex predators), all species have some influence on the world around them (we’re especially good at it).
The overall health and stability of an ecosystem, as well as the benefits it can provide to all living things (including humans) is largely determined by the diversity of species. For example, ‘habitat engineers’ are types of species that, by altering the physical environment around them (such as to build a home), directly provide new habitat for other species. They are a fundamental underpinning of many incredibly vibrant ecosystems; think of what a reef system would look like if there were no corals in it. There’d be no anemones growing colourfully; no fish to live in them; no sharks to feed on these non-existent fish. This is just one example of a complex ecosystem that truly relies on its inhabiting species to function.
Protecting our diversity
Diversity is not just a social construct and is an important phenomenon in nature, at a variety of different levels. Preserving the full diversity of life, from genetic diversity within populations and species to full species diversity within ecosystems, is critical to maintaining healthy and robust natural systems. The more diversity we have at each level of this hierarchy, the greater robustness and security we will have in the future.
All of these questions can be addressed with a combination of genetic, environmental and ecological information across a variety of timescales. However, the overall field of biogeography (and phylogeography as a derivative of it) has traditionally been largely rooted on a strong yet changing theoretical basis. The earliest discussions and discoveries related to biogeography as a field of science date back to the 18th Century, and to Carl Linnaeus (to whom we owe our binomial classification system) and Alexander von Humboldt. These scientists (and undoubtedly many others of that era) were among the first to notice how organisms in similar climates (e.g. Australia, South Africa and South America) showed similar physical characteristics despite being so distantly separated (both in their groups and geographic distance). The communities of these regions also appeared to be highly similar. So how could this be possible over such huge distances?
Dispersal or vicariance?
Two main explanations for these patterns are possible; dispersal and vicariance. As one might expect, dispersal denotes that an ancestral species was distributed in one of these places (referred to as the ‘centre of origin’) before it migrated and inhabited the other places. Contrastingly, vicariancesuggests that the ancestral species was distributed everywhere originally, covering all contemporary ranges within it. However, changes in geography, climate or the formation of other barriers caused the range of the ancestor to fragment, with each fragmented group evolving into its own distinct species (or group of species).
In initial biogeographic science, dispersal was the most heavily favoured explanation. At the time, there was no clear mechanism by which organisms could be present all over the globe without some form of dispersal: it was generally believed that the world was a static, unmoving system. Dispersal was well supported by some biological evidence such as the diversification of Darwin’s finches across the Galápagos archipelago. Thus, this concept was supported through the proposals of a number of prominent scientists such as Charles Darwin and A.R. Wallace. For others, however, the distance required for dispersal (such as across entire oceans) seemed implausible and biologically unrealistic.
A paradigm shift in biogeography
Two particular developments in theory are credited with a paradigm shift in the field; cladistics and plate tectonics. Cladistics simply involved using shared biological characteristics to reconstruct the evolutionary relationships of species (think like phylogenetics, but using physical traits instead of genetic sequence). Just as importantly, however, was plate tectonic theory, which provided a clear way for organisms to spread across the planet. By understanding that, deep in the past, all continents had been directly connected to one another provides a convenient explanation for how species groups spread. Instead of requiring for species to travel across entire oceans, continental drift meant that one widespread and ancient ancestor on the historic supercontinent (Pangaea; or subsequently Gondwana and Laurasia) could become fragmented. It only required that groups were very old, but not necessarily very dispersive.
From these advances in theory, cladistic vicariance biogeography was born. The field rapidly overtook dispersal as the most likely explanation for biogeographic patterns across the globe by not only providing a clear mechanism to explain these but also an analytical framework to test questions relating to these patterns. Further developments into the analytical backbone of cladistic vicariance allowed for more nuanced questions of biogeography to be asked, although still fundamentally ignored the role of potential dispersals in explaining species’ distributions.
Modern philosophy of biogeography
So, what is the current state of the field? Well, the more we research biogeographic patterns with better data (such as with genomics) the more we realise just how complicated the history of life on Earth can be. Complex modelling (such as Bayesian methods) allow us to more explicitly test the impact of Earth history events on our study species, and can provide more detailed overview of the evolutionary history of the species (such as by directly estimating times of divergence, amount of dispersal, extent of range shifts).
From a theoretical perspective, the consistency of patterns of groups is always in question and exactly what determines what species occurs where is still somewhat debatable. However, the greater number of types of data we can now include (such as geological, paleontological, climatic, hydrological, genetic…the list goes on!) allows us to paint a better picture of life on Earth. By combining information about what we know happened on Earth, with what we know has happened to species, we can start to make links between Earth history and species history to better understand how (or if) these events have shaped evolution.
A fellow science student once drunkenly said that “I am a biologist…I don’t understand art.” Although somewhat bemusing (both in and out of context), it raises a particular philosophical idea that I can’t agree with: that art and science directly contradict one another.
It’s a somewhat clichéd paradigm that art and science must work at odds with one another. The idea that art embraces emotion, creativity and abstract perception whilst science is solely dictated by rationality, methodology and universal statistics is one that still seems to be somewhat pervasive throughout society and culture. While there seems to be a more recent shift against this, with both ends of the spectrum acknowledging the importance of the other in their respective fields, the intersection of art and science has a long and productive history.
Typically, the disjunction from the emotional and evocative state of people with science is through how the science is written. In many formats (particularly for the most widely used scientific journals), artistic and emotional writing is seen to detract from the overall message and objectivity of the piece itself. And while appeal to emotion can certainly take away from or mislead the message of the writing, it’s important to connect and attract readers to the work in the first place. Trying to find a possible avenue to work in personal style and artistry into an academic paper is an incredibly difficult affair. This is a large contributor to the merit of non-journalistic forms of scientific communication such as books, poetry and even blogs (this was one motivator in starting this blog, in fact).
It might come as a surprise to readers that I love art quite a lot, especially given the (lack of) quality of the drawings in this blog. But I’ve always tried to flex my creative side and particular when I was a younger was a more avid writer and sketcher. And that truth of the matter is that I don’t feel that the artistic side of a person has to be at odds with their scientific side. In fact, the two directly complement each other by linking our rational, objective understanding of the world with the emotional, expressive and ideological aspects of the human personality.
The art of science
From one angle, science is actively driven by creativity, ambition and often abstract ideation. The desire to delve deep to find new knowledge is intrinsically an emotional and philosophical process and to pretend that science is devoid of passion discredits both the research and the researcher. Entire disciplines of biology, for example, find themselves driven by science and people with deep emotional connections to the natural world and a desire to both understand and protect the diversity of life. The works of John Gould in his explorations of the Australian biota remain some of my favourites for both scientific and artistic merit.
The science of art
From the other direction, science can also inform artistic works by expanding the human knowledge and experience with which to draw inspiration from. Naturally, this is an intrinsic part of genres such as science fiction, but many works of horror, abstraction, fantasy, thriller also draw on theories and revolutions brought about by scientific discovery. The further we understand the processes of the universe through scientific discovery, the greater the context and extent of our philosophical and emotional perspectives can be allowed to vary.
Gone are the days of dichotomy between 18-19th Century Impressionism and Enlightenment. Instead, the unity of science and art in the modern world can have significant positive contributions to both fields. Although there are still some elements of resistance between the two avenues, it is my belief that by allowing the intrinsically emotional nature of science to be expressed (albeit moderated by reason and logic) will allow science to influence a greater number of people, an especially important connection in the age of cynicism.
Understanding the evolutionary history of species can be a complicated matter, both from theoretical and analytical perspectives. Although phylogenetics addresses many questions about evolutionary history, there are a number of limitations we need to consider in our interpretations.
One of these limitations we often want to explore in better detail is the estimation of the divergence times within the phylogeny; we want to know exactly when two evolutionary lineages (be they genera, species or populations) separated from one another. This is particularly important if we want to relate these divergences to Earth history and environmental factors to better understand the driving forces behind evolution and speciation. A traditional phylogenetic tree, however, won’t show this: the tree is scaled in terms of the genetic differences between the different samples in the tree. The rate of genetic differentiation is not always a linear relationship with time and definitely doesn’t appear to be universal.
How do we do it?
There are a number of parameters that are required for estimating divergence times from a phylogenetic tree. These can be summarised into two distinct categories: the tree model and the substitution model.
The first one of these is relatively easy to explain; it describes the exact relationship of the different samples in our dataset (i.e. the phylogenetic tree). Naturally, this includes the topology of the tree (which determines which divergences times can be estimated for in the first place). However, there is another very important factor in the process: the lengths of the branches within the phylogenetic tree. Branch lengths are related to the amount of genetic differentiation between the different tips of the tree. The longer the branch, the more genetic differentiation that must have accumulated (and usually also meaning that longer time has occurred from one end of the branch to the other). Even two phylogenetic trees with identical topology can give very different results if they vary in their branch lengths (see the above Figure).
However, at least one another important component is necessary to turn divergence time estimates into absolute, objective times. An external factor with an attached date is needed to calibrate the relative branch divergences; this can be in the form of the determined mutation rate for all of the branches of the tree or by dating at least one node in the tree using additional information. These help to anchor either the mutation rate along the branches or the absolute date of at least one node in the tree (with the rest estimated relative to this point). The second method often involves placing a time constraint on a particular node of the tree based on prior information about the biogeography of the species (for example, we might know one species likely diverged from another after a mountain range formed: the age of the mountain range would be our constraints). Alternatively, we might include a fossil in the phylogeny which has been radiocarbon dated and place an absolute age on that instead.
In regards to the former method, mutation rates describe how fast genetic differentiation accumulates as evolution occurs along the branch. Although mutations gradually accumulate over time, the rate at which they occur can depend on a variety of factors (even including the environment of the organism). Even within the genome of a single organism, there can be variation in the mutation rate: genes, for example, often gain mutations slower than non-coding region.
All of these components are combined into various analytical frameworks or programs, each of which handle the data in different ways. Many of these are Bayesian model-based analysis, which in short generates hypothetical models of evolutionary history and divergence times for the phylogeny and tests how well it fits the data provided (i.e. the phylogenetic tree). The algorithm then alters some aspect(s) of the model and tests whether this fits the data better than the previous model and repeats this for potentially millions of simulations to get the best model. Although models are typically a simplification of reality, they are a much more tractable approach to estimating divergence times (as well as a number of other types of evolutionary genetics analyses which incorporating modelling).
Despite the developments in the analytical basis of estimating divergence times in the last few decades, there are still a number of limitations inherent in the process. Many of these relate to the assumptions of the underlying model (such as the correct and accurate phylogenetic tree and the correct estimations of evolutionary rate) used to build the analysis and generate simulations. In the case of calibrations, it is also critical that they are correctly dated based on independent methods: inaccurate radiocarbon dating of a fossil, for example, could throw out all of the estimations in the entire tree. That said, these factors are intrinsic to any phylogenetic analysis and regularly considered by evolutionary biologists in the interpretations and discussions of results (such as by including confidence intervals of estimations to demonstrate accuracy).
Understanding the temporal aspects of evolution and being able to relate them to a real estimate of age is a difficult affair, but an important component of many evolutionary studies. Obtaining good estimates of the timing of divergence of populations and species through molecular dating is but one aspect in building the picture of the history of all organisms, including (and especially) humans.
While evolutionary genetics studies often focus on the underlying genetic architecture of species and populations to understand their evolution, we know that natural selection acts directly on physical characteristics. We call these the phenotype; by studying changes in the genes that determine these traits (the genotype), we can take a nuanced approach at studying adaptation. However, our ability to look at genetic changes and relate these to a clear phenotypic trait, and how and why that trait is under natural selection, can be a difficult task.
One gene for one trait
The simplest (and most widely used) models of understanding the genetic basis of adaptation assume that a single genotype codes for a single phenotypic trait. This means that changes in a single gene (such as outliers that we have identified in our analyses) create changes in a particular physical trait that is under a selective pressure in the environment. This is a useful model because it is statistically tractable to be able to identify few specific genes of very large effect within our genomic datasets and directly relate these to a trait: adding more complexity exponentially increases the difficulty in detecting patterns (at both the genotypic and phenotypic level).
Polygenic adaptation is often seen for traits which are clearly heritable, but don’t show a single underlying gene responsible. Previously, we’ve covered this with the heritability of height: this is one of many examples of ‘quantitative trait loci’ (QTLs). Changes in one QTL (a single gene) causes a small quantitative change in a particular trait; the combined effect of different QTLs together can ‘add up’ (or counteract one another) to result in the final phenotype value.
The mechanisms which underlie polygenic adaptation can be more complex than simple addition, too. Individual genes might cause phenotypic changes which interact with other phenotypes (and their underlying genotypes) to create a network of changes. We call these interactions ‘epistasis’, where changes in one gene can cause a flow-on effect of changes in other genes based on how their resultant phenotypes interact. We can see this in metabolic pathways: given that a series of proteins are often used in succession within pathways, a change in any single protein in the process could affect every other protein in the pathway. Of course, knowing the exact proteins coded for every gene, including their physical structure, and how each of those proteins could interact with other proteins is an immense task. Similar to QTLs, this is usually limited to model species which have a large history of research on these specific areas to back up the study. However, some molecular ecology studies are starting to dive into this area by identifying pathways that are under selection instead of individual genes, to give a broader picture of the overall traits that are underlying adaptation.
One gene for many traits: pleiotropy and differential gene expression
In contrast to polygenic traits, changes in a single gene can also potentially alter multiple phenotypic traits simultaneously. This is referred to as ‘pleiotropy’ and can happen if a gene has multiple different functions within an organism; one particular protein might be a component of several different systems depending on where it is found or how it is arranged. A clear example of pleiotropy is in albino animals: the most common form of albinism is the result of possessing two recessive alleles of a single gene (TYR). The result of this is the absence of the enzyme tyrosinase in the organism, a critical component in the production of melanin. The flow-on phenotypic effects from the recessive gene most obviously cause a lack of pigmentation of the skin (whitening) and eyes (which appear pink), but also other physiological changes such as light sensitivity or total blindness (due to changes in the iris). Albinism has even been attributed to behavioural changes in wild field mice.
Because pleiotropic genes code for several different phenotypic traits, natural selection can be a little more complicated. If some resultant traits are selected against, but others are selected for, it can be difficult for evolution to ‘resolve’ the balance between the two. The overall fitness of the gene is thus dependent on the balance of positive and negative fitness of the different traits, which will determine whether the gene is positively or negatively selected (much like a cost-benefit scenario). Alternatively, some traits which are selectively neutral (i.e. don’t directly provide fitness benefits) may be indirectly selected for if another phenotype of the same underlying gene is selected for.
Multiple phenotypes from a single ‘gene’ can also arise by alternate splicing: when a gene is transcribed from the DNA sequence into the protein, the non-coding intron sections within the gene are removed. However, exactly which introns are removed and how the different coding exons are arranged in the final protein sequence can give rise to multiple different protein structures, each with potentially different functions. Thus, a single overarching gene can lead to many different functional proteins. The role of alternate splicing in adaptation and evolution is a rarely explored area of research and its importance is relatively unknown.
Non-genes for traits: epigenetics
This gets more complicated if we consider ‘non-genetic’ aspects underlying the phenotype in what we call ‘epigenetics’. The phrase literally translates as ‘on top of genes’ and refers to chemical attachments to the DNA which control the expression of genes by allowing or resisting the transcription process. Epigenetics is a relatively new area of research, although studies have started to delve into the role of epigenetic changes in facilitating adaptation and evolution. Although epigenetics is still a relatively new research topic, future research into the relationship between epigenetic changes and adaptive potential might provide more detailed insight into how adaptation occurs in the wild (and might provide a mechanism for adaptation for species with low genetic diversity)!
The different interactions between genotypes, phenotypes and fitness, as well as their complex potential outcomes, inevitably complicates any study of evolution. However, these are important aspects of the adaptation process and to discard them as irrelevant will not doubt reduce our ability to examine and determine evolutionary processes in the wild.