You’re perfect, you’re beautiful, you look like a model (species)

What is a ‘model’?

There are quite literally millions of species on Earth, ranging from the smallest of microbes to the largest of mammals. In fact, there are so many that we don’t actually have a good count on the sheer number of species and can only estimate it based on the species we actually know about. Unsurprisingly, then, the number of species vastly outweighs the number of people that research them, especially considering the sheer volumes of different aspects of species, evolution, conservation and their changes we could possibly study.

Species on Earth estimate figure
Some estimations on the number of eukaryotic species (i.e. not including things like bacteria), with the number of known species in blue and the predicted number of total species on Earth in purpleSource: Census of Marine Life.

This is partly where the concept of a ‘model’ comes into it: it’s much easier to pick a particular species to study as a target, and use the information from it to apply to other scenarios. Most people would be familiar with the concept based on medical research: the ‘lab rat’ (or mouse). The common house mouse (Mus musculus) and the brown rat (Rattus norvegicus) are some of the most widely used models for understanding the impact of particular biochemical compounds on physiology and are often used as the testing phase of medical developments before human trials.

So, why are mice used as a ‘model’? What actually constitutes a ‘model’, rather than just a ‘relatively-well-research-species’? Well, there are a number of traits that might make certain species ideal subjects for understanding key concepts in evolution, biology, medicine and ecology. For example, mice are often used in medical research given their (relative) similar genetic, physiological and behavioural characteristics to humans. They’re also relatively short-lived and readily breed, making them ideal to observe the more long-term effects of medical drugs or intergenerational impacts. Other species used as models primarily in medicine include nematodes (Caenorhabditis elegans), pigs (Sus scrofa domesticus), and guinea pigs (Cavia porcellus).

The diversity of models

There are a wide variety and number of different model species, based on the type of research most relevant to them (and how well it can be applied to other species). Even with evolution and conservation-based research, which can often focus on more obscure or cryptic species, there are several key species that have widely been applied as models for our understanding of the evolutionary process. Let’s take a look at a few examples for evolution and conservation.

Drosophila

It would be remiss of me to not mention one of the most significant contributors to our understanding of the genetic underpinning of adaptation and speciation, the humble fruit fly (Drosophila melanogaster, among other species). The ability to rapidly produce new generations (with large numbers of offspring with very short generation time), small fully-sequenced genome, and physiological variation means that observing both phenotypic and genotypic changes over generations due to ‘natural’ (or ‘experimental’) selection are possible. In fact, Drosphilia spp. were key in demonstrating the formation of a new species under laboratory conditions, providing empirical evidence for the process of natural selection leading to speciation (despite some creationist claims that this has never happened).

Drosophila speciation experiment
A simplified summary of the speciation experiment in Drosophila, starting with a single species and resulting in two reproductively isolated species based on mating and food preference. Source: Ilmari Karonen, adapted from here.

Darwin’s finches

The original model of evolution could be argued to be Darwin’s finches, as the formed part of the empirical basis of Charles Darwin’s work on the theory of evolution by natural selection. This is because the different species demonstrate very distinct and obvious changes in morphology related to a particular diet (e.g. the physiological consequences of natural selection), spread across an archipelago in a clear demonstration of a natural experiment. Thus, they remain the original example of adaptive radiation and are fundamental components of the theory of evolution by natural selection. However, surprisingly, Darwin’s finches are somewhat overshadowed in modern research by other species in terms of the amount of available data.

Darwin's finches drawings
Some of Darwin’s early drawings of the morphological differences in Galapagos finch beaks, which lead to the formulation of the theory of evolution by natural selection.

Zebra finches

Even as far as birds go, one species clearly outshines the rest in terms of research. The zebra finch is one of the most highly researched vertebrate species, particularly as a model of song learning and behaviour in birds but also as a genetic model. The full genome of the zebra finch was the second bird to ever be sequenced (the first being a chicken), and remains one of the more detailed and annotated genomes in birds. Because of this, the zebra finch genome is often used as a reference for other studies on the genetics of bird species, especially when trying to understand the function of genetic changes or genes under selection.

Zebra finches.jpg
A pair of (very cute) model zebra finches. Source: Michael Lawton via Smithsonian.com.

 

Fishes

Fish are (perhaps surprisingly) also relatively well research in terms of evolutionary studies, largely due to their ancient origins and highly diverse nature, with many different species across the globe. They also often demonstrate very rapid and strong bouts of divergence, such as the cichlid fish species of African lakes which demonstrate how new species can rapidly form when introduced to new and variable environments. The cichlids have become the poster child of adaptive radiation in fishes much in the same way that Darwin’s finches highlighted this trend in birds. Another group of fish species used as a model for similar aspects of speciation, adaptive divergence and rapid evolutionary change are the three-spine and nine-spine stickleback species, which inhabit a variety of marine, estuarine and freshwater environments. Thus, studies on the genetic changes across these different morphotypes is a key in understanding how adaptation to new environments occur in nature (particularly the relatively common transition into different water types in fishes).

cichlid diversity figure
The sheer diversity of species and form makes African cichlids an ideal model for testing hypotheses and theories about the process of evolution and adaptive radiation. Figure sourced from Brawand et al. (2014) in Nature.

Zebra fish

More similar to the medical context of lab rats is the zebrafish (ironically, zebra themselves are not considered a model species). Zebrafish are often used as models for understanding embryology and the development of the body in early formation given the rapid speed at which embryonic development occurs and the transparent body of embryos (which makes it easier to detect morphological changes during embryogenesis).

Zebrafish embryo
The transparent nature of zebrafish embryos make them ideal for studying the development of organisms in early stages. Source: yourgenome.org.

Using information from model species for non-models

While the relevance of information collected from model species to other non-model species depends on the similarity in traits of the two species, our understanding of broad concepts such as evolutionary process, biochemical pathways and physiological developments have significantly improved due to model species. Applying theories and concepts from better understood organisms to less researched ones allows us to produce better research much faster by cutting out some of the initial investigative work on the underlying processes. Thus, model species remain fundamental to medical advancement and evolutionary theory.

That said, in an ideal world all species would have the same level of research and resources as our model species. In this sense, we must continue to strive to understand and research the diversity of life on Earth, to better understand the world in which we live. Full genomes are progressively being sequenced for more and more species, and there are a number of excellent projects that are aiming to sequence at least one genome for all species of different taxonomic groups (e.g. birds, bats, fish). As the data improves for our non-model species, our understanding of evolution, conservation management and medical research will similarly improve.

Lost in a forest of (gene) trees

Using genetics to understand species history

The idea of using the genetic sequences of living organisms to understand the evolutionary history of species is a concept much repeated on The G-CAT. And it’s a fundamental one in phylogenetics, taxonomy and evolutionary biology. Often, we try to analyse the genetic differences between individuals, populations and species in a tree-like manner, with close tips being similar and more distantly separated branches being more divergent. However, this runs on one very key assumption; that the patterns we observe in our study genes matches the overall patterns of species evolution. But this isn’t always true, and before we can delve into that we have to understand the difference between a ‘gene tree’ and a ‘species tree’.

A gene tree or a species tree?

Our typical view of a phylogenetic tree is actually one of a ‘gene tree’, where we analyse how a particular gene (or set of genes) have changed over time between different individuals (within and across populations or species) based on our understanding of mutation and common ancestry.

However, a phylogenetic tree based on a single gene only demonstrates the history of that gene. What we assume in most cases is that the history of that gene matches the history of the species: that branches in the genetic tree mirror when different splits in species occurred throughout history.

The easiest way to conceptualise gene trees and species trees is to think of individual gene trees that are nested within an overarching species tree. In this sense, individual gene trees can vary from one another (substantially, even) but by looking at the overall trends of many genes we can see how the genome of the species have changed over time.

Gene tree incongruence figure
A (potentially familiar) depiction of individual gene trees (coloured lines) within the broader species tree (defined b the black boundaries). As you might be able to tell, the branching patterns of the different genes are not the same, and don’t always match the overarching species tree.

Gene tree incongruence

Different genes may have different patterns for a number of reasons. Changes in the genetic sequences of organisms over time don’t happen equally across the entire genome, and very specific parts of the genome can evolve in entirely different directions, or at entirely different rates, than the rest of the genome. Let’s take a look at a few ways we could have conflicting gene trees in our studies.

Incomplete lineage sorting

One of the most prolific, but more complicated, ways gene trees can vary from their overarching species tree is due to what we call ‘incomplete lineage sorting’. This is based on the idea that species and the genes that define them are constantly evolving over time, and that because of this different genes are at different stages of divergence between population and species. If we imagine a set of three related populations which have all descended from a single ancestral population, we can start to see how incomplete lineage sorting could occur. Our ancestral population likely has some genetic diversity, containing multiple alleles of the same locus. In a true phylogenetic tree, we would expect these different alleles to ‘sort’ into the different descendent populations, such that one population might have one of the alleles, a second the other, and so on, without them sharing the different alleles between them.

If this separation into new populations has been recent, or if gene flow has occurred between the populations since this event, then we might find that each descendent population has a mixture of the different alleles, and that not enough time has passed to clearly separate the populations. For this to occur, sufficient time for new mutations to occur and genetic drift to push different populations to differently frequent alleles needs to happen: if this is too recent, then it can be hard to accurately distinguish between populations. This can be difficult to interpret (see below figure for a visualisation of this), but there’s a great description of incomplete lineage sorting here.

ILS_adaptedfigure
A demonstration of incomplete lineage sorting, generously adapted from a talk by fellow MELFU postdocs Dr Yuma (Jonathon) Sandoval-Castillo and Dr Catherine Attard. On the left is a depiction of a single gene coalescent tree over time: circles represent a single individual at a particular point in time (row) with the colours representing different alleles of that same gene. The tree shows how new mutations occur (colour changes along the branches) and spread throughout the descendent populations. In this example, we have three recently separated species, with a good number of different alleles. However, when we study these alleles in tree form (the phylogeny on the right), we see that the branches themselves don’t correlate well with the boundaries of the species. For example, the teal allele found within Species C is actually more similar to Species B alleles (purple and blue) than any other Species B alleles, based on the order and patterns of these mutations.

Hybridisation and horizontal transfer

Another way individual genes may become incongruent with other genes is through another phenomenon we’ve discussed before: hybridisation (or more specifically, introgression). When two individuals from different species breed together to form a ‘hybrid’, they join together what was once two separate gene pools. Thus, the hybrid offspring has (if it’s a first generation hybrid, anyway) 50% of genes from Species A and 50% of genes from Species B. In terms of our phylogenetic analysis, if we picked one gene randomly from the hybrid, we have 50% of picking a gene that reflects the evolutionary history of Species A, and 50% chance of picking a gene that reflects the evolutionary history of Species B. This would change how our outputs look significantly: if we pick a Species A gene, our ‘hybrid’ will look (genetically) very, very similar to Species A. If we pick a Species B gene, our ‘hybrid’ will look like a Species B individual instead. Naturally, this can really stuff up our interpretations of species boundaries, distributions and identities.

Hybridisation_figure
An example of hybridisation leading to gene tree incongruence with our favourite colourful fishA) We have a hybridisation event between a red fish (Species A) and a green fish (Species B), resulting in a hybrid species (‘Species’ H). The red fish genome is indicated by the yellow DNA, the green fish genomes by the blue DNA, and the hybrid orange fish has a mixture of these two. B) If we sampled one set of genes in the hybrid, we might select a gene that originated from the red fish, showing that the hybrid is identical (or very similar) the Species A. D) Conversely, if we sampled a gene originating from the green fish, the resultant phylogeny might show that the hybrid is the same as Species B. C) If we consider these two patterns in combination, which see the true pattern of species formation, which is not a clear dichotomous tree and rather a mixture of the two sets of trees.

Paralogous genes

More confusingly, we can even have events where a single gene duplicates within a genome. This is relatively rare, although it can have huge effects: for example, salmon have massive genomes as the entire thing was duplicated! Each version of the gene can take on very different forms, functions, and evolve in entirely different ways. We call these duplicated variants paralogous genes: genes that look the same (in terms of sequence), but are totally different genes.

This can have a profound impact as paralogous genes are difficult to detect: if there has been a gene duplication early in the evolutionary history of our phylogenetic tree, then many (or all) of our study samples will have two copies of said gene. Since they look similar in sequence, there’s all possibility that we pick Variant 1 in some species and Variant 2 in other species. Being unable to tell them apart, we can have some very weird and abstract results within our tree. Most importantly, different samples with the same duplicated variant will seem similar to one another (e.g. have evolved from a common ancestor more recently) than it will to any sample of the other variant (even if they came from the exact same species)!

Paralogy_figure.jpg
An example of how paralogous genes can confound species tree. We start with a single (purple) gene: at a particular point in time, this gene duplicates into a red and a blue form. Each of these genes then evolve and spread into four separate descendent species (A, B, C and D) but not in entirely the same way. However, since both the red and blue genetic sequences are similar, if we took a single gene from each species we might (somewhat randomly) sequence either the red or the blue copy. The different phylogenetic trees on the right demonstrate how different combinations of red and blue genes give very different patterns, since all blue copies will be more related to other blue genes than to the red gene of the same species. E.g. a blue A and a blue C are more similar than a blue A and a red A.

Overcoming incongruence with genomics

Although a tricky conundrum in phylogenetics and evolutionary genetics broadly, gene tree incongruence can largely be overcome with using more loci. As the random changes of any one locus has a smaller effect of the larger total set of loci, the general and broad patterns of evolutionary history can become clearer. Indeed, understanding how many loci are affected by what kind of process can itself become informative: large numbers of introgressed loci can indicate whether hybridisation was recent, strong, or biased towards one species over another, for example. As with many things, the genomic era appears poised to address the many analytical issues and complexities of working with genetic data.

 

Moving right along: dispersal and population structure

The impact of species traits on evolution

Although we often focus on the genetic traits of species in molecular ecology studies, the physiological (or phenotypic) traits are equally as important in shaping their evolution. These different traits are not only the result themselves of evolutionary forces but may further drive and shape evolution into the future by changing how an organism interacts with the environment.

There are a massive number of potential traits we could focus on, each of which could have a large number of different (and interacting) impacts on evolution. One that is often considered, and highly relevant for genetic studies, is the influence of dispersal capability.

Dispersal

Dispersal is essentially the process of an organism migrating to a new habitat, to the point of the two being used almost interchangeably. Often, however, we regard dispersal as a migration event that actually has genetic consequences; particularly, if new populations are formed or if organisms move from one population to another. This can differ from straight migration in that animals that migrate might not necessarily breed (and thus pass on genes) into a new region during their migration; thus, evidence of those organisms will not genetically proliferate into the future through offspring.

Naturally, the ability of organisms to disperse is highly variable across the tree of life and reliant on a number of other physiological factors. Marine mammals, for example, can disperse extremely far throughout their lifetimes, whereas some very localised species like some insects may not move very far within their lifetime at all. The movement of organisms directly facilitates the movement of genetic material, and thus has significant impacts on the evolution and genetic diversity of species and populations.

Dispersal vs pop structure
The (simplistic) relationship between dispersal capability and one aspect of population genetics, population structure (measured as Fst). As organisms are more capable of dispersing longer distance (or more frequently), the barriers between populations become weaker.

Highly dispersive species

At one end of the dispersal spectrum, we have highly dispersive species. These can move extremely long distances and thus mix genetic material from a wide range of habitats and places into one mostly-cohesive population. Because of this, highly dispersive species often have strong colonising abilities and can migrate into a range of different habitats by tolerating a wide range of conditions. For example, a single whale might hang around Antarctica for part of the year but move to the tropics during other times. Thus, this single whale must be able to tolerate both ends of the temperature spectrum.

As these individuals occupy large ranges, localised impacts are unlikely to critically affect their full distribution. Individual organisms that are occupying an unpleasant space can easily move to a more favourable habitat (provided that one exists). Furthermore, with a large population (which is more likely with highly dispersive species), genetic drift is substantially weaker and natural selection (generally) has a higher amount of genetic diversity to work with. This is, of course, assuming that dispersal leads to a large overall population, which might not be the case for species that are critically endangered (such as the cheetah).

Highly dispersive animals often fit the “island model” of Wright, where individual subpopulations all have equal proportions of migrants from all other subpopulations. In reality, this is rare (or unreasonable) due to environmental or physiological limitations of species; distance, for example, is not implicitly factored into the basic island model.

Island model
The Wright island model of population structure. In this example, different independent populations are labelled in the bold letters, with dispersal pathways demonstrated by the different arrows. In the island model, dispersal is equally likely between all populations (including from BD in this example, even though there aren’t any arrows showing it). Naturally, this is not overly realistic and so the island model is used mostly as a neutral, base model.

Intermediately dispersing species

A large number of species, however, are likely to occupy a more intermediate range of dispersal ability. These species might be able to migrate to neighbouring populations, or across a large proportion of their geographic range, but individuals from one end of the range are still somewhat isolated from individuals at the other end.

This often leads to some effect of population structure; different portions of the geographic range are genetically segregated from one another depending on how much gene flow (i.e. dispersal) occurs between populations. In the most simplest scenario, this can lead to what we call isolation-by-distance. Rather than forming totally independent populations, gene flow occurs across short ranges between adjacent ‘populations’. This causes a gradient of genetic differentiation, with one end of the range being clearly genetically different to the other end, with a gradual slope throughout the range. We see this often in marine invertebrates, for example, which might have somewhat localised dispersal but still occupy a large range by following oceanographic currents.

River IDB network
An example of how an isolation-by-distance population network might come about. In this example, we have a series of populations (the different pie charts) spread throughout a river system (that blue thing). The different pie charts represent how much of the genetics of that population matches one end of the river: either the blue end (left) or red end (right). Populations can easily disperse into adjacent populations (the green arrows) but less so to further populations. This leads to gradual changes across the length of the river, with the far ends of the river clearly genetically distinct from the opposite end but relatively similar to neighbouring populations.
River IDB pop structure.jpg
The genetic representation of the above isolation-by-distance example. Each column represents a single population (in the previous figure, a pie chart), with the different colours also representing the relative genetic identity of that population. As you can see, moving from Population 1 to 10 leads to a gradient (decreasing) in blue genes but increase in red genes. The inverse can be said moving in the opposite direction. That said, comparing Population 1 and Population 10 shows that they’re clearly different, although there is no clear cut-off point across the range of other populations.

Medium dispersal capabilities are also often a requirement for forming ‘metapopulations’. In this population arrangement, several semi-independent populations are present within the geographic range of the species. Each of these are subject to their own local environmental pressures and demographic dynamics, and because of this may go locally extinct at any given time. However, dispersal connections between many of these populations leads to recolonization and gene flow patterns, allowing for extinction-dispersal dynamics to sustain the overall metapopulation. Generally, this would require greater levels of dispersal than those typically found within metapopulation species, as individuals must traverse uninhabitable regions relatively frequently to recolonise locally extinct habitat.

Metapopulation structure.jpg
An example of metapopulation dynamics. Different subpopulations (lettered circles) are connected via dispersal (arrows). These different subpopulations can be different sizes and are mostly independent of one another, meaning that a single subpopulation can go locally extinct (the red X) without collapsing the entire system. The different dispersal pathways mean that one population can recolonise extinct habitat and essentially ‘rebirth’ other subpopulations (the green arrows).

Weakly dispersing species

At the far opposite end of the dispersal ability spectrum, we have low dispersal species. These are often localised, endemic species that for various reasons might be unable to travel very far at all; for some, they may spend their entire adult life in a sedentary form. The lack of dispersal lends to very strong levels of population structure, and individual populations often accumulate genetic differences relatively quickly due to genetic drift or local adaptation.

Species with low dispersal capabilities are often at risk of local extinction and are unable to easily recolonise these habitats after the event has ended. Their movement is often restricted to rare environmental events such as flooding that carry individuals long distances despite their physiological limitations. Because of this, low dispersal species are often at greater risk of total extinction and extinction vertices than their higher dispersing counterparts.

Accounting for dispersal in population genetics

Incorporating biological and physiological aspects of our study taxa is important for interpreting the evolutionary context of species. Dispersal ability is but one of many characteristics that can influence the ability of species to respond to selective pressures, and the context in which this natural selection occurs. Thus, understanding all aspects of an organism is important in building the full picture of their evolution and future prospects.

An identity crisis: using genomics to determine species identities

This is the fourth (and final) part of the miniseries on the genetics and process of speciation. To start from Part One, click here.

In last week’s post, we looked at how we can use genetic tools to understand and study the process of speciation, and particularly the transition from populations to species along the speciation continuum. Following on from that, the question of “how many species do I have?” can be further examined using genetic data. Sometimes, it’s entirely necessary to look at this question using genetics (and genomics).

Cryptic species

A concept that I’ve mentioned briefly previously is that of ‘cryptic species’. These are species which are identifiable by their large genetic differences, but appear the same based on morphological, behavioural or ecological characteristics. Cryptic species often arise when a single species has become fragmented into several different populations which have been isolated for a long time from another. Although they may diverge genetically, this doesn’t necessarily always translate to changes in their morphology, ecology or behaviour, particularly if these are strongly selected for under similar environmental conditions. Thus, we need to use genetic methods to be able to detect and understand these species, as well as later classify and describe them.

Cryptic species fish
An example of cryptic species. All four fish in this figure are morphologically identical to one another, but they differ in their underlying genetic variation (indicated by the different colours of DNA). Thus, from looking at these fish alone we would not perceive any differences, but their genetic make-up might suggest that there are more than one species…
Cryptic species heatmap example
The level of genetic differentiation between the fish in the above example. The phylogenies on the left and top of the figure demonstrate the evolutionary relationships of these four fish. The matrix shows a heatmap of the level of differences between different pairwise comparisons of all four fish: red squares indicate zero genetic differences (such as when comparing a fish to itself; the middle diagonal) whilst yellow squares indicate increasingly higher levels of genetic differentiation (with bright yellow = all differences). By comparing the different fish together, we can see that Fish 1 and 2, and Fish 3 and 4, are relatively genetically similar to one another (red-deep orange). However, other comparisons show high level of genetic differences (e.g. 1 vs 3 and 1 vs 4). Based on this information, we might suggest that Fish 1 and 2 belong to one cryptic species (A) and Fish 3 and 4 belong to a second cryptic species (B).

Genetic tools to study species: the ‘Barcode of Life’

A classically employed method that uses DNA to detect and determine species is referred to as the ‘Barcode of Life’. This uses a very specific fragment of DNA from the mitochondria of the cell: the cytochrome c oxidase I gene, CO1. This gene is made of 648 base pairs and is found pretty well universally: this and the fact that CO1 evolves very slowly make it an ideal candidate for easily testing the identity of new species. Additionally, mitochondrial DNA tends to be a bit more resilient than its nuclear counterpart; thus, small or degraded tissue samples can still be sequenced for CO1, making it amenable to wildlife forensics cases. Generally, two sequences will be considered as belonging to different species if they are certain percentage different from one another.

Annotated mitogeome
The full (annotated) mitochondrial genome of humans, with the different genes within it labelled. The CO1 gene is labelled with the red arrow (sometimes also referred to as COX1) whilst blue arrows point to other genes often used in phylogenetic or taxonomic studies, depending on the group or species in question.

Despite the apparent benefits of CO1, there are of course a few drawbacks. Most of these revolve around the mitochondrial genome itself. Because mitochondria are passed on from mother to offspring (and not at all from the father), it reflects the genetic history of only one sex of the species. Secondly, the actual cut-off for species using CO1 barcoding is highly contentious and possibly not as universal as previously suggested. Levels of sequence divergence of CO1 between species that have been previously determined to be separate (through other means) have varied from anywhere between 2% to 12%. The actual translation of CO1 sequence divergence and species identity is not all that clear.

Gene tree – species tree incongruences

One particularly confounding aspect of defining species based on a single gene, and with using phylogenetic-based methods, is that the history of that gene might not actually be reflective of the history of the species. This can be a little confusing to think about but essentially leads to what we call “gene tree – species tree incongruence”. Different evolutionary events cause different effects on the underlying genetic diversity of a species (or group of species): while these may be predictable from the genetic sequence, different parts of the genome might not be as equally affected by the same exact process.

A classic example of this is hybridisation. If we have two initial species, which then hybridise with one another, we expect our resultant hybrids to be approximately made of 50% Species A DNA and 50% Species B DNA (if this is the first generation of hybrids formed; it gets a little more complicated further down the track). This means that, within the DNA sequence of the hybrid, 50% of it will reflect the history of Species A and the other 50% will reflect the history of Species B, which could differ dramatically. If we randomly sample a single gene in the hybrid, we will have no idea if that gene belongs to the genealogy of Species A or Species B, and thus we might make incorrect inferences about the history of the hybrid species.

Gene tree incongruence figure
A diagram of gene tree – species tree incongruence. Each individual coloured line represents a single gene as we trace it back through time; these are mostly bound within the limits of species divergences (the black borders). For many genes (such as the blue ones), the genes resemble the pattern of species divergences very well, albeit with some minor differences in how long ago the splits happened (at the top of the branches). However, the red genes contrast with this pattern, with clear movement across species (from and into B): this represents genes that have been transferred by hybridisation. The green line represents a gene affected by what we call incomplete lineage sorting; that is, we cannot trace it back far enough to determine exactly how/when it initially diverged and so there are still two separate green lines at the very top of the figure. You can think of each line as a separate phylogenetic tree, with the overarching species tree as the average pattern of all of the genes.

There are a number of other processes that could similarly alter our interpretations of evolutionary history based on analysing the genetic make-up of the species. The best way to handle this is simply to sample more genes: this way, the effect of variation of evolutionary history in individual genes is likely to be overpowered by the average over the entire gene pool. We interpret this as a set of individual gene trees contained within a species tree: although one gene might vary from another, the overall picture is clearer when considering all genes together.

Species delimitation

In earlier posts on The G-CAT, I’ve discussed the biogeographical patterns unveiled by my Honours research. Another key component of that paper involved using statistical modelling to determine whether cryptic species were present within the pygmy perches. I didn’t exactly elaborate on that in that section (mostly for simplicity), but this type of analysis is referred to as ‘species delimitation’. To try and simplify complicated analyses, species delimitation methods evaluate possible numbers and combinations of species within a particular dataset and provides a statistical value for which configuration of species is most supported. One program that employs species delimitation is Bayesian Phylogenetics and Phylogeography (BPP): to do this, it uses a plethora of information from the genetics of the individuals within the dataset. These include how long ago the different populations/species separated; which populations/species are most related to one another; and a pre-set minimum number of species (BPP will try to combine these in estimations, but not split them due to computational restraints). This all sounds very complex (and to a degree it is), but this allows the program to give you a statistical value for what is a species and what isn’t based on the genetics and statistical modelling.

Vittata cryptic species
The cryptic species of pygmy perches identified within my research paper. This represents part of the main phylogenetic tree result, with the estimates of divergence times from other analyses included. The pictures indicate the physiology of the different ‘species’: Nannoperca pygmaea is morphologically different to the other species of Nannoperca vittata. Species delimitation analysis suggested all four of these were genetically independent species; at the very least, it is clear that there must be at least 2 species of Nannoperca vittata since is more related to N. pygmaea than to other N. vittata species. Photo credits: N. vittata = Chris Lamin; N. pygmaea = David Morgan.

The end result of a BPP run is usually reported as a species tree (e.g. a phylogenetic tree describing species relationships) and statistical support for the delimitation of species (0-1 for each species). Because of the way the statistical component of BPP works, it has been found to give extremely high support for species identities. This has been criticised as BPP can, at time, provide high statistical support for genetically isolated lineages (i.e. divergent populations) which are not actually species.

Improving species identities with integrative taxonomy

Due to this particular drawback, and the often complex nature of species identity, using solely genetic information such as species delimitation to define species is extremely rare. Instead, we use a combination of different analytical techniques which can include genetic-based evaluations to more robustly assign and describe species. In my own paper example, we suggested that up to three ‘species’ of N. vittata that were determined as cryptic species by BPP could potentially exist pending on further analyses. We did not describe or name any of the species, as this would require a deeper delve into the exact nature and identity of these species.

As genetic data and analytical techniques improve into the future, it seems likely that our ability to detect and determine species boundaries will also improve. However, the additional supported provided by alternative aspects such as ecology, behaviour and morphology will undoubtedly be useful in the progress of taxonomy.

From mutation to speciation: the genetics of species formation

The genetics of speciation

Given the strong influence of genetic identity on the process and outcomes of the speciation process, it seems a natural connection to use genetic information to study speciation and species identities. There is a plethora of genetics-based tools we can use to investigate how speciation occurs (both the evolutionary processes and the external influences that drive it). One clear way to test whether two populations of a particular species are actually two different species is to investigate genes related to reproductive isolation: if the genetic differences demonstrate reproductive incompatibilities across the two populations, then there is strong evidence that they are separate species (at least under the Biological Species Concept; see Part One for why!). But this type of analysis requires several tools: 1) knowledge of the specific genes related to reproduction (e.g. formation of sperm and eggs, genital morphology, etc.), 2) the complete and annotated genome of the species (to be able to find and analyse the right genes properly) and 3) a good amount of data for the populations in question. As you can imagine, for people working on non-model species (i.e. ones that haven’t had the same history and detail of research as, say, humans and mice), this can be problematic. So, instead, we can use other genetic information to investigate and suggest patterns and processes related to the formation of new species.

Is reproductive isolation naturally selected for or just a consequence?

A fundamental aspect of studies of speciation is a “chicken or the egg”-type paradigm: does natural selection directly select for rapid reproductive isolation, preventing interbreeding; or as a secondary consequence of general adaptive differences, over a long history of evolution? This might be a confusing distinction, so we’ll dive into it a little more.

Of the two proposed models of speciation, the by-product of natural selection (the second model) has been the more favoured. Simply put, this expands on Darwin’s theory of evolution that describes two populations of a single species evolving independently of one another. As these become more and more different, both in physical (‘phenotype’) and genetic (‘genotype’) characteristics, there comes a turning point where they are so different that an individual from one population could not reasonably breed with an individual from the other to form a fertile offspring. This could be due to genetic incompatibilities (such as different chromosome numbers), physiological differences (such as changes in genital morphology), or behavioural conflicts (such as solitary vs. group living).

Certainly, this process makes sense, although it is debatable how fast reproductive isolation would occur in a given species (or whether it is predictable just based on the level of differentiation between two populations). Another model suggests that reproductive isolation actually might arise very quickly if natural selection favours maintaining particular combinations of traits together. This can happen if hybrids between two populations are not particularly well adapted (fit), causing natural selection to favour populations to breed within each group rather than across groups (leading to reproductive isolation). Typically, this is referred to as ‘reinforcement’ and predominantly involves isolating mechanisms that prevent individuals across populations from breeding in the first place (since this would be wasted energy and resources producing unfit offspring). The main difference between these two models is the sequence of events: do populations ecologically diverge, and because of that then become reproductively isolated, or do populations selectively breed (enforcing reproductive isolation) and thus then evolve independently?

Reinforcement figure.jpg
An example of reinforcement leading to speciation. A) We start with two populations of a single species (a red fish population and a green fish population), which can interbreed (the arrows). B) Because these two groups can breed, hybrids of the two populations can be formed. However, due to the poor combination of red and green fish genes within a hybrid, they are not overly fit (the red cross). C) Since natural selection doesn’t favour forming hybrids, populations then adapt to selectively breed only with similar fish, reducing the amount of interbreeding that occurs. D) With the two populations effectively isolated from one another, different adaptations specific to each population (spines in red fish, purple stripes in green fish) can evolve, causing them to further differentiate. E) At some point in the differentiation process, hybrids move from being just selectively unfit (as in B)) to entirely impossible, thus making the two populations formal species. In this example, evolution has directly selected against hybrids first, thus then allowing ecological differences to occur (as opposed to the other way around).

Reproductive isolation through DMIs

The reproductive incompatibility of two populations (thus making them species) is often intrinsically linked to the genetic make-up of those two species. Some conflicts in the genetics of Population 1 and Population 2 may mean that a hybrid having half Population 1 genes and half Population 2 genes will have serious fitness problems (such as sterility or developmental problems). Dramatic genetic differences, particularly a difference in the number of chromosomes between the two sources, is a significant component of reproductive isolation and is usually to blame for sterile hybrids such as ligers, zorse and mules.

However, subtler genetic differences can also have a strong effect: for example, the unique combination of Population 1 and Population 2 genes within a hybrid might interact with one another negatively and cause serious detrimental effects. These are referred to as “Dobzhansky-Müller Incompatibilities” (DMIs) and are expected to accumulate as the two populations become more genetically differentiated from one another. This can be a little complicated to imagine (and is based upon mathematical models), but the basis of the concept is that some combinations of gene variants have never, over evolutionary history, been tested together as the two populations diverge. Hybridisation of these two populations suddenly makes brand new combinations of genes, some of which may be have profound physiological impacts (including on reproduction).

DMI figure
An example of how Dobzhansky-Müller Incompatibilities arise, adapted from Coyne & Orr (2004). We start with an initial population (center top), which splits into two separate populations. In this example, we’ll look at how 5 genes (each letter = one gene) change over time in the separate populations, with the original allele of the gene (lowercase) occasionally mutating into a new allele (upper case). These mutations happen at random times and in random genes in each population (the red letters), such that the two become very different over time. After a while, these two populations might form hybrids; however, given the number of changes in each population, this hybrid might have some combinations of alleles that are ‘untested’ in their evolutionary history (see below). These untested combinations may cause the hybrid to be infertile or unviable, making the two populations isolated species.

DMI table
The list of ‘untested’ genetic combinations from the above example. This table shows the different combinations of each gene that could be made in a hybrid if these two populations interbred. The red cells indicate combinations that have never been ‘tested’ together; that is, at no point in the evolutionary history of these two populations were those two particular alleles together in the same individual. Green cells indicate ones that were together at some point, and thus are expected to be viable combinations (since the resultant populations are obviously alive and breeding).

How can we look at speciation in action?

We can study the process of speciation in the natural world without focussing on the ‘reproductive isolation’ element of species identity as well. For many species, we are unlikely to have the detail (such as an annotated genome and known functions of genes related to reproduction) required to study speciation at this level in any case. Instead, we might choose to focus on the different factors that are currently influencing the process of speciation, such as how the environmental, demographic or adaptive contexts of populations plays a role in the formation of new species. Many of these questions fall within the domain of phylogeography; particularly, how the historical environment has shaped the diversity of populations and species today.

Phylogeo of speciation
An example of the interplay between speciation and phylogeography, taken from Reyes-Velasco et al. (2018). They investigated the phylogeographic history of several different groups of species within the frog genus Ptychadena; in this figure, we can see how the different species (indicated by the colours and tree on the left) relate to the geography of their habitat (right).

A variety of different analytical techniques can be used to build a picture of the speciation process for closely related or incipient species. A good starting point for any speciation study is to look at how the different study populations are adapting; is there evidence that natural selection is pushing these populations towards different genotypes or ecological niches? If so, then this might be a precursor for speciation, and we can build on this inference with other complementary analyses.

For example, estimating divergence times between populations can help us suggest whether there has been sufficient time for speciation to occur (although this isn’t always clear cut). Additionally, we could estimate the levels of genetic hybridisation (‘introgression’) between two populations to suggest whether they are reasonably isolated and divergent enough to be considered functional species.

The future of speciation genomics

Although these can help answer some questions related to speciation, new tools are constantly needed to provide a clearer picture of the process. Understanding how and why new species are formed is a critical aspect of understanding the world’s biodiversity. How can we predict if a population will speciate at some point? What environmental factors are most important for driving the formation of new species? How stable are species identities, really? These questions (and many more) remain elusive for a wide variety of life on Earth.

 

Of birds and bees: where do species come from?

This is Part 2 of a four part miniseries on the process of speciation: how we get new species, how we can see this in action, and the end results of the process. This week we’re taking a look at how new species are formed from natural selection. For Part 1, on the identity and concept of the species, click here.

The Origin of Species

Despite Darwin’s scientifically ground-breaking revelations over 150 years ago, the truth of the origin of species has remained a puzzling and complex question in biology. While the fundamental concepts of Darwin’s theory remain heavily supported – that groups which become separated from one another and undergo differing evolutionary pathways through natural selection may over time form new species – the mechanisms leading to this are mysterious. Even though the heritable component of evolution (DNA) was not uncovered for a hundred years after publishing ‘On the Origin of Species’, Darwin’s theory can largely explain many patterns of the formation of species on Earth.

The population-speciation continuum

The understanding that groups that are separated progress into species through differential adaptation leads to a phenomenon as the ‘speciation continuum’: all populations exist at some point on the continuum, with those that are most differentiated (i.e. most progressed) are distinct species, whereas those least differentiated are closely related or the same population. Whether or not populations progress along this continuum, and how fast this progression happens, depends on the difference in selective pressure and speed of evolution in the populations. Even if two populations are physically separated, they might not necessarily form new species if the separation is too short-term or if they do not evolve in different ways. Even if they do differentially evolve, whether or not they develop reproductive isolation is not always consistent.

Speciation continuum figure
A vague diagram of the population-speciation continuum. In this figure, we have two different organisms (Taxa 1 and Taxa 2) and we’re comparing their genetic similarity/differences (the grey arrow). At the bottom left of the chart, there are very few genetic differences between the two, likely indicated that they are from the same population (or closely related e.g. siblings). As we progress towards the upper left, the two start to diverge from one another, first to different populations of the same species, different subspecies of the same overarching species, and eventually becoming so different that they must be new species (i.e. are genetically incompatible and thus reproductively isolated). Exactly where this cut-off is a bit of a grey area (the species boundary) and is unlikely to be consistent across species.

Furthermore, how these populations are changing may affect the rate or success of speciation: if the traits that evolve differently across the population also cause them to be unable to breed, then they may quickly become reproductively isolated and thus new species. For example, Momigliano et al. (2017) demonstrated the fastest known rate of speciation (within 3000 generations) in a marine vertebrate in a species of flounders. Flounders that adapted to a higher salinity environment became reproductively isolated from their sister population as their sperm could not tolerate the high salinity conditions (directly preventing breeding and causing reproductive isolation).  This strong and rapid selection to an environment, and its subsequent selection on reproductive ability, was cutely described as a “magic trait”.

Modes of speciation

Darwin’s model of speciation describes what is called “allopatric speciation”, whereby physical separation of populations by some form of barrier (often attributed to changes such as climatic shifts, mountain range formations or island separation) isolates populations which then independently evolve until they reach a point of differentiation where they can no longer interbreed. Thus, they are now separate species (based on the Biological Species Concept, anyway). Allopatric speciation has traditionally believed to be the most common process of speciation, and is consistently used as the model for teaching and understanding speciation.

While this physical separation is the strongest and most immediately obvious method of speciation, other forms without geographic barriers have been documented. “Sympatric speciation” involves speciation events where there are no apparent geographical barriers that separate populations: instead, other factors may be driving their divergence from one another. This can relate to different microenvironments within the same area, where one population migrates and adapts to an environment which excludes the other population. This is referred to as “ecological speciation” and has been particularly noted within lake fish radiating into different habitats. There are a number of other mechanisms by which sympatric speciation could also occur, however, including temporal isolation (e.g. different flowering times in plants), sexual selection (e.g. a mutation leads to a new physiology that is more attractive to others with that physiology) or polyploidy (e.g. a ‘mutation’ causes an organism to have multiple copies of its genome, making it effectively reproductively isolated from its neighbours due to incompatible sex cells).

Allopatric vs sympatric speciation
Representations of allopatric and sympatric speciation using our friends the fruit-eating catsA) An example of allopatric speciation. Similar to how we’ve seen it before, a geographic barrier (the dashed green line) separates the ancestral species in two; each of these groups then evolve in different directions based on the different environmental pressures of each zone. After enough divergence, these two groups become reproductively isolated from one another and thus are different species. B) An example of sympatric speciation. We start with a single species of red apple eating cats, which form one contiguous group. A mutation within the group produces a new type of fruit-eating cat; one that feeds on green apples (grey cats). Because these feed on a different food source, they move into a different part of the environment, associating with other green apple-eating cats and less with red apple-eating cats. Over time, and with strong enough selection for apple preferences, these two types may become different species.

Sympatric speciation has received a great deal of controversy, due to the fact that some levels of gene flow could occur across the two populations with relative ease (compared to allopatric populations). This gene flow should cause the two populations to reconnect and prevent each population from evolving differently from one another (as changes in one population’s gene pool will be introduced into the other). Speciation with gene flow has been shown for some species, based on the idea that the pressure of natural selection (i.e. being adapted to the right habitat) is much stronger than the level of gene flow (i.e. the introduction of non-adapted genes from the other population), so the two populations still diverge genetically.

Gene flow across populations (through hybridisation) will balance out the different allele frequencies of the two gene pools, preventing adaptive alleles from moving towards fixation as per the standard natural selection process. While the effect of gene flow might slow the process, taking longer for the populations to diverge to the species level, speciation can still be achieved. Thus, the balance of gene flow and adaptive divergence is critical in determining whether ecological speciation is possible.

Sympatric speciation figure
A slightly more convoluted example of sympatric speciation. A) We start with a single species of small orange cats (top row), which can share readily share genes with one another. A mutation within the species creates a new type of cat; one that is much larger and has tufted ears. Although there are somewhat morphologically distinct from one another, they’re still genetically similar enough to continue to breed and share genes across the two types. However, with the big size comes a new ecological niche and these big cats differentially evolve to be grey (to hide better from their new bigger prey, perhaps) whilst the non-mutated group stays the same size and colour. Because large grey cats will preferentially breed with other large grey cats and not with small orange cats, this group genetically diverges from the ancestor to form a new species. B) A representation of the genetic changes between the two groups over time. The figure shows the genome (the grey bar) of the cat; the y-axis is the level of genetic differentiation between the two (measured as Fst). The different coloured sections represent specific genes within the genome, whilst the dashed line represents the average Fst across the whole genome. At initial divergence (top), there is little difference between the two. However, as the new big cats form and evolve, we can see the average Fst increase, with strong peaks around particular genes (blue and green; those related to the changes in physiology). As the two groups continue to diverge, this average raises even higher until genetic changes cause the reproduction-related genes (red and yellow) to become too different to allow for hybridisation, making the two species reproductively isolated (the red X in A)).

The reality of species

While the distinction between divergent populations and species might be a complex one, development in genomic technologies and greater understanding of evolutionary patterns is helping us uncover the real origin of species. And while species might not be as concrete a concept as one might expect, understanding the processes that generate new species and diversity is critical for understanding the diversity within nature that we see today, and also the potential diversity for the future (and why protecting said diversity is important!).

What is a species, anyway?

This is Part 1 of a four part miniseries on the process of speciation; how we get new species, how we can see this in action, and the end results of the process. This week, we’ll start with a seemingly obvious question: what is a species?

The definition of a ‘species’

‘Species’ are a human definition of the diversity of life. When we talk about the diversity of life, and the myriad of creatures and plants on Earth, we often talk about species diversity. This might seem glaringly obvious, but there’s one key issue: what is a species, anyway? While we might like to think of them as discrete and obvious groups (a dog is definitely not the same species as a cat, for example), the concept of a singular “species” is actually the result of human categorisation.

In reality, the diversity of life is spread across a huge spectrum of differentiation: from things which are closely related but still different to us (like chimps), to more different again (other mammals), to hardly relatable at all (bacteria and plants). So, what is the cut-off for calling something a species, and not a different genus, family, or kingdom? Or alternatively, at what point do we call a specific sub-group of a species as a sub-species, or another species entirely?

This might seem like a simple question: we look at two things, and they look different, so they must be different species, right? Well, of course, nature is never simple, and the line between “different” and “not different” is very blurry. Here’s an example: consider that you knew nothing about the history, behaviour or genetics of dogs. If you simply looked at all the different breeds of dogs on Earth, you might suggest that there are hundreds of species of domestic dogs. That seems a little excessive though, right? In fact, the domestic dog, Eurasian wolf, and the Australian dingo are all the same species (but different subspecies, along with about 38 others…but that’s another issue altogether).

Dogs
Morphology can be misleading for identifying species. In this example, we have A) a dog, B) also a dog, C) still a dog, D) yet another dog, and E) not a dog. For the record, A-D are all Canis lupus of some variety; and are domestic dogs (Canis lupus familiaris), C is a dingo (Canis lupus dingo) and is a grey wolf (Canis lupus lupus). E, however, is the Ethiopian wolf, Canis simensis.

How do we describe species?

This method of describing species based on how they look (their morphology) is the very traditional approach to taxonomy. And for a long time, it seemed to work…until we get to more complex scenarios like the domestic dog. Or scenarios where two species look fairly similar, but in reality have evolved entirely differently for a very, very long time. Or groups which look close to more than one other species. So how do we describe them instead?

Cats and foxes
A), a fox. B), a cat. C), a foxy cat? A catty fox? A cat-fox hybrid? Something unrelated to cat or a fox?

 

Believe it or not, there are dozens of ways of deciding what is a species and what isn’t. In Speciation (2004), Coyne & Orr count at least 25 different reported Species Concepts that had been suggested within science, based on different requirements such as evolutionary history, genetic identity, or ecological traits. These different concepts can often contradict one another about where to draw the line between species…so what do we use?

The Biological Species Concept (BSC)

The most commonly used species concept is called the Biological Species Concept (BSC), which denotes that “species are groups of interbreeding natural populations that are reproductively isolated from other such groups” (Mayr, 1942). In short, a population is considered a different species to another population if an individual from one cannot reliably breed to form fertile, viable offspring with an individual from the other. We often refer to this as “reproductive isolation.” It’s important to note that reproductive isolation doesn’t mean they can’t breed at all: just that the hybrid offspring will not live a healthy life and produce its own healthy offspring.

For example, a horse and zebra can breed to produce a zorse, however zorse are fundamentally infertile (due to the different number of chromosomes between a horse and a zebra) and thus a horse is a different species to a zebra. However, a German Shepherd and a chihuahua can breed and make a hybrid mutt, so they are the same species.

zorse
A zorse, which shows its hybrid nature through zebra stripes and horse colouring. These two are still separate species since zorses are infertile, and thus are not a singular stable entity.

You might naturally ask why reproductive isolation is apparently so important for deciding species. Most directly, this means that groups don’t share gene pools at all (since genetic information is introduced and maintained over time through breeding events), which causes them to be genetically independent of one another. Thus, changes in the genetic make-up of one species shouldn’t (theoretically) transfer into the gene pool of another species through hybrids. This is an important concept as the gene pool of a species is the basis upon which natural selection and evolution act: thus, reproductively isolated species may evolve in very different manners over time.

RI example
An example of how reproductive isolation maintains genetic and evolutionary independence of species. In A), our cat groups are robust species, reproductively isolated from one another (as shown by the black box). When each species undergoes natural selection and their genetic variation changes (colour changes on the cats and DNA), these changes are kept within each lineage. This contrasts to B), where genetic changes can be transferred between species. Without reproductive isolation, evolution in the orange lineage and the blue lineage can combine within hybrids, sharing the evolutionary pathways of both ancestral species.

Pitfalls of the BSC

Just because the BSC is the most used concept doesn’t make it infallible, however. Many species on Earth don’t easily demonstrate reproductive isolation from one another, nor does the concept even make sense for asexually reproducing species. If an individual reproduced solely asexually (like many bacteria, or even some lizards), then by the BSC definition every individual is an entirely different species…which seems a little excessive. Even in sexually reproducing organisms, it can be hard to establish reproductive isolation, possibly because the species never come into contact physically.

This raises the debate of whether two species could, let alone will, hybridise in nature, which can be difficult to determine. And if two species do produce hybrid offspring, assessing their fertility or viability can be difficult to detect without many generations of breeding and measurements of fitness (hybrids may not be sustainable in nature if they are not well adapted to their environment and thus the two species are maintained as separate identities).

Hybrid birds
An example of unfit hybrids causing effective reproductive isolation. In this example, we have two different bird species adapted to very different habitats; a smaller, long-tailed bird (left) adapted to moving through dense forest, and a large, longer-legged bird (right) adapted to traversing arid deserts. When (or if) these two species hybridised, the resultant offspring would be middle of the road, possessing too few traits to be adaptive in either the forest or the desert and no fitting intermediate environment available. Measuring exactly how unfit this hybrid would be is a difficult task in establishing species boundaries.

 

Integrative taxonomy

To try and account for the issues with the BSC, taxonomists try to push for the usage of “integrative taxonomy”. This means that species should be defined by multiple different agreeing concepts, such as reproductive isolation, genetic differentiation, behavioural differences, and/or ecological traits. The more traits that can separate the two, the greater support there is for the species to be separated: if they disagree, then more information is needed to determine exactly whether or not that should be called different species. Debates about taxonomy are ongoing and are likely going to be relevant for years to come, but form critical components of understanding biodiversity, patterns of evolution, and creating effective conservation legislation to protect endangered or threatened species (for whichever groups we decide are species).

 

How did pygmy perch swim across the desert?

“Pygmy perch swam across the desert”

As regular readers of The G-CAT are likely aware, my first ever scientific paper was published this week. The paper is largely the results of my Honours research (with some extra analysis tacked on) on the phylogenomics (the same as phylogenetics, but with genomic data) and biogeographic history of a group of small, endemic freshwater fishes known as the pygmy perch. There are a number of different messages in the paper related to biogeography, taxonomy and conservation, and I am really quite proud of the work.

Southern_pygmy_perch 1 MHammer
A male southern pygmy perch, which usually measures 6-8 cm long.

To my honest surprise, the paper has received a decent amount of media attention following its release. Nearly all of these have focused on the biogeographic results and interpretations of the paper, which is arguably the largest component of the paper. In these media releases, the articles are often opened with “…despite the odds, new research has shown how a tiny fish managed to find its way across the arid Australian continent – more than once.” So how did they manage it? These are tiny fish, and there’s a very large desert area right in the middle of Australia, so how did they make it all the way across? And more than once?!

 The Great (southern) Southern Land

To understand the results, we first have to take a look at the context for the research question. There are seven officially named species of pygmy perches (‘named’ is an important characteristic here…but we’ll go into the details of that in another post), which are found in the temperate parts of Australia. Of these, three are found with southwest Western Australia, in Australia’s only globally recognised biodiversity hotspot, and the remaining four are found throughout eastern Australia (ranging from eastern South Australia to Tasmania and up to lower Queensland). These two regions are separated by arid desert regions, including the large expanse of the Nullarbor Plain.

Pygmyperch_distributionmap
The distributions of pygmy perch species across Australia. The dots and labels refer to different sampling sites used in the study. A: the distribution of western pygmy perches, and essentially the extent of the southwest WA biodiversity hotspot region. B: the distribution of eastern pygmy perches, excluding N. oxleyana which occurs in upper NSW/lower QLD (indicated in C). C: the distributions relative to the map of Australia. The black region in the middle indicates the Nullarbor Plain. 

 

The Nullarbor Plain is a remarkable place. It’s dead flat, has no trees, and most importantly for pygmy perches, it also has no standing water or rivers. The plain was formed from a large limestone block that was pushed up from beneath the Earth approximately 15 million years ago; with the progressive aridification of the continent, this region rapidly lost any standing water drainages that would have connected the east to the west. The remains of water systems from before (dubbed ‘paleodrainages’) can be seen below the surface.

Nullarbor Plain photo
See? Nothing here. Photo taken near Watson, South Australia. Credit: Benjamin Rimmer.

Biogeography of southern Australia

As one might expect, the formation of the Nullarbor Plain was a huge barrier for many species, especially those that depend on regular accessible water for survival. In many species of both plants and animals, we see in their phylogenetic history a clear separation of eastern and western groups around this time; once widely distributed species become fragmented by the plain and diverged from one another. We would most certainly expect this to be true of pygmy perch.

But our questions focus on what happened before the Nullarbor Plain arrived in the picture. More than 15 million years ago, southern Australia was a massively different place. The climate was much colder and wetter, even in central Australia, and we even have records of tropical rainforest habitats spreading all the way down to Victoria. Water-dependent animals would have been able to cross the southern part of the continent relatively freely.

Biogeography of the enigmatic pygmy perches

This is where the real difference between everything else and pygmy perch happens. For most species, we see only one east and west split in their phylogenetic tree, associated with the Nullarbor Plain; before that, their ancestors were likely distributed across the entire southern continent and were one continuous unit.

Not for pygmy perch, though. Our phylogenetic patterns show that there were multiple splits between eastern and western ancestral pygmy perch. We can see this visually within the phylogenetic tree; some western species of pygmy perches are more closely related, from an evolutionary perspective, to eastern species of pygmy perches than they are to other western species. This could imply a couple different things; either some species came about by migration from east to west (or vice versa), and that this happened at least twice, or that two different ancestral pygmy perches were distributed across all of southern Australia and each split east-west at some point in time. These two hypotheses are called “multiple invasion” and “geographic paralogy”, respectively.

MCC_geographylabelled
The phylogeny of pygmy perches produced by this study, containing 45 different individuals across all species of pygmy perch. Species are labelled in the tree in brackets, and their geographic location (east or west) is denoted by the colour on the right. This tree clearly shows more than one E/W separation, as not all eastern species are within the same clade. For example, despite being an eastern species, N. variegata is more closely related to Nth. balstoni or N. vittata than to the other eastern species (N. australisN. obscuraN. oxleyana and N. ‘flindersi’.

So, which is it? We delved deeper into this using a type of analysis called ‘ancestral clade reconstruction’. This tries to guess the likely distributions of species ancestors using different models and statistical analysis. Our results found that the earliest east-west split was due to the fragmentation of a widespread ancestor ~20 million years ago, and a migration event facilitated by changing waterways from the Nullarbor Plain pushing some eastern pygmy perches to the west to form the second group of western species. We argue for more than one migration across Australia since the initial ancestor of pygmy perches must have expanded from some point (either east or west) to encompass the entirety of southern Australia.

BGB_figure
The ancestral area reconstruction of pygmy perches, estimated using the R package BioGeoBEARS. The different pie charts denote the relative probability of the possible distributions for the species or ancestor at that particular time; colours denote exactly where the distribution is (following the legend). As you can see, the oldest E/W split at 21 million years ago likely resulted from a single widespread ancestor, with it’s range split into an east and west group. The second E/W event, at 15 million years ago, most likely reflects a migration from east to west, resulting in the formation of the N. vittata species group. This coincides with the Nullarbor Plain, so it’s likely that changes in waterway patterns allowed some eastern pygmy perch to move westward as the area became more arid.

So why do we see this for pygmy perch and no other species? Well, that’s the real mystery; out of all of the aquatic species found in southeast and southwest Australia, pygmy perch are one of the worst at migrating. They’re very picky about habitat, small, and don’t often migrate far unless pushed (by, say, a flood). It is possible that unrecorded extinct species of pygmy perch might help to clarify this a little, but the chances of finding a preserved fish fossil (let alone for a fish less than 8cm in size!) is extremely unlikely. We can really only theorise about how they managed to migrate.

Pygmy perch biogeo history
A diagram of the distribution of pygmy perch species over time, as suggested by the ancestral area reconstruction. A: the initial ancestor of pygmy perches was likely found throughout southern Australia. B: an unknown event splits the ancestor into an eastern and western group; the sole extant species of the W group is Nth. balstoniC: the ancestor of the eastern pygmy perches spreads towards the west, entering part of the pre-Nullarbor region. D: due to changes in the hydrology of the area, some eastern pygmy perches (the maroon colour in C) are pushed towards the west; these form N. vittata species and N. pygmaea. The Nullarbor Plain forms and effectively cuts off the two groups from one another, isolating them.

What does this mean for pygmy perches?

Nearly all species of pygmy perch are threatened or worse in the conservation legislation; there have been many conservation efforts to try and save the worst-off species from extinction. Pygmy perches provide a unique insight to the history of the Australian climate and may be a key in unlocking some of the mysteries of what our land was like so long ago. Every species is important for conservation and even those small, hard-to-notice creatures that we might forget about play a role in our environmental history.

The direction of evolution: divergence vs. convergence

Direction of evolution

We’ve talked previously on The G-CAT about how the genetic underpinning of certain evolutionary traits can change in different directions depending on the selective pressure it is under. Particularly, we can see how the frequency of different alleles might change in one direction or another, or stabilise somewhere in the middle, depending on its encoded trait. But thinking bigger picture than just the genetics of one trait, we can actually see that evolution as an entire process works rather similarly.

Divergent evolution

The classic view of the direction of evolution is based on divergent evolution. This is simply the idea that a particular species possess some ancestral trait. The species (or population) then splits into two (for one reason or another), and each one of these resultant species and populations evolves in a different way to the other. Over time, this means that their traits are changing in different directions, but ultimately originate from the same ancestral source.

Evidence for divergent evolution is rife throughout nature, and is a fundamental component of all of our understanding of evolution. Divergent evolution means that, by comparing similar traits in two species (called homologous traits), we can trace back species histories to common ancestors. Some impressive examples of this exist in nature, such as the number of bones in most mammalian species. Humans have the same number of neck bones as giraffes; thus, we can suggest that the ancestor of both species (and all mammals) probably had a similar number of neck bones. It’s just that the giraffe lineage evolved longer bones whereas other lineages did not.

Homology figure
A diagrammatic example of homologous structures in ‘hand’ bones. The coloured bones demonstrate how the same original bone structures have diverged into different forms. Source: BiologyWise.

Convergent evolution

But of course, evolution never works as simply as you want it to, and sometimes we can get the direct opposite pattern. This is called convergent evolution, and occurs when two completely different species independently evolve very similar (sometimes practically identical) traits. This is often caused by a limitation of the environment; some extreme demand of the environment requires a particular physiological solution, and thus all species must develop that trait in order to survive. An example of this would be the physiology of carnivorous marsupials like Tasmanian devils or thylacines: despite being in another Class, their body shapes closely resemble something more canid. Likely, the carnivorous diet places some constraints on physiology, particularly jaw structure and strength.

Convergent evol intelligence
A surprising example of convergent evolution is cognitive ability in apes and some bird groups (e.g. corvids). There’s plenty of other animal groups more related to each of these that don’t demonstrate the same level of cognitive reasoning (based on the traits listed in the centre): thus, we can conclude that cognition has evolved twice in very, very different lineages. Source: Emery & Clayton, 2004.

A more dramatic (and potentially obvious) example of convergent evolution would be wings and the power of flight. Despite the fact that butterflies, bees, birds and bats all have wings and can fly, most of them are pretty unrelated to one another. It seems much more likely that flight evolved independently multiple times, rather than the other 99% of species that shared the same ancestor lost the capacity of flight.

Parallel evolution

Sometimes convergent evolution can work between two species that are pretty closely related, but still evolved independently of one another. This is distinguished from other categories of evolution as parallel evolution: the main difference is that while both species may have shared the same start and end point, evolution has acted on each one independent of the other. This can make it very difficult to diagnose from convergent evolution, and is usually determined by the exact history of the trait in question.

Parallel evolution is an interesting field of research for a few reasons. Firstly, it provides a scenario in which we can more rigorously test expectations and outcomes of evolution in a particular environment. For example, if we find traits that are parallel in a whole bunch of fish species in a particular region, we can start to look at how that particular environment drives evolution across all fish species, as opposed to one species case studies.

Marsupial handedness.jpg
Here’s another weird example; different populations of marsupials (particularly kangaroos and wallabies) show preferential handedness depending on where the population is. That is, different populations of different species of marsupials shows parallel evolution of handedness, since they’re related to one another but have evolved it independently of the other species. Source: Giljov et al. (2015).

Following from that logic, it is then important to question the mechanisms of parallelism. From a genetic point of view, do these various species use the same genes (and genetic variants) to produce the same identical trait? Or are there many solutions to the selective question in nature? While these questions are rather complicated, and there has been plenty of evidence both for and against parallel genetic underpinning of parallel traits, it seems surprisingly often that many different genetic combinations can be used to get the same result. This gives interesting insight into how complex genetic coding of traits can be, and how creative and diverse evolution can be in the real world.

Where is evolution going?

Cat phylogeny
An example of all three types of evolutionary trajectory in a single phylogeny of cats (you know how we do it here at The G-CAT). This phylogeny consists of two distinct genera; one with one species (P. aliquam) and another of three species (the red box indicates their distance). Our species have three main physical traits: coat colour, ear tufts and tail shape. At the ancestral nodes of the tree, we can see what the ancestor of these species looked like for these three traits. Each of these traits has undergone a different type of evolution. The tufts on the ears are the result of divergent evolution, since F. tuftus evolved the trait differently to its nearest relative, F. griseo. Contrastingly, the orange coat colour of F. tuftus and P. aliquam are the result of convergent evolution: neither of these species are very closely related (remembering the red box) and evolved orange coats independently of one another (since their ancestors are grey). And finally, the fluffy tails of F. hispida and F. griseo can be considered parallel evolution, since they’re similar evolutionarily (same genus) but still each evolved tail fluff independently (not in the ancestor). This example is a little convoluted, but if you trace the history of each trait in the phylogeny you can more easily see these different patterns.

So, where is evolution going for nature? Well, the answer is probably all over the place, but steered by the current environmental circumstances. Predicting the evolutionary impacts of particular environmental change (e.g. climate change) is exceedingly difficult but a critical component of understanding the process of evolution and the future of species. Evolution continually surprises us with creative solution to complex problems and I have no doubt new mysteries will continue to be thrown at us as we delve deeper.

All the world in the palm of your hand: whole genome sequencing for evolution and conservation

Building an entire genome

If bigger is better, then biggest is best. Having the genome of a particular study species fully sequenced allows us to potentially look at all of the genetic variation in the entire gene pool: but how do we sequence the entirety of the genome? And what are the benefits of having a whole genome to refer to?

Whole genome assembly
A very, very simplified overview of whole genome sequencing. Similar to other genomic technologies, we start by fragmenting the genome into much smaller, easier to sequence parts (reads). We then use a computer algorithm which pieces these reads together into a consecutive sequence based on overlapping DNA sequence (like building a chain out of Lego blocks). From this assembled genome, we can then attach annotations using information from other species’ genomes or genetic studies, which can correlate a particular sequence to a gene, a function of that gene, and the resultant protein from these gene (although not always are all of these aspects included).

Well, assembling the whole genome of an organism for the first time is a very tricky process. It involves taking DNA sequence from only a few individuals, breaking them down into smaller fragments and multiplying these fragments into the billions (moreorless the same process used in other genomics technologies: the real difference is that we need the full breadth of the genome so that we don’t miss any spaces). From these fragments, we use a complex computer algorithm which builds up a consensus sequence like a Lego tower; by finding parts of sequences which overlap, the software figures out which pieces connect to one another. Hopefully, we eventually end up with one very long continuous sequence; the genome! Sometimes, we might end with a few very large blocks (called contigs), but this is also useful for analyses (correlated with how many/big blocks there are). With this full genome, we use information from other more completed genomes (such as those from model species like humans, mice or even worms) to figure out which sections of the genome relate to specific genes. We can then annotate these sections by labelling them as clear genes, complete with start and end point, and attach a particular physical function of that gene.

The benefits of whole genomes

Having an entire genome as a reference is an extremely helpful tool in conservation and evolutionary studies. The first, and perhaps most obvious benefit, is the sheer scale of the data we can use. By having the entirety of the genome available, we can use potentially billions of base pairs of sequence in our genetic analyses (for reference, the human genome is >3 billion base pairs long). Even if we don’t sequence the full genome for all of our samples, having a reference genome as basis for assembly our reduced datasets significantly improves the quantity and quality of sequences we can use.

Another very important benefit is the ability to prescribe function in our studies. Many of our processes for obtaining data, even for genomic technologies, use random and anonymous fragments of the genome. Although this is a cost-effective way to obtain a very large amount of data, it unfortunately means that we often have no idea which part of the genome our sequences came from. This means that we don’t know which sequences relate to specific genes, and even if we did we would have no idea what those genes are or do! But with an annotated genome, we can take even our fragmented sequence and check it against the genome and find out what genes are present.

Understanding adaptation

Based on that, it seems pretty obvious about exactly how having an annotated genome can help us in studies of adaptation. Knowing the functional aspect of our genetic data allows us to more directly determine how evolution is happening in nature; instead of only being able to say that two species are evolving differently from one another, for example, we can explicitly look at how they are evolving. Is one evolving tolerance to hotter temperatures? Are they evolving different genes to handle different diets? Are they evolving in response to an external influence, like a viral outbreak or changing climate? What are the physiological consequences of these changes? These questions are critical in understanding past and future evolution, and full genome analysis allows us to delve into them much deeper.

Manhattan plot example
A (slightly edited) figure of full genome comparisons between domestic dogs and wild wolves by Axelsson et al. (2013), with the aim of understanding the evolutionary changes associated with domestication. For avid readers, this figure probably looks familiar. This figure compares the genetic differentiation across the entire genome between dogs and wolves, with some sections of the genome (circled) showing clear differences. As there is an annotated dog genome, the authors then delved into these genes to understand the functional differences between the two. By comparing their genetic differences to functional genes, the authors can more explicitly suggest mechanisms or changes associated with the domestication process (such as adaptation to a starch-heavy and human-influenced diet).

 

 

This includes allowing us to better understand how adaptation actually works in nature. As we’ve discussed before, more traditional studies often assumed that single, or very few, genes were responsible for allowing a species to adapt and change, and that these genes had very strong effects on their physiology. But what we see far more often is polygenic adaptation; small changes in a very large number of genes which, combined together, allow the species to adapt and evolve. By having the entirety of the genome available, we are much more likely to capture all of the genes that are under natural selection in a particular population or species, painting a clearer picture of their evolutionary trajectory.

Understanding demography

The much larger dataset of full genomes is also important for understanding the non-adaptive parts of evolution; the demographic history. Given that selectively neutral impacts (e.g. reductions in population size) are likely to impact all of the genes in the gene pool somewhat equally, having a full genome allows us to more accurately infer the demographic state and historical patterns of species.

For both adaptive and non-adaptive variation, it is also important to consider what we call linkage disequilibrium. Genetic sequences that are physically close to each other in the genome will often be inherited together due to the imprecision of recombination (a fairly technical process, so I won’t delve into this): what this can mean is that if a gene is under very strong selection, then sequences around this gene will also look like they’re under selection too. This can give falsely positive adaptive genes (i.e. sequences that look like genes under selection but are just linked to a gene that is) or can interfere with demographic analyses (since they often assume no selection, or linkage to selection, on the sequences used). With a whole genome, we can actually estimate how far away a base pair has to be before it’s not linked anymore; we call these linkage blocks, and they’re very useful additions to analyses.

Linkage_example
An example of linkage as a process. We start with a particular sequence (top); during recombination, this sequence may randomly break and rearrange into different parts. In this example, I’ve simulated four different ‘breaks’ (dashed coloured lines) due to recombination. Each of these breaks leads to two separate blocks of fragments; for example, the break at the blue line results in the second two sequence blocks (middle). If we focus on one target base pair in the sequence (golden A), then we can see in some fragments it remains with certain bases, but sometimes it gets separated by the break. If we compare how often the golden A is in the same block (i.e. is co-inherited) as each of the other bases, across all 4 breaks, then we see that the bases that are closest to it (the golden A is represented by the golden bar) are almost always in the same block. This makes sense: the further away a base is from our target, the more likely that there will be a break between it. This is shown in the frequency distributions at the bottom: the left figure shows the actual frequencies of co-inheritance (i.e. linkage) using the top example and those 4 breaks. The right figure shows a more realistic depiction of how linkage looks in the genome; it rapidly decays as we move away from the target (although the width and rate of this can vary).

Improving conservation management

In a similar fashion to demography, full genome datasets can improve our estimates of relatedness and pedigrees in captive breeding programs. The massive scale of whole genomes allows us to more easily trace the genealogical history of individuals, allowing us to assign parents more accurately. This also helps with our estimations of genetic relatedness, arguably the most critical aspect of genetic-based breeding programs. This is particularly helpful for species with tricky mating patterns, such as polyamory, brood spawning or difficult to track organisms.

Pedigrees
An example of how whole genomes can improve our estimation of pedigrees. Say we have a random individual (star), and we want to know how they fit into a particular family tree (pedigree). With only a few genes, we might struggle to pick where in the family it fits based on limited genetic information. With a larger genetic dataset (such as reduced-representation genomics), we might be able to cross off a few potential candidate spots but still have some trouble with a few places (due to unknown parents, polygamy or issues with genetic analysis). With whole genomes, we should be able to much better clarify the whole pedigree and find exactly where our star individual fits in the tree (red circle). It is thanks to whole genomes, we can do those ancestry analyses that have gone viral lately!

The way forwards

While many non-model species are still lacking in the available genomic information, whole genomes are progressively being sequenced for more and more species. As this astronomical dataset grows, our ability to investigate, discover and test theories about evolution, natural selection and conservation will also improve. Many projects already exist which aim specifically to increase the number of whole genomes available for certain taxonomic groups such as birds and bats: these will no doubt prove to be invaluable resources for future studies.