You’re perfect, you’re beautiful, you look like a model (species)

What is a ‘model’?

There are quite literally millions of species on Earth, ranging from the smallest of microbes to the largest of mammals. In fact, there are so many that we don’t actually have a good count on the sheer number of species and can only estimate it based on the species we actually know about. Unsurprisingly, then, the number of species vastly outweighs the number of people that research them, especially considering the sheer volumes of different aspects of species, evolution, conservation and their changes we could possibly study.

Species on Earth estimate figure
Some estimations on the number of eukaryotic species (i.e. not including things like bacteria), with the number of known species in blue and the predicted number of total species on Earth in purpleSource: Census of Marine Life.

This is partly where the concept of a ‘model’ comes into it: it’s much easier to pick a particular species to study as a target, and use the information from it to apply to other scenarios. Most people would be familiar with the concept based on medical research: the ‘lab rat’ (or mouse). The common house mouse (Mus musculus) and the brown rat (Rattus norvegicus) are some of the most widely used models for understanding the impact of particular biochemical compounds on physiology and are often used as the testing phase of medical developments before human trials.

So, why are mice used as a ‘model’? What actually constitutes a ‘model’, rather than just a ‘relatively-well-research-species’? Well, there are a number of traits that might make certain species ideal subjects for understanding key concepts in evolution, biology, medicine and ecology. For example, mice are often used in medical research given their (relative) similar genetic, physiological and behavioural characteristics to humans. They’re also relatively short-lived and readily breed, making them ideal to observe the more long-term effects of medical drugs or intergenerational impacts. Other species used as models primarily in medicine include nematodes (Caenorhabditis elegans), pigs (Sus scrofa domesticus), and guinea pigs (Cavia porcellus).

The diversity of models

There are a wide variety and number of different model species, based on the type of research most relevant to them (and how well it can be applied to other species). Even with evolution and conservation-based research, which can often focus on more obscure or cryptic species, there are several key species that have widely been applied as models for our understanding of the evolutionary process. Let’s take a look at a few examples for evolution and conservation.

Drosophila

It would be remiss of me to not mention one of the most significant contributors to our understanding of the genetic underpinning of adaptation and speciation, the humble fruit fly (Drosophila melanogaster, among other species). The ability to rapidly produce new generations (with large numbers of offspring with very short generation time), small fully-sequenced genome, and physiological variation means that observing both phenotypic and genotypic changes over generations due to ‘natural’ (or ‘experimental’) selection are possible. In fact, Drosphilia spp. were key in demonstrating the formation of a new species under laboratory conditions, providing empirical evidence for the process of natural selection leading to speciation (despite some creationist claims that this has never happened).

Drosophila speciation experiment
A simplified summary of the speciation experiment in Drosophila, starting with a single species and resulting in two reproductively isolated species based on mating and food preference. Source: Ilmari Karonen, adapted from here.

Darwin’s finches

The original model of evolution could be argued to be Darwin’s finches, as the formed part of the empirical basis of Charles Darwin’s work on the theory of evolution by natural selection. This is because the different species demonstrate very distinct and obvious changes in morphology related to a particular diet (e.g. the physiological consequences of natural selection), spread across an archipelago in a clear demonstration of a natural experiment. Thus, they remain the original example of adaptive radiation and are fundamental components of the theory of evolution by natural selection. However, surprisingly, Darwin’s finches are somewhat overshadowed in modern research by other species in terms of the amount of available data.

Darwin's finches drawings
Some of Darwin’s early drawings of the morphological differences in Galapagos finch beaks, which lead to the formulation of the theory of evolution by natural selection.

Zebra finches

Even as far as birds go, one species clearly outshines the rest in terms of research. The zebra finch is one of the most highly researched vertebrate species, particularly as a model of song learning and behaviour in birds but also as a genetic model. The full genome of the zebra finch was the second bird to ever be sequenced (the first being a chicken), and remains one of the more detailed and annotated genomes in birds. Because of this, the zebra finch genome is often used as a reference for other studies on the genetics of bird species, especially when trying to understand the function of genetic changes or genes under selection.

Zebra finches.jpg
A pair of (very cute) model zebra finches. Source: Michael Lawton via Smithsonian.com.

 

Fishes

Fish are (perhaps surprisingly) also relatively well research in terms of evolutionary studies, largely due to their ancient origins and highly diverse nature, with many different species across the globe. They also often demonstrate very rapid and strong bouts of divergence, such as the cichlid fish species of African lakes which demonstrate how new species can rapidly form when introduced to new and variable environments. The cichlids have become the poster child of adaptive radiation in fishes much in the same way that Darwin’s finches highlighted this trend in birds. Another group of fish species used as a model for similar aspects of speciation, adaptive divergence and rapid evolutionary change are the three-spine and nine-spine stickleback species, which inhabit a variety of marine, estuarine and freshwater environments. Thus, studies on the genetic changes across these different morphotypes is a key in understanding how adaptation to new environments occur in nature (particularly the relatively common transition into different water types in fishes).

cichlid diversity figure
The sheer diversity of species and form makes African cichlids an ideal model for testing hypotheses and theories about the process of evolution and adaptive radiation. Figure sourced from Brawand et al. (2014) in Nature.

Zebra fish

More similar to the medical context of lab rats is the zebrafish (ironically, zebra themselves are not considered a model species). Zebrafish are often used as models for understanding embryology and the development of the body in early formation given the rapid speed at which embryonic development occurs and the transparent body of embryos (which makes it easier to detect morphological changes during embryogenesis).

Zebrafish embryo
The transparent nature of zebrafish embryos make them ideal for studying the development of organisms in early stages. Source: yourgenome.org.

Using information from model species for non-models

While the relevance of information collected from model species to other non-model species depends on the similarity in traits of the two species, our understanding of broad concepts such as evolutionary process, biochemical pathways and physiological developments have significantly improved due to model species. Applying theories and concepts from better understood organisms to less researched ones allows us to produce better research much faster by cutting out some of the initial investigative work on the underlying processes. Thus, model species remain fundamental to medical advancement and evolutionary theory.

That said, in an ideal world all species would have the same level of research and resources as our model species. In this sense, we must continue to strive to understand and research the diversity of life on Earth, to better understand the world in which we live. Full genomes are progressively being sequenced for more and more species, and there are a number of excellent projects that are aiming to sequence at least one genome for all species of different taxonomic groups (e.g. birds, bats, fish). As the data improves for our non-model species, our understanding of evolution, conservation management and medical research will similarly improve.

Lost in a forest of (gene) trees

Using genetics to understand species history

The idea of using the genetic sequences of living organisms to understand the evolutionary history of species is a concept much repeated on The G-CAT. And it’s a fundamental one in phylogenetics, taxonomy and evolutionary biology. Often, we try to analyse the genetic differences between individuals, populations and species in a tree-like manner, with close tips being similar and more distantly separated branches being more divergent. However, this runs on one very key assumption; that the patterns we observe in our study genes matches the overall patterns of species evolution. But this isn’t always true, and before we can delve into that we have to understand the difference between a ‘gene tree’ and a ‘species tree’.

A gene tree or a species tree?

Our typical view of a phylogenetic tree is actually one of a ‘gene tree’, where we analyse how a particular gene (or set of genes) have changed over time between different individuals (within and across populations or species) based on our understanding of mutation and common ancestry.

However, a phylogenetic tree based on a single gene only demonstrates the history of that gene. What we assume in most cases is that the history of that gene matches the history of the species: that branches in the genetic tree mirror when different splits in species occurred throughout history.

The easiest way to conceptualise gene trees and species trees is to think of individual gene trees that are nested within an overarching species tree. In this sense, individual gene trees can vary from one another (substantially, even) but by looking at the overall trends of many genes we can see how the genome of the species have changed over time.

Gene tree incongruence figure
A (potentially familiar) depiction of individual gene trees (coloured lines) within the broader species tree (defined b the black boundaries). As you might be able to tell, the branching patterns of the different genes are not the same, and don’t always match the overarching species tree.

Gene tree incongruence

Different genes may have different patterns for a number of reasons. Changes in the genetic sequences of organisms over time don’t happen equally across the entire genome, and very specific parts of the genome can evolve in entirely different directions, or at entirely different rates, than the rest of the genome. Let’s take a look at a few ways we could have conflicting gene trees in our studies.

Incomplete lineage sorting

One of the most prolific, but more complicated, ways gene trees can vary from their overarching species tree is due to what we call ‘incomplete lineage sorting’. This is based on the idea that species and the genes that define them are constantly evolving over time, and that because of this different genes are at different stages of divergence between population and species. If we imagine a set of three related populations which have all descended from a single ancestral population, we can start to see how incomplete lineage sorting could occur. Our ancestral population likely has some genetic diversity, containing multiple alleles of the same locus. In a true phylogenetic tree, we would expect these different alleles to ‘sort’ into the different descendent populations, such that one population might have one of the alleles, a second the other, and so on, without them sharing the different alleles between them.

If this separation into new populations has been recent, or if gene flow has occurred between the populations since this event, then we might find that each descendent population has a mixture of the different alleles, and that not enough time has passed to clearly separate the populations. For this to occur, sufficient time for new mutations to occur and genetic drift to push different populations to differently frequent alleles needs to happen: if this is too recent, then it can be hard to accurately distinguish between populations. This can be difficult to interpret (see below figure for a visualisation of this), but there’s a great description of incomplete lineage sorting here.

ILS_adaptedfigure
A demonstration of incomplete lineage sorting, generously adapted from a talk by fellow MELFU postdocs Dr Yuma (Jonathon) Sandoval-Castillo and Dr Catherine Attard. On the left is a depiction of a single gene coalescent tree over time: circles represent a single individual at a particular point in time (row) with the colours representing different alleles of that same gene. The tree shows how new mutations occur (colour changes along the branches) and spread throughout the descendent populations. In this example, we have three recently separated species, with a good number of different alleles. However, when we study these alleles in tree form (the phylogeny on the right), we see that the branches themselves don’t correlate well with the boundaries of the species. For example, the teal allele found within Species C is actually more similar to Species B alleles (purple and blue) than any other Species B alleles, based on the order and patterns of these mutations.

Hybridisation and horizontal transfer

Another way individual genes may become incongruent with other genes is through another phenomenon we’ve discussed before: hybridisation (or more specifically, introgression). When two individuals from different species breed together to form a ‘hybrid’, they join together what was once two separate gene pools. Thus, the hybrid offspring has (if it’s a first generation hybrid, anyway) 50% of genes from Species A and 50% of genes from Species B. In terms of our phylogenetic analysis, if we picked one gene randomly from the hybrid, we have 50% of picking a gene that reflects the evolutionary history of Species A, and 50% chance of picking a gene that reflects the evolutionary history of Species B. This would change how our outputs look significantly: if we pick a Species A gene, our ‘hybrid’ will look (genetically) very, very similar to Species A. If we pick a Species B gene, our ‘hybrid’ will look like a Species B individual instead. Naturally, this can really stuff up our interpretations of species boundaries, distributions and identities.

Hybridisation_figure
An example of hybridisation leading to gene tree incongruence with our favourite colourful fishA) We have a hybridisation event between a red fish (Species A) and a green fish (Species B), resulting in a hybrid species (‘Species’ H). The red fish genome is indicated by the yellow DNA, the green fish genomes by the blue DNA, and the hybrid orange fish has a mixture of these two. B) If we sampled one set of genes in the hybrid, we might select a gene that originated from the red fish, showing that the hybrid is identical (or very similar) the Species A. D) Conversely, if we sampled a gene originating from the green fish, the resultant phylogeny might show that the hybrid is the same as Species B. C) If we consider these two patterns in combination, which see the true pattern of species formation, which is not a clear dichotomous tree and rather a mixture of the two sets of trees.

Paralogous genes

More confusingly, we can even have events where a single gene duplicates within a genome. This is relatively rare, although it can have huge effects: for example, salmon have massive genomes as the entire thing was duplicated! Each version of the gene can take on very different forms, functions, and evolve in entirely different ways. We call these duplicated variants paralogous genes: genes that look the same (in terms of sequence), but are totally different genes.

This can have a profound impact as paralogous genes are difficult to detect: if there has been a gene duplication early in the evolutionary history of our phylogenetic tree, then many (or all) of our study samples will have two copies of said gene. Since they look similar in sequence, there’s all possibility that we pick Variant 1 in some species and Variant 2 in other species. Being unable to tell them apart, we can have some very weird and abstract results within our tree. Most importantly, different samples with the same duplicated variant will seem similar to one another (e.g. have evolved from a common ancestor more recently) than it will to any sample of the other variant (even if they came from the exact same species)!

Paralogy_figure.jpg
An example of how paralogous genes can confound species tree. We start with a single (purple) gene: at a particular point in time, this gene duplicates into a red and a blue form. Each of these genes then evolve and spread into four separate descendent species (A, B, C and D) but not in entirely the same way. However, since both the red and blue genetic sequences are similar, if we took a single gene from each species we might (somewhat randomly) sequence either the red or the blue copy. The different phylogenetic trees on the right demonstrate how different combinations of red and blue genes give very different patterns, since all blue copies will be more related to other blue genes than to the red gene of the same species. E.g. a blue A and a blue C are more similar than a blue A and a red A.

Overcoming incongruence with genomics

Although a tricky conundrum in phylogenetics and evolutionary genetics broadly, gene tree incongruence can largely be overcome with using more loci. As the random changes of any one locus has a smaller effect of the larger total set of loci, the general and broad patterns of evolutionary history can become clearer. Indeed, understanding how many loci are affected by what kind of process can itself become informative: large numbers of introgressed loci can indicate whether hybridisation was recent, strong, or biased towards one species over another, for example. As with many things, the genomic era appears poised to address the many analytical issues and complexities of working with genetic data.

 

Notes from the Field: Octoroks

Scientific name

Octorokus infletus

Meaning: Octorokus from [octorok] in Hylian; infletus from [inflate] in Latin.

Translation: inflating octorok; all varieties use an inflatable air sac derived from the swim bladder to float and scan the horizon.

Varieties

Octorokus infletus hydros [aquatic morphotype]

Octorokus infletus petram [mountain morphotype]

Octorokus infletus silva [forest morphotype]

Octorokus infletus arctus [snow morphotype]

Octorokus infletus imitor [deceptive morphotype]

All octoroks.jpg
The various morphotypes of inflating octoroksA: The water octorok, considered the morphotype closest to the ancestral physiology of the species. B: The forest octorok, with grass camouflage. C: The deceptive octorok, which has replaced its tufted vegetation with a glittering chest as bait. D: The mountainous octorok, with rock camouflage. E: The snow octorok, with tundra grass camouflage.

Common name

Variable octorok

Taxonomic status

Kingdom Animalia; Phylum Mollusca; Class Cephalapoda; Order Octopoda; Family Octopididae; Genus Octorokus; Species infletus

Conservation status

Least Concern

Distribution

The species is found throughout all major habitat regions of Hyrule, with localised morphotypes found within specific habitats. The only major region where the variable octorok is not found is within the Gerudo Desert, suggesting some remnant dependency of standing water.

Octorok distribution.jpg
The region of Hyrule, with the distribution of octoroks in blue. The only major region where they are not found is the Gerudo Desert in the bottom left.

Habitat

Habitat choice depends on the physiology of the morphotype; so long as the environment allows the octorok to blend in, it is highly likely there are many around (i.e. unseen).

Behaviour and ecology

The variable octorok is arguably one of the most diverse species within modern Hyrule, exhibiting a large number of different morphotypic forms and occurring in almost all major habitat zones. Historical data suggests that the water octorok (Octorokus infletus hydros) is the most ancestral morphotype, with ancient literature frequently referring to them as sea-bearing or river-traversing organisms. Estimates from the literature suggests that their adaptation to land-based living is a recent evolutionary step which facilitated rapid morphological radiation of the lineage.

Several physiological characteristics unite the variable morphological forms of the octorok into a single identifiable species. Other than the typical body structure of an octopod (eight legs, largely soft body with an elongated mantle region), the primary diagnostic trait of the octorok is the presence of a large ‘balloon’ with the top of the mantle. This appears to be derived from the swim bladder of the ancestral octorok, which has shifted to the cranial region. The octorok can inflate this balloon using air pumped through the gills, filling it and lifting the octorok into the air. All morphotypes use this to scan the surrounding region to identify prey items, including attacking people if aggravated.

inflated octorok
A water morphotype octorok with balloon inflated.

Diets of the octorok vary depending on the morphotype and based on the ecological habitat; adaptations to different ecological niches is facilitated by a diverse and generalist diet.

Demography

Although limited information is available on the amount of gene flow and population connectivity between different morphotypes, by sheer numbers alone it would appear the variable octorok is highly abundant. Some records of interactions between morphotypes (such as at the water’s edge within forested areas) implies that the different types are not reproductively isolated and can form hybrids: how this impacts resultant hybrid morphotypes and development is unknown. However, given the propensity of morphotypes to be largely limited to their adaptive habitats, it would seem reasonable to assume that some level of population structure is present across types.

Adaptive traits

The variable octorok appears remarkably diverse in physiology, although the recent nature of their divergence and the observed interactions between morphological types suggests that they are not reproductively isolated. Whether these are the result of phenotypic plasticity, and environmental pressures are responsible for associated physiological changes to different environments, or genetically coded at early stages of development is unknown due to the cryptic nature of octorok spawning.

All octoroks employ strong behavioural and physiological traits for camouflage and ambush predation. Vegetation is usually placed on the top of the cranium of all morphotypes, with the exact species of plant used dependent on the environment (e.g. forest morphotypes will use grasses or ferns, whilst mountain morphotypes will use rocky boulders). The octorok will then dig beneath the surface until just the vegetation is showing, effectively blending in with the environment and only occasionally choosing to surface by using the balloon. Whether this behaviour is passed down genetically or taught from parents is unclear.

Management actions

Few management actions are recommended for this highly abundant species. However, further research is needed to better understand the highly variable nature and the process of evolution underpinning their diverse morphology. Whether morphotypes are genetically hardwired by inheritance of determinant genes, or whether alterations in gene expression caused by the environmental context of octoroks (i.e. phenotypic plasticity) provides an intriguing avenue of insight into the evolution of Hylian fauna.

Nevertheless, the transition from the marine environment onto the terrestrial landscape appears to be a significant stepping stone in the radiation of morphological structures within the species. How this has been facilitated by the genetic architecture of the octorok is a mystery.

 

Moving right along: dispersal and population structure

The impact of species traits on evolution

Although we often focus on the genetic traits of species in molecular ecology studies, the physiological (or phenotypic) traits are equally as important in shaping their evolution. These different traits are not only the result themselves of evolutionary forces but may further drive and shape evolution into the future by changing how an organism interacts with the environment.

There are a massive number of potential traits we could focus on, each of which could have a large number of different (and interacting) impacts on evolution. One that is often considered, and highly relevant for genetic studies, is the influence of dispersal capability.

Dispersal

Dispersal is essentially the process of an organism migrating to a new habitat, to the point of the two being used almost interchangeably. Often, however, we regard dispersal as a migration event that actually has genetic consequences; particularly, if new populations are formed or if organisms move from one population to another. This can differ from straight migration in that animals that migrate might not necessarily breed (and thus pass on genes) into a new region during their migration; thus, evidence of those organisms will not genetically proliferate into the future through offspring.

Naturally, the ability of organisms to disperse is highly variable across the tree of life and reliant on a number of other physiological factors. Marine mammals, for example, can disperse extremely far throughout their lifetimes, whereas some very localised species like some insects may not move very far within their lifetime at all. The movement of organisms directly facilitates the movement of genetic material, and thus has significant impacts on the evolution and genetic diversity of species and populations.

Dispersal vs pop structure
The (simplistic) relationship between dispersal capability and one aspect of population genetics, population structure (measured as Fst). As organisms are more capable of dispersing longer distance (or more frequently), the barriers between populations become weaker.

Highly dispersive species

At one end of the dispersal spectrum, we have highly dispersive species. These can move extremely long distances and thus mix genetic material from a wide range of habitats and places into one mostly-cohesive population. Because of this, highly dispersive species often have strong colonising abilities and can migrate into a range of different habitats by tolerating a wide range of conditions. For example, a single whale might hang around Antarctica for part of the year but move to the tropics during other times. Thus, this single whale must be able to tolerate both ends of the temperature spectrum.

As these individuals occupy large ranges, localised impacts are unlikely to critically affect their full distribution. Individual organisms that are occupying an unpleasant space can easily move to a more favourable habitat (provided that one exists). Furthermore, with a large population (which is more likely with highly dispersive species), genetic drift is substantially weaker and natural selection (generally) has a higher amount of genetic diversity to work with. This is, of course, assuming that dispersal leads to a large overall population, which might not be the case for species that are critically endangered (such as the cheetah).

Highly dispersive animals often fit the “island model” of Wright, where individual subpopulations all have equal proportions of migrants from all other subpopulations. In reality, this is rare (or unreasonable) due to environmental or physiological limitations of species; distance, for example, is not implicitly factored into the basic island model.

Island model
The Wright island model of population structure. In this example, different independent populations are labelled in the bold letters, with dispersal pathways demonstrated by the different arrows. In the island model, dispersal is equally likely between all populations (including from BD in this example, even though there aren’t any arrows showing it). Naturally, this is not overly realistic and so the island model is used mostly as a neutral, base model.

Intermediately dispersing species

A large number of species, however, are likely to occupy a more intermediate range of dispersal ability. These species might be able to migrate to neighbouring populations, or across a large proportion of their geographic range, but individuals from one end of the range are still somewhat isolated from individuals at the other end.

This often leads to some effect of population structure; different portions of the geographic range are genetically segregated from one another depending on how much gene flow (i.e. dispersal) occurs between populations. In the most simplest scenario, this can lead to what we call isolation-by-distance. Rather than forming totally independent populations, gene flow occurs across short ranges between adjacent ‘populations’. This causes a gradient of genetic differentiation, with one end of the range being clearly genetically different to the other end, with a gradual slope throughout the range. We see this often in marine invertebrates, for example, which might have somewhat localised dispersal but still occupy a large range by following oceanographic currents.

River IDB network
An example of how an isolation-by-distance population network might come about. In this example, we have a series of populations (the different pie charts) spread throughout a river system (that blue thing). The different pie charts represent how much of the genetics of that population matches one end of the river: either the blue end (left) or red end (right). Populations can easily disperse into adjacent populations (the green arrows) but less so to further populations. This leads to gradual changes across the length of the river, with the far ends of the river clearly genetically distinct from the opposite end but relatively similar to neighbouring populations.
River IDB pop structure.jpg
The genetic representation of the above isolation-by-distance example. Each column represents a single population (in the previous figure, a pie chart), with the different colours also representing the relative genetic identity of that population. As you can see, moving from Population 1 to 10 leads to a gradient (decreasing) in blue genes but increase in red genes. The inverse can be said moving in the opposite direction. That said, comparing Population 1 and Population 10 shows that they’re clearly different, although there is no clear cut-off point across the range of other populations.

Medium dispersal capabilities are also often a requirement for forming ‘metapopulations’. In this population arrangement, several semi-independent populations are present within the geographic range of the species. Each of these are subject to their own local environmental pressures and demographic dynamics, and because of this may go locally extinct at any given time. However, dispersal connections between many of these populations leads to recolonization and gene flow patterns, allowing for extinction-dispersal dynamics to sustain the overall metapopulation. Generally, this would require greater levels of dispersal than those typically found within metapopulation species, as individuals must traverse uninhabitable regions relatively frequently to recolonise locally extinct habitat.

Metapopulation structure.jpg
An example of metapopulation dynamics. Different subpopulations (lettered circles) are connected via dispersal (arrows). These different subpopulations can be different sizes and are mostly independent of one another, meaning that a single subpopulation can go locally extinct (the red X) without collapsing the entire system. The different dispersal pathways mean that one population can recolonise extinct habitat and essentially ‘rebirth’ other subpopulations (the green arrows).

Weakly dispersing species

At the far opposite end of the dispersal ability spectrum, we have low dispersal species. These are often localised, endemic species that for various reasons might be unable to travel very far at all; for some, they may spend their entire adult life in a sedentary form. The lack of dispersal lends to very strong levels of population structure, and individual populations often accumulate genetic differences relatively quickly due to genetic drift or local adaptation.

Species with low dispersal capabilities are often at risk of local extinction and are unable to easily recolonise these habitats after the event has ended. Their movement is often restricted to rare environmental events such as flooding that carry individuals long distances despite their physiological limitations. Because of this, low dispersal species are often at greater risk of total extinction and extinction vertices than their higher dispersing counterparts.

Accounting for dispersal in population genetics

Incorporating biological and physiological aspects of our study taxa is important for interpreting the evolutionary context of species. Dispersal ability is but one of many characteristics that can influence the ability of species to respond to selective pressures, and the context in which this natural selection occurs. Thus, understanding all aspects of an organism is important in building the full picture of their evolution and future prospects.

From mutation to speciation: the genetics of species formation

The genetics of speciation

Given the strong influence of genetic identity on the process and outcomes of the speciation process, it seems a natural connection to use genetic information to study speciation and species identities. There is a plethora of genetics-based tools we can use to investigate how speciation occurs (both the evolutionary processes and the external influences that drive it). One clear way to test whether two populations of a particular species are actually two different species is to investigate genes related to reproductive isolation: if the genetic differences demonstrate reproductive incompatibilities across the two populations, then there is strong evidence that they are separate species (at least under the Biological Species Concept; see Part One for why!). But this type of analysis requires several tools: 1) knowledge of the specific genes related to reproduction (e.g. formation of sperm and eggs, genital morphology, etc.), 2) the complete and annotated genome of the species (to be able to find and analyse the right genes properly) and 3) a good amount of data for the populations in question. As you can imagine, for people working on non-model species (i.e. ones that haven’t had the same history and detail of research as, say, humans and mice), this can be problematic. So, instead, we can use other genetic information to investigate and suggest patterns and processes related to the formation of new species.

Is reproductive isolation naturally selected for or just a consequence?

A fundamental aspect of studies of speciation is a “chicken or the egg”-type paradigm: does natural selection directly select for rapid reproductive isolation, preventing interbreeding; or as a secondary consequence of general adaptive differences, over a long history of evolution? This might be a confusing distinction, so we’ll dive into it a little more.

Of the two proposed models of speciation, the by-product of natural selection (the second model) has been the more favoured. Simply put, this expands on Darwin’s theory of evolution that describes two populations of a single species evolving independently of one another. As these become more and more different, both in physical (‘phenotype’) and genetic (‘genotype’) characteristics, there comes a turning point where they are so different that an individual from one population could not reasonably breed with an individual from the other to form a fertile offspring. This could be due to genetic incompatibilities (such as different chromosome numbers), physiological differences (such as changes in genital morphology), or behavioural conflicts (such as solitary vs. group living).

Certainly, this process makes sense, although it is debatable how fast reproductive isolation would occur in a given species (or whether it is predictable just based on the level of differentiation between two populations). Another model suggests that reproductive isolation actually might arise very quickly if natural selection favours maintaining particular combinations of traits together. This can happen if hybrids between two populations are not particularly well adapted (fit), causing natural selection to favour populations to breed within each group rather than across groups (leading to reproductive isolation). Typically, this is referred to as ‘reinforcement’ and predominantly involves isolating mechanisms that prevent individuals across populations from breeding in the first place (since this would be wasted energy and resources producing unfit offspring). The main difference between these two models is the sequence of events: do populations ecologically diverge, and because of that then become reproductively isolated, or do populations selectively breed (enforcing reproductive isolation) and thus then evolve independently?

Reinforcement figure.jpg
An example of reinforcement leading to speciation. A) We start with two populations of a single species (a red fish population and a green fish population), which can interbreed (the arrows). B) Because these two groups can breed, hybrids of the two populations can be formed. However, due to the poor combination of red and green fish genes within a hybrid, they are not overly fit (the red cross). C) Since natural selection doesn’t favour forming hybrids, populations then adapt to selectively breed only with similar fish, reducing the amount of interbreeding that occurs. D) With the two populations effectively isolated from one another, different adaptations specific to each population (spines in red fish, purple stripes in green fish) can evolve, causing them to further differentiate. E) At some point in the differentiation process, hybrids move from being just selectively unfit (as in B)) to entirely impossible, thus making the two populations formal species. In this example, evolution has directly selected against hybrids first, thus then allowing ecological differences to occur (as opposed to the other way around).

Reproductive isolation through DMIs

The reproductive incompatibility of two populations (thus making them species) is often intrinsically linked to the genetic make-up of those two species. Some conflicts in the genetics of Population 1 and Population 2 may mean that a hybrid having half Population 1 genes and half Population 2 genes will have serious fitness problems (such as sterility or developmental problems). Dramatic genetic differences, particularly a difference in the number of chromosomes between the two sources, is a significant component of reproductive isolation and is usually to blame for sterile hybrids such as ligers, zorse and mules.

However, subtler genetic differences can also have a strong effect: for example, the unique combination of Population 1 and Population 2 genes within a hybrid might interact with one another negatively and cause serious detrimental effects. These are referred to as “Dobzhansky-Müller Incompatibilities” (DMIs) and are expected to accumulate as the two populations become more genetically differentiated from one another. This can be a little complicated to imagine (and is based upon mathematical models), but the basis of the concept is that some combinations of gene variants have never, over evolutionary history, been tested together as the two populations diverge. Hybridisation of these two populations suddenly makes brand new combinations of genes, some of which may be have profound physiological impacts (including on reproduction).

DMI figure
An example of how Dobzhansky-Müller Incompatibilities arise, adapted from Coyne & Orr (2004). We start with an initial population (center top), which splits into two separate populations. In this example, we’ll look at how 5 genes (each letter = one gene) change over time in the separate populations, with the original allele of the gene (lowercase) occasionally mutating into a new allele (upper case). These mutations happen at random times and in random genes in each population (the red letters), such that the two become very different over time. After a while, these two populations might form hybrids; however, given the number of changes in each population, this hybrid might have some combinations of alleles that are ‘untested’ in their evolutionary history (see below). These untested combinations may cause the hybrid to be infertile or unviable, making the two populations isolated species.

DMI table
The list of ‘untested’ genetic combinations from the above example. This table shows the different combinations of each gene that could be made in a hybrid if these two populations interbred. The red cells indicate combinations that have never been ‘tested’ together; that is, at no point in the evolutionary history of these two populations were those two particular alleles together in the same individual. Green cells indicate ones that were together at some point, and thus are expected to be viable combinations (since the resultant populations are obviously alive and breeding).

How can we look at speciation in action?

We can study the process of speciation in the natural world without focussing on the ‘reproductive isolation’ element of species identity as well. For many species, we are unlikely to have the detail (such as an annotated genome and known functions of genes related to reproduction) required to study speciation at this level in any case. Instead, we might choose to focus on the different factors that are currently influencing the process of speciation, such as how the environmental, demographic or adaptive contexts of populations plays a role in the formation of new species. Many of these questions fall within the domain of phylogeography; particularly, how the historical environment has shaped the diversity of populations and species today.

Phylogeo of speciation
An example of the interplay between speciation and phylogeography, taken from Reyes-Velasco et al. (2018). They investigated the phylogeographic history of several different groups of species within the frog genus Ptychadena; in this figure, we can see how the different species (indicated by the colours and tree on the left) relate to the geography of their habitat (right).

A variety of different analytical techniques can be used to build a picture of the speciation process for closely related or incipient species. A good starting point for any speciation study is to look at how the different study populations are adapting; is there evidence that natural selection is pushing these populations towards different genotypes or ecological niches? If so, then this might be a precursor for speciation, and we can build on this inference with other complementary analyses.

For example, estimating divergence times between populations can help us suggest whether there has been sufficient time for speciation to occur (although this isn’t always clear cut). Additionally, we could estimate the levels of genetic hybridisation (‘introgression’) between two populations to suggest whether they are reasonably isolated and divergent enough to be considered functional species.

The future of speciation genomics

Although these can help answer some questions related to speciation, new tools are constantly needed to provide a clearer picture of the process. Understanding how and why new species are formed is a critical aspect of understanding the world’s biodiversity. How can we predict if a population will speciate at some point? What environmental factors are most important for driving the formation of new species? How stable are species identities, really? These questions (and many more) remain elusive for a wide variety of life on Earth.