An identity crisis: using genomics to determine species identities

This is the fourth (and final) part of the miniseries on the genetics and process of speciation. To start from Part One, click here.

In last week’s post, we looked at how we can use genetic tools to understand and study the process of speciation, and particularly the transition from populations to species along the speciation continuum. Following on from that, the question of “how many species do I have?” can be further examined using genetic data. Sometimes, it’s entirely necessary to look at this question using genetics (and genomics).

Cryptic species

A concept that I’ve mentioned briefly previously is that of ‘cryptic species’. These are species which are identifiable by their large genetic differences, but appear the same based on morphological, behavioural or ecological characteristics. Cryptic species often arise when a single species has become fragmented into several different populations which have been isolated for a long time from another. Although they may diverge genetically, this doesn’t necessarily always translate to changes in their morphology, ecology or behaviour, particularly if these are strongly selected for under similar environmental conditions. Thus, we need to use genetic methods to be able to detect and understand these species, as well as later classify and describe them.

Cryptic species fish
An example of cryptic species. All four fish in this figure are morphologically identical to one another, but they differ in their underlying genetic variation (indicated by the different colours of DNA). Thus, from looking at these fish alone we would not perceive any differences, but their genetic make-up might suggest that there are more than one species…
Cryptic species heatmap example
The level of genetic differentiation between the fish in the above example. The phylogenies on the left and top of the figure demonstrate the evolutionary relationships of these four fish. The matrix shows a heatmap of the level of differences between different pairwise comparisons of all four fish: red squares indicate zero genetic differences (such as when comparing a fish to itself; the middle diagonal) whilst yellow squares indicate increasingly higher levels of genetic differentiation (with bright yellow = all differences). By comparing the different fish together, we can see that Fish 1 and 2, and Fish 3 and 4, are relatively genetically similar to one another (red-deep orange). However, other comparisons show high level of genetic differences (e.g. 1 vs 3 and 1 vs 4). Based on this information, we might suggest that Fish 1 and 2 belong to one cryptic species (A) and Fish 3 and 4 belong to a second cryptic species (B).

Genetic tools to study species: the ‘Barcode of Life’

A classically employed method that uses DNA to detect and determine species is referred to as the ‘Barcode of Life’. This uses a very specific fragment of DNA from the mitochondria of the cell: the cytochrome c oxidase I gene, CO1. This gene is made of 648 base pairs and is found pretty well universally: this and the fact that CO1 evolves very slowly make it an ideal candidate for easily testing the identity of new species. Additionally, mitochondrial DNA tends to be a bit more resilient than its nuclear counterpart; thus, small or degraded tissue samples can still be sequenced for CO1, making it amenable to wildlife forensics cases. Generally, two sequences will be considered as belonging to different species if they are certain percentage different from one another.

Annotated mitogeome
The full (annotated) mitochondrial genome of humans, with the different genes within it labelled. The CO1 gene is labelled with the red arrow (sometimes also referred to as COX1) whilst blue arrows point to other genes often used in phylogenetic or taxonomic studies, depending on the group or species in question.

Despite the apparent benefits of CO1, there are of course a few drawbacks. Most of these revolve around the mitochondrial genome itself. Because mitochondria are passed on from mother to offspring (and not at all from the father), it reflects the genetic history of only one sex of the species. Secondly, the actual cut-off for species using CO1 barcoding is highly contentious and possibly not as universal as previously suggested. Levels of sequence divergence of CO1 between species that have been previously determined to be separate (through other means) have varied from anywhere between 2% to 12%. The actual translation of CO1 sequence divergence and species identity is not all that clear.

Gene tree – species tree incongruences

One particularly confounding aspect of defining species based on a single gene, and with using phylogenetic-based methods, is that the history of that gene might not actually be reflective of the history of the species. This can be a little confusing to think about but essentially leads to what we call “gene tree – species tree incongruence”. Different evolutionary events cause different effects on the underlying genetic diversity of a species (or group of species): while these may be predictable from the genetic sequence, different parts of the genome might not be as equally affected by the same exact process.

A classic example of this is hybridisation. If we have two initial species, which then hybridise with one another, we expect our resultant hybrids to be approximately made of 50% Species A DNA and 50% Species B DNA (if this is the first generation of hybrids formed; it gets a little more complicated further down the track). This means that, within the DNA sequence of the hybrid, 50% of it will reflect the history of Species A and the other 50% will reflect the history of Species B, which could differ dramatically. If we randomly sample a single gene in the hybrid, we will have no idea if that gene belongs to the genealogy of Species A or Species B, and thus we might make incorrect inferences about the history of the hybrid species.

Gene tree incongruence figure
A diagram of gene tree – species tree incongruence. Each individual coloured line represents a single gene as we trace it back through time; these are mostly bound within the limits of species divergences (the black borders). For many genes (such as the blue ones), the genes resemble the pattern of species divergences very well, albeit with some minor differences in how long ago the splits happened (at the top of the branches). However, the red genes contrast with this pattern, with clear movement across species (from and into B): this represents genes that have been transferred by hybridisation. The green line represents a gene affected by what we call incomplete lineage sorting; that is, we cannot trace it back far enough to determine exactly how/when it initially diverged and so there are still two separate green lines at the very top of the figure. You can think of each line as a separate phylogenetic tree, with the overarching species tree as the average pattern of all of the genes.

There are a number of other processes that could similarly alter our interpretations of evolutionary history based on analysing the genetic make-up of the species. The best way to handle this is simply to sample more genes: this way, the effect of variation of evolutionary history in individual genes is likely to be overpowered by the average over the entire gene pool. We interpret this as a set of individual gene trees contained within a species tree: although one gene might vary from another, the overall picture is clearer when considering all genes together.

Species delimitation

In earlier posts on The G-CAT, I’ve discussed the biogeographical patterns unveiled by my Honours research. Another key component of that paper involved using statistical modelling to determine whether cryptic species were present within the pygmy perches. I didn’t exactly elaborate on that in that section (mostly for simplicity), but this type of analysis is referred to as ‘species delimitation’. To try and simplify complicated analyses, species delimitation methods evaluate possible numbers and combinations of species within a particular dataset and provides a statistical value for which configuration of species is most supported. One program that employs species delimitation is Bayesian Phylogenetics and Phylogeography (BPP): to do this, it uses a plethora of information from the genetics of the individuals within the dataset. These include how long ago the different populations/species separated; which populations/species are most related to one another; and a pre-set minimum number of species (BPP will try to combine these in estimations, but not split them due to computational restraints). This all sounds very complex (and to a degree it is), but this allows the program to give you a statistical value for what is a species and what isn’t based on the genetics and statistical modelling.

Vittata cryptic species
The cryptic species of pygmy perches identified within my research paper. This represents part of the main phylogenetic tree result, with the estimates of divergence times from other analyses included. The pictures indicate the physiology of the different ‘species’: Nannoperca pygmaea is morphologically different to the other species of Nannoperca vittata. Species delimitation analysis suggested all four of these were genetically independent species; at the very least, it is clear that there must be at least 2 species of Nannoperca vittata since is more related to N. pygmaea than to other N. vittata species. Photo credits: N. vittata = Chris Lamin; N. pygmaea = David Morgan.

The end result of a BPP run is usually reported as a species tree (e.g. a phylogenetic tree describing species relationships) and statistical support for the delimitation of species (0-1 for each species). Because of the way the statistical component of BPP works, it has been found to give extremely high support for species identities. This has been criticised as BPP can, at time, provide high statistical support for genetically isolated lineages (i.e. divergent populations) which are not actually species.

Improving species identities with integrative taxonomy

Due to this particular drawback, and the often complex nature of species identity, using solely genetic information such as species delimitation to define species is extremely rare. Instead, we use a combination of different analytical techniques which can include genetic-based evaluations to more robustly assign and describe species. In my own paper example, we suggested that up to three ‘species’ of N. vittata that were determined as cryptic species by BPP could potentially exist pending on further analyses. We did not describe or name any of the species, as this would require a deeper delve into the exact nature and identity of these species.

As genetic data and analytical techniques improve into the future, it seems likely that our ability to detect and determine species boundaries will also improve. However, the additional supported provided by alternative aspects such as ecology, behaviour and morphology will undoubtedly be useful in the progress of taxonomy.

The direction of evolution: divergence vs. convergence

Direction of evolution

We’ve talked previously on The G-CAT about how the genetic underpinning of certain evolutionary traits can change in different directions depending on the selective pressure it is under. Particularly, we can see how the frequency of different alleles might change in one direction or another, or stabilise somewhere in the middle, depending on its encoded trait. But thinking bigger picture than just the genetics of one trait, we can actually see that evolution as an entire process works rather similarly.

Divergent evolution

The classic view of the direction of evolution is based on divergent evolution. This is simply the idea that a particular species possess some ancestral trait. The species (or population) then splits into two (for one reason or another), and each one of these resultant species and populations evolves in a different way to the other. Over time, this means that their traits are changing in different directions, but ultimately originate from the same ancestral source.

Evidence for divergent evolution is rife throughout nature, and is a fundamental component of all of our understanding of evolution. Divergent evolution means that, by comparing similar traits in two species (called homologous traits), we can trace back species histories to common ancestors. Some impressive examples of this exist in nature, such as the number of bones in most mammalian species. Humans have the same number of neck bones as giraffes; thus, we can suggest that the ancestor of both species (and all mammals) probably had a similar number of neck bones. It’s just that the giraffe lineage evolved longer bones whereas other lineages did not.

Homology figure
A diagrammatic example of homologous structures in ‘hand’ bones. The coloured bones demonstrate how the same original bone structures have diverged into different forms. Source: BiologyWise.

Convergent evolution

But of course, evolution never works as simply as you want it to, and sometimes we can get the direct opposite pattern. This is called convergent evolution, and occurs when two completely different species independently evolve very similar (sometimes practically identical) traits. This is often caused by a limitation of the environment; some extreme demand of the environment requires a particular physiological solution, and thus all species must develop that trait in order to survive. An example of this would be the physiology of carnivorous marsupials like Tasmanian devils or thylacines: despite being in another Class, their body shapes closely resemble something more canid. Likely, the carnivorous diet places some constraints on physiology, particularly jaw structure and strength.

Convergent evol intelligence
A surprising example of convergent evolution is cognitive ability in apes and some bird groups (e.g. corvids). There’s plenty of other animal groups more related to each of these that don’t demonstrate the same level of cognitive reasoning (based on the traits listed in the centre): thus, we can conclude that cognition has evolved twice in very, very different lineages. Source: Emery & Clayton, 2004.

A more dramatic (and potentially obvious) example of convergent evolution would be wings and the power of flight. Despite the fact that butterflies, bees, birds and bats all have wings and can fly, most of them are pretty unrelated to one another. It seems much more likely that flight evolved independently multiple times, rather than the other 99% of species that shared the same ancestor lost the capacity of flight.

Parallel evolution

Sometimes convergent evolution can work between two species that are pretty closely related, but still evolved independently of one another. This is distinguished from other categories of evolution as parallel evolution: the main difference is that while both species may have shared the same start and end point, evolution has acted on each one independent of the other. This can make it very difficult to diagnose from convergent evolution, and is usually determined by the exact history of the trait in question.

Parallel evolution is an interesting field of research for a few reasons. Firstly, it provides a scenario in which we can more rigorously test expectations and outcomes of evolution in a particular environment. For example, if we find traits that are parallel in a whole bunch of fish species in a particular region, we can start to look at how that particular environment drives evolution across all fish species, as opposed to one species case studies.

Marsupial handedness.jpg
Here’s another weird example; different populations of marsupials (particularly kangaroos and wallabies) show preferential handedness depending on where the population is. That is, different populations of different species of marsupials shows parallel evolution of handedness, since they’re related to one another but have evolved it independently of the other species. Source: Giljov et al. (2015).

Following from that logic, it is then important to question the mechanisms of parallelism. From a genetic point of view, do these various species use the same genes (and genetic variants) to produce the same identical trait? Or are there many solutions to the selective question in nature? While these questions are rather complicated, and there has been plenty of evidence both for and against parallel genetic underpinning of parallel traits, it seems surprisingly often that many different genetic combinations can be used to get the same result. This gives interesting insight into how complex genetic coding of traits can be, and how creative and diverse evolution can be in the real world.

Where is evolution going?

Cat phylogeny
An example of all three types of evolutionary trajectory in a single phylogeny of cats (you know how we do it here at The G-CAT). This phylogeny consists of two distinct genera; one with one species (P. aliquam) and another of three species (the red box indicates their distance). Our species have three main physical traits: coat colour, ear tufts and tail shape. At the ancestral nodes of the tree, we can see what the ancestor of these species looked like for these three traits. Each of these traits has undergone a different type of evolution. The tufts on the ears are the result of divergent evolution, since F. tuftus evolved the trait differently to its nearest relative, F. griseo. Contrastingly, the orange coat colour of F. tuftus and P. aliquam are the result of convergent evolution: neither of these species are very closely related (remembering the red box) and evolved orange coats independently of one another (since their ancestors are grey). And finally, the fluffy tails of F. hispida and F. griseo can be considered parallel evolution, since they’re similar evolutionarily (same genus) but still each evolved tail fluff independently (not in the ancestor). This example is a little convoluted, but if you trace the history of each trait in the phylogeny you can more easily see these different patterns.

So, where is evolution going for nature? Well, the answer is probably all over the place, but steered by the current environmental circumstances. Predicting the evolutionary impacts of particular environmental change (e.g. climate change) is exceedingly difficult but a critical component of understanding the process of evolution and the future of species. Evolution continually surprises us with creative solution to complex problems and I have no doubt new mysteries will continue to be thrown at us as we delve deeper.

All the world in the palm of your hand: whole genome sequencing for evolution and conservation

Building an entire genome

If bigger is better, then biggest is best. Having the genome of a particular study species fully sequenced allows us to potentially look at all of the genetic variation in the entire gene pool: but how do we sequence the entirety of the genome? And what are the benefits of having a whole genome to refer to?

Whole genome assembly
A very, very simplified overview of whole genome sequencing. Similar to other genomic technologies, we start by fragmenting the genome into much smaller, easier to sequence parts (reads). We then use a computer algorithm which pieces these reads together into a consecutive sequence based on overlapping DNA sequence (like building a chain out of Lego blocks). From this assembled genome, we can then attach annotations using information from other species’ genomes or genetic studies, which can correlate a particular sequence to a gene, a function of that gene, and the resultant protein from these gene (although not always are all of these aspects included).

Well, assembling the whole genome of an organism for the first time is a very tricky process. It involves taking DNA sequence from only a few individuals, breaking them down into smaller fragments and multiplying these fragments into the billions (moreorless the same process used in other genomics technologies: the real difference is that we need the full breadth of the genome so that we don’t miss any spaces). From these fragments, we use a complex computer algorithm which builds up a consensus sequence like a Lego tower; by finding parts of sequences which overlap, the software figures out which pieces connect to one another. Hopefully, we eventually end up with one very long continuous sequence; the genome! Sometimes, we might end with a few very large blocks (called contigs), but this is also useful for analyses (correlated with how many/big blocks there are). With this full genome, we use information from other more completed genomes (such as those from model species like humans, mice or even worms) to figure out which sections of the genome relate to specific genes. We can then annotate these sections by labelling them as clear genes, complete with start and end point, and attach a particular physical function of that gene.

The benefits of whole genomes

Having an entire genome as a reference is an extremely helpful tool in conservation and evolutionary studies. The first, and perhaps most obvious benefit, is the sheer scale of the data we can use. By having the entirety of the genome available, we can use potentially billions of base pairs of sequence in our genetic analyses (for reference, the human genome is >3 billion base pairs long). Even if we don’t sequence the full genome for all of our samples, having a reference genome as basis for assembly our reduced datasets significantly improves the quantity and quality of sequences we can use.

Another very important benefit is the ability to prescribe function in our studies. Many of our processes for obtaining data, even for genomic technologies, use random and anonymous fragments of the genome. Although this is a cost-effective way to obtain a very large amount of data, it unfortunately means that we often have no idea which part of the genome our sequences came from. This means that we don’t know which sequences relate to specific genes, and even if we did we would have no idea what those genes are or do! But with an annotated genome, we can take even our fragmented sequence and check it against the genome and find out what genes are present.

Understanding adaptation

Based on that, it seems pretty obvious about exactly how having an annotated genome can help us in studies of adaptation. Knowing the functional aspect of our genetic data allows us to more directly determine how evolution is happening in nature; instead of only being able to say that two species are evolving differently from one another, for example, we can explicitly look at how they are evolving. Is one evolving tolerance to hotter temperatures? Are they evolving different genes to handle different diets? Are they evolving in response to an external influence, like a viral outbreak or changing climate? What are the physiological consequences of these changes? These questions are critical in understanding past and future evolution, and full genome analysis allows us to delve into them much deeper.

Manhattan plot example
A (slightly edited) figure of full genome comparisons between domestic dogs and wild wolves by Axelsson et al. (2013), with the aim of understanding the evolutionary changes associated with domestication. For avid readers, this figure probably looks familiar. This figure compares the genetic differentiation across the entire genome between dogs and wolves, with some sections of the genome (circled) showing clear differences. As there is an annotated dog genome, the authors then delved into these genes to understand the functional differences between the two. By comparing their genetic differences to functional genes, the authors can more explicitly suggest mechanisms or changes associated with the domestication process (such as adaptation to a starch-heavy and human-influenced diet).

 

 

This includes allowing us to better understand how adaptation actually works in nature. As we’ve discussed before, more traditional studies often assumed that single, or very few, genes were responsible for allowing a species to adapt and change, and that these genes had very strong effects on their physiology. But what we see far more often is polygenic adaptation; small changes in a very large number of genes which, combined together, allow the species to adapt and evolve. By having the entirety of the genome available, we are much more likely to capture all of the genes that are under natural selection in a particular population or species, painting a clearer picture of their evolutionary trajectory.

Understanding demography

The much larger dataset of full genomes is also important for understanding the non-adaptive parts of evolution; the demographic history. Given that selectively neutral impacts (e.g. reductions in population size) are likely to impact all of the genes in the gene pool somewhat equally, having a full genome allows us to more accurately infer the demographic state and historical patterns of species.

For both adaptive and non-adaptive variation, it is also important to consider what we call linkage disequilibrium. Genetic sequences that are physically close to each other in the genome will often be inherited together due to the imprecision of recombination (a fairly technical process, so I won’t delve into this): what this can mean is that if a gene is under very strong selection, then sequences around this gene will also look like they’re under selection too. This can give falsely positive adaptive genes (i.e. sequences that look like genes under selection but are just linked to a gene that is) or can interfere with demographic analyses (since they often assume no selection, or linkage to selection, on the sequences used). With a whole genome, we can actually estimate how far away a base pair has to be before it’s not linked anymore; we call these linkage blocks, and they’re very useful additions to analyses.

Linkage_example
An example of linkage as a process. We start with a particular sequence (top); during recombination, this sequence may randomly break and rearrange into different parts. In this example, I’ve simulated four different ‘breaks’ (dashed coloured lines) due to recombination. Each of these breaks leads to two separate blocks of fragments; for example, the break at the blue line results in the second two sequence blocks (middle). If we focus on one target base pair in the sequence (golden A), then we can see in some fragments it remains with certain bases, but sometimes it gets separated by the break. If we compare how often the golden A is in the same block (i.e. is co-inherited) as each of the other bases, across all 4 breaks, then we see that the bases that are closest to it (the golden A is represented by the golden bar) are almost always in the same block. This makes sense: the further away a base is from our target, the more likely that there will be a break between it. This is shown in the frequency distributions at the bottom: the left figure shows the actual frequencies of co-inheritance (i.e. linkage) using the top example and those 4 breaks. The right figure shows a more realistic depiction of how linkage looks in the genome; it rapidly decays as we move away from the target (although the width and rate of this can vary).

Improving conservation management

In a similar fashion to demography, full genome datasets can improve our estimates of relatedness and pedigrees in captive breeding programs. The massive scale of whole genomes allows us to more easily trace the genealogical history of individuals, allowing us to assign parents more accurately. This also helps with our estimations of genetic relatedness, arguably the most critical aspect of genetic-based breeding programs. This is particularly helpful for species with tricky mating patterns, such as polyamory, brood spawning or difficult to track organisms.

Pedigrees
An example of how whole genomes can improve our estimation of pedigrees. Say we have a random individual (star), and we want to know how they fit into a particular family tree (pedigree). With only a few genes, we might struggle to pick where in the family it fits based on limited genetic information. With a larger genetic dataset (such as reduced-representation genomics), we might be able to cross off a few potential candidate spots but still have some trouble with a few places (due to unknown parents, polygamy or issues with genetic analysis). With whole genomes, we should be able to much better clarify the whole pedigree and find exactly where our star individual fits in the tree (red circle). It is thanks to whole genomes, we can do those ancestry analyses that have gone viral lately!

The way forwards

While many non-model species are still lacking in the available genomic information, whole genomes are progressively being sequenced for more and more species. As this astronomical dataset grows, our ability to investigate, discover and test theories about evolution, natural selection and conservation will also improve. Many projects already exist which aim specifically to increase the number of whole genomes available for certain taxonomic groups such as birds and bats: these will no doubt prove to be invaluable resources for future studies.

Why we should always pander to diversity

Diversity in the natural world

‘Diversity’ is a term that gets used a lot these days, albeit usually in reference to social changes and structures. However, diversity is not merely a human construct and reflects an extremely important aspect of the natural world at a variety of levels. From the smallest genes to the biggest ecosystems, diversity is a trait that confers a massive range of benefits to individuals, populations, species and even the entire globe. Let’s dissect this diversity down at different scales and see how beneficial it can be.

Hierarchy of diversity
The generalised hierarchy at life, with diversity being an important component of each tier. At the smallest tier, genes underpin all life. The collection of genetic diversity is often summarised into a population (as a single cohesive genetic unit). Several populations can be pooled together into a single (usually) cohesive speciesDifferent species are then components of a larger community (which in turn are components of a broader ecosystem).

Genetic diversity

At the smallest scale in the hierarchy of genetic differentiation, we have the genes themselves. It is a well-established concept that having a diversity of genetic variants (alleles) within a population or species is critical to their future adaptation, evolution and persistance. This is because different alleles will have different benefits (or costs) depending on the environmental pressure that influences them; natural selection might favour one allele over another at one time, but a different one as the pressure changes. Having a higher number of alleles within the population or species means that there is a greater chance at least a few individuals will possess an adaptive gene with the changing environment (which we know can be quite rapid and very, very strong). The diversity serves as a ‘buffer’ against extinction; evolution by natural selection functions best when there are many options to choose from.

Without this diversity, species run the risk of having no adaptive genes at the ready to deal with a selective pressure. Either a new adaptive gene must mutate (or come about in other ways, such as through gene flow from another population or species) or the population/species will suffer and potentially go extinct. As strong selection causes the species to dwindle, it enters what is referred to as the ‘extinction vortex’. Without genetic diversity, they can’t adapt: thus, more individuals die off, causing more genetic diversity to be lost from the population. This pattern is a vicious cycle which can inevitably destroy species (without serious intervention).

Extinction vortex
A very dramatic representation of the extinction vortex.

For this reason, captive breeding programs aim to maintain as much of the genetic diversity of the original population as possible. This reduces the probability of entering a downward extinction spiral from inbreeding depression and helps to maintain populations into the future (both the captive one and the wild population when we reintroduce individuals into the wild).

“Population”  diversity

Because genetic diversity is critically important for species survival, we must also try to preserve the diversity of the entire gene pool of a species. This means conserving highly genetically differentiated populations within a species as a priority, as they may be the only ones that possess the necessary adaptive genes to save the rest of the species. This adaptive genetic variation can then be introduced into other populations in genetic rescue programs and serve as a means to semi-naturally allow the species to evolve. Evolutionarily-significant units (ESUs) are one measure of the invaluable nature of genetically unique populations.

Although many more traditional conservationists strongly believe that ESUs should be managed entirely independently of one another (to preserve their evolutionary ‘pedigree’ and prevent the risk of outbreeding depression), it has been suggested that the benefit of genetic rescue in many cases significantly outweighs this risk of outbreeding depression. For some species, this really is an act of rescue: they are at the edge of extinction, and if we do nothing we condemn them to die out.

Introducing genetic material across populations (or even species!) can generate new functional genes that allow the recipient species to adapt to selective pressures. This might sound very strange, and could be extremely rare, but examples of adaptive genetic material in one species originating from another species through hybridisation do exist in nature. For example, the black coat of wolves is a highly adaptive trait in some populations and is encoded for by the Melanocortin 1 receptor (Mc1r) gene. However, the specific mutation in Mc1r gene that generates the black coat colour actually first originated in domestic dogs; when wild wolves and domestic dogs interbred, this mutation was transferred into the wolf gene pool. Natural selection strongly favoured this new variant, and it very rapidly underwent strong positive selection. Thus, the adaptiveness of black wolves is thanks to a domestic dog mutation!

Species diversity

At a higher level of the hierarchy, the diversity of species within a particular community or ecosystem has been shown to be important for the health and stability of said community. Every species, however small or seemingly unimpressive, plays a role in the greater ecosystem balance, through interactions with other species (e.g. as predator, as prey, as competitor) and the abiotic environment. While some species are known to have very strong impacts on the immediate ecosystem (often dubbed ‘keystone species’, such as apex predators), all species have some influence on the world around them (we’re especially good at it).

Species interactions flowchart

The overall health and stability of an ecosystem, as well as the benefits it can provide to all living things (including humans) is largely determined by the diversity of species. For example, ‘habitat engineers’ are types of species that, by altering the physical environment around them (such as to build a home), directly provide new habitat for other species. They are a fundamental underpinning of many incredibly vibrant ecosystems; think of what a reef system would look like if there were no corals in it. There’d be no anemones growing colourfully; no fish to live in them; no sharks to feed on these non-existent fish. This is just one example of a complex ecosystem that truly relies on its inhabiting species to function.

Ecosystem jenga
Much like Jenga, taking out one block (a species) could cause the entire stack (the ecosystem) to collapse in on itself. Even if it stands up, however, the system will still be weaker without the full diversity to support it.

Protecting our diversity

Diversity is not just a social construct and is an important phenomenon in nature, at a variety of different levels. Preserving the full diversity of life, from genetic diversity within populations and species to full species diversity within ecosystems, is critical to maintaining healthy and robust natural systems. The more diversity we have at each level of this hierarchy, the greater robustness and security we will have in the future.

The many genetic faces of adaptation

The transition from genotype to phenotype

While evolutionary genetics studies often focus on the underlying genetic architecture of species and populations to understand their evolution, we know that natural selection acts directly on physical characteristics. We call these the phenotype; by studying changes in the genes that determine these traits (the genotype), we can take a nuanced approach at studying adaptation. However, our ability to look at genetic changes and relate these to a clear phenotypic trait, and how and why that trait is under natural selection, can be a difficult task.

One gene for one trait

The simplest (and most widely used) models of understanding the genetic basis of adaptation assume that a single genotype codes for a single phenotypic trait. This means that changes in a single gene (such as outliers that we have identified in our analyses) create changes in a particular physical trait that is under a selective pressure in the environment. This is a useful model because it is statistically tractable to be able to identify few specific genes of very large effect within our genomic datasets and directly relate these to a trait: adding more complexity exponentially increases the difficulty in detecting patterns (at both the genotypic and phenotypic level).

Single locus figure
An example of a single gene coding for a single phenotypic trait. In this example, the different combination of alleles of the one gene determines the colour of the cat.

Many genes for one trait: polygenic adaptation

Unfortunately, nature is not always convenient and recent findings suggest that the overwhelming majority of the genetics of adaptation operate under what is called ‘polygenic adaptation’. As the name suggestions, under this scenario changes (even very small ones) in many different genes combine together to have a large effect on a particular phenotypic trait. Given the often very small magnitude of the genetic changes, it can be extremely difficult to separate adaptive changes in genes from neutral changes due to genetic drift. Likewise, trying to understand how these different genes all combine into a single functional trait is almost impossible, especially for non-model species.

Polygenic adaptation is often seen for traits which are clearly heritable, but don’t show a single underlying gene responsible. Previously, we’ve covered this with the heritability of height: this is one of many examples of ‘quantitative trait loci’ (QTLs). Changes in one QTL (a single gene) causes a small quantitative change in a particular trait; the combined effect of different QTLs together can ‘add up’ (or counteract one another) to result in the final phenotype value.

Height QTL
An example of polygenic quantitative trait loci. In this example, height is partially coded for by a total of ten different genes: the dominant form of each gene (Capitals, green) provides more height whereas the recessive form (lowercase, red) doesn’t. The cumulative total of these components determines how tall the person is: the person on the far right was very unlucky and got 0/10 height bonuses and so is the shortest. Progressively from left to right, some genes are contributing to the taller height of the people, with the far right person standing tall with the ultimate 10/10 pro-height genes. For reference, height is actually likely to be coded for by thousands of genes, not 10.

The mechanisms which underlie polygenic adaptation can be more complex than simple addition, too. Individual genes might cause phenotypic changes which interact with other phenotypes (and their underlying genotypes) to create a network of changes. We call these interactions ‘epistasis’, where changes in one gene can cause a flow-on effect of changes in other genes based on how their resultant phenotypes interact. We can see this in metabolic pathways: given that a series of proteins are often used in succession within pathways, a change in any single protein in the process could affect every other protein in the pathway. Of course, knowing the exact proteins coded for every gene, including their physical structure, and how each of those proteins could interact with other proteins is an immense task. Similar to QTLs, this is usually limited to model species which have a large history of research on these specific areas to back up the study. However, some molecular ecology studies are starting to dive into this area by identifying pathways that are under selection instead of individual genes, to give a broader picture of the overall traits that are underlying adaptation.

Labrador epistasis figure
My favourite example of epistasis on coat colour in labradors. Two genes together determine the colour of the coat, with strong interactions between them. The first gene (E/e) determines whether or not the underlying coat gene (B/b) is masked or not: two recessive alleles of the first gene (ee) completely blocks Gene 2 and causes the coat to become golden regardless of the second gene genotype (much like my beloved late childhood pet pictured, Sunny). If the first gene has at least one dominant allele, then the second gene is allowed to express itself. Possessing a dominant allele (BB or Bb) leads to a black lab; possessing two recessive alleles (bb) makes a choc lab!
Labrador epistasis table
The possible combinations of genotypes for the two above genes and the resultant coat colour (indicated by the box colour).

One gene for many traits: pleiotropy and differential gene expression

In contrast to polygenic traits, changes in a single gene can also potentially alter multiple phenotypic traits simultaneously. This is referred to as ‘pleiotropy’ and can happen if a gene has multiple different functions within an organism; one particular protein might be a component of several different systems depending on where it is found or how it is arranged. A clear example of pleiotropy is in albino animals: the most common form of albinism is the result of possessing two recessive alleles of a single gene (TYR). The result of this is the absence of the enzyme tyrosinase in the organism, a critical component in the production of melanin. The flow-on phenotypic effects from the recessive gene most obviously cause a lack of pigmentation of the skin (whitening) and eyes (which appear pink), but also other physiological changes such as light sensitivity or total blindness (due to changes in the iris). Albinism has even been attributed to behavioural changes in wild field mice.

Albinism pleiotropy
A very simplified diagram of how one genotype (the albino version of the TYR gene) can lead to a large number of phenotypic changes via pleiotropy (although many are naturally physiologically connected).

Because pleiotropic genes code for several different phenotypic traits, natural selection can be a little more complicated. If some resultant traits are selected against, but others are selected for, it can be difficult for evolution to ‘resolve’ the balance between the two. The overall fitness of the gene is thus dependent on the balance of positive and negative fitness of the different traits, which will determine whether the gene is positively or negatively selected (much like a cost-benefit scenario). Alternatively, some traits which are selectively neutral (i.e. don’t directly provide fitness benefits) may be indirectly selected for if another phenotype of the same underlying gene is selected for.

Multiple phenotypes from a single ‘gene’ can also arise by alternate splicing: when a gene is transcribed from the DNA sequence into the protein, the non-coding intron sections within the gene are removed. However, exactly which introns are removed and how the different coding exons are arranged in the final protein sequence can give rise to multiple different protein structures, each with potentially different functions. Thus, a single overarching gene can lead to many different functional proteins. The role of alternate splicing in adaptation and evolution is a rarely explored area of research and its importance is relatively unknown.

Non-genes for traits: epigenetics

This gets more complicated if we consider ‘non-genetic’ aspects underlying the phenotype in what we call ‘epigenetics’. The phrase literally translates as ‘on top of genes’ and refers to chemical attachments to the DNA which control the expression of genes by allowing or resisting the transcription process. Epigenetics is a relatively new area of research, although studies have started to delve into the role of epigenetic changes in facilitating adaptation and evolution. Although epigenetics is still a relatively new research topic, future research into the relationship between epigenetic changes and adaptive potential might provide more detailed insight into how adaptation occurs in the wild (and might provide a mechanism for adaptation for species with low genetic diversity)!

 

The different interactions between genotypes, phenotypes and fitness, as well as their complex potential outcomes, inevitably complicates any study of evolution. However, these are important aspects of the adaptation process and to discard them as irrelevant will not doubt reduce our ability to examine and determine evolutionary processes in the wild.

Evolution and the space-time continuum

Evolution travelling in time

As I’ve mentioned a few times before, evolution is a constant force that changes and flows over time. While sometimes it’s more convenient to think of evolution as a series of rather discrete events (a species pops up here, a population separates here, etc.), it’s really a more continual process. The context and strength of evolutionary forces, such as natural selection, changes as species and the environment they inhabit also changes. This is important to remember in evolutionary studies because although we might think of more recent and immediate causes of the evolutionary changes we see, they might actually reflect much more historic patterns. For example, extremely low contemporary levels of genetic diversity in cheetah is likely largely due to a severe reduction in their numbers during the last ice age, ~12 thousand years ago (that’s not to say that modern human issues haven’t also been seriously detrimental to them). Similarly, we can see how the low genetic diversity of a small population colonise a new area can have long term effects on their genetic variation: this is called ‘founder effect’. Because of this, we often have to consider the temporal aspect of a species’ evolution.

Founder effect diagram
An example of founder effect. Each circle represents a single organism; the different colours are an indicator of how much genetic diversity that individual possesses (more colours = more variation). We start with a single population; one (A) or two (B) individuals go on a vacation and decide to stay on a new island. Even after the population has become established and grows over time, it takes a long time for new diversity to arise. This is because of the small original population size and genetic diversity; this is called founder effect. The more genetic diversity in the settled population (e.g. vs A), the faster new diversity arises and the weaker the founder effect.

Evolution travelling across space

If the environmental context of species and populations are also important for determining the evolutionary pathways of organisms, then we must also consider the spatial context. Because of this, we also need to look at where evolution is happening in the world; what kinds of geographic, climatic, hydrological or geological patterns are shaping and influencing the evolution of species? These patterns can influence both neutral or adaptive processes by shaping exactly how populations or species exist in nature; how connected they are, how many populations they can sustain, how large those populations can sustainably become, and what kinds of selective pressures those populations are under.

Allopatry diagram
An example of how the environment (in this case, geology) can have both neutral and adaptive effects. Let’s say we start with one big population of cats (N = 9; A), which is distributed over a single large area (the green box). However, a sudden geological event causes a mountain range to uplift, splitting the population in two (B). Because of the reduced population size and the (likely) randomness of which individuals are on each side, we expect some impact of genetic drift. Thus, this is the neutral influence. Over time, these two separated regions might change climatically (C), with one becoming much more arid and dry (right) and the other more wet and shady (left). Because of the difference of the selective environment, the two populations might adapt differently. This is the adaptive influence. 

Evolution along the space-time continuum

Given that the environment also changes over time (and can be very rapid, and we’ve seen recently), the interaction of the spatial and temporal aspects of evolution are critical in understanding the true evolutionary history of species. As we know, the selective environment is what determines what is, and isn’t, adaptive (or maladaptive), so we can easily imagine how a change in the environment could push changes in species. Even from a neutral perspective, geography is important to consider since it can directly determine which populations are or aren’t connected, how many populations there are in total or how big populations can sustainably get. It’s always important to consider how evolution travels along the space-time continuum.

Genetics TARDIS
“Postgraduate Student Who” doesn’t quite have the same ring to it, unfortunately.

Phylogeography

The field of evolutionary science most concerned with these two factors and how the influence evolution is known as ‘phylogeography’, which I’ve briefly mentioned in previous posts. In essence, phylogeographers are interested in how the general environment (e.g. geology, hydrology, climate, etc) have influenced the distribution of genealogical lineages. That’s a bit of a mouthful and seems a bit complicated, by the genealogical part is important; phylogeography has a keen basis in evolutionary genetics theory and analysis, and explicitly uses genetic data to test patterns of historic evolution. Simply testing the association between broad species or populations, without the genetic background, and their environment, falls under the umbrella field of ‘biogeography’. Semantics, but important.

Birds phylogeo
Some example phylogeographic models created by Zamudio et al. (2016). For each model, there’s a demonstrated relationship between genealogical lineages (left) and the geographic patterns (right), with the colours of the birds indicating some trait (let’s pretend they’re actually super colourful, as birds are). As you can see, depending on which model you look at, you will see a different evolutionary pattern; for example, model shows specific lineages that are geographically isolated from one another each evolved their own colour. This contrasts with in that each colour appears to have evolved once in each region based on the genetic history.

For phylogeography, the genetic history of populations or species gives the more accurate overview of their history; it allows us to test when populations or species became separated, which were most closely related, and whether patterns are similar or different across other taxonomic groups. Predominantly, phylogeography is based on neutral genetic variation, as using adaptive variation can confound the patterns we are testing. Additionally, since neutral variation changes over time in a generally predictable, mathematical format (see this post to see what I mean), we can make testable models of various phylogeographic patterns and see how well our genetic data makes sense under each model. For example, we could make a couple different models of how many historic populations there were and see which one makes the most sense for our data (with a statistical basis, of course). This wouldn’t work with genes under selection since they (by their nature) wouldn’t fit a standard ‘neutral’ model.

Coalescent
If it looks mathematically complicated, it’s because it is. This is an example of the coalescent from Brito & Edwards, 2008: a method that maps genes back in time (the different lines) to see where the different variants meet at a common ancestor. These genes are nested within the history of the species as a whole (the ‘tubes’), with many different variables accounted for in the model.

That said, there are plenty of interesting scientific questions within phylogeography that look at exploring the adaptive variation of historic populations or species and how this has influenced their evolution. Although this can’t inherently be built into the same models as the neutral patterns, looking at candidate genes that we think are important for evolution and seeing how their distributions and patterns relate to the overall phylogeographic history of the species is one way of investigating historic adaptive evolution. For example, we might track changes in adaptive genes by seeing which populations have which variants of the gene and referring to our phylogeographic history to see how and when these variants arose. This can help us understand how phylogeographic patterns have influenced the adaptive evolution of different populations or species, or inversely, how adaptive traits might have influenced the geographic distribution of species or populations.

Where did you come from and where will you go?

Phylogeographic studies can tell us a lot about the history of a species, and particularly how that relates to the history of the Earth. All organisms share an intimate relationship with their environment, both over time and space, and keeping this in mind is key for understanding the true evolutionary history of life on Earth.

 

“Who Do You Think You Are?”: studying the evolutionary history of species

The constancy of evolution

Evolution is a constant, endless force which seeks to push and shape species based on the context of their environment: sometimes rapidly, sometimes much more gradually. Although we often think of discrete points of evolution (when one species becomes two, when a particular trait evolves), it is nevertheless a continual force that influences changes in species. These changes are often difficult to ‘unevolve’ and have a certain ‘evolutionary inertia’ to them; because of these factors, it’s often critical to understand how a history of evolution has generated the organisms we see today.

What do I mean when I say evolutionary history? Well, the term is fairly diverse and can relate to the evolution of particular traits or types of traits, or the genetic variation and changes related to these changes. The types of questions and points of interest of evolutionary history can depend at which end of the timescale we look at: recent evolutionary histories, and the genetics related to them, will tell us different information to very ancient evolutionary histories. Let’s hop into our symbolic DeLorean and take a look back in time, shall we?

Labelled_evolhistory
A timeslice of evolutionary history (a pseudo-phylogenetic tree, I guess?), going from more recent history (bottom left) to deeper history (top right). Each region denoted in the tree represents the generally area of focus for each of the following blog headings. 1: Recent evolutionary history might look at individual pedigrees, or comparing populations of a single species. 2: Slightly older comparisons might focus on how species have arisen, and the factors that drive this (part of ‘phylogeography’). 3: Deep history might focus on the origin of whole groups of organisms and a focus on the evolution of particular traits like venom or sociality.

Very recent evolutionary history: pedigrees and populations

While we might ordinarily consider ‘evolutionary history’ to refer to events that happened thousands or millions of years ago, it can still be informative to look at history just a few generations ago. This often involves looking at pedigrees, such as in breeding programs, and trying to see how very short term and rapid evolution may have occurred; this can even include investigating how a particular breeding program might accidentally be causing the species to evolve to adapt to captivity! Rarely does this get referred to as true evolutionary history, but it fits on the spectrum, so I’m going to count it. We might also look at how current populations are evolving differently to one another, to try and predict how they’ll evolve into the future (and thus determine which ones are most at risk, which ones have critically important genetic diversity, and the overall survivability of the total species). This is the basis of ‘evolutionarily significant units’ or ESUs which we previously discussed on The G-CAT.

Captivefishcomic
Maybe goldfish evolved 3 second memory to adapt to the sheer boringness of captivity? …I’m joking, of course: the memory thing is a myth and adaptation works over generations, not a lifetime.

A little further back: phylogeography and species

A little further back, we might start to look at how different populations have formed or changed in semi-recent history (usually looking at the effect of human impacts: we’re really good at screwing things up I’m sorry to say). This can include looking at how populations have (or have not) adapted to new pressures, how stable populations have been over time, or whether new populations are being ‘made’ by recent barriers. At this level of populations and some (or incipient) species, we can find the field of ‘phylogeography’, which involves the study of how historic climate and geography have shaped the evolution of species or caused new species to evolve.

Evolution of salinity
An example of trait-based phylogenetics, looking at the biogeographic patterns and evolution/migration to freshwater in perch-like fishes, by Chen et al. (2014). The phylogeny shows that a group of fishes adapted to freshwater environments (black) from a (likely) saltwater ancestor (white), with euryhaline tolerance evolving two separate times (grey).

One high profile example of phylogeographic studies is the ‘Out of Africa’ hypothesis and debate for the origination of the modern human species. Although there has been no shortage of debate about the origin of modern humans, as well as the fate of our fellow Neanderthals and Denisovans, the ‘Out of Africa’ hypothesis still appears to be the most supported scenario.

human phylogeo
A generalised diagram of the ‘Out of Africa’ hypothesis of human migration, from Oppenheimer, 2012. 

Phylogeography is also component for determining and understanding ‘biodiversity hotspots’; that is, regions which have generated high levels of species diversity and contain many endemic species and populations, such as tropical hotspots or remote temperate regions. These are naturally of very high conservation value and contribute a huge amount to Earth’s biodiversity, ecological functions and potential for us to study evolution in action.

Deep, deep history: phylogenetics and the origin of species (groups)

Even further back, we start to delve into the more traditional concept of evolutionary history. We start to look at how species have formed; what factors caused them to become new species, how stable the new species are, and what are the genetic components underlying the change. This subfield of evolution is called ‘phylogenetics’, and relates to understanding how species or groups of species have evolved and are related to one another.

Sometimes, this includes trying to look at how particular diagnostic traits have evolved in a certain group, like venom within snakes or eusocial groups in bees. Phylogenetic methods are even used to try and predict which species of plants might create compounds which are medically valuable (like aspirin)! Similarly, we can try and predict how invasive a pest species may be based on their phylogenetic (how closely related the species are) and physiological traits in order to safeguard against groups of organisms that are likely to run rampant in new environments. It’s important to understand how and why these traits have evolved to get a good understanding of exactly how the diversity of life on Earth came about.

evolution of venom
An example of looking at trait evolution with phylogenetics, focusing on the evolution of venom in snakes, from Reyes-Velasco et al. (2014). The size of the boxes demonstrates the number of species in each group, with the colours reflecting the number of venomous (red) vs. non-venomous (grey) species. The red dot shows the likely origin of venom.

Phylogenetics also allows us to determine which species are the most ‘evolutionarily unique’; all the special little creatures of plant Earth which represent their own unique types of species, such as the tuatara or the platypus. Naturally, understanding exactly how precious and unique these species are suggests we should focus our conservation attention and particularly conserve them, since there’s nothing else in the world that even comes close!

Who cares what happened in the past right? Well, I do, and you should too! Evolution forms an important component of any conservation management plan, since we obviously want to make sure our species can survive into the future (i.e. adapt to new stressors). Trying to maintain the most ‘evolvable’ groups, particularly within breeding programs, can often be difficult when we have to balance inbreeding depression (not having enough genetic diversity) with outbreeding depression (obscuring good genetic diversity by adding bad genetic diversity into the gene pool). Often, we can best avoid these by identifying which populations are evolutionarily different to one another (see ESUs) and using that as a basis, since outbreeding vs. inbreeding depression can be very difficult to measure. This all goes back to the concept of ‘adaptive potential’ that we’ve discussed a few times before.

In any case, a keen understanding of the evolutionary trajectory of a species is a crucial component for conservation management and to figure out the processes and outcomes of evolution in the real world. Thus, evolutionary history remains a key area of research for both conservation and evolution-related studies.

 

Bigger and better: the evolution of genomic markers

From genetic to genomic markers

As we discussed in last week’s post, different parts of the DNA can be used as genetic markers for analyses relating to conservation, ecology and evolution. We looked at a few different types of markers (allozymes, microsatellites, mitochondrial DNA) and why different markers are good for different things. This week, we’ll focus on the much grander and more modern state of genomics; that is, using DNA markers that are often thousands of genes big!

Genomics vs genetics
If we pretended that the size of the text for each marker was indicative of how big the data is, this figure would probably be about a 1000x under-estimation of genomic datasets. There is not enough room on the blog page to actually capture this.

I briefly mentioned last week that the development of genomics was largely facilitated by what we call ‘next-generation sequencing’, which allows us to easily obtain billions of fragments of DNA and collate them into a useful dataset. Most genomic technologies differ based on how they fragment the DNA for sequencing and how the data is processed.

While the analytical, monetary and time cost of obtaining genomic data has decreased as sequencing technology has improved, we still need to balance these factors together when deciding which method to use. Many methods allow us to put many individual samples together in the same reaction (we tell which sequence belongs to which sequence using special ‘barcode sequences’ that code for one specific sample): in this case, we also need to consider how many samples to place together (“multiplex”).

As a broad generalisation, we can separate most genomic sequencing methods into two broad categories: whole genome or reduced-representation. As the name suggests, whole genome sequencing involves collecting the entire genome of the individuals we use, although this is generally very expensive and can only be done with a limited number of samples at a time. If we want to have a much larger dataset, often we’ll use reduced-representation methods: these involve breaking down the whole genome into much smaller fragments and as many of these as we can to get a broad overview of the genome. Reduced-representation methods are much cheaper and are appropriate for larger sample sizes than whole genome, but naturally lose large amounts of information from the genome.

Genomic sequencing pathway
The (very, very) vague outline of genomic sequencing. First we take all of the DNA of an organism, breaking it into smaller fragments in this case using a restriction enzyme (see below). We then amplify these fragments, making billions of copies of them before piecing them back together to either make the entire genome (left) of a few individuals or patches of the genome (right) for more individuals.

Restriction-site associated DNA (RADseq)

Within the Molecular Ecology Lab, we predominantly use a technology known as “double digest restriction site-associated DNA sequencing”, which is a huge mouthful so we just call it ‘ddRAD’. This sounds incredibly complicated, but (as far as sequencing methods go, anyway) is actually relatively simple. We take the genome of a sample, and then using particular enzymes (called ‘restriction enzymes’), we break the genome randomly down into small fragments (usually up to 200 bases long, after we filter it). We then attach a specific barcode for that individual, and a few more bits and pieces as part of the sequencing process, and then pool them together. This pool (a “library”) is sent off to a facility to be run through a sequencing machine and produce the data we work with. The ‘dd’ part of ‘ddRAD’ just means that a pair of restriction enzymes are used in this method, instead of just one (it’s a lot cleaner and more efficient).

ddRAD flowchart
A simplified standard ddRAD protocol. 1) We obtain the DNA-containing tissue of the organism we want to study, such as blood, skin or muscle samples. 2) We extract all of the genomic DNA from the tissue sample, making sure we have good quantity and quality (avoiding degradation if possible). 3) We break the genome down into smaller fragments using restriction enzymes, which cut at certain places (orange and green marks on the top line). We then attach special sequences to these fragments, such as the adapter (needed for the sequencer to work) and the barcode for that specific individual organism (the green bar). 4) We amplify the fragments, generating billions of copies of each of them. 5) We send these off to a sequencing facility to read the DNA sequence of these fragments (often outsourced to a private institution). 6) We get back a massive file containing all of the different sequences for all of the organisms in one file. 7) We separate out these sequences into the individual the came from by using their special barcode as an identifier (the coloured codes). 8) We then process this data to make sure it’s of the best quality possible, including removing sequences that we don’t have enough copies of or have errors. From this, we produce a final dataset, often with one continuous sequence for each individual. If this dataset doesn’t meet our standards for quality or quantity, we go back and try new filtering parameters.

Gene expression and transcriptomics

Sometimes, however, we might not even want to look at the exact DNA sequence. You might remember in an earlier blog post that I mentioned genes can be ‘switched on’ or ‘switched off’ by activator or repressor proteins. Well, because of this, we can have the exact same genes act in different ways depending on the environment. This is most observable in tissue development: although all of the cells of all of your organs have the exact same genome, the control of gene expression changes what genes are active and thus the physiology of the organ. We might also have genes which are only active in an organism under certain conditions, like heat shock proteins under hot conditions.

This can be an important part of evolution as being able to easily change genetic expression may allow an individual to adapt to new environmental pressures much more easily; we call this ‘phenotypic plasticity’. In this case, instead of sequencing the DNA, we might want to look at which genes are expressed, or how much they are expressed, in different conditions or populations: this is called ‘comparative transcriptomics’. So instead of sequencing the DNA, we sequence the RNA of an organism (the middle step of making proteins, so most RNAs are only present if the gene is being expressed).

Processing data

Despite how it must appear, most of the work with genomic datasets actually comes after you get the sequences back. Because of the nature and scale of genomic datasets, rigorous analytical pipelines are needed to manage and filter data from the billions of small sequences into full sequences of high quality. There are many different ways to do this, and usually involves playing with parameters, so I won’t delve into the details (although some of it is explained in the boxed part of the flowchart figure).

The future of genomics

No doubt as the technology improves, whole genome sequencing will become progressively more feasible for more species, opening up the doors for a new avalanche of data and possibilities. In any case, we’ve come a long way since the first whole genome (for Haemophilus influenzae) in 1995 and the construction of the whole human genome in 2003.

 

Using the ‘blueprint of life’: an introduction to DNA markers

What is a ‘molecular marker’?

As we’ve previously discussed within The G-CAT, information from the DNA of organisms can be used in a variety of ways to study evolution and ecology, inform conservation management, and understand the diversity of life on Earth. We’ve also had a look at the general background of the DNA itself, and some of the different parts of the genome. What we haven’t discussed yet is how we use the DNA sequence in these studies; most importantly, which part of the genome to use.

The genome of most organisms is massive. The size of the genome ranges depending on the organism, with one of the smallest recorded genomes belonging to a bacteria (Carsonella ruddi), consisting of 160,000 bases. There is a bit of debate about the largest recorded genome, but one contender (the ‘canopy plant’, Paris japonica) has a genome stretching 150 billion base pairs long! The human genome sits in the middle at around 3 billion bases long. Naturally, it would be incredibly difficult to obtain the sequence of the whole genome of many organisms (particularly 20 – 30 years ago, due to technological limitations in the sequencing process) so instead we usually pick a specific region of the genome instead. The exact region (or type of region) we use is referred to as a ‘molecular marker’.

How do we choose a good marker?

The marker we pick is incredibly important: this is often based on how much variation we need to observe across groups. For example, if we want to study differences between individuals, say in a pedigree analysis, we need to pick a section of the DNA that will show differences between individuals; it will need to mutate fairly rapidly to be useful. If it mutates too slowly, all individuals will look identical genetically and we won’t have learnt anything new at all.

On the flipside, if we want to study evolution at a larger scale (say, between species, or groups of species) we would need to use a marker that evolves much slower. Using a rapidly mutating section of DNA would effectively give a tonne of ‘white noise’; it’d be impossible to pick what is the genetic difference at the species level (i.e. one species is different to another at that base) vs. at the individual level (i.e. one or many individuals within the species are different). Thus, we tend to use much slower mutating markers for deeper evolutionary history.

Evol spectrum
The spectrum of evolutionary history, with evolutionary splits between major animal groups on the left, to splits between species in the middle, to splits between individuals within a family tree on the right. The effectiveness of a marker for a particular part of the spectrum depends on its mutation rate. The original figure was taken from a landmark paper by Avise (1994), considered one of the forefathers of molecular ecology.

Think of it like comparing cats and dogs. If we wanted to compare different cats to one another (say different breeds) we could use hair length or coat colour as a useful trait. Since some breeds have different coat characteristics, and these don’t vary as much within the breed as across breeds, we can easily determine a long haired cat from a short haired cat. However, if we tried to use coat colour and length to compare cats and dogs we’d be stumped, because both species have lots of variation in these traits within their species. Some cats have coat length more similar to some dogs than to other cats for example; so they’re not a good characteristics to separate the two animal species (we might use muzzle shape, or body shape instead). If we substitute each of these traits with a particular marker, then we can see that some markers are better for some comparisons but not good for others.

Allozymes

The most traditional molecular marker are referred to as ‘allozymes’; instead of comparing actual genetic sequences (something that was not readily possible early in the field), variations in the shape (i.e. the amino acids of the protein, not the code underlying it) were compared between species. Changes in proteins occur very rarely as natural selection tends to push against randomly changing protein structure, since the shape of it is critical to its function and functionality. Because of this, allozymes were only really effective for studying very broad comparisons (mainly across species or species groups); the exact protein used depends on the study organism. Allozymes are generally considered outdated in the field nowadays.

With the development of technologies that allowed us to actual determine the DNA code of genes, molecular ecology moved into comparing actual sequences across individuals. However, early sequencing technology could generally only accurately determine small sections of DNA at a time, so particular markers capitalising on this were developed. Many of these are still used due to their cost-effectiveness and general ease of analysing.

Microsatellites

For comparing closely related individuals (within a pedigree, or a population), markers called ‘microsatellites’ are widely used. These are small sections of the genome which have repetitive DNA codes; usually, the same two or three base pairs (one ‘motif’) are repeated a number of times afterwards (the ‘repeat number’). While the motifs themselves rarely get mutations, the number of repeated motifs very rapidly mutates. This is because the protein that copies DNA is not very perfect, and often ‘slips up’, and adds or cuts off a repeat from the microsatellite sequence. Thus, differences in the repeat number of microsatellites accumulate pretty quickly, to the point where you can determine the parents of an individual with them.

Microsat_diagram
The general (and simplified) structure of a microsatellite marker. 

Microsatellites are often used in comparisons across closely related individuals, such as within pedigrees or within populations. While they are relatively easy to obtain, one drawback is that you need to have some understanding of the exact microsatellite you wish to analyse before you start; you need to make a specific ‘primer’ sequence to be able to get the right marker, as some may not be informative in particular species or comparisons. Many researchers choose to use 10-20 different microsatellite markers together in these types of studies, such as in human parentage analyses.

Cats_parentage
Microsatellites are useful for parentage analysis. Our previous guest contestants are here to discuss ‘Who is the father?!’ in Maury-like fashion. The results are in, and using 4 microsatellites (1-4) and looking at the number of repeats in each of those, we can see the contestant 2 is undoubtedly the father! I’ll be honest, I have no idea if this is how Maury works, but I think it would work.

Mitochondrial DNA

For deeper comparisons, however, microsatellites mutate far too rapidly to be effective. Instead, we can choose to use the DNA of the mitochondria. You may remember the mitochondria as ‘the powerhouse of the cell’; while this is true, it also has a lot of other unique properties. The mitochondria was actually (a very, very, very long time ago) a separate bacteria-like organism which became symbiotically embedded within another cell. Because of this, and despite a couple billion years of evolution since that time, the mitochondria actually has its own genome separate to the ‘host’ (like the standard human genome). The full mitochondrial genome consists of around 37 different genes, most of which don’t code for any proteins involved directly in evolution; as such, natural selection doesn’t affect them as much as other genes. The most commonly used mitochondrial genes are the cytochrome b oxidase gene (cytb for short) or the cytochrome c oxidase 1 (CO1) gene.

The mitochondrial genome evolves relatively rapidly (but not nearly as fast as microsatellites) and is found in pretty much every plant and animal on the planet. Because of these traits, it’s often used as a way of diagnosing species through the ‘Barcode of Life’ project (using cytb and CO1). It’s very widely used within species-level studies, to the point where we can even use the relatively consistent mutation rate of the mitochondrial genome to estimate how long ago different species separated in evolution.

Cats_barcode
Not entirely how the Barcode of Life works, but close enough, right?

Other markers?

There are plenty of other genetic markers that are used within molecular ecology, with some focusing on only the exons or introns of genes, or other repetitive sequences. However, microsatellites and mitochondrial genes are among the most widely used in evolution and conservation studies.

While these markers have been very useful in building the foundations of molecular ecology as a scientific field, developments in sequencing technology, analytical methods and evolutionary theory have pushed our ability to use DNA to understand evolution and conservation even further. Particularly the development of sequencing machines which can process much larger amounts of genetic DNA. This has pushed genetics into the age of ‘genomics’: while this sounds like a massively technical difference, it’s really just about the difference in the size of the data we can use. Obviously, this has many other benefits for the kinds of questions we can ask about evolution, conservation and ecology.

Genomics has massively expanded in recent years, the types, quantity and quality of data are diverse. Stay tuned because next week, we’ll start to delve into the modern world of genomics!

Playing around with science

Science in pop culture

For most people, scientific research can seem somewhat distant and detached from the average person (and society generally). However, the distillation of scientific ideas into various forms of media has been done for ages, and is particularly prevalent (although not limited to) within science fiction. It’s not all that uncommon for scientists to describe the origination of their scientific interest to have come from classic sci-fi movies, tv shows, or games. I’m not saying dinosaurs haven’t always been cool, but after seeing them animated and ferocious in Jurassic Park, I have no doubt a new generation of palaeontologists were inspired to enter the field. I’m sure the same must also be at least partially true for archaeology and Indiana Jones. While I can guarantee the actual scientific research is nowhere near as adventurous and high-octane thriller as those movies would depict, their respective popularities renew interest in the science and inspire new students of the disciplines.

Velociraptor
Sure, they’re not perfectly scientifically accurate, but the certainly get the attention of the public. Source: Jurassic Park wiki.

The inclusion of science within pop culture media such as movies, tv shows, music and video games can have profound impacts on the overall perception of science. This influence seems to go either way depending on how the science is presented and perceived: positive outlooks on science can succinctly present scientific matter in a way that is easy to interpret, and thus can generate interest in the fields of science. Contrastingly, negative outlooks on science, or misinterpretations of science, can drastically impact what people understand about scientific theory. For example, despite being a horrendously outdated belief, Lucy proposed that the average human only uses 10% of their brain capacity: achieving 100% brain capacity using a stimulant, the titular character becomes miraculously superhuman. While this concept is clearly outrageously behind the times for anyone who follows psychological sciences, a disturbing number of people apparently still believe this notion. Thus, misrepresentation of scientific theory perpetuates outdated concepts.

10% brain comic
I mean, someone may as well, right?

Don’t get me wrong: I love ridiculous science fiction as much as the next nerd, and I’m certainly not of the expectation that all science-based information needs to be 100% accurate, without fail (after all, the fiction and fantasy has to fit somewhere…). But it’s important to make sure the transition from scientific research to popular media doesn’t lose the important facts along the way.

Evolution’s relationship with pop culture has been a little more complicated than other scientific theories. Sometimes it’s invoked rather loosely to explain supernatural alien monsters (e.g. Xenomorphs; Alien franchise); other times it’s flipped on its head to show a type of de-evolution (Planet of the Apes). Science fiction has long recognised the innovative and seemingly endless possibilities of evolution and the formation of new species. Generally, the audience is fairly familiar with the concept of evolution (at least in principle) and it makes for a useful tool for explaining the myriad of life in science fiction stories.

Evolution in video games?

It probably doesn’t come as a huge surprise to note that I’m a nerd in all aspects of my life, not just my career. For me, this is particularly a love of video games. Rarely, however, do these two forms of nerdism coincide for me: while some games apply science and scientific theory, they are usually biased towards physics and engineering disciplines (looking at you, Portal). As far as my field is concerned, there are a few notable examples (such as Spore) which encapsulate the essence and majesty of evolution, but rarely do they incorporate the ‘genetic’ aspect that I love.

Spore screenshot
There’s nothing quite like making a horrific carnivorous monster and collapsing ecosystems by exterminating all of the wildlife, then taking over the Universe. Hmm…

You can then imagine my utter delight at the discovery of a game that actually incorporates both population genetics and interesting gameplay. The indie survival game, aptly named Niche: A Genetics Survival Game, very literally represents this ‘niche’ for me (and I will not apologise for the pun!). Combining simplified models of population genetics processes such as genetic diversity, inbreeding (and associated inbreeding depression), natural selection, and stochastic events, Niche beautifully incorporates scientific theory (albeit toned down to a layman level) with challenging, yet engaging, gameplay mechanics and adorable art style.

Niche screenshot
Niche: A Genetics Survival Game epitomises the intersection of evolutionary theory and pop culture.

As one might expect from the title, Niche is at heart a survival game: the aim is to have your very own population of animals (dubbed ‘Nichelings’) survive the stresses of the world, through balancing population size, gene pools, resources (such as food, nests, space) and fighting off predators. Over time, the genetics component drives the evolution of your Nichelings, pushing them to be better at certain tasks depending on the traits selected for: the ultimate aim of the game is to create the perfectly adapted species that can colonise all of the land masses randomly generated.

Niche screenshot DNA
The user interface of Niche. A: The ranking of the selected Nicheling, moving from alpha, to beta, to gamma. This determines the order the Nichelings eat in (gammas get the short end of the stick). B: The traits of the selecting Nicheling. In order, these are the physical traits (i.e. the strength, speed and abilities of the animal), the genetic sequence (genotype) of the animal (expanded in C). the user-chosen mutations for that Nicheling and the pedigree of NichelingsC: The expanded DNA sequence of the selected Nicheling, showing the paternal and maternal variants (alleles) of all the possible genes. Highlighted traits are the expressed trait (dominant) whilst the faded ones indicate recessive carrier genes that aren’t expressed. D: Collected food, one of the most important resources in the game. E: Nest material, required to build nests and produce offspring. F: The different senses (sight, smell, hearing) which can be toggled to give different viewpoints of the surrounding environments (with different benefits and weaknesses).

Niche requires cunning strategy, good foresight and planning, and sometimes a little luck. Although I’m decidedly not very good at Niche yet (I think my rates of extinction would mirror the real world a little too much for my liking…), the chance to involve my scientific background into my favourite hobby is a somewhat magical experience.

Niche screenshot extinct
Oh god, I hope this isn’t a premonition for my career!

You might wonder why I care so much about a video game. While the game is in and of itself an interesting concept, to me it exemplifies one way we can make science an enjoyable and digestible concept for non-scientists. It’s possible that Niche could open the door of population-level genetics and evolution to a new audience, and potentially inspire the next generation of scientists in the field. Although that might be an extraordinarily long shot, it is my hope that the curiosity, mystery and creativity of scientific research is at least partially represented in media such as gaming to help integrate science and society.

Using video games for science?!

Both science and society can benefit from the (accurate) representation of science in pop culture, not just through fostering a connection between scientific theory and the recreational hobbies of people. In rare occasions, pop culture can even be used as a surrogate medium for testing scientific theories and hypotheses in a specific environment: for example, World of Warcraft has unwittingly contributed to scientific progress. As part of a particular boss battle, characters could become infected with a particular disease (called “Corrupted Blood”), which would have significant effects on players but only for a few seconds. While this was supposed to be removed after leaving the area of the fight, a bug in the game caused it to stay on animal pets that were afflicted, and thus become a viral phenomenon when it started to spread into the wider world (of Warcraft). The presence of the epidemic wiped out swathes of lower level players and caused significant social repercussions in the World of Warcraft community as players adjusted their behaviour to avoid or prevent transmission of the deadly disease.

This unique circumstance allowed a group of scientists to use it as a simulation of a real viral outbreak, as the spread of the disease was directly related to the social behaviour and interactivity of players within the game. The “Corrupted Blood” incident such enthralled scientists that multiple papers were published discussing the feasibility of using virtual gaming worlds to simulate human reactions to epidemic outbreaks and viral transmissions on an unparalleled scale. Similarities between the method of transmission and behavioural responses to real-world events such as the avian flu epidemic were made.

Corrupted blood event
And you thought Bird Flu was bad, at least they couldn’t teleport! Source: GameRant.

This isn’t the only example of even World of Warcraft informing research, with others using it to model economic theories through a free market auction system. While these may seem extraordinarily strange (to scientists and non-scientists alike), these examples demonstrate how popular media such as gaming can be an important interactive front between science and society.