It shouldn’t come as a surprise to anyone with a basic understanding of evolution that it is a temporal (and also spatial concept). Time is a fundamental aspect of the process of evolution by natural selection, and without it evolution wouldn’t exist. But time is also a fickle thing, and although it remains constant (let’s not delve into that issue here) not all things experience it in the same way.
Contrastingly, sometimes we might also use genetic information to do the exact opposite. While so many species on Earth are at risk (or have already passed over the precipice) of extinction, some have gone rogue with our intervention. These are, of course, invasive species; pests that have been introduced into new environments and, by their prolific nature, start to throw out the balance of the ecosystem. Australians will be familiar with no shortage of relevant invasive species; the most notable of which is the cane toad, Rhinella marina. However, there are a plethora of invasive species which range from notably prolific (such as the cane toad) to the seemingly mundane (such as the blackbird): so how can we possibly deal with the number and propensity of pests?
Tools for invasive species management
There are a number of tools at our disposal for dealing with invasive species. These range from chemical controls (like pesticides), to biological controls and more recently to targeted genetic methods. Let’s take a quick foray into some of these different methods and their applications to pest control.
The potential secondary impact of biological controls, and the degree of unpredictability in how they will respond to a new environment (and how native species will also respond to their introduction) leads conservationists to develop new, more specific techniques. In similar ways, viral and bacterial-based controls have had limited success (although are still often proposed in conservation management, such as the planned carp herpesvirus release).
The better we understand invasive species and populations from a genetic perspective, the more informed our management efforts can be and the more likely we are to be able to adequately address the problem.
Managing invasive pest species
The impact of human settlement into new environments is exponentially beyond our direct influences. With our arrival, particularly in the last few hundred years, human migration has been an effective conduit for the spread of ecologically-disastrous species which undermine the health and stability of ecosystems around the globe. As such, it is our responsibility to Earth to attempt to address our problems: new genetic techniques is but one growing avenue by which we might be able to remove these invasive pests.
Beyond the apparent ethical and moral objections to the invasive nature of demanding genetic testing for Indigenous peoples, a crucial question is one of feasibility: even if you decided to genetically test for race, is this possible? It might come as a surprise to non-geneticists that actually, from a genetic perspective, race is not a particularly stable concept.
This is exponentially difficult for people who might have fewer sequenced ancestors or relatives; without the reference for genetic variation, it can be even harder to trace their genetic ancestry. Such is the case for Indigenous Australians, for which there is a distinct lack of available genetic data (especially compared to European-descended Australians).
The non-genetic components
The genetic non-identifiability of race is but one aspect which contradicts the rationality of genetic race testing. As we discussed in the previous post on The G-CAT, the connection between genetic underpinning and physicality is not always clear or linear. The role of the environment on both the expression of genetic variation, as well as the general influence of environment on aspects such as behaviour, philosophy, and culture necessitate that more than the genome contributes to a person’s identity. For any given person, how they express and identify themselves is often more strongly associated with their non-genetic traits such as beliefs and culture.
These factors cannot reliably be tested under a genetic framework. While there may be some influence of genes on how a person’s psychology develops, it is unlikely to be able to predict the lifestyle, culture and complete identity of said person. For Indigenous Australians, this has been confounded by the corruption and disruption of their identity through the Stolen Generation. As a result, many Indigenous descendants may not appear (from a genetic point of view) to be purely Indigenous but their identity and culture as an Indigenous person is valid. To suggest that their genetic ancestry more strongly determines their identity than anything else is not only naïve from a scientific perspective, but nothing short of a horrific simplification and degradation of those seeking to reclaim their identity and culture.
The non-identifiability of genetic race
The science of genetics overwhelmingly suggests that there is no fundamental genetic underpinning of ‘race’ that can be reliably used. Furthermore, the impact of non-genetic factors on determining the more important aspects of personal identity, such as culture, tradition and beliefs, demonstrates that attempts to delineate people into subcategories by genetic identity is an unreliable method. Instead, genetic research and biological history fully acknowledges and embraces the diversity of the global human population. As it stands, the phrase ‘human race’ might be the most biologically-sound classification of people: we are all the same.
It should come as no surprise to any reader of The G-CAT that I’m a firm believer against the false dichotomy (and yes, I really do love that phrase) of “nature versus nurture.” Primarily, this is because the phrase gives the impression of some kind of counteracting balance between intrinsic (i.e. usually genetic) and extrinsic (i.e. usually environmental) factors and how they play a role in behaviour, ecology and evolution. While both are undoubtedly critical for adaptation by natural selection, posing this as a black-and-white split removes the possibility of interactive traits.
Despite how important the underlying genes are for the formation of proteins and definition of physiology, they are not omnipotent in that regard. In fact, many other factors can influence how genetic traits relate to phenotypic traits: we’ve discussed a number of these in minor detail previously. An example includes interactions across different genes: these can be due to physiological traits encoded by the cumulative presence and nature of many loci (as in quantitative trait loci and polygenic adaptation). Alternatively, one gene may translate to multiple different physiological characters if it shows pleiotropy.
From an evolutionary standpoint again, epigenetics can similarly influence the ‘bang for a buck’ of particular genes. Being able to translate a single gene into many different forms, and for this to be linked to environmental conditions, allows organisms to adapt to a variety of new circumstances without the need for specific adaptive genes to be available. Following this logic, epigenetic variation might be critically important for species with naturally (or unnaturally) low genetic diversity to adapt into the future and survive in an ever-changing world. Thus, epigenetic information might paint a more optimistic outlook for the future: although genetic variation is, without a doubt, one of the most fundamental aspects of adaptability, even horrendously genetically depleted populations and species might still be able to be saved with the right epigenetic diversity.
Note: For some clear, interesting presentations on the topic of de-extinction, and where some of the information for this post comes from, check out this list of TED talks.
The current conservation crisis
The stark reality of conservation in the modern era epitomises the crisis disciplinethat so often is used to describe it: species are disappearing at an unprecedented rate, and despite our best efforts it appears that they will continue to do so. The magnitude and complexity of our impacts on the environment effectively decimates entire ecosystems (and indeed, the entire biosphere). It is thus our responsibility as ‘custodians of the planet’ (although if I had a choice, I would have sacked us as CEOs of this whole business) to attempt to prevent further extinction of our planet’s biodiversity.
There’s one catch (well, a few really) with genetic rescue: namely, that one must have other populations to ‘outbreed’ with in order add genetic variation to the captive population. But what happens if we’re too late? What if there are no other populations to supplement with, or those other populations are also too genetically depauperate to use for genetic rescue?
Believe it or not, sometimes it’s not too late to save species, even after they have gone extinct. Which brings us from this (lengthy) introduction to this week’s topic: de-extinction. Yes, we’re literally (okay, maybe not) going to raise the dead.
Backbreeding: resurrection by hybridisation
You might wonder how (or even if!) this is possible. And to be frank, it’s extraordinarily difficult. However, it has to a degree been done before, in very specific circumstances. One scenario is based on breeding out a species back into existence: sometimes we refer to this as ‘backbreeding’.
This practice really only applies in a few select scenarios. One requirement for backbreeding to be possible is that hybridisation across species has to have occurred in the past, and generally to a substantial scale. This is important as it allows the genetic variation which defines one of those species to live on within the genome of its sister species even when the original ‘host’ species goes extinct. That might make absolutely zero sense as it stands, so let’s dive into this with a case study.
One of these species, Chelonoidis elephantopus, also known as the Floreana tortoise after their home island, went extinct over 150years ago, likely due to hunting and trade. However, before they all died, some individuals were transported to another island (ironically, likely by mariners) and did the dirty with another species of tortoise: C. becki. Because of this, some of the genetic material of the extinct Floreana tortoiseintrogressed into the genome of the still-living C. becki. In an effort to restore an iconic species, scientists from a number of institutions attempted to do what sounds like science-fiction: breed the extinct tortoise back to life.
When you saw the title for this post, you were probably expecting some Jurassic Parklevel ‘dinosaurs walking on Earth again’ information. I know I did when I first heard the term de-extinction. Unfortunately, contemporary de-extinction practices are not that far advanced just yet, although there have been some solid attempts. Experiments conducted using the genomic DNA from the nucleus of a dead animal, and cloning it within the egg of another living member of that species has effectively cloned an animal back from the dead. This method, however, is currently limited to animals that have died recently, as the DNA degrades beyond use over time.
One might expect that as genomic technologies improve, particularly methods facilitated by the genome-editing allowed from CRISPR/Cas-9 development, that we might one day be able to truly resurrect an extinct species. But this leads to very strongly debated topics of ethics and morality of de-extinction. If we can bring a species back from the dead, should we? What are the unexpected impacts of its revival? How will we prevent history from repeating itself, and the species simply going back extinct? In a rapidly changing world, how can we account for the differences in environment between when the species was alive and now?
There is no clear, simple answer to many of these questions. We are only scratching the surface of the possibility of de-extinction, and I expect that this debate will only accelerate with the research. One thing remains eternally true, though: it is still the distinct responsibility of humanity to prevent more extinctions in the future. Handling the growing climate change problem and the collapse of ecosystems remains a top priority for conservation science, and without a solution there will be no stable planet on which to de-extinct species.
Further to this, we can expand the site-frequency spectrum to compare across populations. Instead of having a simple 1-dimensional frequency distribution, for a pair of populations we can have a grid. This grid specifies how often a particular allele occurs at a certain frequency in Population A and at a different frequency in Population B. This can also be visualised quite easily, albeit as a heatmap instead. We refer to this as the 2-dimensional SFS (2DSFS).
The same concept can be expanded to even more populations, although this gets harder to represent visually. Essentially, we end up with a set of different matrices which describe the frequency of certain alleles across all of our populations, merging them together into the joint SFS. For example, a joint SFS of 4 populations would consist of 6 (4 x 4 total comparisons – 4 self-comparisons, then halved to remove duplicate comparisons) 2D SFSs all combined together. To make sense of this, check out the diagrammatic tables below.
The different forms of the SFS
Which alleles we choose to use within our SFS is particularly important. If we don’t have a lot of information about the genomics or evolutionary history of our study species, we might choose to use the minor allele frequency (MAF). Given that SNPs tend to be biallelic, for any given locus we could have Allele A or Allele B. The MAF chooses the least frequent of these two within the dataset and uses that in the summary SFS: since the other allele’s frequency would just be 2N – the frequency of the other allele, it’s not included in the summary. An SFS made of the MAF is also referred to as the folded SFS.
Alternatively, if we know some things about the genetic history of our study species, we might be able to divide Allele A and Allele B into derived or ancestral alleles. Since SNPs often occur as mutations at a single site in the DNA, one allele at the given site is the new mutation (the derived allele) whilst the other is the ‘original’ (the ancestral allele). Typically, we would use the derived allele frequency to construct the SFS, since under coalescent theory we’re trying to simulate that mutation event. An SFS made of the derived alleles only is also referred to as the unfolded SFS.
Applications of the SFS
How can we use the SFS? Well, it can moreorless be used as a summary of genetic variation for many types of coalescent-based analyses. This means we can make inferences of demographic history (see here for more detailed explanation of that) without simulating large and complex genetic sequences and instead use the SFS. Comparing our observed SFS to a simulated scenario of a bottleneck and comparing the expected SFS allows us to estimate the likelihood of that scenario.
The SFS can even be used to detect alleles under natural selection. For strongly selected parts of the genome, alleles should occur at either high (if positively selected) or low (if negatively selected) frequency, with a deficit of more intermediate frequencies.
Adding to the analytical toolbox
The SFS is just one of many tools we can use to investigate the demographic history of populations and species. Using a combination of genomic technologies, coalescent theory and more robust analytical methods, the SFS appears to be poised to tackle more nuanced and complex questions of the evolutionary history of life on Earth.
This is based on the idea that for genes that are not related to traits under selection (either positively or negatively), new mutations should be acquired and lost under predominantly random patterns. Although this accumulation of mutations is influenced to some degree by alternate factors such as population size, the overall average of a genome should give a picture that largely discounts natural selection. But is this true? Is the genome truly neutral if averaged?
First, let’s take a look at what we mean by neutral or not. For genes that are not under selection, alleles should be maintained at approximately balanced frequencies and all non-adaptive genes across the genome should have relatively similar distribution of frequencies. While natural selection is one obvious way allele frequencies can be altered (either favourably or detrimentally), other factors can play a role.
The extent of this linkage effect depends on a number of other factors such as ploidy (the number of copies of a chromosome a species has), the size of the population and the strength of selection around the central locus. The presence of linkage and its impact on the distribution of genetic diversity (LD) has been well documented within evolutionary and ecological genetic literature. The more pressing question is one of extent: how much of the genome has been impacted by linkage? Is any of the genome unaffected by the process?
Although I avoid having a strong stance here (if you’re an evolutionary geneticist yourself, I will allow you to draw your own conclusions), it is my belief that the model of neutral theory – and the methods that rely upon it – are still fundamental to our understanding of evolution. Although it may present itself as a more conservative way to identify adaptation within the genome, and cannot account for the effect of the above processes, neutral theory undoubtedly presents itself as a direct and well-implemented strategy to understand adaptation and demography.
A recurring analytical method, both within The G-CAT and the broader ecological genetic literature, is based on coalescent theory. This is based on the mathematical notion that mutations within genes (leading to new alleles) can be traced backwards in time, to the point where the mutation initially occurred. Given that this is a retrospective, instead of describing these mutation moments as ‘divergence’ events (as would be typical for phylogenetics), these appear as moments where mutations come back together i.e. coalesce.
From a mathematical perspective, the coalescent model is actually (relatively) simple. If we sampled a single gene from two different individuals (for simplicity’s sake, we’ll say they are haploid and only have one copy per gene), we can statistically measure the probability of these alleles merging back in time (coalescing) at any given generation. This is the same probability that the two samples share an ancestor (think of a much, much shorter version of sharing an evolutionary ancestor with a chimpanzee).
Normally, if we were trying to pick the parents of our two samples, the number of potential parents would be the size of the ancestral population (since any individual in the previous generation has equal probability of being their parent). But from a genetic perspective, this is based on the genetic (effective) population size (Ne), multiplied by 2 as each individual carries two copies per gene (one paternal and one maternal). Therefore, the number of potential parents is 2Ne.
Although this might seem mathematically complicated, the coalescent model provides us with a scenario of how we would expect different mutations to coalesce back in time if those idealistic scenarios are true. However, biology is rarely convenient and it’s unlikely that our study populations follow these patterns perfectly. By studying how our empirical data varies from the expectations, however, allows us to infer some interesting things about the history of populations and species.
This makes sense from theoretical perspective as well, since strong genetic bottlenecks means that most alleles are lost. Thus, the alleles that we do have are much more likely to coalesce shortly after the bottleneck, with very few alleles that coalesce before the bottleneck event. These alleles are ones that have managed to survive the purge of the bottleneck, and are often few compared to the overarching patterns across the genome.
In a similar vein, the coalescent can also be used to test how long ago the two contemporary populations diverged. Similar to gene flow, this is often included as an additional parameter on top of the coalescent model in terms of the number of generations ago. To convert this to a meaningful time estimate (e.g. in terms of thousands or millions of years ago), we need to include a mutation rate (the number of mutations per base pair of sequence per generation) and a generation time for the study species (how many years apart different generations are: for humans, we would typically say ~20-30 years).
While each of these individual concepts may seem (depending on how well you handle maths!) relatively simple, one critical issue is the interactive nature of the different factors. Gene flow, divergence time and population size changes will all simultaneously impact the distribution and frequency of alleles and thus the coalescent method. Because of this, we often use complex programs to employ the coalescent which tests and balances the relative contributions of each of these factors to some extent. Although the coalescent is a complex beast, improvements in the methodology and the programs that use it will continue to improve our ability to infer evolutionary history with coalescent theory.
One particular distinction we need to make early here is the difference between allele frequency and allele identity. In these analyses, often we are working with the same alleles (i.e. particular variants) across our populations, it’s just that each of these populations may possess these particular alleles in different frequencies. For example, one population may have an allele (let’s call it Allele A) very rarely – maybe only 10% of individuals in that population possess it – but in another population it’s very common and perhaps 80% of individuals have it. This is a different level of differentiation than comparing how different alleles mutate (as in the coalescent) or how these mutations accumulate over time (like in many phylogenetic-based analyses).
Fixed differences are sometimes used as a type of diagnostic trait for species. This means that each ‘species’ has genetic variants that are not shared at all with its closest relative species, and that these variants are so strongly under selection that there is no diversity at those loci. Often, fixed differences are considered a level above populations that differ by allelic frequency only as these alleles are considered ‘diagnostic’ for each species.
To distinguish between the two, we often use the overall frequency of alleles in a population as a basis for determining how likely two individuals share an allele by random chance. If alleles which are relatively rare in the overall population are shared by two individuals, we expect that this similarity is due to family structure rather than population history. By factoring this into our relatedness estimates we can get a more accurate overview of how likely two individuals are to be related using genetic information.
The wild world of allele frequency
Despite appearances, this is just a brief foray into the many applications of allele frequency data in evolution, ecology and conservation studies. There are a plethora of different programs and methods that can utilise this information to address a variety of scientific questions and refine our investigations.