You’ve probably been exposed to one news headline or another in the recent past (let’s say the last 5 years) that reads something like “SCIENTISTS DISCOVER GENES THAT CAUSE (X).” X, of course, varies massively based on the study itself (and sometimes the bastardisation of said study by media): it can include describing medical conditions such as cancer, autism or congenital diseases; behavioural traits, such as sexual preferences; or broad physical traits, such as the classic problem of the inheritability of height. Unsurprisingly, you may think that trying to find the genes responsible for some traits should be either a) super easy, or b) super hard, depending on your own philosophical preference or the trait in question. So how do these studies come about, anyway?
We’ve spent some time before discussing the nature of the term ‘species’ and what it means in reality. Of course, answers to questions in biology are always more complicated than we wish they might be, and despite the common nomenclature of the word ‘species’ the underlying definition is convoluted and variable.
It shouldn’t come as a surprise to anyone with a basic understanding of evolution that it is a temporal (and also spatial concept). Time is a fundamental aspect of the process of evolution by natural selection, and without it evolution wouldn’t exist. But time is also a fickle thing, and although it remains constant (let’s not delve into that issue here) not all things experience it in the same way.
Contrastingly, sometimes we might also use genetic information to do the exact opposite. While so many species on Earth are at risk (or have already passed over the precipice) of extinction, some have gone rogue with our intervention. These are, of course, invasive species; pests that have been introduced into new environments and, by their prolific nature, start to throw out the balance of the ecosystem. Australians will be familiar with no shortage of relevant invasive species; the most notable of which is the cane toad, Rhinella marina. However, there are a plethora of invasive species which range from notably prolific (such as the cane toad) to the seemingly mundane (such as the blackbird): so how can we possibly deal with the number and propensity of pests?
Tools for invasive species management
There are a number of tools at our disposal for dealing with invasive species. These range from chemical controls (like pesticides), to biological controls and more recently to targeted genetic methods. Let’s take a quick foray into some of these different methods and their applications to pest control.
The potential secondary impact of biological controls, and the degree of unpredictability in how they will respond to a new environment (and how native species will also respond to their introduction) leads conservationists to develop new, more specific techniques. In similar ways, viral and bacterial-based controls have had limited success (although are still often proposed in conservation management, such as the planned carp herpesvirus release).
The better we understand invasive species and populations from a genetic perspective, the more informed our management efforts can be and the more likely we are to be able to adequately address the problem.
Managing invasive pest species
The impact of human settlement into new environments is exponentially beyond our direct influences. With our arrival, particularly in the last few hundred years, human migration has been an effective conduit for the spread of ecologically-disastrous species which undermine the health and stability of ecosystems around the globe. As such, it is our responsibility to Earth to attempt to address our problems: new genetic techniques is but one growing avenue by which we might be able to remove these invasive pests.
Beyond the apparent ethical and moral objections to the invasive nature of demanding genetic testing for Indigenous peoples, a crucial question is one of feasibility: even if you decided to genetically test for race, is this possible? It might come as a surprise to non-geneticists that actually, from a genetic perspective, race is not a particularly stable concept.
This is exponentially difficult for people who might have fewer sequenced ancestors or relatives; without the reference for genetic variation, it can be even harder to trace their genetic ancestry. Such is the case for Indigenous Australians, for which there is a distinct lack of available genetic data (especially compared to European-descended Australians).
The non-genetic components
The genetic non-identifiability of race is but one aspect which contradicts the rationality of genetic race testing. As we discussed in the previous post on The G-CAT, the connection between genetic underpinning and physicality is not always clear or linear. The role of the environment on both the expression of genetic variation, as well as the general influence of environment on aspects such as behaviour, philosophy, and culture necessitate that more than the genome contributes to a person’s identity. For any given person, how they express and identify themselves is often more strongly associated with their non-genetic traits such as beliefs and culture.
These factors cannot reliably be tested under a genetic framework. While there may be some influence of genes on how a person’s psychology develops, it is unlikely to be able to predict the lifestyle, culture and complete identity of said person. For Indigenous Australians, this has been confounded by the corruption and disruption of their identity through the Stolen Generation. As a result, many Indigenous descendants may not appear (from a genetic point of view) to be purely Indigenous but their identity and culture as an Indigenous person is valid. To suggest that their genetic ancestry more strongly determines their identity than anything else is not only naïve from a scientific perspective, but nothing short of a horrific simplification and degradation of those seeking to reclaim their identity and culture.
The non-identifiability of genetic race
The science of genetics overwhelmingly suggests that there is no fundamental genetic underpinning of ‘race’ that can be reliably used. Furthermore, the impact of non-genetic factors on determining the more important aspects of personal identity, such as culture, tradition and beliefs, demonstrates that attempts to delineate people into subcategories by genetic identity is an unreliable method. Instead, genetic research and biological history fully acknowledges and embraces the diversity of the global human population. As it stands, the phrase ‘human race’ might be the most biologically-sound classification of people: we are all the same.
It should come as no surprise to any reader of The G-CAT that I’m a firm believer against the false dichotomy (and yes, I really do love that phrase) of “nature versus nurture.” Primarily, this is because the phrase gives the impression of some kind of counteracting balance between intrinsic (i.e. usually genetic) and extrinsic (i.e. usually environmental) factors and how they play a role in behaviour, ecology and evolution. While both are undoubtedly critical for adaptation by natural selection, posing this as a black-and-white split removes the possibility of interactive traits.
Despite how important the underlying genes are for the formation of proteins and definition of physiology, they are not omnipotent in that regard. In fact, many other factors can influence how genetic traits relate to phenotypic traits: we’ve discussed a number of these in minor detail previously. An example includes interactions across different genes: these can be due to physiological traits encoded by the cumulative presence and nature of many loci (as in quantitative trait loci and polygenic adaptation). Alternatively, one gene may translate to multiple different physiological characters if it shows pleiotropy.
From an evolutionary standpoint again, epigenetics can similarly influence the ‘bang for a buck’ of particular genes. Being able to translate a single gene into many different forms, and for this to be linked to environmental conditions, allows organisms to adapt to a variety of new circumstances without the need for specific adaptive genes to be available. Following this logic, epigenetic variation might be critically important for species with naturally (or unnaturally) low genetic diversity to adapt into the future and survive in an ever-changing world. Thus, epigenetic information might paint a more optimistic outlook for the future: although genetic variation is, without a doubt, one of the most fundamental aspects of adaptability, even horrendously genetically depleted populations and species might still be able to be saved with the right epigenetic diversity.
Further to this, we can expand the site-frequency spectrum to compare across populations. Instead of having a simple 1-dimensional frequency distribution, for a pair of populations we can have a grid. This grid specifies how often a particular allele occurs at a certain frequency in Population A and at a different frequency in Population B. This can also be visualised quite easily, albeit as a heatmap instead. We refer to this as the 2-dimensional SFS (2DSFS).
The same concept can be expanded to even more populations, although this gets harder to represent visually. Essentially, we end up with a set of different matrices which describe the frequency of certain alleles across all of our populations, merging them together into the joint SFS. For example, a joint SFS of 4 populations would consist of 6 (4 x 4 total comparisons – 4 self-comparisons, then halved to remove duplicate comparisons) 2D SFSs all combined together. To make sense of this, check out the diagrammatic tables below.
The different forms of the SFS
Which alleles we choose to use within our SFS is particularly important. If we don’t have a lot of information about the genomics or evolutionary history of our study species, we might choose to use the minor allele frequency (MAF). Given that SNPs tend to be biallelic, for any given locus we could have Allele A or Allele B. The MAF chooses the least frequent of these two within the dataset and uses that in the summary SFS: since the other allele’s frequency would just be 2N – the frequency of the other allele, it’s not included in the summary. An SFS made of the MAF is also referred to as the folded SFS.
Alternatively, if we know some things about the genetic history of our study species, we might be able to divide Allele A and Allele B into derived or ancestral alleles. Since SNPs often occur as mutations at a single site in the DNA, one allele at the given site is the new mutation (the derived allele) whilst the other is the ‘original’ (the ancestral allele). Typically, we would use the derived allele frequency to construct the SFS, since under coalescent theory we’re trying to simulate that mutation event. An SFS made of the derived alleles only is also referred to as the unfolded SFS.
Applications of the SFS
How can we use the SFS? Well, it can moreorless be used as a summary of genetic variation for many types of coalescent-based analyses. This means we can make inferences of demographic history (see here for more detailed explanation of that) without simulating large and complex genetic sequences and instead use the SFS. Comparing our observed SFS to a simulated scenario of a bottleneck and comparing the expected SFS allows us to estimate the likelihood of that scenario.
The SFS can even be used to detect alleles under natural selection. For strongly selected parts of the genome, alleles should occur at either high (if positively selected) or low (if negatively selected) frequency, with a deficit of more intermediate frequencies.
Adding to the analytical toolbox
The SFS is just one of many tools we can use to investigate the demographic history of populations and species. Using a combination of genomic technologies, coalescent theory and more robust analytical methods, the SFS appears to be poised to tackle more nuanced and complex questions of the evolutionary history of life on Earth.
This is based on the idea that for genes that are not related to traits under selection (either positively or negatively), new mutations should be acquired and lost under predominantly random patterns. Although this accumulation of mutations is influenced to some degree by alternate factors such as population size, the overall average of a genome should give a picture that largely discounts natural selection. But is this true? Is the genome truly neutral if averaged?
First, let’s take a look at what we mean by neutral or not. For genes that are not under selection, alleles should be maintained at approximately balanced frequencies and all non-adaptive genes across the genome should have relatively similar distribution of frequencies. While natural selection is one obvious way allele frequencies can be altered (either favourably or detrimentally), other factors can play a role.
The extent of this linkage effect depends on a number of other factors such as ploidy (the number of copies of a chromosome a species has), the size of the population and the strength of selection around the central locus. The presence of linkage and its impact on the distribution of genetic diversity (LD) has been well documented within evolutionary and ecological genetic literature. The more pressing question is one of extent: how much of the genome has been impacted by linkage? Is any of the genome unaffected by the process?
Although I avoid having a strong stance here (if you’re an evolutionary geneticist yourself, I will allow you to draw your own conclusions), it is my belief that the model of neutral theory – and the methods that rely upon it – are still fundamental to our understanding of evolution. Although it may present itself as a more conservative way to identify adaptation within the genome, and cannot account for the effect of the above processes, neutral theory undoubtedly presents itself as a direct and well-implemented strategy to understand adaptation and demography.