Crossing the Wires: why ‘genetic hardwiring’ is not the whole story

The age-old folly of ‘nature vs. nurture’

It should come as no surprise to any reader of The G-CAT that I’m a firm believer against the false dichotomy (and yes, I really do love that phrase) of “nature versus nurture.” Primarily, this is because the phrase gives the impression of some kind of counteracting balance between intrinsic (i.e. usually genetic) and extrinsic (i.e. usually environmental) factors and how they play a role in behaviour, ecology and evolution. While both are undoubtedly critical for adaptation by natural selection, posing this as a black-and-white split removes the possibility of interactive traits.

We know readily that fitness, the measure by which adaptation or maladaptation can be quantified, is the product of both the adaptive value of a certain trait and the environmental conditions said trait occurs in. A trait that might confer strong fitness in white environment may be very, very unfit in another. A classic example is fur colour in mammals: in a snowy environment, a white coat provides camouflage for predators and prey alike; in a rainforest environment, it’s like wearing one of those fluoro-coloured safety vests construction workers wear.

Genetics and environment interactions figure.jpg
The real Circle of Life. Not only do genes and the environment interact with one another, but genes may interact with other genes and environments may be complex and multi-faceted.

Genetically-encoded traits

In the “nature versus nurture” context, the ‘nature’ traits are often inherently assumed to be genetic. This is because genetic traits are intrinsic as a fundamental aspect of life, inheritable (and thus can be passed on and undergo evolution by natural selection) and define the important physiological traits that provide (or prevent) adaptation. Of course, not all of the genome encodes phenotypic traits at all, and even less relate to diagnosable and relevant traits for natural selection to act upon. In addition, there is a bit of an assumption that many physiological or behavioural traits are ‘hardwired’: that is, despite any influence of environment, genes will always produce a certain phenotype.

Adaptation from genetic variation.jpg
A very simplified example of adaptation from genetic variation. In this example, we have two different alleles of a single gene (orange and blue). Natural selection favours the blue allele so over time it increases in frequency. The difference between these two alleles is at least one base pair of DNA sequence; this often arises by mutation processes.

Despite how important the underlying genes are for the formation of proteins and definition of physiology, they are not omnipotent in that regard. In fact, many other factors can influence how genetic traits relate to phenotypic traits: we’ve discussed a number of these in minor detail previously. An example includes interactions across different genes: these can be due to physiological traits encoded by the cumulative presence and nature of many loci (as in quantitative trait loci and polygenic adaptation). Alternatively, one gene may translate to multiple different physiological characters if it shows pleiotropy.

Differential expression

One non-direct way genetic information can impact on the phenotype of an organism is through something we’ve briefly discussed before known as differential expression. This is based on the notion that different environmental pressures may affect the expression (that is, how a gene is translated into a protein) in alternative ways. This is a fundamental underpinning of what we call phenotypic plasticity: the concept that despite having the exact same (or very similar) genes and alleles, two clonal individuals can vary in different traits. The is related to the example of genetically-identical twins which are not necessarily physically identical; this could be due to environmental constraints on growth, behaviour or personality.

Brauer DE figure_cropped
An example of differential expression in wild populations of southern pygmy perch, courtesy of Brauer et al. (2017). In this figure, each column represents a single individual fish, with the phylogenetic tree and coloured boxes at the top indicating the different populations. Each row represents a different gene (this is a subset of 50 from a much larger dataset). The colour of each cell indicates whether the expression of that gene is expressed more (red) or less (blue) than average. As you can see, the different populations can clearly be seen within their expression profiles, with certain genes expressing more or less in certain populations.

From an evolutionary perspective, the ability to translate a single gene into multiple phenotypic traits has a strong advantage. It allows adaptation to new, novel environments without waiting for natural selection to favour adaptive mutations (or for new, adaptive alleles to become available from new mutation events). This might be a fundamental trait that determines which species can become invasive pests, for instance: the ability to establish and thrive in environments very different to their native habitat allows introduced species to quickly proliferate and spread. Even for species which we might not consider ‘invasive’ (i.e. they have naturally spread to new environments), phenotypic plasticity might allow them to very rapidly adapt and evolve into new ecological niches and could even underpin the early stages of the speciation process.

Epigenetics

Related to this alternative expression of genes is another relatively recent concept: that of epigenetics. In epigenetics, the expression and function of genes is controlled by chemical additions to the DNA which can make gene expression easier or more difficult, effectively promoting or silencing genes. Generally, the specific chemicals that are attached to the DNA are relatively (but not always) predictable in their effects: for example, the addition of a methyl group to the sequence is generally associated with the repression of the gene underlying it. How and where these epigenetic markers may in turn be affected by environmental conditions, creating a direct conduit between environmental (‘nurture’) and intrinsic genetic (‘nature’) aspects of evolution.

Epigenetic_mechanisms.jpg
A diagram of different epigenetic factors and the mechanisms by which they control gene expression. Source: Wikipedia.

Typically, these epigenetic ‘marks’ (chemical additions to the DNA) are erased and reset during fertilisation: the epigenetic marks on the parental gametes are removed, and new marks are made on the fertilised embryo. However, it has been shown that this removal process is not 100% effective, and in fact some marks are clearly passed down from parent to offspring. This means that these marks are heritable, and could allow them to evolve similarly to full DNA mutations.

The discovery of epigenetic markers and their influence on gene expression has opened up the possibility of understanding heritable traits which don’t appear to be clearly determined by genetics alone. For example, research into epigenetics suggest that heritable major depressive disorder (MDD) may be controlled by the expression of genes, rather than from specific alleles or genetic variants themselves. This is likely true for a number of traits for which the association to genotype is not entirely clear.

Epigenetic adaptation?

From an evolutionary standpoint again, epigenetics can similarly influence the ‘bang for a buck’ of particular genes. Being able to translate a single gene into many different forms, and for this to be linked to environmental conditions, allows organisms to adapt to a variety of new circumstances without the need for specific adaptive genes to be available. Following this logic, epigenetic variation might be critically important for species with naturally (or unnaturally) low genetic diversity to adapt into the future and survive in an ever-changing world. Thus, epigenetic information might paint a more optimistic outlook for the future: although genetic variation is, without a doubt, one of the most fundamental aspects of adaptability, even horrendously genetically depleted populations and species might still be able to be saved with the right epigenetic diversity.

Epigenetic cats example
A relatively simplified example of adaptation from epigenetic variation. In this example, we have a species of cat; the ‘default’ cat has non-tufted ears and an orange coat. These two traits are controlled by the expression of Genes A and B, respectively: in the top cat, neither gene is expressed. However, when this cat is placed into different environments, the different genes are “switched on” by epigenetic factors (the green markers). In a rainforest environment, the dark foliage makes darker coat colour more adaptive; switching on Gene B allows this to happen. Conversely, in a desert environment switching on Gene A causes the cat to develop tufts on its ears, which makes it more effective at hunting prey hiding in the sands. Note that in both circumstances, the underlying genetic sequence (indicated by the colours in the DNA) is identical: only the expression of those genes change.

 

Epigenetic research, especially from an ecological/evolutionary perspective, is a very new field. Our understanding of how epigenetic factors translate into adaptability, the relative performance of epigenetic vs. genetic diversity in driving adaptability, and how limited heritability plays a role in adaptation is currently limited. As with many avenues of research, further studies in different contexts, experiments and scopes will reveal further this exciting new aspect of evolutionary and conservation genetics. In short: watch this space! And remember, ‘nature is nurture’ (and vice versa)!

When “getting it wrong” is ‘right’

The nature of science

Over the course of the (relatively brief) history of this blog, I’ve covered a number of varied topics. Many of these have been challenging to write about – either because they are technically-inclined and thus require significant effort to distill down to sensibility and without jargon; or because they address personal issues related to mental health or artistic expression. But despite the nature of these posts, this week’s blog has proven to be one of the most difficult to write, largely because it demands a level of personal vulnerability, acceptance of personality flaws and a potentially self-deprecating message. Alas, I find myself unable to ignore my own perceived importance of the topic.

It should come as no surprise to any reader, whether scientifically trained or not, that the expectation of scientific research is one of total objectivity, clarity and accuracy. Scientific research that is seen not to meet determined quotas of these aspects is undoubtedly labelled ‘bad science’. Naturally, of course, we aim to maximise the value of our research by addressing these as best as can be conceivably possible. Therein, however, lies the limitation: we cannot ever truly be totally objective, nor clear, nor accurate with research, and acceptance and discussion of the limitations of research is a vital aspect of any paper.

The imperfections of science

The basic underpinning of this disjunction lies with the people that conduct the science. Because while the scientific method has been developed and improved over centuries to be as objective, factual and robust as possible, the underlying researchers will always be plagued to some degree by subjectivism. Whether we consciously mean to or not, our prior beliefs, perceptions and history influence the way we conduct or perceive science (hopefully, only to a minor extent).

Inherent biases figure
How the different aspects of ourselves can influence our research. The scientific method directly addresses the more objective aspects (highlighted in green arrows), but other subjective concepts may cause bias. Ideally, however, the objective parts outweigh the subjective ones (indicated by the size of the arrows), and is helped by peer-review as a process.

 

Additionally, one of the drawbacks of being mortal is that we are prone to making mistakes. Biology is never perfect, and the particularly complex tasks and ideas we assign ourselves to research inevitably involve some level of incorrectness. But while that may seem to fundamentally contradict the nature of science, I argue that is in fact not just a reality of scientific research, but also a necessity for progression.

Impostor syndrome

One widely realised manifestation of this disjunction between idealistic science and practical science, and one particularly felt by researchers in training such as post-graduate students, is referred to as ‘impostor syndrome’. This involves the sometimes subversive (and sometimes more overtly) feeling of inadequacy when we compare ourselves to a wider crowd. It is the feeling of not belonging in a particular social or professional group due to a lack of experience, talent or other ‘right’ characteristics. This is particularly pervasive in postgraduate students as we inevitably interact and compare ourselves to those we aspire to be like – postdoctoral researchers, professors, or other more established researchers – who are naturally more experienced in the field. The jarring disjunction of our own capability, often inaccurately assumed to be a proxy of intelligence, leads many to feel incapable or inadequate to be a ‘real’ scientist.

imposter syndrome.jpg
I’d explain impostor syndrome as “feeling like being three kids stacked in a lab coat instead of a ‘real scientist’.”

It cannot be overstated that impostor syndrome is often the result of mental health issues and a high-pressure, demanding academic system, and rarely a rational perception. In many cases, we see only the best aspects of scientific research (both for academic students and the general public), a rose-coloured view of process. What we don’t see, however, is the series of failures and missteps that have led to even the best of scientific outcomes, and may assume that they didn’t happen. This is absolutely false.

Analysis paralysis

Other tangible impacts of impostor syndrome and self-induced perfectionism is the suppression of progressive work. By this I mean the typical ‘procrastinating’ behaviour that comes about from perfectionism: that we often prevent ourselves from moving forward if we perceive that there might be (however minor) issues with our work. Within science, this often involves inane amounts of reading and preparing on how to run an analysis without actually running anything. This is what has been called ‘analysis paralysis’, and disguises inactivity under the pretence that the student is still learning the ropes.

The reality is that trying to predict the multitude of factors and problems one can run into when conducting an analysis is a monolithic task. Some aspects relevant to a particular dataset or analysis are unlikely to be discussed or clearly referenced in the literature, and thus difficult to anticipate. Problem solving is often more effective as a reactive, rather than proactive, measure by allowing researchers to respond to an issue when it arises instead of getting bogged down in the astronomical realm of “things that could possibly go wrong.”

Drawing on personal experience, this has led to literal months of reading and preparing data for running models only to have the first dozens of attempts not run or run incorrectly due to something as trivial as formatting. The lesson learnt is that I should have just tried to run the analysis early, stuffed it all up, and learnt from the mistakes with a little problem solving. No matter how much reading I did, or ever could do, some of these mistakes would never have been able to be explicitly predicted a priori.

analysis error messages collage.jpg
Sometimes it feels like analysis is 90% “why didn’t this work?!” I think that’s realistic, though.

Why failure is conducive to better research

While we should always strive to be as accurate and objective as possible, sometimes this can be counteractive to our own learning progress. The rabbit holes of “things that could possibly go wrong” run very, very deep and if you fall down them, you’ll surely end up in a bizarre world full of odd distractions, leaps of logic and insanity. Under this circumstance, I suggest allowing yourself to get it wrong: although repeated failures are undoubtedly damaging to the ego and confidence, giving ourselves the opportunity to make mistakes and grow from them ultimately allows us to become more productive and educated than if we avoided them altogether.

Alice in Wonderland analogy
“We’re all mad here.”

Speaking at least from a personal anecdote (although my story appears corroborated with other students’ experiences), some level of failure is critical to the learning process and important for scientific development generally. Although cliché, “learning from our mistakes” is inevitably one of the most effective and quickest ways to learn and allowing ourselves to be imperfect, a little inaccurate or at time foolish is conducive to better science.

Allow yourself to stuff things up. You’ll do it way less in the future if you do.

Pressing Ctrl-Z on Life with De-extinction

Note: For some clear, interesting presentations on the topic of de-extinction, and where some of the information for this post comes from, check out this list of TED talks.

The current conservation crisis

The stark reality of conservation in the modern era epitomises the crisis discipline that so often is used to describe it: species are disappearing at an unprecedented rate, and despite our best efforts it appears that they will continue to do so. The magnitude and complexity of our impacts on the environment effectively decimates entire ecosystems (and indeed, the entire biosphere). It is thus our responsibility as ‘custodians of the planet’ (although if I had a choice, I would have sacked us as CEOs of this whole business) to attempt to prevent further extinction of our planet’s biodiversity.

Human CEO example
“….shit.”

If you’re even remotely familiar with this blog, then you would have been exposed to a number of different techniques, practices and outcomes of conservation research and its disparate sub-disciplines (e.g. population genetics, community ecology, etc.). Given the limited resources available to conserve an overwhelming number of endangered species, we attempt to prioritise our efforts towards those most in need, although there is a strong taxonomic bias underpinning them.

At least from a genetic perspective, this sometimes involves trying to understand the nature and potential of adaptation from genetic variation (as a predictor of future adaptability). Or using genetic information to inform captive breeding programs, to allow us to boost population numbers with minimal risk of inbreeding depression. Or perhaps allowing us to describe new, unidentified species which require their own set of targeted management recommendations and political legislation.

Genetic rescue

Yet another example of the use of genetics in conservation management, and one that we have previously discussed on The G-CAT, is the concept of ‘genetic rescue’. This involves actively adding new genetic material from other populations into our captive breeding programs to supplement the amount of genetic variation available for future (or even current) adaptation. While there traditionally has been some debate about the risk of outbreeding depression, genetic rescue has been shown to be an effective method for prolonging the survival of at-risk populations.

super-gene-genetic-rescue-e1549973268851.jpg
How my overactive imagination pictures ‘genetic rescue’.

There’s one catch (well, a few really) with genetic rescue: namely, that one must have other populations to ‘outbreed’ with in order add genetic variation to the captive population. But what happens if we’re too late? What if there are no other populations to supplement with, or those other populations are also too genetically depauperate to use for genetic rescue?

Believe it or not, sometimes it’s not too late to save species, even after they have gone extinct. Which brings us from this (lengthy) introduction to this week’s topic: de-extinction. Yes, we’re literally (okay, maybe not) going to raise the dead.

Necroconservaticon
Your textbook guide to de-extinction. Now banned in 47 countries.

Backbreeding: resurrection by hybridisation

You might wonder how (or even if!) this is possible. And to be frank, it’s extraordinarily difficult. However, it has to a degree been done before, in very specific circumstances. One scenario is based on breeding out a species back into existence: sometimes we refer to this as ‘backbreeding’.

This practice really only applies in a few select scenarios. One requirement for backbreeding to be possible is that hybridisation across species has to have occurred in the past, and generally to a substantial scale. This is important as it allows the genetic variation which defines one of those species to live on within the genome of its sister species even when the original ‘host’ species goes extinct. That might make absolutely zero sense as it stands, so let’s dive into this with a case study.

I’m sure you’ll recognise (at the very least, in name) these handsome fellows below: the Galápagos tortoise. They were a pinnacle in Charles Darwin’s research into the process of evolution by natural selection, and can live for so long that until recently there had been living individuals which would have been able to remember him (assuming, you know, memory loss is not a thing in tortoises. I can’t even remember what I had for dinner two days ago, to be fair). As remarkable as they are, Galápagos tortoises actually comprise 15 different species, which can be primarily determined by the shape of their shells and the islands they inhabit.

Galapagos island and tortoises
A map of the Galápagos archipelago and tortoise species, with extinct species indicated by symbology. Lonesome George was the last known living member of the Pinta Island tortoise, C. abingdonii for reference. Source: Wikipedia.

One of these species, Chelonoidis elephantopus, also known as the Floreana tortoise after their home island, went extinct over 150 years ago, likely due to hunting and tradeHowever, before they all died, some individuals were transported to another island (ironically, likely by mariners) and did the dirty with another species of tortoise: C. becki. Because of this, some of the genetic material of the extinct Floreana tortoise introgressed into the genome of the still-living C. becki. In an effort to restore an iconic species, scientists from a number of institutions attempted to do what sounds like science-fiction: breed the extinct tortoise back to life.

By carefully managing and selectively breeding captive individuals , progressive future generations of the captive population can gradually include more and more of the original extinct C. elephantopus genetic sequence within their genomes. While a 100% resurrection might not be fully possible, by the end of the process individuals with progressively higher proportion of the original Floreana tortoise genome will be born. Although maybe not a perfect replica, this ‘revived’ species is much more likely to serve a similar ecological role to the now-extinct species, and thus contribute to ecosystem stability. To this day, this is one of the closest attempts at reviving a long-dead species.

Is full de-extinction possible?

When you saw the title for this post, you were probably expecting some Jurassic Park level ‘dinosaurs walking on Earth again’ information. I know I did when I first heard the term de-extinction. Unfortunately, contemporary de-extinction practices are not that far advanced just yet, although there have been some solid attempts. Experiments conducted using the genomic DNA from the nucleus of a dead animal, and cloning it within the egg of another living member of that species has effectively cloned an animal back from the dead. This method, however, is currently limited to animals that have died recently, as the DNA degrades beyond use over time.

The same methods have been attempted for some extinct animals, which went extinct relatively recently. Experiments involving the Pyrenean ibex (bucardo) were successful in generating an embryo, but not sustaining a living organism. The bucardo died 10 minutes after birth due to a critical lung condition, as an example.

The challenges and ethics of de-extinction

One might expect that as genomic technologies improve, particularly methods facilitated by the genome-editing allowed from CRISPR/Cas-9 development, that we might one day be able to truly resurrect an extinct species. But this leads to very strongly debated topics of ethics and morality of de-extinction. If we can bring a species back from the dead, should we? What are the unexpected impacts of its revival? How will we prevent history from repeating itself, and the species simply going back extinct? In a rapidly changing world, how can we account for the differences in environment between when the species was alive and now?

Deextinction via necromancy figure
The Chaotic Neutral (?) approach to de-extinction.

There is no clear, simple answer to many of these questions. We are only scratching the surface of the possibility of de-extinction, and I expect that this debate will only accelerate with the research. One thing remains eternally true, though: it is still the distinct responsibility of humanity to prevent more extinctions in the future. Handling the growing climate change problem and the collapse of ecosystems remains a top priority for conservation science, and without a solution there will be no stable planet on which to de-extinct species.

de-extinction meme
You bet we’re gonna make a meme months after it’s gone out of popularity.

The ‘other’ allele frequency: applications of the site frequency spectrum

The site-frequency spectrum

In order to simplify our absolutely massive genomic datasets down to something more computationally feasible for modelling techniques, we often reduce it to some form of summary statistic. These are various aspects of the genomic data that can summarise the variation or distribution of alleles within the dataset without requiring the entire genetic sequence of all of our samples.

One very effective summary statistic that we might choose to use is the site-frequency spectrum (aka the allele frequency spectrum). Not to be confused with other measures of allele frequency which we’ve discussed before (like Fst), the site-frequency spectrum (abbreviated to SFS) is essentially a histogram of how frequent certain alleles are within our dataset. To do this, the SFS classifies each allele into a certain category based on how common it is, tallying up the number of alleles that occur at that frequency. The total number of categories would be the maximum number of possible alleles: for organisms with two copies of every chromosome (‘diploids’, including humans), this means that there are double the number of samples included. For example, a dataset comprised of genomic sequence for 5 people would have 10 different frequency bins.

For one population

The SFS for a single population – called the 1-dimensional SFS – this is very easy to visualise as a concept. In essence, it’s just a frequency distribution of all the alleles within our dataset. Generally, the distribution follows an exponential shape, with many more rare (e.g. ‘singletons’) alleles than there are common ones. However, the exact shape of the SFS is determined by the history of the population, and like other analyses under coalescent theory we can use our understanding of the interaction between demographic history and current genetic variation to study past events.

1DSFS example.jpg
An example of the 1DSFS for a single population, taken from a real dataset from my PhD. Left: the full site-frequency spectrum, counting how many alleles (y-axis) occur a certain number of times (categories of the x-axis) within the population. In this example, as in most species, the vast majority of our DNA sequence is non-variable (frequency = 0). Given the huge disparity in number of non-variable sites, we often select on the variable ones (and even then, often discard the 1 category to remove potential sequencing errors) and get a graph more like the right. Right: the ‘realistic’ 1DSFS for the population, showing a general exponential decline (the blue trendline) for the more frequent classes. This is pretty standard for an SFS. ‘Singleton’ and ‘doubleton’ are alternative names for ‘alleles which occur once’ and ‘alleles which occur twice’ in an SFS.

Expanding the SFS to multiple populations

Further to this, we can expand the site-frequency spectrum to compare across populations. Instead of having a simple 1-dimensional frequency distribution, for a pair of populations we can have a grid. This grid specifies how often a particular allele occurs at a certain frequency in Population A and at a different frequency in Population B. This can also be visualised quite easily, albeit as a heatmap instead. We refer to this as the 2-dimensional SFS (2DSFS).

2dsfs example
An example of a 2DSFS, also taken from my PhD research. In this example, we are comparing Population A, containing 5 individuals (as diploid, 2 x 5 = max. of 10 occurrences of an allele) with Population B, containing 4 individuals. Each row denotes the frequency at which a certain allele occurs in Population whilst the columns indicate the frequency a certain allele occurs in Population A. Each cell therefore indicates the number of alleles that occur at the exact frequency of the corresponding row and column. For example, the first cell (highlighted in green) indicates the number of alleles which are not found in either Population A or Population B (this dataset is a subsample from a larger one). The yellow cell indicates the number of alleles which occur 4 times in Population and also 4 times in Population A. This could mean that in one of those Populations 4 individuals have one copy of that allele each, or two individuals have two copies of that allele, or that one has two copies and two have one copy. The exact composition of how the alleles are spread across samples within each population doesn’t matter to the overall SFS.

The same concept can be expanded to even more populations, although this gets harder to represent visually. Essentially, we end up with a set of different matrices which describe the frequency of certain alleles across all of our populations, merging them together into the joint SFS. For example, a joint SFS of 4 populations would consist of 6 (4 x 4 total comparisons – 4 self-comparisons, then halved to remove duplicate comparisons) 2D SFSs all combined together. To make sense of this, check out the diagrammatic tables below.

populations for jsfs
A summary of the different combinations of 2DSFSs that make up a joint SFS matrix. In this example we have 4 different populations (as described in the above text). Red cells denote comparisons between a population and itself – which is effectively redundant. Green cells contain the actual 2D comparisons that would be used to build the joint SFS: the blue cells show the same comparisons but in mirrored order, and are thus redundant as well.
annotated jsfs heatmap
Expanding the above jSFS matrix to the actual data, this matrix demonstrates how the matrix is actually a collection of multiple 2DSFSs. In this matrix, one particular cell demonstrates the number of alleles which occur at frequency x in one population and frequency y in another. For example, if we took the cell in the third row from the top and the fourth column from the left, we would be looking at the number of alleles which occur twice in Population B and three times in Population A. The colour of this cell is moreorless orange, indicating that ~50 alleles occur at this combination of frequencies. As you may notice, many population pairs show similar patterns, except for the Population C vs Population D comparison.

The different forms of the SFS

Which alleles we choose to use within our SFS is particularly important. If we don’t have a lot of information about the genomics or evolutionary history of our study species, we might choose to use the minor allele frequency (MAF). Given that SNPs tend to be biallelic, for any given locus we could have Allele A or Allele B. The MAF chooses the least frequent of these two within the dataset and uses that in the summary SFS: since the other allele’s frequency would just be 2N – the frequency of the other allele, it’s not included in the summary. An SFS made of the MAF is also referred to as the folded SFS.

Alternatively, if we know some things about the genetic history of our study species, we might be able to divide Allele A and Allele B into derived or ancestral alleles. Since SNPs often occur as mutations at a single site in the DNA, one allele at the given site is the new mutation (the derived allele) whilst the other is the ‘original’ (the ancestral allele). Typically, we would use the derived allele frequency to construct the SFS, since under coalescent theory we’re trying to simulate that mutation event. An SFS made of the derived alleles only is also referred to as the unfolded SFS.

Applications of the SFS

How can we use the SFS? Well, it can moreorless be used as a summary of genetic variation for many types of coalescent-based analyses. This means we can make inferences of demographic history (see here for more detailed explanation of that) without simulating large and complex genetic sequences and instead use the SFS. Comparing our observed SFS to a simulated scenario of a bottleneck and comparing the expected SFS allows us to estimate the likelihood of that scenario.

For example, we would predict that under a scenario of a recent genetic bottleneck in a population that alleles which are rare in the population will be disproportionately lost due to genetic drift. Because of this, the overall shape of the SFS will shift to the right dramatically, leaving a clear genetic signal of the bottleneck. This works under the same theoretical background as coalescent tests for bottlenecks.

SFS shift from bottleneck example.jpg
A representative example of how a bottleneck causes a shift in the SFS, based on a figure from a previous post on the coalescentCentre: the diagram of alleles through time, with rarer variants (yellow and navy) being lost during the bottleneck but more common variants surviving (red). Left: this trend is reflected in the coalescent trees for these alleles, with red crosses indicating the complete loss of that allele. Right: the SFS from before (in red) and after (in blue) the bottleneck event for the alleles depicted. Before the bottleneck, variants are spread in the usual exponential shape: afterwards, however, a disproportionate loss of the rarer variants causes the distribution to flatten. Typically, the SFS would be built from more alleles than shown here, and extend much further.

Contrastingly, a large or growing population will have a larger number of rare (i.e. unique) alleles from the sudden growth and increase in genetic variation. Thus, opposite to the bottleneck the SFS distribution will be biased towards the left end of the spectrum, with an excess of low-frequency variants.

SFS shift from expansion example.jpg
A similar diagram as above, but this time with an expansion event rather than a bottleneck. The expansion of the population, and subsequent increase in Ne, facilitates the mutation of new alleles from genetic drift (or reduced loss of alleles from drift), causing more new (and thus rare) alleles to appear. This is shown by both the coalescent tree (left) and a shift in the SFS (right).

The SFS can even be used to detect alleles under natural selection. For strongly selected parts of the genome, alleles should occur at either high (if positively selected) or low (if negatively selected) frequency, with a deficit of more intermediate frequencies.

Adding to the analytical toolbox

The SFS is just one of many tools we can use to investigate the demographic history of populations and species. Using a combination of genomic technologies, coalescent theory and more robust analytical methods, the SFS appears to be poised to tackle more nuanced and complex questions of the evolutionary history of life on Earth.

Mr. Gorbachev, tear down this (pay)wall

The dreaded paywall

For anyone who absorbs their news and media through the Internet (hello, welcome to the 21st Century), you would undoubtedly be familiar with a few frustrating and disingenuous aspects of media such as clickbait headlines and targeted advertising. Another one that might aggravate the common reader is Ol’ Reliable, the paywall – blocking access to an article unless some volume of money is transferred to the publisher, usually through a subscription basis. You might argue that this is a necessary evil, or that rewarding well-written pieces and informative journalism through monetary means might lead to the free market starving poor media (extremely optimistically). Or you might argue that the paywall is morally corrupt and greedy, and just another way to extort money out of hapless readers.

Paywalls.jpg
Yes, that is a literal paywall. And no, I don’t do subtlety.

Accessibility in science

I’m loathe to tell that you that even science, the powerhouse of objectivity with peer-review to increase accountability, is stifled by the weight of corporate greed. You may notice this from some big name journals, like Nature and Science – articles cost money to access, either at the individual level (e.g. per individual article, or as a subscription for a single person for a year) or for an entire institution (such as universities). To state that these paywalls are exorbitantly priced would be a tremendous understatement – for reference, an institution subscription to the single journal Nature (one of 2,512 journals listed under the conglomerate of Springer Nature) costs nearly $8,000 per year. A download of a single paper often costs around $30 for a curious reader.

Some myths about the publishing process

You might be under the impression, as above, that this money goes towards developing good science and providing a support network for sharing and distributing scientific research. I wish you were right. In his book ‘The Effective Scientist’, Professor Corey Bradshaw describes the academic publishing process as “knowledge slavery”, and no matter how long I spend thinking about this blog post would I ever come up with a more macabre yet apt description. And while I highly recommend his book for a number of reasons, his summary and interpretation of how publishing in science actually works (both the strengths and pitfalls) is highly informative and representative.

There are a number of different aspects about publishing in science that make it so toxic to researchers. For example, the entirety of the funds acquired from the publishing process goes to the publishing institution – none of it goes to the scientists that performed and wrote the work, none to the scientists who reviewed and critiqued the paper prior to publication, and none to the institutions who provided the resources to develop the science. In fact, the perception is that if you publish science in a journal, especially high-ranking ones, it should be an honour just to have your paper in that journal. You got into Nature – what more do you want?

Publishing cycle.jpg
The alleged cycle of science. You do Good Science; said Good Science gets published in an equally Good Journal; the associated pay increase (not from the paper itself, of course, but by increasing success rates of grant applications and collaborations) helps to fund the next round of Good Science and the cost of publishing in a Good Journal. Unfortunately, and critically, the first step into the cycle (the yellow arrow) is remarkably difficult and acts as a barrier to many researchers (many of whom do Very Good Science).

Open Access journals

Thankfully, some journals exist which publish science without the paywall: we refer to these as ‘Open Access’ (OA) journals. Although the increased accessibility is undoubtedly a benefit for the spread of scientific knowledge, the reduced revenue often means that a successful submission comes with an associated cost. This cost is usually presented as an ‘article processing charge’: for a paper in a semi-decent journal, this can be upwards of thousands of dollars for a single paper. Submitting to an OA journal can be a bit of a delicate balance: the increased exposure, transparency and freedom to disseminate research is a definite positive for scientists, but the exorbitant costs that can be associated with OA journals can preclude less productive or financially robust labs from publishing in them (regardless of the quality of science produced).

Open access logo.png
The logo for Open Access journals, originally designed by PLoS.

Manuscripts and ArXives

There is somewhat of a counter culture to the rigorous tyranny of scientific journals: some sites exist where scientists can freely upload their manuscripts and articles without a paywall or submission cost. Naturally, the publishing industry reviles this and many of these are not strictly legal (since you effectively hand over almost all publishing rights to the journal at submission). The most notable of these is Sci-Hub, which uses various techniques (including shifting through different domain names in different countries) to bypass paywalls.

Other more user-generated options exist, such as the different subcategories of ArXiv, where users can upload their own manuscripts free of charge and without a paywall and predominantly prior to the peer-review process. By being publically uploaded, ArXiv sites allow scientists to broaden the peer-review process beyond a few journal-selected reviewers. There is still some screening process when submitting to ArXiv to filter out non-scientific articles, but the overall method is much more transparent and scientist-friendly than a typical publishing corporation. For articles that have already been published, other sites such as Researchgate often act as conduits for sharing research (either those obscured by paywalls, despite copyright issues, or those freely accessible by open access).

You might also have heard through the grapevine that “scientists are allowed to send you PDFs of their research if you email them.” This is a bit of a dubious copyright loophole: often, this is not strictly within the acceptable domain of publishing rights as the journal that has published this research maintains all copyrights to the work (clever). Out of protest, many scientists may send their research to interested parties, often with the caveat of not sharing it anywhere else or in manuscript form (as opposed to the finalised published article). Regardless, scientists are more than eager to share their research however they can.

Summary table.jpg
A summary of some of the benefits and detriments of each journal type. For articles published in pre-print sites there is still the intention of (at some date) publishing the article under one of the other two official journal models (and thereof are not mutually exclusive).

Civil rights and access to science

There are a number of both empirical and philosophical reasons why free access to science is critically important for all people. At least one of these (among many others) is based on your civil rights. Scientific research is incredibly expensive, and is often funded through a number of grants from various sources, among the most significant of which includes government-funded programs such as the Australian Research Council (ARC).

Where does this money come from? Well, indirectly, you (if you pay your taxes, anyway). While this connection can be at times frustrating for scientists – particularly if there is difficulty in communicating the importance of your research due to a lack of or not-readily-transparent commercial, technological or medical impact of the work – the logic applies to access to scientific data and results, too. As someone who has contributed monetarily to the formation and presentation of scientific work, it is your capitalist right to have access to the results of that work. Although privatisation ultimately overpowers this in the publishing world, there is (in my opinion) a strong moral philosophy behind demanding access to the results of the research you have helped to fund.

Walled off from research

Anyone who has attempted to publish in the scientific literature is undoubtedly keenly aware of the overt corruption and inadequacy of the system. Private businesses hold a monopoly on the dissemination of scientific research, and although science tries to overcome this process, it is a pervasive structure. However, some changes are in process which are seeking to re-invent the way we handle the publishing of scientific research and with strong support from the general public there is opportunity to minimise the damage that private publication businesses proliferate.

Two Worlds: contrasting Australia’s temperate regions

Temperate Australia

Australia is renowned for its unique diversity of species, and likewise for the diversity of ecosystems across the island continent. Although many would typically associate Australia with the golden sandy beaches, palm trees and warm weather of the tropical east coast, other ecosystems also hold both beautiful and interesting characteristics. Even the regions that might typically seem the dullest – the temperate zones in the southern portion of the continent – themselves hold unique stories of the bizarre and wonderful environmental history of Australia.

The two temperate zones

Within Australia, the temperate zone is actually separated into two very distinct and separate regions. In the far south-western corner of the continent is the southwest Western Australia temperate zone, which spans a significant portion. In the southern eastern corner, the unnamed temperate zone spans from the region surrounding Adelaide at its westernmost point, expanding to the east and encompassing Tasmanian and Victoria before shifting northward into NSW. This temperate zones gradually develops into the sub-tropical and tropical climates of more northern latitudes in Queensland and across to Darwin.

 

Labelled Koppen-Geiger map
The climatic classification (Koppen-Geiger) of Australia’s ecosystems, derived from the Atlas of Living Australia. The light blue region highlights the temperate zones discussed here, with an isolated region in the SW and the broader region of the SE as it transitions into subtropical and tropical climates northward.

The divide separating these two regions might be familiar to some readers – the Nullarbor Plain. Not just a particularly good location for fossils and mineral ores, the Nullarbor Plain is an almost perfectly flat arid expanse that stretches from the western edge of South Australia to the temperate zone of the southwest. As the name suggests, the plain is totally devoid of any significant forestry, owing to the lack of available water on the surface. This plain is a relatively ancient geological structure, and finished forming somewhere between 14 and 16 million years ago when tectonic uplift pushed a large limestone block upwards to the surface of the crust, forming an effective drain for standing water with the aridification of the continent. Thus, despite being relatively similar bioclimatically, the two temperate zones of Australia have been disconnected for ages and boast very different histories and biota.

Elevation map of NP.jpg
A map of elevation across the Australian continent, also derived from the Atlas of Living Australia. The dashed black line roughly outlines the extent of the Nullarbor Plain, a massively flat arid expanse.

The hotspot of the southwest

The southwest temperate zone – commonly referred to as southwest Western Australia (SWWA) – is an island-like bioregion. Isolated from the rest of the temperate Australia, it is remarkably geologically simple, with little topographic variation (only the Darling Scarp that separates the lower coast from the higher elevation of the Darling Plateau), generally minor river systems and low levels of soil nutrients. One key factor determining complexity in the SWWA environment is the isolation of high rainfall habitats within the broader temperate region – think of islands with an island.

SSWA environment.jpg
A figure demonstrating the environmental characteristics of SWWA, using data from the Atlas of Living AustraliaLeft: An elevation map of the region, showing some mountainous variation, but only one significant steep change along the coast (blue area). Right: A summary of 19 different temperature and precipitation variables, showing a relatively weak gradient as the region shifts inland.

Despite the lack of geological complexity and the perceived diversity of the tropics, the temperate zone of SWWA is the only internationally recognised biodiversity hotspot within Australia. As an example, SWWA is inhabited by ~7,000 different plant species, half of which are endemic to the region. Not to discredit the impressive diversity of the rest of the continent, of course. So why does this area have even higher levels of species diversity and endemism than the rest of mainland Australia?

speciation patterns in SWWA.jpg
A demonstration of some of the different patterns which might explain the high biodiversity of SWWA, from Rix et al. (2015). These predominantly relate to different biogeographic mechanisms that might have driven diversification in the region, from survivors of the Gondwana era to the more recent fragmentation of mesic habitats.

Well, a number of factors may play significant roles in determining this. One of these is the ancient and isolated nature of the region: SWWA has been separated from the rest of Australia for at least 14 million years, with many species likely originating much earlier than this. Because of this isolation, species occurring within SWWA have been allowed to undergo adaptive divergence from their east coast relatives, forming unique evolutionary lineages. Furthermore, the southwest corner of the continent was one of the last to break away from Antarctica in the dismantling of Gondwana >30 million years ago. Within the region more generally, isolation of mesic (wetter) habitats from the broader, arid (xeric) habitats also likely drove the formation of new species as distributions became fragmented or as species adapted to the new, encroaching xeric habitat. Together, this varies mechanisms all likely contributed in some way to the overall diversity of the region.

The temperate south-east of Australia

Contrastingly, the temperate region in the south-east of the continent is much more complex. For one, the topography of the zone is much more variable: there are a number of prominent mountain chains (such as the extended Great Dividing Range), lowland basins (such as the expansive Murray-Darling Basin) and variable valley and river systems. Similarly, the climate varies significantly within this temperate region, with the more northern parts featuring more subtropical climatic conditions with wetter and hotter summers than the southern end. There is also a general trend of increasing rainfall and lower temperatures along the highlands of the southeast portion of the region, and dry, semi-arid conditions in the western lowland region.

MDB map
A map demonstrating the climatic variability across the Murray-Darling Basin (which makes up a large section of the SE temperate zone), from Brauer et al. (2018). The different heat maps on the left describe different types of variables; a) and b) represent temperature variables, c) and d) represent precipitation (rainfall) variables, and e) and f) represent water flow variables. Each variable is a summary of a different set of variables, hence the differences.

A complicated history

The south-east temperate zone is not only variable now, but has undergone some drastic environmental changes over history. Massive shifts in geology, climate and sea-levels have particularly altered the nature of the area. Even volcanic events have been present at some time in the past.

One key hydrological shift that massively altered the region was the paleo-megalake Bungunnia. Not just a list of adjectives, Bungunnia was exactly as it’s described: a historically massive lake that spread across a huge area prior to its demise ~1-2 million years ago. At its largest size, Lake Bungunnia reached an area of over 50,000 km­­­2, spreading from its westernmost point near the current Murray mouth although to halfway across Victoria. Initially forming due to a tectonic uplift event along the coastal edge of the Murray-Darling Basin ~3.2 million years ago, damming the ancestral Murray River (which historically outlet into the ocean much further east than today). Over the next few million years, the size of the lake fluctuated significantly with climatic conditions, with wetter periods causing the lake to overfill and burst its bank. With every burst, the lake shrank in size, until a final break ~700,000 years ago when the ‘dam’ broke and the full lake drained.

Lake Bungunnia map 2.jpg
A map demonstrating the sheer size of paleo megalake Bungunnia at it’s largest extent, taken from McLaren et al. (2012).

Another change in the historic environment readers may be more familiar with is the land-bridge that used to connect Tasmania to the mainland. Dubbed the Bassian Isthmus, this land-bridge appeared at various points in history of reduced sea-levels (i.e. during glacial periods in Pleistocene cycle), predominantly connecting via the still-above-water Flinders and Cape Barren Islands. However, at lower sea-levels, the land bridge spread as far west as King Island: central to this block of land was a large lake dubbed the Bass Lake (creative). The Bassian Isthmus played a critical role in the migration of many of the native fauna of Tasmania (likely including the Indigenous peoples of the now-island), and its submergence and isolation leads to some distinctive differences between Tasmanian and mainland biota. Today, the historic presence of the Bassian Isthmus has left a distinctive mark on the genetic make-up of many species native to the southeast of Australia, including dolphins, frogs, freshwater fishes and invertebrates.

Bass Strait bathymetric contours.jpg
An elevation (Etopo1) map demonstrating the now-underwater land bridge between Tasmania and the mainland. Orange colours denote higher areas whilst light blue represents lower sections.

Don’t underestimate the temperates

Although tropical regions get most of the hype for being hotspots of biodiversity, the temperate zones of Australia similarly boast high diversity, unique species and document a complex environmental history. Studying how the biota and environment of the temperate regions has changed over millennia is critical to predicting the future effects of climatic change across large ecosystems.

The reality of neutrality

The neutral theory 

Many, many times within The G-CAT we’ve discussed the difference between neutral and selective processes, DNA markers and their applications in our studies of evolution, conservation and ecology. The idea that many parts of the genome evolve under a seemingly random pattern – largely dictated by genome-wide genetic drift rather than the specific force of natural selection – underpins many demographic and adaptive (in outlier tests) analyses.

This is based on the idea that for genes that are not related to traits under selection (either positively or negatively), new mutations should be acquired and lost under predominantly random patterns. Although this accumulation of mutations is influenced to some degree by alternate factors such as population size, the overall average of a genome should give a picture that largely discounts natural selection. But is this true? Is the genome truly neutral if averaged?

Non-neutrality

First, let’s take a look at what we mean by neutral or not. For genes that are not under selection, alleles should be maintained at approximately balanced frequencies and all non-adaptive genes across the genome should have relatively similar distribution of frequencies. While natural selection is one obvious way allele frequencies can be altered (either favourably or detrimentally), other factors can play a role.

As stated above, population sizes have a strong impact on allele frequencies. This is because smaller populations are more at risk of losing rarer alleles due to random deaths (see previous posts for a more thorough discussion of this). Additionally, genes which are physically close to other genes which are under selection may themselves appear to be under selection due to linkage disequilibrium (often shortened to ‘LD’). This is because physically close genes are more likely to be inherited together, thus selective genes can ‘pull’ neighbours with them to alter their allele frequencies.

Linkage disequilibrium figure
An example of how linkage disequilibrium can alter allele frequency of ‘neutral’ parts of the genome as well. In this example, only one part of this section of the genome is selected for: the green gene. Because of this positive selection, the frequency of a particular allele at this gene increases (the blue graph): however, nearby parts of the genome also increase in frequency due to their proximity to this selected gene, which decreases with distance. The extent of this effect determines the size of the ‘linkage block’ (see below).

Why might ‘neutral’ models not be neutral?

The assumption that the vast majority of the genome evolves under neutral patterns has long underpinned many concepts of population and evolutionary genetics. But it’s never been all that clear exactly how much of the genome is actually evolving neutrally or adaptively. How far natural selection reaches beyond a single gene under selection depends on a few different factors: let’s take a look at a few of them.

Linked selection

As described above, physically close genes (i.e. located near one another on a chromosome) often share some impacts of selection due to reduced recombination that occurs at that part of the genome. In this case, even alleles that are not adaptive (or maladaptive) may have altered frequencies simply due to their proximity to a gene that is under selection (either positive or negative).

Recombination blocks and linkage figure
A (perhaps familiar) example of the interaction between recombination (the breaking and mixing of different genes across chromosomes) and linkage disequilibrium. In this example, we have 5 different copies of a part of the genome (different coloured sequences), which we randomly ‘break’ into separate fragments (breaks indicated by the dashed lines). If we focus on a particular base in the sequence (the yellow A) and count the number of times a particular base pair is on the same fragment, we can see how physically close bases are more likely to be coinherited than further ones (bottom column graph). This makes mathematical sense: if two bases are further apart, you’re more likely to have a break that separates them. This is the very basic underpinning of linkage and recombination, and the size of the region where bases are likely to be coinherited is called the ‘linkage block’.

Under these circumstances, for a region of a certain distance (dubbed the ‘linkage block’) around a gene under selection, the genome will not truly evolve neutrally. Although this is simplest to visualise as physically linked sections of the genome (i.e. adjacent), linked genes do not necessarily have to be next to one another, just linked somehow. For example, they may be different parts of a single protein pathway.

The extent of this linkage effect depends on a number of other factors such as ploidy (the number of copies of a chromosome a species has), the size of the population and the strength of selection around the central locus. The presence of linkage and its impact on the distribution of genetic diversity (LD) has been well documented within evolutionary and ecological genetic literature. The more pressing question is one of extent: how much of the genome has been impacted by linkage? Is any of the genome unaffected by the process?

Background selection

One example of linked selection commonly used to explain the proliferation of non-neutral evolution within the genome is ‘background selection’. Put simply, background selection is the purging of alleles due to negative selection on a linked gene. Sometimes, background selection is expanded to include any forms of linked selection.

Background selection figure .jpg
A cartoonish example of how background selection affects neighbouring sections of the genome. In this example, we have 4 genes (A, B, C and D) with interspersing neutral ‘non-gene’ sections. The allele for Gene B is strongly selected against by natural selection (depicted here as the Banhammer of Selection). However, the Banhammer is not very precise, and when decreasing the frequency of this maladaptive Gene B allele it also knocks down the neighbouring non-gene sections. Despite themselves not being maladaptive, their allele frequencies are decreased due to physical linkage to Gene B.

Under the first etymology of background selection, the process can be divided into two categories based on the impact of the linkage. As above, one scenario is the purging of neutral alleles (and therefore reduction in genetic diversity) as it is associated with a deleterious maladaptive gene nearby. Contrastingly, some neutral alleles may be preserved by association with a positively selected adaptive gene: this is often referred to as ‘genetic hitchhiking’ (which I’ve always thought was kind of an amusing phrase…).

Genetic hitchhiking picture.jpg
Definitely not how genetic hitchhiking works.

The presence of background selection – particularly under the ‘maladaptive’ scenario – is often used as a counter-argument to the ‘paradox in variation’. This paradox was determined by evolutionary biologist Richard Lewontin, who noted that despite massive differences in population sizes across the many different species on Earth, the total amount of ‘neutral’ genetic variation does not change significantly. In fact, he observed no clear relationship (directly) between population size and neutral variation. Many years after this observation, the influence of background selection and genetic hitchhiking on the distribution of genomic diversity helps to explain how the amount of neutral genomic variation is ‘managed’, and why it doesn’t vary excessively across biota.

What does it mean if neutrality is dead?

This findings have significant implications for our understanding of the process of evolution, and how we can detect adaptation within the genome. In light of this research, there has been heated discussion about whether or not neutral theory is ‘dead’, or a useful concept.

Genome wide allele frequency figure.jpg
A vague summary of how a large portion of the genome might not actually be neutral. In this section of the genome, we have neutral (blue), maladaptive (red) and adaptive (green) elements. Natural selection either favours, disfavours, or is ambivalent about each of this sections aloneHowever, there is significant ‘spill-over’ around regions of positively or negatively selected sections, which causes the allele frequency of even the neutral sections to fluctuate widely. The blue dotted line represents this: when the line is above the genome, allele frequency is increased; when it is below it is decreased. As we travel along this section of the genome, you may notice it is rarely ever in the middle (the so-called ‘neutral‘ allele frequency, in line with the genome).

Although I avoid having a strong stance here (if you’re an evolutionary geneticist yourself, I will allow you to draw your own conclusions), it is my belief that the model of neutral theory – and the methods that rely upon it – are still fundamental to our understanding of evolution. Although it may present itself as a more conservative way to identify adaptation within the genome, and cannot account for the effect of the above processes, neutral theory undoubtedly presents itself as a direct and well-implemented strategy to understand adaptation and demography.

The folly of absolute dichotomies

Divide and conquer (nothing)

Divisiveness is becoming quickly apparent as a plague on the modern era. The segregation and categorisation of people – whether politically, spiritually or morally justified – permeates throughout the human condition and in how we process the enormity of the Homo sapien population. The idea that the antithetic extremes form two discrete categories (for example, the waning centrist between ‘left’ vs. ‘right’ political perspectives) is widely employed in many aspects of the world.

But how pervasive is this pattern? How well can we summarise, divide and categorise people? For some things, this would appear innately very easy to do – one of the most commonly evoked divisions in people is that between men and women. But the increasingly charged debate around concepts of both gender and sex (and sexuality as a derivative, somewhat interrelated concept) highlights the inconsistency of this divide.

The ‘sex’ and ‘gender’ arguments

The most commonly used argument against ‘alternative’ concepts of either gender of sex – the binary states of a ‘man’ with a ‘male’ body and a ‘female’ with a ‘female’ body – is often based on some perception of “biologically reality.” As a (trainee) biologist, let me make this apparently clear that such confidence and clarity of “reality” in many, if not all, biological subdisciplines is absurd (e.g. “nature vs. nurture”). Biologists commonly acknowledge (and rely upon) the realisation that life in all of its constructs is unfathomably diverse, unique, and often difficult to categorise. Any impression of being able to do so is a part of the human limitation to process concepts without boundaries.

Genderbread-Person figure
A great example of the complex nature of human sex and gender. You’ll notice that each category is itself a spectrum: even Biological Sex is not a clearly binary system. In fact, even this representation likely simplifies the complexity of human identity and sexuality given that each category is only a single linear scale (e.g. pansexuality and asexuality aren’t on the Sexual Orientation gradient), but nevertheless is a good summary. Source: It’s Pronounced METROsexual.

Gender as a binary

In terms of gender identity, I think this is becoming (slowly) more accepted over time. That most people have a gender identity somewhere along a multidimensional spectrum is not, for many people, a huge logical leap. Trans people are not mentally ill, not all ‘men’ identify as ‘men’ and certainly not all ‘men’ identify as a ‘man’ under the same characteristics or expression. Human psychology is beautifully complex and to reduce people down to the most simplistic categories is, in my humble opinion, a travesty. The single-variable gender binary cannot encapsulate the full depth of any single person’s identity or personality, and this biologically makes sense.

Sex as a binary

As an extension of the gender debate, sex itself has often been relied upon as the last vestige of some kind of sexual binary. Even for those more supported of trans people, sex is often described as some concrete, biologically, genetically-encoded trait which conveniently falls into its own binary system. Thus, instead of a single binary, people are reduced down to a two-character matrix of sex and gender.

Gender and sex table.jpg
A representative table of the “2 Character Sex and Gender” composition. Although slightly better at allowing for complexity in people’s identities, having 2 binaries instead of 1 doesn’t encapsulate the full breadth of diversity in either sex or gender.

However, the genetics of the definition and expression of sex is in itself a complex network of the expression of different genes and the presence of different chromosomes. Although high-school level biology teaches us that men are XY and women are XX genetically, individual genes within those chromosomes can alter the formation of different sexual organs and the development of a person. Furthermore, additional X or Y chromosomes can further alter the way sexual development occurs in people. Many people who fall in between the two ends of the gender spectrum of Male and Female identify as ‘intersex’.

DSD types table.jpg
A list of some of the known types of ‘Disorders of Sex Development’ (DSDs) which can lead to non-binary sex development in many different ways. Within these categories, there may be multiple genetic mechanisms (e.g. specific mutations) underlying the symptoms. It’s also important to note that while DSD medically describes the conditions of many people, it can be offensive/inappropriate to many intersex people (‘disorder’ can be a heavy word). Source: El-Sherbiny (2013).

You might be under the impression that these are rare ‘genetic disorders’, and don’t count as “real people” (decidedly not my words). But the reality is that intersex people are relatively common throughout the world, and occur roughly as frequently as true redheads or green eyes. Thus, the idea that excluding intersex people from the rest of societal definitions has very little merit, especially from a scientific point of view. Instead, allowing our definitions of both sex and gender to be broad and flexible allows us to incorporate the biological reality of the immense diversity of the world, even just within our own species.

Absolute species concepts

Speaking of species, and relating this paradigm of dichotomy to potentially less politically charged concepts, species themselves are a natural example on the inaccuracy of absolutism. This idea is not a new one, either within The G-CAT or within the broad literature, and species identity has long been regarded as a hive of grey areas. The sheer number of ways a group of organisms can be divided into species (or not, as the case may be) lends to the idea that simplified definitions of what something is or is not will rarely be as accurate as we hope. Even the most commonly employed of characteristics – such as those of the Biological Species Conceptcannot be applied to a number of biological systems such as asexually-reproducing species or complex cases of isolation.

Speciation continuum figure
A figure describing the ‘speciation continuum’ from a previous post on The G-CAT. Now imagine that each Species Concept has it’s own vague species boundary (dotted line): draw 30 of them over the top of one another, and try to pick the exact cut-off between the red and green areas. Even using the imagination, this would be difficult.

The diversity of Life

Anyone who argues a biological basis for these concepts is taking the good name of biological science hostage. Diversity underpins the most core aspects of biology (e.g. evolution, communities and ecosystems, medicine) and is a real attribute of living in a complicated world. Downscaling and simplifying the world to the ‘black’ and the ‘white’ discredits the wonder of biology, and acknowledging the ‘outliers’ (especially those that are not actually so far outside the boxes we have drawn) of any trends we may observe in nature is important to understand the complexity of life on Earth. Even if individual components of this post seem debatable to you: always remember that life is infinitely more complex and colourful than we can even imagine, and all of that is underpinned by diversity in one form or another.

Bringing alleles back together: applications of coalescent theory

Coalescent theory

A recurring analytical method, both within The G-CAT and the broader ecological genetic literature, is based on coalescent theory. This is based on the mathematical notion that mutations within genes (leading to new alleles) can be traced backwards in time, to the point where the mutation initially occurred. Given that this is a retrospective, instead of describing these mutation moments as ‘divergence’ events (as would be typical for phylogenetics), these appear as moments where mutations come back together i.e. coalesce.

There are a number of applications of coalescent theory, and it is particularly fitting process for understanding the demographic (neutral) history of populations and species.

Mathematics of the coalescent

Before we can explore the multitude of applications of the coalescent, we need to understand the fundamental underlying model. The initial coalescent model was described in the 1980s, built upon by a number of different ecologists, geneticists and mathematicians. However, John Kingman is often attributed with the formation of the original coalescent model, and the Kingman’s coalescent is considered the most basic, primal form of the coalescent model.

From a mathematical perspective, the coalescent model is actually (relatively) simple. If we sampled a single gene from two different individuals (for simplicity’s sake, we’ll say they are haploid and only have one copy per gene), we can statistically measure the probability of these alleles merging back in time (coalescing) at any given generation. This is the same probability that the two samples share an ancestor (think of a much, much shorter version of sharing an evolutionary ancestor with a chimpanzee).

Normally, if we were trying to pick the parents of our two samples, the number of potential parents would be the size of the ancestral population (since any individual in the previous generation has equal probability of being their parent). But from a genetic perspective, this is based on the genetic (effective) population size (Ne), multiplied by 2 as each individual carries two copies per gene (one paternal and one maternal). Therefore, the number of potential parents is 2Ne.

Constant Ne and coalescent prob
A graph of the probability of a coalescent event (i.e. two alleles sharing an ancestor) in the immediately preceding generation (i.e. parents) relatively to the size of the population. As one might expect, with larger population sizes there is low chance of sharing an ancestor in the immediately prior generation, as the pool of ‘potential parents’ increases.

If we have an idealistic population, with large Ne, random mating and no natural selection on our alleles, the probability that their ancestor is in this immediate generation prior (i.e. share a parent) is 1/(2Ne). Inversely, the probability they don’t share a parent is 1 − 1/(2Ne). If we add a temporal component (i.e. number of generations), we can expand this to include the probability of how many generations it would take for our alleles to coalesce as (1 – (1/2Ne))t-1 x 1/2Ne.

Variable Ne and coalescent probs
The probability of two alleles sharing a coalescent event back in time under different population sizes. Similar to above, there is a higher probability of an earlier coalescent event in smaller populations as the reduced number of ancestors means that alleles are more likely to ‘share’ an ancestor. However, over time this pattern consistently decreases under all population size scenarios.

Although this might seem mathematically complicated, the coalescent model provides us with a scenario of how we would expect different mutations to coalesce back in time if those idealistic scenarios are true. However, biology is rarely convenient and it’s unlikely that our study populations follow these patterns perfectly. By studying how our empirical data varies from the expectations, however, allows us to infer some interesting things about the history of populations and species.

Testing changes in Ne and bottlenecks

One of the more common applications of the coalescent is in determining historical changes in the effective population size of species, particularly in trying to detect genetic bottleneck events. This is based on the idea that alleles are likely to coalesce at different rates under scenarios of genetic bottlenecks, as the reduced number of individuals (and also genetic diversity) associated with bottlenecks changes the frequency of alleles and coalescence rates.

For a set of k different alleles, the rate of coalescence is determined as k(k – 1)/4Ne. Thus, the coalescence rate is intrinsically linked to the number of genetic variants available: Ne. During genetic bottlenecks, the severely reduced Ne gives the appearance of coalescence rate speeding up. This is because alleles which are culled during the bottleneck event by genetic drift causes only a few (usually common) alleles to make it through the bottleneck, with the mutation and spread of these alleles after the bottleneck. This can be a little hard to think of, so the diagram below demonstrates how this appears.

Bottleneck test figure.jpg
A diagram of how the coalescent can be used to detect bottlenecks in a single population (centre). In this example, we have contemporary population in which we are tracing the coalescence of two main alleles (red and green, respectively). Each circle represents a single individual (we are assuming only one allele per individual for simplicity, but for most animals there are up to two).  Looking forward in time, you’ll notice that some red alleles go extinct just before the bottleneck: they are lost during the reduction in Ne. Because of this, if we measure the rate of coalescence (right), it is much higher during the bottleneck than before or after it. Another way this could be visualised is to generate gene trees for the alleles (left): populations that underwent a bottleneck will typically have many shorter branches and a long root, as many branches will be ‘lost’ by extinction (the dashed lines, which are not normally seen in a tree).

This makes sense from theoretical perspective as well, since strong genetic bottlenecks means that most alleles are lost. Thus, the alleles that we do have are much more likely to coalesce shortly after the bottleneck, with very few alleles that coalesce before the bottleneck event. These alleles are ones that have managed to survive the purge of the bottleneck, and are often few compared to the overarching patterns across the genome.

Testing migration (gene flow) across lineages

Another demographic factor we may wish to test is whether gene flow has occurred across our populations historically. Although there are plenty of allele frequency methods that can estimate contemporary gene flow (i.e. within a few generations), coalescent analyses can detect patterns of gene flow reaching further back in time.

In simple terms, this is based on the idea that if gene flow has occurred across populations, then some alleles will have been transferred from one population to another. Because of this, we would expect that transferred alleles coalesce with alleles of the source population more recently than the divergence time of the two populations. Thus, models that include a migration rate often add it as a parameter specifying the probability than any given allele coalesces with an allele in another population or species (the backwards version of a migration or introgression event). Again, this might be difficult to conceptualise so there’s a handy diagram below.

Migration rate test figure
A similar model of coalescence as above, but testing for migration rate (gene flow) in two recently diverged populations (right). In this example, when we trace two alleles (red and green) back in time, we notice that some individuals in Population 1 coalesce more recently with individuals of Population 2 than other individuals of Population 1 (e.g. for the red allele), and vice versa for the green allele. This can also be represented with gene trees (left), with dashed lines representing individuals from Population 2 and whole lines representing individuals from Population 1. This incomplete split between the two populations is the result of migration transferring genes from one population to the other after their initial divergence (also called ‘introgression’ or ‘horizontal gene transfer’).

Testing divergence time

In a similar vein, the coalescent can also be used to test how long ago the two contemporary populations diverged. Similar to gene flow, this is often included as an additional parameter on top of the coalescent model in terms of the number of generations ago. To convert this to a meaningful time estimate (e.g. in terms of thousands or millions of years ago), we need to include a mutation rate (the number of mutations per base pair of sequence per generation) and a generation time for the study species (how many years apart different generations are: for humans, we would typically say ~20-30 years).

Divergence time test figure.jpg
An example of using the coalescent to test the divergence time between two populations, this time using three different alleles (red, green and yellow). Tracing back the coalescence of each alleles reveals different times (in terms of which generation the coalescence occurs in) depending on the allele (right). As above, we can look at this through gene trees (left), showing variation how far back the two populations (again indicated with bold and dashed lines respectively) split. The blue box indicates the range of times (i.e. a confidence interval) around which divergence occurred: with many more alleles, this can be more refined by using an ‘average’ and later related to time in years with a generation time.

 

The basic model of testing divergence time with the coalescent is relatively simple, and not all that different to phylogenetic methods. Where in phylogenetics we relate the length of the different branches in the tree to the amount of time that has occurred since the divergence of those branches, with the coalescent we base these on coalescent events, with more coalescent events occurring around the time of divergence. One important difference in the two methods is that coalescent events might not directly coincide with divergence time (in fact, we expect many do not) as some alleles will separate prior to divergence, and some will lag behind and start to diverge after the divergence event.

The complex nature of the coalescent

While each of these individual concepts may seem (depending on how well you handle maths!) relatively simple, one critical issue is the interactive nature of the different factors. Gene flow, divergence time and population size changes will all simultaneously impact the distribution and frequency of alleles and thus the coalescent method. Because of this, we often use complex programs to employ the coalescent which tests and balances the relative contributions of each of these factors to some extent. Although the coalescent is a complex beast, improvements in the methodology and the programs that use it will continue to improve our ability to infer evolutionary history with coalescent theory.

The space for species: how spatial aspects influence speciation

Spatial and temporal factors of speciation

The processes driving genetic differentiation, and the progressive development of populations along the speciation continuum, are complex in nature and influenced by a number of factors. Generally, on The G-CAT we have considered the temporal aspects of these factors: how time much time is needed for genetic differentiation, how this might not be consistent across different populations or taxa, and how a history of environmental changes affect the evolution of populations and species. We’ve also touched on the spatial aspects of speciation and genetic differentiation before, but in significantly less detail.

To expand on this, we’re going to look at a few different models of how the spatial distribution of populations influences their divergence, and particularly how these factor into different processes of speciation.

What comes first, ecological or genetic divergence?

One key paradigm in understanding speciation is somewhat an analogy to the “chicken and the egg scenario”, albeit with ecological vs. genetic divergence. This concept is based on the idea that two aspects are key for determining the formation of new species: genetic differentiation of the populations in question, and ecological (or adaptive) changes that provide new ecological niches for species to inhabit. Without both, we might have new morphotypes or ecotypes of a singular species (in the case of ecological divergence without strong genetic divergence) or cryptic species (genetically distinct but ecologically identical species).

The order of these two processes have been in debate for some time, and different aspects of species and the environment can influence how (or if) these processes occur.

Different spatial models of speciation

Generally, when we consider the spatial models for speciation we divide these into distinct categories based on the physical distance of populations from one another. Although there is naturally a lot of grey area (as there is with almost everything in biological science), these broad concepts help us to define and determine how speciation is occurring in the wild.

Allopatric speciation

The simplest model is one we have described before called “allopatry”. In allopatry, populations are distributed distantly from one another, so that there are separated and isolated. A common way to imagine this is islands of populations separated by ocean of unsuitable habitat.

Allopatric speciation is considered one of the simplest and oldest models of speciation as the process is relatively straightforward. Geographic isolation of populations separates them from one another, meaning that gene flow is completely stopped and each population can evolve independently. Small changes in the genes of each population over time (e.g. due to different natural selection pressures) cause these populations to gradually diverge: eventually, this divergence will reach a point where the two populations would not be compatible (i.e. are reproductively isolated) and thus considered separate species.

Allopatry_example
The standard model of allopatric speciation, following an island model. 1) We start with a single population occupying a single island.  2) A rare dispersal event pushes some individuals onto a new island, forming a second population. Note that this doesn’t happen often enough to allow for consistent gene flow (i.e. the island was only colonised once). 3) Over time, these populations may accumulate independent genetic and ecological changes due to both natural selection and drift, and when they become so different that they are reproductively isolated they can be considered separate species.

Although relatively straightforward, one complex issue of allopatric speciation is providing evidence that hybridisation couldn’t happen if they reconnected, or if populations could be considered separate species if they could hybridise, but only under forced conditions (i.e. it is highly unlikely that the two ‘species’ would interact outside of experimental conditions).

Parapatric and peripatric speciation

A step closer in bringing populations geographically together in speciation is “parapatry” and “peripatry”. Parapatric populations are often geographically close together but not overlapping: generally, the edges of their distributions are touching but do not overlap one another. A good analogy would be to think of countries that share a common border. Parapatry can occur when a species is distributed across a broad area, but some form of narrow barrier cleaves the distribution in two: this can be the case across particular environmental gradients where two extremes are preferred over the middle.

The main difference between paraptry and allopatry is the allowance of a ‘hybrid zone’. This is the region between the two populations which may not be a complete isolating barrier (unlike the space between allopatric populations). The strength of the barrier (and thus the amount of hybridisation and gene flow across the two populations) is often determined by the strength of the selective pressure (e.g. how unfit hybrids are). Paraptry is expected to reduce the rate and likelihood of speciation occurring as some (even if reduced) gene flow across populations is reduces the amount of genetic differentiation between those populations: however, speciation can still occur.

Parapatric speciation across a thermocline.jpg
An example of parapatric species across an environment gradient (in this case, a temperature gradient along the ocean coastline). Left: We have two main species (red and green fish) which are adapted to either hotter or colder temperatures (red and green in the gradient), respectively. A small zone of overlap exists where hybrid fish (yellow) occur due to intermediate temperature. Right: How the temperature varies across the system, forming a steep gradient between hot and cold waters.

Related to this are peripatric populations. This differs from parapatry only slightly in that one population is an original ‘source’ population and the other is a ‘peripheral’ population. This can happen from a new population becoming founded from the source by a rare dispersal event, generating a new (but isolated) population which may diverge independently of the source. Alternatively, peripatric populations can be formed when the broad, original distribution of the species is reduced during a population contraction, and a remnant piece of the distribution becomes fragmented and ‘left behind’ in the process, isolated from the main body. Speciation can occur following similar processes of allopatric speciation if gene flow is entirely interrupted or paraptric if it is significantly reduced but still present.

Peripatric distributions.jpg
The two main ways peripatric species can form. Left: The dispersal method. In this example, there is a central ‘source’ population (orange birds on the main island), which holds most of the distribution. However, occasionally (more frequently than in the allopatric example above) birds can disperse over to the smaller island, forming a (mostly) independent secondary population. If the gene flow between this population and the central population doesn’t overwhelm the divergence between the two populations (due to selection and drift), then a new species (blue birds) can form despite the gene flow. Right: The range contraction method. In this example, we start with a single widespread population (blue lizards) which has a rapid reduction in its range. However, during this contraction one population is separated from the main body (i.e. as a refugia), which may also be a precursor of peripatric speciation.

Sympatric (ecological) speciation

On the other end of the distribution spectrum, the two diverging populations undergoing speciation may actually have completely overlapping distributions. In this case, we refer to these populations as “sympatric”, and the possibility of sympatric speciation has been a highly debated topic in evolutionary biology for some time. One central argument rears its head against the possibility of sympatric speciation, in that if populations are co-occurring but not yet independent species, then gene flow should (theoretically) occur across the populations and prevent divergence.

It is in sympatric speciation that we see the opposite order of ecological and genetic divergence happen. Because of this, the process is often referred to as “ecological speciation”, where individual populations adapt to different niches within the same area, isolating themselves from one another by limiting their occurrence and tolerances. As the two populations are restricted from one another by some kind of ecological constraint, they genetically diverge over time and speciation can occur.

This can be tricky to visualise, so let’s invent an example. Say we have a tropical island, which is occupied by one bird species. This bird prefers to eat the large native fruit of the island, although there is another fruit tree which produces smaller fruits. However, there’s only so much space and eventually there are too many birds for the number of large fruit trees available. So, some birds are pushed to eat the smaller fruit, and adapt to a different diet, changing physiology over time to better acquire their new food and obtain nutrients. This shift in ecological niche causes the two populations to become genetically separated as small-fruit-eating-birds interact more with other small-fruit-eating-birds than large-fruit-eating-birds. Over time, these divergences in genetics and ecology causes the two populations to form reproductively isolated species despite occupying the same island.

Ecological sympatric speciation
A diagram of the ecological speciation example given above. Note that ecological divergence occurs first, with some birds of the original species shifting to the new food source (‘ecological niche’) which then leads to speciation. An important requirement for this is that gene flow is somehow (even if not totally) impeded by the ecological divergence: this could be due to birds preferring to mate exclusively with other birds that share the same food type; different breeding seasons associated with food resources; or other isolating mechanisms.

Although this might sound like a simplified example (and it is, no doubt) of sympatric speciation, it’s a basic summary of how we ended up with so many species of Darwin’s finches (and why they are a great model for the process of evolution by natural selection).

The complexity of speciation

As you can see, the processes and context driving speciation are complex to unravel and many factors play a role in the transition from population to species. Understanding the factors that drive the formation of new species is critical to understanding not just how evolution works, but also in how new diversity is generated and maintained across the globe (and how that might change in the future).