Sweeping under the genomic rug: hard and soft sweeps

Of alleles and selection

If you’ve read this blog more than once before, you’re probably sick of hearing about how genetic variation underlies adaptation. It’s probably the most central theme of this blog, and similarly one of the biggest components of contemporary biology. We’ve talked about different types of selection; different types of genes; different ways genes and selection can interact. And believe it or not, there’s still heaps to talk about!

The distribution of selection across the genome

An area we’ve touched on before is how selection varies across the genome, and may be concentrated in single regions or spread out across multiple genes. This relates to the mode of selection and how the frequency of genetic variants changes over both time (from generation to generation) and across the genome itself. These in turn affect how and the speed at which adaptation occurs within a population or species.

One particular issue with determining how adaptive genetic variation may spread (or not) throughout a population relates to two major factors: the origin of the genetic variation, and the rate at which the adaptive allele is ‘swept’ throughout the population. Categorically, we could (and frequently do) break these down into different scenarios: a ‘soft’ sweep, or a ‘hard’ sweep. So what do these words mean?

Hard sweeps

One of the more typical ways we might understand how genetic variation drives the evolution of traits within species is through mutation and selection. In this scenario, a single mutation event generates one new allele, which may be beneficial (adaptive), detrimental (maladaptive), or neutral. If it’s very strongly adaptive, this could allele could very readily spread throughout the population in question based on the fundamental processes of natural selection. This is what we describe as a hard sweep, and it has a few different consequences beyond just conferring evolution.

In a hard sweep, the arrival of a new and strongly adaptive allele into the gene pool is inevitably ‘linked’ with other genetic variants shared by the genome of origin. This can be a little confusing to think about, but we can instead think of genomes as individual people. If the mutation appears in a person, then it could inevitably be linked to the other traits of that person: maybe blonde hair, or green eyes, or a weird gangly leg. Who knows.

Where this does matter is in future generations: given that this particular mutation is highly adaptive, it will inevitably spread itself throughout the population. However, when it does, it also drags along with it other alleles that are closely linked (see a more thorough description of linkage here) to it. As a result, when the allele has swept throughout the entire population, it inadvertently causes linked alleles to also sweep, increasing their frequency. In the situation of the gangly person, this might mean that the frequency of genes that cause blonde hair, green eyes and weird legs all increase, even if only a single one of them is actually adaptive (assuming that these traits are all closely linked).

Hard sweep figure
An example of how a hard sweep leads to changes in allele frequencies. In this figure, each row indicates the genome of one individual, the various blocks as particular genes and the colours of the blocks indicating the allele (or variant) of said gene. The bottom row of figures shows the relative frequencies of the different alleles, and how these change over time. Initially, the population has some diversity, but not in one particular gene (brown) (1). A mutation event (*) introduces new diversity into this gene, which becomes the source of new selection (2). Because this initial mutation only occurs in one select genome, it is immediately associated with (i.e. linked) to particular variants of the other genes (light blue box). Given that genes tend to be inherited in segments due to linkage blocks and recombination, these collateral alleles also tend to be inherited with the actual allele under selection (remember, selection is only directly acting on the brown gene). Over time, this causes the frequencies of these associated alleles to also increase along with the brown allele (3). A long way down the road, this eventually results in a loss of genetic diversity around the gene under selection as only the initially linked variant remains in the population (4).

When we observe genetic frequencies across the genome, hard sweeps often leave very detectable signals of a ‘peak’ surrounding the adaptive mutation. These peaks often have unusually low genetic variation (since the adaptive allele ‘outcompetes’ alternatives, and only linked variants spread, not alternative alleles on different ‘people’). For the person analogy, this might mean alternative hair colours, eye colours and leg shapes are removed from the population as the adaptive trait sweeps throughout it.

Hard sweep Fst figure
An example of how hard sweeps can create ‘peaks’ in genetic differentiation. In this example, we’re comparing the frequencies of alleles (Fst) between two different populations: one where the mutation has occurred (and undergone a hard sweep) and one without it. As you can see, the hard sweep drives a sharp increase in differentiation at the locus under selection (green arrow), but also drags up neighbouring loci based on their linkage (yellow arrows). The strength of this linkage tends to decay with distance, so further along the genome alleles are longer affected.

Soft sweeps

This process, and its outcomes, directly opposes a soft sweep. In a soft sweep, instead of a new mutation occurring there is already genetic variation present at the locus before selection acts. Selection acts to change the frequency of the adaptive allele in a much more subtle way, resulting in a gradual shift in frequency over a longer period of time. Because of this prior variation, it becomes much more difficult for the adaptive allele to completely swamp out and remove other alleles, thereby avoiding the reduction of genetic variation caused by hard sweeps.

Soft sweep figure
How a soft sweep affects allele frequencies across the genome. In this example, genetic diversity is already present at all genes before selection acts (1). Then, a change in the selective environment causes one particular allele (dark brown) to become favoured (2). Unlike in hard sweeps, this adaptive allele is associated with multiple different alleles which can be co-inherited with it (dashed boxes). This means that individuals in the following generation might inherit the different neighbouring alleles even if the frequency of the selected allele increases (3). Over time, this means that genetic diversity is not lost in the linked genes, even if selection drives total fixation (i.e. loss of diversity) within the directly selected gene (4).

Alternatively, soft sweeps can occur if multiple individual adaptive mutations occur at the exact same site: given that all alleles are approximately equivalent, it’s unlikely for any one allele to completely swamp out the others.

Soft sweep multiple mutations figure
How multiple mutations at the same site can lead to a soft sweep. This scenario is almost identical to a hard sweep, except this time there are multiple mutations (*) at a single site. Because both of these new mutations are adaptive, and are associated with different nearby alleles (indicated by the dashed boxes), when selection favours each of these mutations and it spreads throughout the population it drags multiple alleles with it. Thus, genetic diversity in linked loci are not lost during the sweep, unlike a hard sweep.

How common are these sweeps?

While there are cases of hard sweeps throughout the biodiversity of the planet, the majority of adaptive genetic changes appear to be driven by soft sweeps from already existing genetic variation. Particularly, complex soft sweeps often seem to underlie polygenic adaptation: that is, when the evolution of a trait is driven by shifts in many different genes concurrently. This is far more common in biology than adaptation from a single locus, although that’s not to say that this never happens. In terms of responding to rapid selective pressures – such as adapting to current climate change – pre-existing genetic diversity appears critical in facilitating an evolutionary response.

Combined sweeps
How both types of selective sweep impact genetic differentiation across the genome: hard sweeps leave distinctive ‘peaks’, as if sucked up by a strong vacuum cleaner, whereas soft sweeps leave more subdued mounds, as if pushed by a weaker broom,

From a conservation perspective, this leads to a few different realisations. One of the most critical is the importance of maintaining genetic diversity in natural populations and species. Mutations are relatively rare over time and across the genome (compared to the scale of both), and do not often suddenly confer adaptive benefits. However, pre-existing genetic diversity (referred to as ‘standing genetic variation’) provides a template for species to adapt to changing conditions, boosting their chances of being able to respond with a dramatically shifting climate. Thus, aiming to maintain genetic diversity in wild populations remains a critical component for giving species the best change to survive under climate change.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s