What’s the (allele) frequency, Kenneth?

Allele frequency

A number of times before on The G-CAT, we’ve discussed the idea of using the frequency of different genetic variants (alleles) within a particular population or species to test a number of different questions about evolution, ecology and conservation. These are all based on the central notion that certain forces of nature will alter the distribution and frequency of alleles within and across populations, and that these patterns are somewhat predictable in how they change.

One particular distinction we need to make early here is the difference between allele frequency and allele identity. In these analyses, often we are working with the same alleles (i.e. particular variants) across our populations, it’s just that each of these populations may possess these particular alleles in different frequencies. For example, one population may have an allele (let’s call it Allele A) very rarely – maybe only 10% of individuals in that population possess it – but in another population it’s very common and perhaps 80% of individuals have it. This is a different level of differentiation than comparing how different alleles mutate (as in the coalescent) or how these mutations accumulate over time (like in many phylogenetic-based analyses).

Allele freq vs identity figure.jpg
An example of the difference between allele frequency and identity. In this example (and many of the figures that follow in this post), the circle denote different populations, within which there are individuals which possess either an A gene (blue) or a B gene. Left: If we compared Populations 1 and 2, we can see that they both have A and B alleles. However, these alleles vary in their frequency within each population, with an equal balance of A and B in Pop 1 and a much higher frequency of B in Pop 2. Right: However, when we compared Pop 3 and 4, we can see that not only do they vary in frequencies, they vary in the presence of alleles, with one allele in each population but not the other.

Non-adaptive (neutral) uses

Testing neutral structure

Arguably one of the most standard uses of allele frequency data is the determination of population structure, one which more avid The G-CAT readers will be familiar with. This is based on the idea that populations that are isolated from one another are less likely to share alleles (and thus have similar frequencies of those alleles) than populations that are connected. This is because gene flow across two populations helps to homogenise the frequency of alleles within those populations, by either diluting common alleles or spreading rarer ones (in general). There are a number of programs that use allele frequency data to assess population structure, but one of the most common ones is STRUCTURE.

Gene flow homogeneity figure
An example of how gene flow across populations homogenises allele frequencies. We start with two initial populations (and from above), which have very different allele frequencies. Hybridising individuals across the two populations means some alleles move from Pop 1 and Pop 2 into the hybrid population: which alleles moves is random (the smaller circles). Because of this, the resultant hybrid population has an allele frequency somewhere in between the two source populations: think of like mixing red and blue cordial and getting a purple drink.

 

Simple YPP structure figure.jpg
An example of a Structure plot which long-term The G-CAT readers may be familiar with. This is taken from Brauer et al. (2013), where the authors studied the population structure of the Yarra pygmy perch. Each small column represents a single individual, with the colours representing how well the alleles of that individual fit a particular genetic population (each population has one colour). The numbers and broader columns refer to different ‘localities’ (different from populations) where individuals were sourced. This shows clear strong population structure across the 4 main groups, except for in Locality 6 where there is a mixture of Eastern and Merri/Curdies alleles.

Determining genetic bottlenecks and demographic change

Other neutral aspects of population identity and history can be studied using allele frequency data. One big component of understanding population history in particular is determining how the population size has changed over time, and relating this to bottleneck events or expansion periods. Although there are a number of different approaches to this, which span many types of analyses (e.g. also coalescent methods), allele frequency data is particularly suited to determining changes in the recent past (hundreds of generations, as opposed to thousands of generations ago). This is because we expect that, during a bottleneck event, it is statistically more likely for rare alleles (i.e. those with low frequency) in the population to be lost due to strong genetic drift: because of this, the population coming out of the bottleneck event should have an excess of more frequent alleles compared to a non-bottlenecked population. We can determine if this is the case with tests such as the heterozygosity excess, M-ratio or mode shift tests.

Genetic drift and allele freq figure
A diagram of how allele frequencies change in genetic bottlenecks due to genetic drift. Left: Large circles again denote a population (although across different sequential times), with smaller circle denoting which alleles survive into the next generation (indicated by the coloured arrows). We start with an initial ‘large’ population of 8, which is reduced down to 4 and 2 in respective future times. Each time the population contracts, only a select number of alleles (or individuals) ‘survive’: assuming no natural selection is in process, this is totally random from the available gene pool. Right: We can see that over time, the frequencies of alleles A and B shift dramatically, leading to the ‘extinction’ of Allele B due to genetic drift. This is because it is the less frequent allele of the two, and in the smaller population size has much less chance of randomly ‘surviving’ the purge of the genetic bottleneck. 

Adaptive (selective) uses

Testing different types of selection

We’ve also discussed previously about how different types of natural selection can alter the distribution of allele frequency within a population. There are a number of different predictions we can make based on the selective force and the overall population. For understanding particular alleles that are under strong selective pressure (i.e. are either strongly adaptive or maladaptive), we often test for alleles which have a frequency that strongly deviates from the ‘neutral’ background pattern of the population. These are called ‘outlier loci’, and the fact that their frequency is much more different from the average across the genome is attributed to natural selection placing strong pressure on either maintaining or removing that allele.

Other selective tests are based on the idea of correlating the frequency of alleles with a particular selective environmental pressure, such as temperature or precipitation. In this case, we expect that alleles under selection will vary in relation to the environmental variable. For example, if a particular allele confers a selective benefit under hotter temperatures, we would expect that allele to be more common in populations that occur in hotter climates and rarer in populations that occur in colder climates. This is referred to as a ‘genotype-environment association test’ and is a good way to detect polymorphic selection (i.e. when multiple alleles contribute to a change in a single phenotypic trait).

Genotype by environment figure.jpg
An example of how the frequency of alleles might vary under natural selection in correlation to the environment. In this example, the blue allele A is adaptive and under positive selection in the more intense environment, and thus increases in frequency at higher values. Contrastingly, the red allele B is maladaptive in these environments and decreases in frequency. For comparison, the black allele shows how the frequency of a neutral (non-adaptive or maladaptive) allele doesn’t vary with the environment, as it plays no role in natural selection.

Taxonomic (species identity) uses

At one end of the spectrum of allele frequencies, we can also test for what we call ‘fixed differences’ between populations. An allele is considered ‘fixed’ it is the only allele for that locus in the population (i.e. has a frequency of 1), whilst the alternative allele (which may exist in other populations) has a frequency of 0. Expanding on this, ‘fixed differences’ occur when one population has Allele A fixed and another population has Allele B fixed: thus, the two populations have as different allele frequencies (for that one locus, anyway) as possible.

Fixed differences are sometimes used as a type of diagnostic trait for species. This means that each ‘species’ has genetic variants that are not shared at all with its closest relative species, and that these variants are so strongly under selection that there is no diversity at those loci. Often, fixed differences are considered a level above populations that differ by allelic frequency only as these alleles are considered ‘diagnostic’ for each species.

Fixed differences figure.jpg
An example of the difference between fixed differences and allelic frequency differences. In this example, we have 5 cats from 3 different species, sequencing a particular target gene. Within this gene, there are three possible alleles: T, A or G respectively. You’ll quickly notice that the allele is both unique to Species A and is present in all cats of that species (i.e. is fixed). This is a fixed difference between Species A and the other two. Alleles and G, however, are present in both Species B and C, and thus are not fixed differences even if they have different frequencies.

Intrapopulation (relatedness) uses

Allele frequency-based methods are even used in determining relatedness between individuals. While it might seem intuitive to just check whether individuals share the same alleles (and are thus related), it can be hard to distinguish between whether they are genetically similar due to direct inheritance or whether the entire population is just ‘naturally’ similar, especially at a particular locus. This is the distinction between ‘identical-by-descent’, where alleles that are similar across individuals have recently been inherited from a similar ancestor (e.g. a parent or grandparent) or ‘identical-by-state’, where alleles are similar just by chance. The latter doesn’t contribute or determine relatedness as all individuals (whether they are directly related or not) within a population may be similar.

To distinguish between the two, we often use the overall frequency of alleles in a population as a basis for determining how likely two individuals share an allele by random chance. If alleles which are relatively rare in the overall population are shared by two individuals, we expect that this similarity is due to family structure rather than population history. By factoring this into our relatedness estimates we can get a more accurate overview of how likely two individuals are to be related using genetic information.

The wild world of allele frequency

Despite appearances, this is just a brief foray into the many applications of allele frequency data in evolution, ecology and conservation studies. There are a plethora of different programs and methods that can utilise this information to address a variety of scientific questions and refine our investigations.

Notes from the Field: Cliff racer

Scientific name

Cinis descendens

Meaning: Cinis: from [ash] in Latin; descendens from [descends] in Latin.

Translation: descending from the ash; describes hunting behaviour in ash mountains of Vvardenfell.

Common name

Cliff racer

cliff racer
A cliff racer hovering above a precipice on Vvardenfell.

Taxonomic status

Kingdom Animalia; Phylum Chordata; Class Aves; Subclass Archaeornithes; Family Vvardidae; Genus Cinis; Species descendens

Conservation status

Least Concern [circa 3E 427]

Threatened [circa 4E 433]

Distribution

Once widespread throughout the north eastern region of Tamriel, occupying regions from the island of Vvardenfell to mainland Morrowind and Solstheim. Despite their name, the cliff racer is found across nearly all geographic regions of Vvardenfell, although the species is found in greatest densities in the rocky interior region of Stonefalls.

Following a purge of the species as part of pest control management, the cliff racer was effectively exterminated from parts of its range, including local extinction on the island of Solstheim. Since the cull the cliff racer is much less abundant throughout its range although still distributed throughout much of Vvardenfell and mainland Morrowind.

Morrowind
The province of Morrowind, which largely contains the distribution of the cliff racer. The island of Solstheim is found to the northwest of the map (the lower half of the island can be seen in brown).

Habitat

Although, much as the name suggests, the cliff racer prefers rocky outcroppings and mountainous regions in which it can build its nest, the species is frequently seen in lowland swamp and plains regions of Morrowind.

Behaviour and ecology

The cliff racer is a highly aggressive ambush predator, using height and range to descend on unsuspecting victims and lashing at them with its long, sharp tail. Although preferring to predate on small rodents and insects (such as kwama), cliff racers have been known to attack much larger beasts such as agouti and guar if provoked or desperate. The highly territorial nature of cliff racer means that they often attack travellers, even if they pose no immediate threat or have done nothing to provoke the animal.

Cliff_Racer_(Online).png
A cliff racer descends upon its prey.

Despite the territoriality of cliff racers, large flocks of them can often be found in the higher altitude regions of Vvardenfell, perhaps facilitated by an abundance of food (reducing competition) or communal breeding grounds. Attempts by researchers to study these aggregations have been limited due to constant attacks and damage to equipment by the flock.

Demography

Prior to the purging of cliff racers in the early 4E by Saint Jiub, the cliff racer was overly abundant throughout its range and considered a pest species by native peoples. Although formal studies on the population structure of the species was never conducted due to their aggressive nature, suppositions of migratory rates, distances and geographies suggested that potentially three major (ESUs) populations existed; one of Solstheim, one of Vvardenfell, and another of mainland Morrowind.

Following the control measures implemented, the population size of these populations of cliff racers declined severely; however, given the survival of the majority of the population it does not appear this bottleneck has severely impacted the longevity of the species. The extirpation of the Solstheim population of cliff racers likely removed a unique ESU from the species, given the relative isolation of the island. Whether the island will be recolonised in time by Vvardenfell cliff racers is unknown, although the presence of any cliff racers back onto Solstheim would likely be met with strong opposition from the local peoples.

Adaptive traits

The broad wings, dorsal sail and long tail allow the cliff racer to travel large distances in the air, serving them well in hunting behaviour. The drawback of this is that, if hunting during the middle hours of the day, the cliff racer leaves an imposing shadow on the ground and silhouette in the sky, often alerting aware prey to their presence. That said, the speed of descent and disorienting cry of the animal often startles prey long enough for the cliff racer to attack.

The plumes of the cliff racer are a well-sought-after commodity by local peoples, used in the creation of garments and household items. Whether these plumes serve any adaptive purpose (such as sexual selection through mate signalling) is unknown, given the difficulties with studying wild cliff racer behaviour.

Management actions

Although suffering from a strong population bottleneck after the purge, the cliff racer is still relatively abundant across much of its range and maintains somewhat stable size. Management and population control of the cliff racer is necessary across the full distribution of the species to prevent strong recovery and maintain public safety and ecosystem balance. Breeding or rescuing cliff racers is strictly forbidden and the species has been widely declared as ‘native pest’, despite the somewhat oxymoron nature of the phrase.

Notes from the Field: Nugs

Scientific name

Nuggula minutus

Meaning: Nuggula from [nug] in Dwarven; minutus from [smaller] in Latin.

Translation: smallests of the nugs; the smallest species of the broader nug taxonomic group.

Common name

Common nug

Nug creature
A wild nug.

Taxonomic status

Kingdom Animalia; Phylum Chordata; Class Mammalia; Order Eulipotyphyla; Family Talpidae; Genus Nuggula; Species minus

Conservation status

Least concern

Distribution

Throughout the underground regions of Thedas; full extent of distribution possibly spans the full area of the continent.

Thedas Map.jpg
The continent of Thedas. The nug is likely distributed across much of the subterranean landmass, although the exact distribution is unknown.

Habitat

Nugs are primarly subterranean species, largely inhabiting the underground tunnels and cave systems occupied by Dwarven civilisation. However, nugs can be found on the surface predominantly in forested regions with accessible passageways into the subterranean realm.

Behaviour and ecology

Nugs are non-confrontational omnivorous species, preferring to hide and delve in the dark underground systems below the world of Thedas. Thus, nugs will typically avoid contact with people or predators by hiding in various crevices, using their pale skin to blend in with the surrounding rock faces. Reports of nugs in the wild demonstrate that nugs are remarkably inefficient at predator avoidance, despite their physiology; however, nug populations do not appear to suffer dramatically with predator presence, suggesting that either predators are too few to significantly impact population size or that alternative behaviours might allow them to rapidly bounce back from natural declines.

Given the lack of consistent light within their habitat, nugs are effectively blind, retaining only limited eyesight required for moving around above the surface. Nugs feed on a large variety of food sources, preferring insects but resorting to mineral deposits if available food resources are depleted. Their generalist diet may be one physiological trait that has allowed the nug to become some widespread and abundant historically.

Demography

Although the nug is a widespread and abundant species, they are heavily reliant on the connections of the Deep Roads to maintain connectivity and gene flow. With the gradual declination of Dwarven abundance and the loss of entire regions of the underground civilisation, it is likely that many areas of the nug distribution have become isolated and suffering from varying levels of inbreeding depression. Given the lack of access to these populations, whether some have collapsed since their isolation is unknown and potentially isolated populations may have even speciated if local environments have changed significantly.

Adaptive traits

Nugs are highly adapted to low-light, subterranean conditions, and show many phenotypic traits related to this kind of environment. The reduction of eyesight capability is considered a regression of unusable traits in underground habitats; instead, nugs show a highly developed and specialised nasal system. The high sensitivity of the nasal cavity makes them successful forages in the deep caverns of the underworld, and the elongated maw of the nug allows them to dig into buried food sources with ease. One of the more noticeable (and often disconcerting) traits of the nug is their human-like hands; the development of individual digits similar to fingers allows the nug to grip and manipulate rocky surfaces with surprising ease.

Management actions

Re-establishment of habitat corridors through the clearing and revival of the Deep Roads is critical for both reconnecting isolated populations of nugs and restoring natural gene flow, but also allowing access to remote populations for further studies. A combination of active removal of resident Darkspawn and population genetics analysis to accurately assess the conservation status of the species. That said, given the commercial value of the nug as a food source for many societies, establishing consistent sustainable farming practices may serve to both boost the nug populations and also provide an industry for many people.

Moving right along: dispersal and population structure

The impact of species traits on evolution

Although we often focus on the genetic traits of species in molecular ecology studies, the physiological (or phenotypic) traits are equally as important in shaping their evolution. These different traits are not only the result themselves of evolutionary forces but may further drive and shape evolution into the future by changing how an organism interacts with the environment.

There are a massive number of potential traits we could focus on, each of which could have a large number of different (and interacting) impacts on evolution. One that is often considered, and highly relevant for genetic studies, is the influence of dispersal capability.

Dispersal

Dispersal is essentially the process of an organism migrating to a new habitat, to the point of the two being used almost interchangeably. Often, however, we regard dispersal as a migration event that actually has genetic consequences; particularly, if new populations are formed or if organisms move from one population to another. This can differ from straight migration in that animals that migrate might not necessarily breed (and thus pass on genes) into a new region during their migration; thus, evidence of those organisms will not genetically proliferate into the future through offspring.

Naturally, the ability of organisms to disperse is highly variable across the tree of life and reliant on a number of other physiological factors. Marine mammals, for example, can disperse extremely far throughout their lifetimes, whereas some very localised species like some insects may not move very far within their lifetime at all. The movement of organisms directly facilitates the movement of genetic material, and thus has significant impacts on the evolution and genetic diversity of species and populations.

Dispersal vs pop structure
The (simplistic) relationship between dispersal capability and one aspect of population genetics, population structure (measured as Fst). As organisms are more capable of dispersing longer distance (or more frequently), the barriers between populations become weaker.

Highly dispersive species

At one end of the dispersal spectrum, we have highly dispersive species. These can move extremely long distances and thus mix genetic material from a wide range of habitats and places into one mostly-cohesive population. Because of this, highly dispersive species often have strong colonising abilities and can migrate into a range of different habitats by tolerating a wide range of conditions. For example, a single whale might hang around Antarctica for part of the year but move to the tropics during other times. Thus, this single whale must be able to tolerate both ends of the temperature spectrum.

As these individuals occupy large ranges, localised impacts are unlikely to critically affect their full distribution. Individual organisms that are occupying an unpleasant space can easily move to a more favourable habitat (provided that one exists). Furthermore, with a large population (which is more likely with highly dispersive species), genetic drift is substantially weaker and natural selection (generally) has a higher amount of genetic diversity to work with. This is, of course, assuming that dispersal leads to a large overall population, which might not be the case for species that are critically endangered (such as the cheetah).

Highly dispersive animals often fit the “island model” of Wright, where individual subpopulations all have equal proportions of migrants from all other subpopulations. In reality, this is rare (or unreasonable) due to environmental or physiological limitations of species; distance, for example, is not implicitly factored into the basic island model.

Island model
The Wright island model of population structure. In this example, different independent populations are labelled in the bold letters, with dispersal pathways demonstrated by the different arrows. In the island model, dispersal is equally likely between all populations (including from BD in this example, even though there aren’t any arrows showing it). Naturally, this is not overly realistic and so the island model is used mostly as a neutral, base model.

Intermediately dispersing species

A large number of species, however, are likely to occupy a more intermediate range of dispersal ability. These species might be able to migrate to neighbouring populations, or across a large proportion of their geographic range, but individuals from one end of the range are still somewhat isolated from individuals at the other end.

This often leads to some effect of population structure; different portions of the geographic range are genetically segregated from one another depending on how much gene flow (i.e. dispersal) occurs between populations. In the most simplest scenario, this can lead to what we call isolation-by-distance. Rather than forming totally independent populations, gene flow occurs across short ranges between adjacent ‘populations’. This causes a gradient of genetic differentiation, with one end of the range being clearly genetically different to the other end, with a gradual slope throughout the range. We see this often in marine invertebrates, for example, which might have somewhat localised dispersal but still occupy a large range by following oceanographic currents.

River IDB network
An example of how an isolation-by-distance population network might come about. In this example, we have a series of populations (the different pie charts) spread throughout a river system (that blue thing). The different pie charts represent how much of the genetics of that population matches one end of the river: either the blue end (left) or red end (right). Populations can easily disperse into adjacent populations (the green arrows) but less so to further populations. This leads to gradual changes across the length of the river, with the far ends of the river clearly genetically distinct from the opposite end but relatively similar to neighbouring populations.
River IDB pop structure.jpg
The genetic representation of the above isolation-by-distance example. Each column represents a single population (in the previous figure, a pie chart), with the different colours also representing the relative genetic identity of that population. As you can see, moving from Population 1 to 10 leads to a gradient (decreasing) in blue genes but increase in red genes. The inverse can be said moving in the opposite direction. That said, comparing Population 1 and Population 10 shows that they’re clearly different, although there is no clear cut-off point across the range of other populations.

Medium dispersal capabilities are also often a requirement for forming ‘metapopulations’. In this population arrangement, several semi-independent populations are present within the geographic range of the species. Each of these are subject to their own local environmental pressures and demographic dynamics, and because of this may go locally extinct at any given time. However, dispersal connections between many of these populations leads to recolonization and gene flow patterns, allowing for extinction-dispersal dynamics to sustain the overall metapopulation. Generally, this would require greater levels of dispersal than those typically found within metapopulation species, as individuals must traverse uninhabitable regions relatively frequently to recolonise locally extinct habitat.

Metapopulation structure.jpg
An example of metapopulation dynamics. Different subpopulations (lettered circles) are connected via dispersal (arrows). These different subpopulations can be different sizes and are mostly independent of one another, meaning that a single subpopulation can go locally extinct (the red X) without collapsing the entire system. The different dispersal pathways mean that one population can recolonise extinct habitat and essentially ‘rebirth’ other subpopulations (the green arrows).

Weakly dispersing species

At the far opposite end of the dispersal ability spectrum, we have low dispersal species. These are often localised, endemic species that for various reasons might be unable to travel very far at all; for some, they may spend their entire adult life in a sedentary form. The lack of dispersal lends to very strong levels of population structure, and individual populations often accumulate genetic differences relatively quickly due to genetic drift or local adaptation.

Species with low dispersal capabilities are often at risk of local extinction and are unable to easily recolonise these habitats after the event has ended. Their movement is often restricted to rare environmental events such as flooding that carry individuals long distances despite their physiological limitations. Because of this, low dispersal species are often at greater risk of total extinction and extinction vertices than their higher dispersing counterparts.

Accounting for dispersal in population genetics

Incorporating biological and physiological aspects of our study taxa is important for interpreting the evolutionary context of species. Dispersal ability is but one of many characteristics that can influence the ability of species to respond to selective pressures, and the context in which this natural selection occurs. Thus, understanding all aspects of an organism is important in building the full picture of their evolution and future prospects.