In the 18 years since the completion of the Human Genome Project, the practicality of assembling full genomes for a wide range of taxa beyond ourselves has only improved. While model taxa systems have achieved genomes before many others, it is now possible for whole genomes to be assembled for a range of non-model organisms as well. But how do we assemble the genome of a species for the very first time (often de novo – literally “from the new”)? What can we do with this genome? Why is it so useful? Let’s delve into the process and outcomes of genome assembly a little more.
Unravelling the evolutionary history of organisms – one of the main goals of phylogenetic research – remains a challenging prospect due to a number of theoretical and analytical aspects. Particularly, trying to reconstruct evolutionary patterns based on current genetic data (the most common way phylogenetic trees are estimated) is prone to the erroneous influence of some secondary factors. One of these is referred to as ‘incomplete lineage sorting’, which can have a major effect on how phylogenetic relationships are estimated and the statistical confidence we may have around these patterns. Today, we’re going to take a look at incomplete lineage sorting (shortened to ILS for brevity herein) using a game-based analogy – a Pachinko machine. Or, if you’d rather, the same general analogy also works for those creepy clown carnival games, but I prefer the less frightening alternative.
It’s been a few minutes (okay, several weeks) since the last post here on The G-CAT. Naturally, over that time I’ve spent holidays with both my own family and my partner’s family. Hopefully, you’ve enjoyed your own Christmas/New Year/Other-Non-Denominational-Celebrations break (and for us Aussies, that you’ve managed to avoid much of the devastation of the recent bushfire epidemic).
Because of this period of time (and a few other more pressing deadlines I had for the start of the new year), I haven’t prepared a new post in some time. However, I’d like to take this time to address how the nature of this blog might change of the next year or so (and into the future).
A new schedule
For those of you who keep more up-to-date with my academic progress, you’ll be aware that this year is the final year of my PhD. As it stands, I’m due to submit my thesis in August of this year (which feels much, much sooner than it really is). Similarly, for anyone who has ever interacted with a PhD student in the final year of their studies, you’ll also be aware that this can be a time of high stress, stacking deadlines and the overall impending doom of D-Day (thesis submission).
In light of all of this, I have decided to move away from the more predictable fortnightly post routine in favour of a more organic timetable. This will likely mean a fairly significant reduction in the frequency of blog posts, whereby I will post as a topic comes into my mind or when it appears relevant to other parts of my studies (e.g. in reading for writing manuscripts, etc.). This decision has also been playing on my mind for some time to also balance the quality of the posts I write: in some circumstances, I feel like the consistent deadline of once per fortnight causes some posts to suffer a little as I rush to produce something at least in the vicinity of every second Wednesday.
The future of The G-CAT
It is not my intention to completely abandon this project: The G-CAT is something that I have invested a fair time and inspiration into and provides a solid avenue for science communication. As always, my inbox (on whichever platform you choose) is wide open for suggestions on topics of discussion. I’m looking forward to a more organic schedule that will allow me to properly explore and expand on the topics of interest whilst maintaining a healthy balance of PhD progression and down-time.
Further to this, we can expand the site-frequency spectrum to compare across populations. Instead of having a simple 1-dimensional frequency distribution, for a pair of populations we can have a grid. This grid specifies how often a particular allele occurs at a certain frequency in Population A and at a different frequency in Population B. This can also be visualised quite easily, albeit as a heatmap instead. We refer to this as the 2-dimensional SFS (2DSFS).
The same concept can be expanded to even more populations, although this gets harder to represent visually. Essentially, we end up with a set of different matrices which describe the frequency of certain alleles across all of our populations, merging them together into the joint SFS. For example, a joint SFS of 4 populations would consist of 6 (4 x 4 total comparisons – 4 self-comparisons, then halved to remove duplicate comparisons) 2D SFSs all combined together. To make sense of this, check out the diagrammatic tables below.
The different forms of the SFS
Which alleles we choose to use within our SFS is particularly important. If we don’t have a lot of information about the genomics or evolutionary history of our study species, we might choose to use the minor allele frequency (MAF). Given that SNPs tend to be biallelic, for any given locus we could have Allele A or Allele B. The MAF chooses the least frequent of these two within the dataset and uses that in the summary SFS: since the other allele’s frequency would just be 2N – the frequency of the other allele, it’s not included in the summary. An SFS made of the MAF is also referred to as the folded SFS.
Alternatively, if we know some things about the genetic history of our study species, we might be able to divide Allele A and Allele B into derived or ancestral alleles. Since SNPs often occur as mutations at a single site in the DNA, one allele at the given site is the new mutation (the derived allele) whilst the other is the ‘original’ (the ancestral allele). Typically, we would use the derived allele frequency to construct the SFS, since under coalescent theory we’re trying to simulate that mutation event. An SFS made of the derived alleles only is also referred to as the unfolded SFS.
Applications of the SFS
How can we use the SFS? Well, it can moreorless be used as a summary of genetic variation for many types of coalescent-based analyses. This means we can make inferences of demographic history (see here for more detailed explanation of that) without simulating large and complex genetic sequences and instead use the SFS. Comparing our observed SFS to a simulated scenario of a bottleneck and comparing the expected SFS allows us to estimate the likelihood of that scenario.
The SFS can even be used to detect alleles under natural selection. For strongly selected parts of the genome, alleles should occur at either high (if positively selected) or low (if negatively selected) frequency, with a deficit of more intermediate frequencies.
Adding to the analytical toolbox
The SFS is just one of many tools we can use to investigate the demographic history of populations and species. Using a combination of genomic technologies, coalescent theory and more robust analytical methods, the SFS appears to be poised to tackle more nuanced and complex questions of the evolutionary history of life on Earth.
Meaning: Octorokus from [octorok] in Hylian; infletus from [inflate] in Latin.
Translation: inflating octorok; all varieties use an inflatable air sac derived from the swim bladder to float and scan the horizon.
Octorokus infletus hydros [aquatic morphotype]
Octorokus infletus petram [mountain morphotype]
Octorokus infletus silva [forest morphotype]
Octorokus infletus arctus [snow morphotype]
Octorokus infletus imitor [deceptive morphotype]
Kingdom Animalia; Phylum Mollusca; Class Cephalapoda; Order Octopoda; Family Octopididae; GenusOctorokus; Speciesinfletus
The species is found throughout all major habitat regions of Hyrule, with localised morphotypes found within specific habitats. The only major region where the variable octorok is not found is within the Gerudo Desert, suggesting some remnant dependency of standing water.
Habitat choice depends on the physiology of the morphotype; so long as the environment allows the octorok to blend in, it is highly likely there are many around (i.e. unseen).
Behaviour and ecology
The variable octorok is arguably one of the most diverse species within modern Hyrule, exhibiting a large number of different morphotypic forms and occurring in almost all major habitat zones. Historical data suggests that the water octorok (Octorokus infletus hydros) is the most ancestral morphotype, with ancient literature frequently referring to them as sea-bearing or river-traversing organisms. Estimates from the literature suggests that their adaptation to land-based living is a recent evolutionary step which facilitated rapid morphological radiation of the lineage.
Several physiological characteristics unite the variable morphological forms of the octorok into a single identifiable species. Other than the typical body structure of an octopod (eight legs, largely soft body with an elongated mantle region), the primary diagnostic trait of the octorok is the presence of a large ‘balloon’ with the top of the mantle. This appears to be derived from the swim bladder of the ancestral octorok, which has shifted to the cranial region. The octorok can inflate this balloon using air pumped through the gills, filling it and lifting the octorok into the air. All morphotypes use this to scan the surrounding region to identify prey items, including attacking people if aggravated.
Diets of the octorok vary depending on the morphotype and based on the ecological habitat; adaptations to different ecological niches is facilitated by a diverse and generalist diet.
Although limited information is available on the amount of gene flow and population connectivity between different morphotypes, by sheer numbers alone it would appear the variable octorok is highly abundant. Some records of interactions between morphotypes (such as at the water’s edge within forested areas) implies that the different types are not reproductively isolated and can form hybrids: how this impacts resultant hybrid morphotypes and development is unknown. However, given the propensity of morphotypes to be largely limited to their adaptive habitats, it would seem reasonable to assume that some level of population structure is present across types.
The variable octorok appears remarkably diverse in physiology, although the recent nature of their divergence and the observed interactions between morphological types suggests that they are not reproductively isolated. Whether these are the result of phenotypic plasticity, and environmental pressures are responsible for associated physiological changes to different environments, or genetically coded at early stages of development is unknown due to the cryptic nature of octorok spawning.
All octoroks employ strong behavioural and physiological traits for camouflage and ambush predation. Vegetation is usually placed on the top of the cranium of all morphotypes, with the exact species of plant used dependent on the environment (e.g. forest morphotypes will use grasses or ferns, whilst mountain morphotypes will use rocky boulders). The octorok will then dig beneath the surface until just the vegetation is showing, effectively blending in with the environment and only occasionally choosing to surface by using the balloon. Whether this behaviour is passed down genetically or taught from parents is unclear.
Few management actions are recommended for this highly abundant species. However, further research is needed to better understand the highly variable nature and the process of evolution underpinning their diverse morphology. Whether morphotypes are genetically hardwired by inheritance of determinant genes, or whether alterations in gene expression caused by the environmental context of octoroks (i.e. phenotypic plasticity) provides an intriguing avenue of insight into the evolution of Hylian fauna.
Nevertheless, the transition from the marine environment onto the terrestrial landscape appears to be a significant stepping stone in the radiation of morphological structures within the species. How this has been facilitated by the genetic architecture of the octorok is a mystery.