‘The Building Block of Life’

Before we can delve too deep into the expansive world of molecular ecology (that is, the study of evolution using genetic information), we must first understand the basics of genetics.

All organisms (except some viruses, if you count those) contain DNA: you may have heard it referred to as ‘the building block of life’. And this is fundamentally true; DNA is a chemical compound contained within cells which acts as a technical blueprint the cell will use to make all of the parts of the body. In its stable state, DNA looks like a ‘twisted ladder’, or a ‘double helix’ as we call it. The rungs of the DNA ladder are made of a combination of ‘nucleotide bases’, which are shortened to G (guanine), C (cytosine), A (adenosine) and T (thymine). Hopefully, these letters look a little familiar (see the top of the page…). Each one of these is always paired with a specific base: A is always paired with T, G with C. One ‘pair’ of sequences makes up one rung of the ladder.

DNA structure — The (very simplified) structure of the DNA double helix. Bonus points if you spot the blog title.

These letters of the DNA more-or-less spell out the basis for making all of the different proteins of the body. Specific sequences will say, for example, where to start reading the code (the capital at the start of the sentence) for a particular protein, while others will tell it where to stop (the full stop at the end of the sentence). The rest of the sentence is translated into the protein and is what we call a ‘gene’.

Despite the importance of genes, not all of the DNA is actually made of them. In fact, it’s estimated that only 1.5% of the genome (that is, the collection of all the DNA sequence in an organism) consists of genes: the rest of it is attributed to other things like control sequences, ‘junk’ DNA or coding for non-proteins (like RNAs, another type of nucleic acid). Some sections of the DNA sequence are often ‘cut out’ during the process of translating the gene into a protein; these are call ‘introns’ and are considered non-coding regions. It’s sort of like when you’re 100 words over the word count of an essay and have to start chopping sentences into smaller pieces. The parts that aren’t cut out, and actually translate to the protein, are called ‘exons’.

While the exact code of the gene is important, not all genes are expressed at the same time or constantly. Many genes are ‘switched on’ (activated) or ‘switched off’ (deactivated) by other external influences; usually different micro-sequences or proteins which block off or allow the translation process to occur. For example, the gene that creates the protein to digest lactose isn’t always active: only when lactose enters the cell and binds to a specific protein that rests on top of the lactose-digesting gene, removing it, does translation start to happen. This is because it’d be a total waste to make lactose-digesting proteins if there was no lactose around at all.

Genome structure — The generalised structure of the genome. Note that much of it is not made of genes. Within the gene, only the exon regions are translated into the final protein; the intron sequences are removed in an intermediate copy of the DNA (call the ‘mRNA’). The expression of the gene is controlled by the presence or absence of the repressor protein.

Why does all of this matter for molecular ecology? Well, the DNA sequence changes over generations due to mutations (spoiler: they don’t usually turn your skin green); these can happen for a variety of different reasons and aren’t inherently good or bad. It really depends where these mutations are happening in the genome and how this changes the DNA and the downstream proteins (or not).

Thus, DNA evolves over time if new mutations arise which cause changes that natural selection favours: if a mutation makes an animal see better at night, then it might gradually evolve to become a night hunter as it accumulates new mutations (if there is an actual fitness benefit to doing so: we’ll discuss that more in a later post). Contrastingly, bad mutations which cause an organism to be very “maladaptive” (i.e. “bad”; say, mutations which make your eyes bleed constantly) would be selected against.

We can use these types of information to study the evolution, ecology and conservation status of a species or population. We can look at how these mutations have accumulated; where in the genome they have accumulated; how frequently these new mutations arise; what effect these mutations have on the organism. With different statistical models, we can start to build a quantifiable way to handle this data and voila: molecular ecology is born! Many of these models are based on mathematical correlations between certain patterns in the frequency and distribution of new mutations within populations or species and certain biological effects like the size of the population, natural selection or connectivity across populations. Thus, we can investigate a massive swathe of possible questions with genetic data!