How do scientists study new viruses? Genome sequencing

In the second in a series of posts explaining how scientists study new viruses, I will talk about genome sequencing. Thanks again to 9-year-old Chloe for asking about how new viruses are studied.

If you’ve read my post ‘What is a virus?’, you may remember that a virus consists of a protective envelope, and inside that envelope is a strand of either DNA or RNA. DNA and RNA are molecules that store information in the form of a code. This code contains all the instructions that the virus needs for reproducing, and is called the genome. All living organisms also have a genome, including each of us.

Believe it or not, the code that makes up the genome only has four letters. In DNA these letters are C, G, A and T; in RNA they are C, G, A and U. Each letter represents a chemical called a base: A is adenine; T is thymine; G is guanine; C is cytosine and U is uracil. In the genome, the letters are arranged in groups of 3, called triplets. I won’t go into the maths, but there are 64 possible triplets.

Part of a DNA molecule. It forms a spiral called the double-helix, joined across the middle by bases. Here, purple represents thymine (T), orange is adenine (A), yellow is guanine (G) and green is cytosine (C)
RNA is a single strand. It’s shorter and less stable than DNA, and thymine (T) has been replaced by uracil (U) which is coloured red.

DNA and RNA contain the instructions for making proteins. Proteins are long molecules made up of chemicals called amino acids. There are 20 amino acids, and DNA and RNA have the instructions for which amino acids to join in which order to make a specific protein – this works because each amino acid is matched to a specific triplet. There are 64 possible triplets and only 20 amino acids. Several amino acids match more than one triplet. There are a few unused triplets which do not match any amino acid – these are called non-coding triplets and they are used to create spaces between instructions for different proteins.

Genomes are unique. When a new virus appears, its genome will be different to anything else, because it’s alterations to the genome that have made the virus change. Genome sequencing basically involves reading a genome and finding out the combinations of letters making up the code. This isn’t easy. The first full genome to be sequenced was a bacterium,and it had over 1,800,000 bases. Sequencing the human genome took 13 years!

Viruses are a lot simpler than bacteria with fewer bases – COVID-19 has around 30,000. That’s still a lot, but improvements in technology mean that when a new virus appears, it’s genome can be sequenced fairly rapidly. Within weeks of the first cases appearing, Chinese scientists had sequenced the virus’ genome using samples taken from patients.

Why is genome sequencing important? Although the genome of COVID-19 is unique, it has similarities to other strains of coronavirus. By comparing the genome of COVID-19 to other strains, scientists can get important information about where the virus came from and how it has changed to affect humans. In the case of COVID-19, they found that the virus is most likely to have come from bats or snakes; epidemiologists (see an earlier post) were able to use this information to confirm their theory that the virus originated in a livestock market in Wuhan, China.

The study of COVID-19’s genome is a world-wide project. In every country where cases appear, scientists are sequencing the genome and adding their findings to a massive international computer database. As recently as 13th March, scientists from the University of Sheffield published the genome sequence of the virus from the first patients in the UK. Collecting and analysing this information from across the world means that scientists can monitor how the virus is changing as it spreads, and it provides vital information for those working on vaccines and treatments. Monitoring minute changes in the genome also provides important information which epidemiologists can use in studying how the virus is spreading.

Something to be reassured about with COVID-19 is that scientists are making discoveries about the virus at an incredible rate. I’d hazard a guess that if this virus had appeared even 10-15 years ago, it would have taken scientists several years to discover what has been discovered in a few months. As a scientist myself I am literally in awe of what has been achieved in such a short time. I hope you found that interesting. My next post in this series will look at how microscopes are being used to study the virus.