BIOCHEMISTRY OF VIRUSES

Microbes

Microbes, or microorganisms, are microscopic organisms. Bacteria, fungi and viruses are the most common microbes. Most of those are harmless (i.e. commensal or non-pathogenic). However, some are pathogenic, meaning they make us sick. These microbes secrete chemicals as a normal part of their lifecycles that are toxic to us. Interestingly, various animals respond differently to a toxin. While the bird flu virus kills chicken and humans, waterfowl is asymptomatic to it, i.e., the virus does not bother the waterfowl. Similarly, bats are asymptomatic to the COVID-19 virus. This website is dedicated to viruses because their genetic material can be sequenced more precisely than the other two microbes. Therefore, it is conducive to mathematical analysis.

Nucleotides and Genome

Nucleotides are the building blocks of genome, which is either DNA or RNA. It is believed that life originated with the RNA genome. It evolved to DNA because RNA replication is highly error prone. Almost all organisms have DNA genome and are eukaryotes, i.e. they have a well-defined nucleus in their cells. Their DNA is enclosed inside the nucleus. Bacteria are a major exception. They do not have a nucleus. So, they are prokaryotes. Their DNA floats freely inside the cell. The DNA nucleotides comprise adenine (A), cytosine (C), guanine (G) and thymine (T). Uracil (U) replaces thymine in RNA. In a double-strand DNA. C always pairs with G and A with T. The C-G/G-C and A-T/T-A pairs are called base pairs. The bacterial and fungal genomes are made of single-strand DNA (ssDNA) and double-strand DNA (dsDNA), respectively. Humans have dsDNA, arranged in the familiar double helix. In comparison, viruses can have single-strand RNA (ssRNA), double-strand RNA (dsRNA), ssDNA or dsDNA. Both flu and coronaviruses have ssRNA.

Please click on images for more info

Amino Acids

All life on Earth is built with only 20 amino acids (AA), though over 70 have been identified in meteorites. All AA have a backbone made of one amino group and one acid group, hence the name amino acid (shown as Generic Amino Acid in the figure on left). Each AA is identified by its unique side chain or "R" group (in green in the figure). The sidechain can be as simple as a single carbon (as in alanine) or a complex of multiple carbon rings (as in tryptophan). There are six major elements in all living matter, CHNOPS (carbon, hydrogen, nitrogen, oxygen, phosphorus and sulfur). The sidechains contain all of those elements, except phosphorus which is a key component of the genome (shown as black filled circles in the previous figure).

Amino Acid Symbols

Similar to the A, C, G and T/U symbols for nucleotides, the amino acids have single letter symbols. As we will see in the subsequence tutorials, these symbols are used copiously in biochemistry, like the symbols for elements in chemistry.

Amino Acids and Their One-letter Symbols

(Letters B, J, O, U, X and Z are not assigned to any amino acid)

  Alanine A     Glutamine Q     Leucine L        Serine S
  Arginine R   Glutamic acid   E      Lysine K Threonine   T
  Asparagine   N      Glycine G      Methionine M        Tryptophan W
  Aspartic acid D   Histidine H      Phenylalanine F      Tyrosine Y
  Cysteine C        Isoleucine I Proline P      Valine V

Proteins

Proteins are formed by the end-to-end chaining of amino acids (AA). The acid group of one AA links with the amino group of the next AA to form the peptide bond. A group of two or more AA is called a peptide. While forming the peptide bond, one hydrogen atom is lost from the amino group and OH from the acid group for a total of one molecule of water (H2O). The next AA in the protein chain is similarly added to the right of this AA pair. The process continues with the addition of AA to the growing chain until the protein is fully assembled.

Relationship Between Genome and Proteins

There is a one-to-one relationship between genome and proteins, meaning, coding the proteins is the sole activity of the genome and conversely proteins are created through genome alone. Only certain parts of the genetic material is used for coding proteins. These are called genes. Each gene codes a specific protein. There are subtle differences in the coded proteins among people because our genes are different. Thus, our individual traits are controlled through proteins. Proteins have a large number of functions in the body including catalyzing almost all biological reactions. Even increasing or decreasing the activity of a gene is controlled by other proteins. Less than 1.5% of the human genome is known to be functional. The rest is called "junk DNA". Most likely, there are useful areas in the so called junk sections. Scientists are constantly looking for those. Further, over 99.9% of the DNA is identical among all people. So, our individuality depends on less than 0.1% of our genome!

Coding of Proteins

How do we get from the genome, which is a string of nucleotides, to proteins that are a concatenation of amino acids? It is a very complex process. The extreme details are not germane to our discussion here. Even though all lifeform, other than viruses, has a DNA genome there is an immense variety of RNAs in all cells. They are the workhorses of ferrying important molecules inside the cells. When a gene gets the signal to create its protein, a group of proteins called polymerase makes a copy of the gene. As stated above, the signaling itself is done through proteins. The end product of copying the DNA in nucleus is a specialized RNA called messenger RNA (mRNA). mRNA moves from the nucleus to cytoplasm (the space outside the nucleus but inside the cell). There the mRNA is translated into the protein using a protein complex called ribosome. Incidentally, protein named with suffix ase e.g. polymerase, are protein-based catalysts (aka enzymes). Also, different animals have different DNA and RNA polymerases. The importance of this fact will become apparent when we will study why a particular virus can survive in some animals and not others.

Codon Table

We still have not answered the question, how are the two totally different molecules, nucleotides and peptides related? The codon table is the link. Please click on the figure on the left to see the table. The nucleotides in genome work together in triples. Each triplet recognizes a unique amino acid (AA). For example, as seen in the codon table, the sequence of nucleotides GCT and TGG recognize alanine and tryptophan, respectively. There are redundancies in the codon table, e.g., GCT, GCC, GCA and GCG all code for alanine. However, TGG is the only codon for tryptophan. This will be an important detail when we study the mutations in viruses. Coming back to the translation from nucleotides to peptides, we had left off at the stage where a gene was copied into an mRNA and the mRNA was sent from the nucleus to the cytoplasm. The mRNA is a sequence of A, C, G and U's. It is fed into the ribosome like a ribbon (shown in the figure on left) . There is an abundance of premade AAs in the cytoplasm. These are ferried around by the transfer RNAs (tRNA). The tRNAs constantly present their AAs to an active ribosome. If the AA matches the codon that is being processed by the ribosome, it will be added to the growing ribbon of protein by forming the peptide bond, as explained earlier. Once the entire mRNA is processed, the ribbons of mRNA and newly formed protein disengage from the ribosome. The mRNA is recycled into its component nucleotides which return to the nucleus and proteins is sent for maturation and eventual secretion out of the cell. Meanwhile, the ribosome goes in the hunt for next mRNA to process. This section gives a glimpse of the key role of proteins in the existence of life. Of three major food groups, carbohydrates and lipids (aka fats) are the primary providers of energy but proteins control everything. Of course, there are many overlapping and synergistic roles among the three food groups. For instance, protein is used for energy during starvation.

Viruses

Schematic of flu virus

Finally, we come to the biochemistry of viruses. Viruses are unique in numerous respects. For starter, most still use the prehistoric RNA genome which enables them to mutate wildly. While that is good for their survivability, it makes them deadly like the current COVID-19. Viruses also have a primitive structure. They are literally a package of bare minimum RNA/DNA wrapped inside a capsid or membrane with some surface proteins. That also makes them very small and gives them the physical size needed to get inside a host cell without killing it. A human cell is 100 micrometer in diameter while the flu virus is 100 nanometer. That makes the cell one billion times more voluminous than the virus. Compared to the simplicity of a virus, a human cell is an industrial complex. The viral genome only codes for its own specific proteins. It lacks the key components such as polymerases, mRNA, tRNA, ribosomes, etc. Therefore, it has to get inside a host cell and hijack the cell's vast machinery to replicate itself. Viruses use the surface proteins as decoys to get inside the cells. As we will study in later Sections, flu and coronaviruses use hemagglutinin and spike proteins on their surfaces to gain entry into human cells.