Wednesday, June 28, 2006


The Second Genetic Code

An article in a creationist magazine claims that the DNA code is yet another death-knell of evolution (Mark Twain's remark that "rumors of my death have been greatly exaggerated" comes to mind). The author says
It was in 1953 that James Watson and Francis Crick achieved what appeared impossible--discovering the genetic structure deep inside the nucleus of our cells.
Reading Jim Watson's book The Double Helix gives you an idea of just how impossible it seemed. That is, not at all. A few years earlier, various experiments had shown that deoxyribonucleic acid, DNA, carries the genetic information that produces proteins. Watson and Crick were confident that it would be possible to determine the physical structure of the DNA molecule, because the structure of other molecules had been figured out. Watson's book is a revealing account of the effort, including the rivalry with Linus Pauling, who also was confident that he could discover the structure. (A little too confident, maybe. Read the book.)

In the years after 1953, biologists concentrated on figuring out just how DNA translated into protein. Horace Freeland Judson's massive book The Eighth Day of Creation is the best account I've found. (The title seems to echo a quote from Thornton Wilder: "Man is not an end but a beginning. We are at the beginning of the second week. We are children of the eighth day.")

After some statistics about the amount of information crammed into such a small space, the article says

Let's first consider some of the characteristics of this genetic 'language.' For it to be rightly called a language, it must contain the following elements: an alphabet or coding system, correct spelling, grammar (a proper arrangement of the words), meaning (semantics) and an intended purpose.

Scientists have found the genetic code has all of these key elements. "The coding regions of DNA," explains Dr. Stephen Meyer, "have exactly the same relevant properties as a computer code or language".
The first problem with this is the talk about "grammar" and "intended purpose". Nothing in the genetic code corresponds to anything that we would normally call "grammar". DNA is a chemical template for the formation of RNA. Most RNA goes on to serve as a template for the formation of protein. Also, there is no semantics at the DNA level, as I show later.

I'm going to assume some knowledge of how proteins get made. For review, here are articles about amino acids, DNA. and RNA.

The creationist article continues:
Besides all the evidence we have covered for the intelligent design of DNA information, there is still one amazing fact remaining--the ideal number of genetic letters in the DNA code for storage and translation.
Moreover, the copying mechanism of DNA, to meet maximum effectiveness, requires the number of letters in each word to be an even number. Of all possible mathematical combinations, the ideal number for storage and transcription has been calculated to be four letters.

This is exactly what has been found in the genes of every living thing on earth--a four-letter digital code. As Werner Gitt states: 'The coding system used for living beings is optimal from an engineering standpoint. This fact strengthens the argument that it was a case of purposeful design rather that (sic) a [lucky] chance."
What got me onto this subject is the claim that "The coding system used for living beings is optimal from an engineering standpoint". The article says "the copying mechanism of DNA, to meet maximum effectiveness, requires the number of letters in each word to be an even number". This makes no sense. There are actually three "letters" in each "word". (I'll omit the quotes from here on. Anthropomorhpic-sounding terms are handy metaphors, nothing more.)

A four-letter alphabet is needed only if each word is to be the same length. That's not optimum, though. It would be more efficient if the DNA code used something like Huffman coding, which uses shorter words for more frequent things. Such a code is prefix-free, which means that no word is ever the prefix for any other. Thus if "A" is used for the most frequent amino acid, every other word will begin with some other letter. So when you are reading the text, you don't need any indication of where a word stops and the next one begins. (Francis Crick tried a similar scheme early on in the figuring-out of the genetic code, but it didn't work.) A biologist has calculated that "[t]he restriction to a fixed codon length of three bases means that it takes 42% more DNA than the minimum necessary, and the genetic code is 70% efficient."

So why didn't evolution produce a more efficient code? The biologist has some suggestions, but it seems to me that the major reason is that DNA doesn't code for protein. DNA codes for RNA. Since the function of this RNA is to tell other machinery how to code for protein, it's known as messenger RNA (mRNA).

A string of mRNA hooks up with a ribosome, which is part RNA and part protein, although the RNA seems to be the more important part. Another piece of RNA called a transfer RNA (tRNA) hooks onto the combination of ribosome and mRNA, carrying an amino acid. The tRNA has chemical hooks that recognize three specific letters in the mRNA. (As with just about everything in biology, it's a little more complicated than that.) The tRNA transfers its amino acid to the string of amino acids that's already attached to the ribosome. The ribosome moves the mRNA three letters on, ready for the tRNA that matches the next three-letter group.

DNA doesn't know anything about three-letter words (or about amino acids, for that matter); it's only at the ribosome that the word size becomes important. The correspondence between tRN and amino acid comes from an enzyme known as aminoacyl-tRNA synthetase. There is one synthetase for each combination of tRNA and amino acid; the synthetase attaches the proper amino acid to the proper tRNA.

A more technical explanation is here. This correspondence between mRNA, tRNA, and synthetase is sometimes known as the "second genetic code".

Or is it the first genetic code? RNA has lots of uses, and does most of the work in forming proteins; DNA just sits there. Maybe RNA came first, then DNA came along as a better way to simply store the information, leaving RNA as the user. This seems consistent with the RNA World hypothesis.

The relevance to the evolution of a more efficient DNA code is that any change in the length of a DNA word would have to be reflected in the tRNA, the ribosome, and the synthetase. If the pioneers of molecular biology had found that each tRNA included some sort of indicator of how far the ribosome needed to move the mRNA, then we might be a bit more receptive to an inference of design.