Footnotes and extra appendixes transferred away from the PhD manuscript for the lack of space
"Cambrian explosion"/Neoproterozoic-Cambrian radiation
Soft tissues do fossilize. We have even been aware of eggs and embryos from the Cambrian for a few years (Morris 1998a):
In the two types of tiny fishes from lower Cambrian (Shu et al 1999) there has been even discussion on the possibility of affliction by parasites. Indeed, the well-received phrase of gill slits used for human embryos could be a key to understand the dilemma between time contra depth of geographical habitat in the interpretation of the fossil record. Shu et al (2001) describe the early Cambrian excavation:
Mainly some sponges, worms, and jellyfish predate the Cambrian "explosion". Insect-, amphibian-, reptile-, bird-, and mammal species (and plants) are usually considered post-Cambrian. If the rival catastrophic interpretations and the parameter of geographical latitude and echological niches upon burial would be taken into account, however, the informational appearance does look more abrupt.
This is especially significant, when one recalls that the Cambrian "explosion" is not an observation recycled in Finnish schoolbooks. This, again, is despite the historical fact that for Charles Darwin himself the sudden appearance of animal fossils at the beginning of the Cambrian was of particular concern. It was at odds with his gradualistic view that the diversification of life on earth through natural selection had required a long period of time. The only figure in the Origins of Species (1859) visualized the prediction that the major groups of animals should gradually diverge during evolution. Darwin argued that a long period of time, unrepresented in the fossil record, must have preceded the Cambrian to allow the various major groups of animals to diverge. At his time, the strata that is now called Cambrian were regarded within the concept of the Silurian.
"Ediacaran assemblages are presumably integral to understanding the roots of the Cambrian "explosion", and this approach assumes that the fossil record is historically valid. It is markedly at odds, however, with an alternative view, based on molecular data. These posit metazoan divergences hundreds of millions of years earlier. As such, the origination of animals would be more or less coincident with the postulated "Big Bang" of eukaryote diversification ~1,000 Myr ago. The existence of some sort of pre-Ediacaran metazoan history is a reasonable assumption, but such animals must have been minute because anything larger than about one millimeter would leave a sedimentary imprint as trace fossil. The literature is littered with claims for pre-Ediacaran traces, but the history of research has been one of continuous rebuttal." (Morris 2000b)
Bowring SA, Grotzinger JP, Isachsen CE, Knoll AH, Pelechaty
SM & Kolosov P (1993) Calibrating rates of early Cambrian evolution.
Science 261, 1293-8
Bromham L, Rambaut A, Fortey R, Cooper A & Penny D (1998) Testing the Cambrian explosion hypothesis by using a molecular dating technique. Proc Natl Acad Sci 95, 12386-9
Carroll RL (2000) Towards a new evolutionary synthesis. Trends Ecol Evol 15, 27-32
Erwin DH, Valentine JW & Sepkoski JJ (1987) A Comparative Study of Diversification Events: The Early Paleozoic Versus the Mesozoic. Evolution 41, 1178
Erwin DH (2000) Macroevolution is more than repeated rounds of microevolution. Evol Dev 2, 78-84
Gould SJ (1991) Wonderful Life. The Burgess Shale and the Nature of History. Penguin Harmondsworth
Morris SC (1993) The fossil record and the early evolution of the Metazoa. Nature 361, 219-25
Morris SC (1998a) Eggs and embryos from the cambrian. Bioessays 20, 676-82
Morris SC (2000a) Nipping the Cambrian "explosion" in the bud? Bioessays 22, 1053-6
Morris SC (2000b) The Cambrian "explosion": slow-fuse or megatonnage? Proc Natl Acad Sci 97, 4426-9
Scherer S & Junker R (2000) Evoluutio - kriittinen analyysi. Datakirjat
Shu DG, Luo, HL, Morris, SC, Zhang, X-L, Hu, S-X, Chen, L, Han, J, Li, Y & Chen, L-Z. (1999) Lower Cambrian vertebrates from south China. Nature 402, 42-6
Shu DG, Morris SC, Han J, Chen L, Zhang XL, Zhang ZF, Liu HQ, Li Y & Liu JN (2001) Primitive deuterostomes from the Chengjiang Lagerstatte (Lower Cambrian, China). Nature 414, 419-24
Morris SC (1998b) The Crucible of Creation. Oxford University Press, UK
Wray GA, Levinton JS & Shapiro LH (1996) Molecular evidence for deep precambrian divergence among metazoan phyla. Science 274, 568-73
Quotes from Science News since the original draft
Genomes galore – a great opportunity to study evolution, right? Think again. A paper in Science by Wong et al1 revealed systematic uncertainty in the way genomes are compared, leading to bias that makes genetic comparisons essentially useless. Antonis Rokas, in the same issue,2 began his commentary on this problem thus:
Darwin relied on fossils, morphology, and geographical distribution to glean important clues about the history of life. Today, natural historians can study organisms’ history of change and adaptation by probing the DNA record. Whether to elucidate evolutionary relationships of genes and species or spot the amino acid changes driven by selection, we need to be able to generate accurate alignments of DNA sequences. On page 473 of this issue, Wong et al.1 provide some important caveats on how this can go awry and how to avoid alignment bias.Rokas continued with a folksy explanation of the basic problem:
For years, the standard protocol has been to pick a favorite algorithm to optimize the alignment it generates. This approach is fast and easy, but it is like being forced to always settle on vanilla ice cream for dessert; doing so can taint one’s opinion about ice cream. Similarly, sticking to the use of a single alignment from a single algorithm can bias the estimation of phylogenies or of other evolutionary parameters pivotal to our understanding of the DNA record. Until now, the extent and potential significance of this bias introduced by alignment was unknown. Wong and colleagues quantify the contribution of alignment uncertainty to genome-wide evolutionary analyses and report that we sweep this uncertainty under the proverbial rug at our peril.Wong and team used seven popular programs to compare seven genomes. “The term ‘popular’ is not used lightly here,” Rokas notes; “these programs have been employed, judging by citation counts, in at least 25,000 analyses.” The potential for revision, therefore is enormous. What did the researchers find?
They report that a staggering 46.2% of the genes examined exhibit variation in the phylogeny produced dependent on the choice of alignment method, whereas the prediction of the amino acid changes driven by selection was likewise method dependent for another 28.4% of the genes.The significance of this “whoops” admission cannot be overstated. For years, evolutionary biologists have depended on the “popular” algorithms to generate phylogenetic trees, expecting their results to be reliable. Rokas explains that high “bootstrap” values for some trees (a popular index that is supposed to measure robustness in inference) can be misleading, because “bootstrap values do not always equate with phylogenetic accuracy.” But if the bootstrap value is strong, what is in error – the signal or the phylogenetic inference? Rokas did not explore the latter possibility.
Wong et al explain how researchers can fall into the trap by trusting algorithms that cannot bear the weight of inference placed on them:
A common theme in comparative genomics studies is a flow diagram, or chart, tracing the various steps and algorithms used during the analysis of a large number of genes. Flow charts can be quite sophisticated, with steps such as identifying orthologous gene sets, aligning the genes, and performing different statistical analyses on the resulting alignments. The key point, and a great practical difficulty in comparative genomics studies, is that the analyses must be repeated many times. The procedure, then, is largely automated, with scripting languages such as Perl or Python cobbling together individual programs that perform each step. In addition, many of the individual steps involve procedures originally developed in the evolutionary biology literature, to perform phylogeny estimation or to identify individual amino acid residues under the influence of positive selection. Statistical methods that until recently would have been applied to a single alignment, carefully constructed, are now applied to a large number of alignments, many of which may be of uncertain quality and cause the underlying assumptions of the methods to fail.This seems to indicate another problem: the very algorithms trusted were written on the assumption of evolution. Is there a circularity here? Will the algorithm select the data that will produce the expected evolutionary result? They did not elaborate.
The authors state that the uncertainty is not just a matter of sloppy analysis. A biologist may run the program with great care and precision. It’s trusting the algorithms themselves, and being unaware of the uncertainties, that leads to huge errors and false conclusions. They explain how this can happen:
Many comparative genomics studies are carefully performed and reasonable in design. However, even carefully designed and carried out analyses can suffer from these types of problems because the methods used in the analysis of the genomic data do not properly accommodate alignment uncertainty in the first place. Moreover, the genes that are of greatest interest to the evolutionary biologist probably suffer disproportionately. For example, in several studies, the genes of greatest interest were the ones that had diverged most in their nonsynonymous rate of substitution. But, these are the very genes that should be the most difficult to align in the first place. We also do not believe that the alignment uncertainty problem is one that can be resolved by simply throwing away genes, or portions of genes, for which alignment differs.In fact, throwing out portions that have ambiguous alignments can lead to other problems, such as removing a large portion of the primary data. It also does not guarantee the remainder will line up well.
Rokas has a good-news-bad-news story. On the hopeful side, “several novel statistical methods that simultaneously estimate alignment and evolutionary parameters of interest such as phylogeny have shown exceptional promise,” he said. The bad news is there’s a catch: “The computational demands of these programs are prohibitive.”
Wong et al suggested some ways to mitigate alignment bias. No matter the quality control used, though, carefulness is not going to solve all the problems. “The goal is to analyze all of the genes in the genome,” they said. “As we have shown here, many of these genes will be difficult to align and result in highly variable evolutionary parameter estimates.” They did not seem to explore the possibility of circular reasoning in the algorithms.
Wow. This is going to be a shattering revelation to many a biologist. Rokas put the best possible spin on a bad situation:
As in any scientific field, molecular evolution has a long tradition of dramatic transformation. The development of a powerful computational and statistical arsenal to account for the uncertainty stemming from sequence alignments is heralding the first paradigm shift in the era of genome-scale analysis.Now, the question is what to do about the 25,000 erroneous papers, and how long it will take to overcome the inertia of thousands of scientists continuing to use the popular algorithms oblivious to their inherent uncertainties.
1. Wong, Suchard and Huelsenbeck, “Alignment Uncertainty and Genomic Analysis,” Science, 25 January 2008: Vol. 319. no. 5862, pp. 473-476, DOI: 10.1126/science.1151532.
2. Antonis Rokas, “Lining Up to Avoid Bias,” Science, 25 January 2008: Vol. 319. no. 5862, pp. 416-417, DOI: 10.1126/science.1153156..