Main Introduction Page Electronic Reference Library Citation for this website Know-how Glossary of acronyms and other terms used on this website Support the Butterflies of America Foundation Interactive Listing of American Butterflies Learn about contributing your photos Photographer Credits Contact us
index sitemap advanced

DNA phylogeny made easy:
a case study of Poladryas

© Nick V. Grishin

Recent invasion of DNA sequences into taxonomy is scary to some. While most researchers agree that sequence analysis is helpful, many feel somewhat lost because "feel and look" of DNA is different from that of specimens. In addition, probably because people enjoy arguing with each other, and because several classes of phylogenetic methods have been proposed (parsimony, distance, maximum likelihood, etc.), there is no agreement about how phylogenetic analysis of DNA should be done to be maximally productive. It seems obvious that all these classes of methods should be considered, as simplifying assumptions in each class and case may be quite different. However, due to various reasons, most research groups tend to pick their class of choice and criticize others who pick another class of methods. Typical, but not very useful.

The idea behind this essay is to show that there is no magic in DNA analysis and it can make sense in easy ways. As a disclaimer, my apologies to those who understand evolutionary analysis of DNA is detail, as many concepts here are necessarily simplified, but all the simplifications are carefully considered to highlight the substance behind the form.

Genus Poladryas is an interesting North American genus of 2 species. Its phylogenetic affinities among the tribe Melitaeini were not well understood. We illustrate how analysis of DNA sequences can be used to clarify the position of Poladryas among its closest relatives. This serves as an example to describe the logic behind phylogenetic analysis of DNA.

Very impressive studies of the family Nympalidae have been performed recently by the Niklas Wahlberg group. The group maintains a wonderful web site: The Nymphalidae Systematics Group and pioneered application of major DNA techniques to Butterflies, including the polished protocol of data acquisition. Wahlberg and colleagues obtained partial DNA sequences for many species of Nymphalidae. Most of the published sequences are available from the GenBank database for free to everyone to analyze. Five of these sequences are used here.

For simplicity, we have chosen to analyze just one gene: 16S ribosomal RNA gene, which is a standard marker for phylogeny reconstruction in many groups of organisms. Other genes can be added to the analysis – you are welcome to try! However, the conclusions will not change.


The first step is to obtain needed sequences from the database. Submitting (Poladryas 16S ribosomal RNA gene) to the GenBank database search at http://www.ncbi.nlm.nih.gov/nuccore retrieves one entry: web page under this link. The entry contains a partial sequence of the gene. To get just the sequence only, go to FASTA link close to the top left of the NCBI web page. "FASTA" is a special sequence format, in which the first line is starting from the ">" character and is a description of the sequence. Sequence itself follows in other lines. You can cut'n'paste the retrieved sequence in your favorite text editor to work with it further. Here is how this FASTA-formatted sequence looks like:

>gi|8388955|gb|AF186854.1| Poladryas arachne voucher NW27-4 16S ribosomal RNA gene, partial sequence; mitochondrial
TCAAAAACATGTCTTTTTGAAAATAATTTAAAGTTTAATCTGCCCACTGATATATTTATTAAAGGGCTGC
AGTATATTGACTGTACAAAGGTAGCATAATCATTAGTCTTTTAATTGAAGACTTGTATGAAAGATTTGAT
GAAATATAAACTGTCTCTAATTTAATAATAAAATTTAATTTTTTAGTTAAAAAGCTAAAATAATATTAAA
AGACGAGAAGACCCTATAAAGTTTTATAATTTATTTATTTAATATTAAATATATAATTAATTATAGTAAT
TATATAAAATTATTTTATTGGGGTGATAGAAAAATTTAATAAACTTTTTTTATATTATTAACATAAATAA
GTGAAAAAATGATCCATTATTAATGATTAAAAGAAAAAATTACTTTAGGGATAACAGCGTAATATTTTTT
TTTAGAACAAATAAAAAAAAAAGTTTGCGACCTCGATGTTGGATTAAGATAAAATTTAAATGCAAAAGTT
TAAAATTTTGATCTGTTCGATCATTAAAATCTTACATGATCTGAGCT

We choose 4 other genera representing the 4 groups from the Melitaeini Tribe, namely Melitaea, Euphydryas, Chlosyne and Phyciodes. While the sequence for only one Poladryas species is available, all these 4 genera are represented in GenBank by many species. We have semi-randomly chosen the following well-known species: Melitaea athalia (common European species), Euphydryas phaeton, Chlosyne theona, and Phyciodes tharos. As it turned out, this choice worked well for the analysis. Here are GenBank sequences of all 5 taxa:

>gi|8388955|gb|AF186854.1| Poladryas arachne voucher NW27-4 16S ribosomal RNA gene, partial sequence; mitochondrial
TCAAAAACATGTCTTTTTGAAAATAATTTAAAGTTTAATCTGCCCACTGATATATTTATTAAAGGGCTGC
AGTATATTGACTGTACAAAGGTAGCATAATCATTAGTCTTTTAATTGAAGACTTGTATGAAAGATTTGAT
GAAATATAAACTGTCTCTAATTTAATAATAAAATTTAATTTTTTAGTTAAAAAGCTAAAATAATATTAAA
AGACGAGAAGACCCTATAAAGTTTTATAATTTATTTATTTAATATTAAATATATAATTAATTATAGTAAT
TATATAAAATTATTTTATTGGGGTGATAGAAAAATTTAATAAACTTTTTTTATATTATTAACATAAATAA
GTGAAAAAATGATCCATTATTAATGATTAAAAGAAAAAATTACTTTAGGGATAACAGCGTAATATTTTTT
TTTAGAACAAATAAAAAAAAAAGTTTGCGACCTCGATGTTGGATTAAGATAAAATTTAAATGCAAAAGTT
TAAAATTTTGATCTGTTCGATCATTAAAATCTTACATGATCTGAGCT

>gi|8388958|gb|AF186857.1| Melitaea athalia isolate 5-5 16S ribosomal RNA gene, partial sequence; mitochondrial
TCAAAAACATGTCTTTTTGAAAATAATTTAAAGTCTAATCTGCCCACTGATATAATTATTAAAGGGCTGC
AGTATATTGACTGTACAAAGGTAGCATAATCATTAGTCTTTTAATTGAAGACTTGTATGAAAGATTTGAT
GAAATATATACTGTCTCTAATTTATGAATAAAAATTAATTTTTTAGTTAAAAAGCTAAAATAATATTAAA
AGACGAGAAGACCCTATAAAGTTTTATAATTTATTTATTTAATATTAATTATATAAATAAATATAATAAT
TAATTTAAATTATTTTATTGGGGTGATAAAAAAATTTAATTAACTTTTTTTAAAAAATAAACATAAATAA
GTGTGATAATGATCCATTATTAATGATTAAAAGAAAAAATTACTTTAGGGATAACAGCGTAATATTTTTT
TTTAGTACAAATAAAAAAAAAAGTTTGCGACCTCGATGTTGGATTAAGATAAAATTTAAGTGCAAAAGTT
TAAAATTTTGATCTGTTCGATCATTAAAATCTTACATGATCTGAGCT

>gi|8389009|gb|AF186908.1| Euphydryas phaeton voucher NW13-3 16S ribosomal RNA gene, partial sequence; mitochondrial
TCAAAAACATGCTTTTTATATTAATTTAAAGTCTAATCTGCCCACTGATAAATATTATTAAAGGGCTGCA
GTATATTGACTGTACAAAGGTAGCATAATCATTAGTCTTTTAATTGAAGACTTGTATGAAAGATTTGATG
AAATATAAACTGTCTCTAATTTATAAATAGAAATTAATTTTTTAATTAAAAAGTTAAAATATTATTAAAA
GACGAGAAGACCCTATAGAGTTTTATAATTTATTTATTTAATTATAAAATATATATTTAAATTTAATAAA
TAAATAAATTATTATATTGGGGTGATAAAAAAATTTAATAAACTTTTTTTAATTAAATAACATAAATAAA
TGAAAAAATGATCCATTATTAATGATTAGAAGAAAAAATTACCTTAGGGATAACAGCGTAATGTTTTTTT
TTAGTACAAATAAAAAAAAAAGTTTGCGACCTCGATGTTGGATTAAGATAAAATTTAAATGCAAAAGTTT
AAAAGTTTTGATCTGTTCGATCATTAAAATCTTACATGATCTGAGCT

>gi|8389018|gb|AF186917.1| Chlosyne theona voucher NW27-6 16S ribosomal RNA gene, partial sequence; mitochondrial
TCAAAAACATGTCTTTTTGATAATAATTTAAAGTCTAATCTGCCCACTGATATAATTATTAAAGGGCTGC
AGTATATTGACTGTACAAAGGTAGCATAATCATTAGTCTTTTAATTGAAGACTTGAATGAAAGATTTGAT
GAAATATAAACTGTCTCTAATTTAAAAATAAAATTTAATTTTTTAGTTAAAAAGCTAAAATATTATTAAA
AGACGAGAAGACCCTATAAAGTTTTATAATTTATTTATTTAACATTAAATATATAATTAATTATGATAAT
TAAATAAATTATTTTATTGGGGTGATAGAAAAATTTAATAAACTTTTTTTAAAAATAAACATAAATAAGT
GAATAAATGATCCATTATTAATGATTAAAAGAAAAAATTACTTTAGGGATAACAGCGTAATATTTTTTTT
TAGTACAAATAAGAAAAAAAGTTTGCGACCTCGATGTTGGATTAAGATAAAATTTAAATGCCAAAATTTA
AAATTTTAGATCTGTTCGATCATTAAAATCTTACATGATCTGAGCT

>gi|8389017|gb|AF186916.1| Phyciodes tharos voucher NW34-2 16S ribosomal RNA gene, partial sequence; mitochondrial
TCAAAAACATGTCTTTTTGATAATAATTTAAAGTCTAATCTGCCCACTGATATAAATATTAAAGGGCTGC
AGTATATTGACTGTACAAAGGTAGCATAATCATTAGTCTTTTAATTGAAGACTTGTATGAAAGATTTGAT
GAAATATAATCTGTCTCTATATTATTAATAGAAATTAATTTTTTAGTTAAAAAGCTAAAATAGTATTAAA
AGACGAGAAGACCCTATAAAGTTTTATAATTTATTTATTTATTATTAATTGTAAAAATAAATATTATAAT
TAAATAAATTATTTTATTGGGGTGATAGAAAAATTAAATAAACTTTTTTTTAATATAAAACATAAATAAA
TGAAAAATTGATCCATTAATAATGATTAAAAGAAAAAATTACTTTAGGGATAACAGCGTAATATTTTTTT
TTAGTACAAATAAAAAAAAAAGTTTGCGACCTCGATGTTGGATTAAGATAAAATTTAAATGCAAAAGTTT
AAAATTTTGATCTGTTCGATCATTAAAATCTTACATGATCTGAGCT

The second step is to align the sequences, i.e. to place letters (=nucleotides) in the same column if these letters are in evolutionarily equivalent positions of these sequences. Nucleotides in the same column = they all evolved from a common ancestral nucleotide. We use the MUSCLE server at EBI. Select "ClustalW2 (Strict)" for the "Output Format", instead of default "FASTA" and paste the above 5 sequences into the window. "Run" it and in less than a minute we get the result. Since sequences are long, the alignment is cut into blocks. Each block starts from the names of sequences taken from the first line (definition, starts with ">"). To obtain nice names (not numbers as in defines shown above), the defines were replaced with species names to get this alignment:

CLUSTAL W (1.81) multiple sequence alignment

Euphydryas_phaeton      TCAAAAACATGCTTTTT--ATATTAATTTAAAGTCTAATCTGCCCACTGATAAATATTAT
Phyciodes_tharos        TCAAAAACATGTCTTTTTGATAATAATTTAAAGTCTAATCTGCCCACTGATATA-AATAT
Melitaea_athalia        TCAAAAACATGTCTTTTTGAAAATAATTTAAAGTCTAATCTGCCCACTGATATA-ATTAT
Chlosyne_theona         TCAAAAACATGTCTTTTTGATAATAATTTAAAGTCTAATCTGCCCACTGATATA-ATTAT
Poladryas_arachne       TCAAAAACATGTCTTTTTGAAAATAATTTAAAGTTTAATCTGCCCACTGATATA-TTTAT
                        ***********  ****  * * *********** ***************** *   ***

Euphydryas_phaeton      TAAAGGGCTGCAGTATATTGACTGTACAAAGGTAGCATAATCATTAGTCTTTTAATTGAA
Phyciodes_tharos        TAAAGGGCTGCAGTATATTGACTGTACAAAGGTAGCATAATCATTAGTCTTTTAATTGAA
Melitaea_athalia        TAAAGGGCTGCAGTATATTGACTGTACAAAGGTAGCATAATCATTAGTCTTTTAATTGAA
Chlosyne_theona         TAAAGGGCTGCAGTATATTGACTGTACAAAGGTAGCATAATCATTAGTCTTTTAATTGAA
Poladryas_arachne       TAAAGGGCTGCAGTATATTGACTGTACAAAGGTAGCATAATCATTAGTCTTTTAATTGAA
                        ************************************************************

Euphydryas_phaeton      GACTTGTATGAAAGATTTGATGAAATATAAACTGTCTCTAATTTATAAATAGAAATTAAT
Phyciodes_tharos        GACTTGTATGAAAGATTTGATGAAATATAATCTGTCTCTATATTATTAATAGAAATTAAT
Melitaea_athalia        GACTTGTATGAAAGATTTGATGAAATATATACTGTCTCTAATTTATGAATAAAAATTAAT
Chlosyne_theona         GACTTGAATGAAAGATTTGATGAAATATAAACTGTCTCTAATTTAAAAATAAAATTTAAT
Poladryas_arachne       GACTTGTATGAAAGATTTGATGAAATATAAACTGTCTCTAATTTAATAATAAAATTTAAT
                        ****** **********************  *********  ***  **** ** *****

Euphydryas_phaeton      TTTTTAATTAAAAAGTTAAAATATTATTAAAAGACGAGAAGACCCTATAGAGTTTTATAA
Phyciodes_tharos        TTTTTAGTTAAAAAGCTAAAATAGTATTAAAAGACGAGAAGACCCTATAAAGTTTTATAA
Melitaea_athalia        TTTTTAGTTAAAAAGCTAAAATAATATTAAAAGACGAGAAGACCCTATAAAGTTTTATAA
Chlosyne_theona         TTTTTAGTTAAAAAGCTAAAATATTATTAAAAGACGAGAAGACCCTATAAAGTTTTATAA
Poladryas_arachne       TTTTTAGTTAAAAAGCTAAAATAATATTAAAAGACGAGAAGACCCTATAAAGTTTTATAA
                        ****** ******** ******* ************************* **********

Euphydryas_phaeton      TTTATTTATTTAATTATAAAATATATATTTAAATTTAATAAATAAAT-AAATTATTATAT
Phyciodes_tharos        TTTATTTATTTAT-TATTAATTGTAAAAATAAATATTATAATTAAAT-AAATTATTTTAT
Melitaea_athalia        TTTATTTATTTAA-TATTAATTATATAAATAAATATAATAATTAATTTAAATTATTTTAT
Chlosyne_theona         TTTATTTATTTAA-CATTAAATATATAATTAATTATGATAATTAAAT-AAATTATTTTAT
Poladryas_arachne       TTTATTTATTTAA-TATTAAATATATAATTAATTATAGTAATTATATAAAATTATTTTAT
                        ************   ** ** * ** *  *** * *  *** **  * ******** ***

Euphydryas_phaeton      TGGGGTGATAAAAAAATTTAATAAACTTTTTTTAATTAAATAACATAAATAAATGAAAAA
Phyciodes_tharos        TGGGGTGATAGAAAAATTAAATAAACTTTTTTTTAATATAAAACATAAATAAATGAAAAA
Melitaea_athalia        TGGGGTGATAAAAAAATTTAATTAACTTTTTTTAAAAAATAAACATAAATAAGTGTGATA
Chlosyne_theona         TGGGGTGATAGAAAAATTTAATAAACTTTTTTTAAA-AATAAACATAAATAAGTGAATAA
Poladryas_arachne       TGGGGTGATAGAAAAATTTAATAAACTTTTTTTATATTATTAACATAAATAAGTGAAAAA
                        ********** ******* *** **********        *********** **    *

Euphydryas_phaeton      ATGATCCATTATTAATGATTAGAAGAAAAAATTACCTTAGGGATAACAGCGTAATGTTTT
Phyciodes_tharos        TTGATCCATTAATAATGATTAAAAGAAAAAATTACTTTAGGGATAACAGCGTAATATTTT
Melitaea_athalia        ATGATCCATTATTAATGATTAAAAGAAAAAATTACTTTAGGGATAACAGCGTAATATTTT
Chlosyne_theona         ATGATCCATTATTAATGATTAAAAGAAAAAATTACTTTAGGGATAACAGCGTAATATTTT
Poladryas_arachne       ATGATCCATTATTAATGATTAAAAGAAAAAATTACTTTAGGGATAACAGCGTAATATTTT
                         ********** ********* ************* ******************* ****

Euphydryas_phaeton      TTTTTAGTACAAATAAAAAAAAAAGTTTGCGACCTCGATGTTGGATTAAGATAAAATTTA
Phyciodes_tharos        TTTTTAGTACAAATAAAAAAAAAAGTTTGCGACCTCGATGTTGGATTAAGATAAAATTTA
Melitaea_athalia        TTTTTAGTACAAATAAAAAAAAAAGTTTGCGACCTCGATGTTGGATTAAGATAAAATTTA
Chlosyne_theona         TTTTTAGTACAAATAAGAAAAAAAGTTTGCGACCTCGATGTTGGATTAAGATAAAATTTA
Poladryas_arachne       TTTTTAGAACAAATAAAAAAAAAAGTTTGCGACCTCGATGTTGGATTAAGATAAAATTTA
                        ******* ******** *******************************************

Euphydryas_phaeton      AATGCAAAAGTTTAAAAGTTTTGATCTGTTCGATCATTAAAATCTTACATGATCTGAGCT
Phyciodes_tharos        AATGCAAAAGTTTAAAATTTT-GATCTGTTCGATCATTAAAATCTTACATGATCTGAGCT
Melitaea_athalia        AGTGCAAAAGTTTAAAATTTT-GATCTGTTCGATCATTAAAATCTTACATGATCTGAGCT
Chlosyne_theona         AATGCCAAAATTTAAAATTTTAGATCTGTTCGATCATTAAAATCTTACATGATCTGAGCT
Poladryas_arachne       AATGCAAAAGTTTAAAATTTT-GATCTGTTCGATCATTAAAATCTTACATGATCTGAGCT
                        * *** *** ******* *** **************************************

We see that sequences are very similar, i.e. in many columns letters did not change. The last line in each alignment block (stars) marks positions that are identical in all 5 sequences. If there is no star, then at least one sequence contains a letter (=nucleotide) different from some other sequences at this position.


The third step is to reconstruct evolutionary tree from this alignment. As a first try we will use BioNJ program available as a web-server. Paste the above alignment – including the first line "CLUSTAL W (1.81) multiple sequence alignment" – in the window, put 1000 in the "Settings" field for "Number of bootstraps" and submit. Again, less than a minute is needed to get the result. The following tree in the "Newick format" will be generated:

((Phyciodes_tharos:0.031246,Euphydryas_phaeton:0.041058)0.566:0.003135,Melitaea_athalia:0.021967,(Poladryas_arachne:0.018801,Chlosyne_theona:0.015971)0.817:0.006534);

This tree will look like this in ASCII form, if Euphydryas phaeton is taken as a root;

                                                                                                     -----0.005----
 
 +--------------------------------------------------------------------------------------------Euphydryas_phaeton
 |
 |
 |                                           +-------------------------------------------Chlosyne_theona
 |                         +-----------------+
 |                         |                 +---------------------------------------------------Poladryas_arachne
 |                 +-------+
 |                 |       |
 |                 |       +-----------------------------------------------------------Melitaea_athalia
 +-----------------+
                   |
                   +------------------------------------------------------------------------------------Phyciodes_tharos

Or like this in graphics form, shown by TreeView:

The same tree displayed by ATV and in a rooted form. The root position is chosen by the user.

One can stop here and say that the analysis is done! Bootstrap support is a fractional value (between 0 and 1) that indicates how consistent positions in sequence are to support the tree. Values above 0.75 are considered reasonably strong, and in the tree above we see that Poladryas is grouped with Chlosyne. Mission accomplished! One can try to run other phylogenetic programs from the same web-server, e.g. PhyML, TNT, and MrBayes, or other programs, e.g. maximum parsimony dnapars, but the results will be the same. E.g. the TNT tree looks like this:

   ,-- Euphydryas_phaeton
|--|  ,-- Phyciodes_tharos
   `--|  ,-- Melitaea_athalia
      `--|  ,-- Poladryas_arachne
         `----- Chlosyne_theona

However, such a "black box" approach to obtain phylogeny does not help our understanding of how sequences are converted into a tree. Here we would like to explain how one can look at the letters in the alignment and analyze them manually.


Manual analysis. Let's look at the alignment again:

Euphydryas_phaeton      TCAAAAACATGCTTTTT--ATATTAATTTAAAGTCTAATCTGCCCACTGATAAATATTAT
Phyciodes_tharos        TCAAAAACATGTCTTTTTGATAATAATTTAAAGTCTAATCTGCCCACTGATATA-AATAT
Melitaea_athalia        TCAAAAACATGTCTTTTTGAAAATAATTTAAAGTCTAATCTGCCCACTGATATA-ATTAT
Chlosyne_theona         TCAAAAACATGTCTTTTTGATAATAATTTAAAGTCTAATCTGCCCACTGATATA-ATTAT
Poladryas_arachne       TCAAAAACATGTCTTTTTGAAAATAATTTAAAGTTTAATCTGCCCACTGATATA-TTTAT
                        ***********  ****  * * *********** ***************** *   ***

Euphydryas_phaeton      TAAAGGGCTGCAGTATATTGACTGTACAAAGGTAGCATAATCATTAGTCTTTTAATTGAA
Phyciodes_tharos        TAAAGGGCTGCAGTATATTGACTGTACAAAGGTAGCATAATCATTAGTCTTTTAATTGAA
Melitaea_athalia        TAAAGGGCTGCAGTATATTGACTGTACAAAGGTAGCATAATCATTAGTCTTTTAATTGAA
Chlosyne_theona         TAAAGGGCTGCAGTATATTGACTGTACAAAGGTAGCATAATCATTAGTCTTTTAATTGAA
Poladryas_arachne       TAAAGGGCTGCAGTATATTGACTGTACAAAGGTAGCATAATCATTAGTCTTTTAATTGAA
                        ************************************************************

Euphydryas_phaeton      GACTTGTATGAAAGATTTGATGAAATATAAACTGTCTCTAATTTATAAATAGAAATTAAT
Phyciodes_tharos        GACTTGTATGAAAGATTTGATGAAATATAATCTGTCTCTATATTATTAATAGAAATTAAT
Melitaea_athalia        GACTTGTATGAAAGATTTGATGAAATATATACTGTCTCTAATTTATGAATAAAAATTAAT
Chlosyne_theona         GACTTGAATGAAAGATTTGATGAAATATAAACTGTCTCTAATTTAAAAATAAAATTTAAT
Poladryas_arachne       GACTTGTATGAAAGATTTGATGAAATATAAACTGTCTCTAATTTAATAATAAAATTTAAT
                        ****** **********************  *********  ***  **** ** *****

Euphydryas_phaeton      TTTTTAATTAAAAAGTTAAAATATTATTAAAAGACGAGAAGACCCTATAGAGTTTTATAA
Phyciodes_tharos        TTTTTAGTTAAAAAGCTAAAATAGTATTAAAAGACGAGAAGACCCTATAAAGTTTTATAA
Melitaea_athalia        TTTTTAGTTAAAAAGCTAAAATAATATTAAAAGACGAGAAGACCCTATAAAGTTTTATAA
Chlosyne_theona         TTTTTAGTTAAAAAGCTAAAATATTATTAAAAGACGAGAAGACCCTATAAAGTTTTATAA
Poladryas_arachne       TTTTTAGTTAAAAAGCTAAAATAATATTAAAAGACGAGAAGACCCTATAAAGTTTTATAA
                        ****** ******** ******* ************************* **********

Euphydryas_phaeton      TTTATTTATTTAATTATAAAATATATATTTAAATTTAATAAATAAAT-AAATTATTATAT
Phyciodes_tharos        TTTATTTATTTAT-TATTAATTGTAAAAATAAATATTATAATTAAAT-AAATTATTTTAT
Melitaea_athalia        TTTATTTATTTAA-TATTAATTATATAAATAAATATAATAATTAATTTAAATTATTTTAT
Chlosyne_theona         TTTATTTATTTAA-CATTAAATATATAATTAATTATGATAATTAAAT-AAATTATTTTAT
Poladryas_arachne       TTTATTTATTTAA-TATTAAATATATAATTAATTATAGTAATTATATAAAATTATTTTAT
                        ************   ** ** * ** *  *** * *  *** **  * ******** ***

Euphydryas_phaeton      TGGGGTGATAAAAAAATTTAATAAACTTTTTTTAATTAAATAACATAAATAAATGAAAAA
Phyciodes_tharos        TGGGGTGATAGAAAAATTAAATAAACTTTTTTTTAATATAAAACATAAATAAATGAAAAA
Melitaea_athalia        TGGGGTGATAAAAAAATTTAATTAACTTTTTTTAAAAAATAAACATAAATAAGTGTGATA
Chlosyne_theona         TGGGGTGATAGAAAAATTTAATAAACTTTTTTTAAA-AATAAACATAAATAAGTGAATAA
Poladryas_arachne       TGGGGTGATAGAAAAATTTAATAAACTTTTTTTATATTATTAACATAAATAAGTGAAAAA
                        ********** ******* *** **********        *********** **    *

Euphydryas_phaeton      ATGATCCATTATTAATGATTAGAAGAAAAAATTACCTTAGGGATAACAGCGTAATGTTTT
Phyciodes_tharos        TTGATCCATTAATAATGATTAAAAGAAAAAATTACTTTAGGGATAACAGCGTAATATTTT
Melitaea_athalia        ATGATCCATTATTAATGATTAAAAGAAAAAATTACTTTAGGGATAACAGCGTAATATTTT
Chlosyne_theona         ATGATCCATTATTAATGATTAAAAGAAAAAATTACTTTAGGGATAACAGCGTAATATTTT
Poladryas_arachne       ATGATCCATTATTAATGATTAAAAGAAAAAATTACTTTAGGGATAACAGCGTAATATTTT
                         ********** ********* ************* ******************* ****

Euphydryas_phaeton      TTTTTAGTACAAATAAAAAAAAAAGTTTGCGACCTCGATGTTGGATTAAGATAAAATTTA
Phyciodes_tharos        TTTTTAGTACAAATAAAAAAAAAAGTTTGCGACCTCGATGTTGGATTAAGATAAAATTTA
Melitaea_athalia        TTTTTAGTACAAATAAAAAAAAAAGTTTGCGACCTCGATGTTGGATTAAGATAAAATTTA
Chlosyne_theona         TTTTTAGTACAAATAAGAAAAAAAGTTTGCGACCTCGATGTTGGATTAAGATAAAATTTA
Poladryas_arachne       TTTTTAGAACAAATAAAAAAAAAAGTTTGCGACCTCGATGTTGGATTAAGATAAAATTTA
                        ******* ******** *******************************************

Euphydryas_phaeton      AATGCAAAAGTTTAAAAGTTTTGATCTGTTCGATCATTAAAATCTTACATGATCTGAGCT
Phyciodes_tharos        AATGCAAAAGTTTAAAATTTT-GATCTGTTCGATCATTAAAATCTTACATGATCTGAGCT
Melitaea_athalia        AGTGCAAAAGTTTAAAATTTT-GATCTGTTCGATCATTAAAATCTTACATGATCTGAGCT
Chlosyne_theona         AATGCCAAAATTTAAAATTTTAGATCTGTTCGATCATTAAAATCTTACATGATCTGAGCT
Poladryas_arachne       AATGCAAAAGTTTAAAATTTT-GATCTGTTCGATCATTAAAATCTTACATGATCTGAGCT
                        * *** *** ******* *** **************************************

Positions marked with asterisks contain the same nucleotide in all 5 sequences. These positions are called invariant. Although these positions are quite important for various reasons, e.g. to compute nucleotide frequencies in the sequences, as these frequencies are need for some phylogenetic methods to work, invariant positions do not directly tell us which sequence is more similar to which other sequence. Thus, we can delete these positions from the alignment, and as a result, the alignment will be shorter and easier to view. Here is what we get:

Euphydryas_phaeton  CT--TTCATATTAAATTAGAATTGATTAAATTTATAAAAA-AATAAATTAAATAAAAAATGCGTAAAGGT
Phyciodes_tharos    TCTGTACT-AATATTATTGAGCGAT-TTTGAAAAATATAA-TGAATAATATAAAAAAATAATATAAAGT-
Melitaea_athalia    TCTGAACT-ATTTAATTGAAGCAAA-TTTATAAAAAATATTTATTAAAAAATAGTGATATATATAGAGT-
Chlosyne_theona     TCTGTACT-ATAAAATAAATGCTAA-CTAATATTAGATAA-TGTAAAA-AATAGAATAATATATGACATA
Poladryas_arachne   TCTGAATT-TTTAAATATATGCAAA-TTAATATTAAGTTAATGTAATATTATTGAAAAATATAAAAAGT-

The full alignment contained 540 positions, 470 out of which were invariant. Removal of them resulted in an alignment of 70 positions. Now, let's find positions that are semi-invariant, i.e. positions that are occupied by the same nucleotide in all but 1 sequence. Thus only one sequence will have a different nucleotide in these positions. This different nucleotide is marked gray in the alignment:

Euphydryas_phaeton  CT--TTCATATTAAATTAGAATTGATTAAATTTATAAAAA-AATAAATTAAATAAAAAATGCGTAAAGGT  21
Phyciodes_tharos    TCTGTACT-AATATTATTGAGCGAT-TTTGAAAAATATAA-TGAATAATATAAAAAAATAATATAAAGT-  12
Melitaea_athalia    TCTGAACT-ATTTAATTGAAGCAAA-TTTATAAAAAATATTTATTAAAAAATAGTGATATATATAGAGT-   7
Chlosyne_theona     TCTGTACT-ATAAAATAAATGCTAA-CTAATATTAGATAA-TGTAAAA-AATAGAATAATATATGACATA   5
Poladryas_arachne   TCTGAATT-TTTAAATATATGCAAA-TTAATATTAAGTTAATGTAATATTATTGAAAAATATAAAAAGT-   7

Number of nucleotides shaded gray is shown to the right of the alignment (highlighted yellow). We see that the number of gray marks is not the same in all sequences. In fact, Euphydryas phaeton sequence contains almost twice the number of such nucleotides than the next largest number (in Phyciodes tharos): 21 vs. 12. And 2-fold difference must mean something. Since all 5 sequences in the original alignment are about 90% identical, not that many nucleotide mutations happened in them. Therefore it is reasonable to think that each nucleotide difference usually corresponds to a single mutation in a gene. Since positions with gray highlight show difference in one species only, and all other species have the same nucleotide in these positions, it is reasonable to think that mutation to a different nucleotide happened on a path (=tree branch) leading to the species with the gray position. We can measure branch length in the number of mutations (=number of positions highlighted gray in each sequence). We see, that the branch leading to Euphydryas phaeton should be very long – almost twice the branch leading to Phyciodes tharos and at least three times longer than a branch leading to other species, i.e. at least about twice as many mutations happened on the branch leading to Euphydryas phaeton. Why is this branch so long?

Typically, if we consider very close organisms, most mutations separating them from each other are random and randomly happened in time, thus the more mutations, the more time separates the organisms. This idea is called "molecular clock". Of course, it is possible that during some times in evolution the "clock" might speed up, and in some species the rate of mutation accumulation might grow. However, why would Euphydryas phaeton accumulate mutations at a rate twice the rate of other species? Likely there is no reason. Thus a logical explanation for the length of this branch is that the root (common ancestor of the 5 genera) is placed on this branch. In other words, this branch consists of two branches: one is from the root (large blue circle) to Euphydryas phaeton and the other is from the root to the common ancestor of other 4 genera (small blue circle). With these 2 branches being about equal length (=about the same number of mutations accumulated during this time) we get an explanation why Euphydryas phaeton sequence shows about twice as many mutations as other sequences:

This root placement is the same, as the one proposed by The Nymphalidae Systematics Group on the basis of an outgroup – all other genera of Nymphalids and other butterflies. Now since we analyzed gray positions, let's remove them from the alignment and concentrate on the rest. We obtain 18 positions:

Euphydryas_phaeton  TTAGATATAA-ATATAAT
Phyciodes_tharos    TTTGAGTAAT-GTAAAA-
Melitaea_athalia    ATGAAATAAATAATAGA-
Chlosyne_theona     TAAATTATTG-G-TAGTA
Poladryas_arachne   AATATAATTAAGTTTGA-

The next step is to check positions that have a unique nucleotide in a pair of sequences. Since we would like to check whether there is a sequence that is closest to the root, we mark positions that have the same nucleotide as the Euphydryas phaeton sequence. These positions are highlighted in red:

Euphydryas_phaeton  TTAGATATAA-ATATAAT    
Phyciodes_tharos    TTTGAGTAAT-GTAAAA-   3
Melitaea_athalia    ATGAAATAAATAATAGA-   1
Chlosyne_theona     TAAATTATTG-G-TAGTA   1
Poladryas_arachne   AATATAATTAAGTTTGA-   1

We see that the last three sequences are consistently far from the Euphydryas phaeton sequence and share just 1 unique position with it, however, Phyciodes tharos has 3 such positions. Although 3 is not that different from 1, three ones in three other sequences are so consistent that we can form a hypothesis that Phyciodes tharos branched from the common ancestor next:

Since we are finished with the analysis of Euphydryas phaeton: it is quite clear that the root of the group falls on its branch, this sequence is removed from the alignment. Positions unique to each of the remaining sequences are highlighted in gray :

Phyciodes_tharos    TTTGAGTAAT-GTAAAA-   3
Melitaea_athalia    ATGAAATAAATAATAGA-   1
Chlosyne_theona     TAAATTATTG-G-TAGTA   2
Poladryas_arachne   AATATAATTAAGTTTGA-   1

Due to small number of unique positions, it is not certain how this analysis is helpful, however, the largest number of unique positions being in Phyciodes tharos is consistent with our hypothesis that this sequence branches out next in the tree. To concentrate on the rest of the alignment, we remove unique positions from it:

Phyciodes_tharos    TTTAGTAAT-T
Melitaea_athalia    ATGAATAAATA
Chlosyne_theona     TAATTATTG--
Poladryas_arachne   AATTAATTAAT

Then, analogously to the previous step, we highlight in red unique positions shared with Phyciodes tharos, which is the closest to the root:

Phyciodes_tharos    TTTAGTAAT-T    
Melitaea_athalia    ATGAATAAATA   5
Chlosyne_theona     TAATTATTG--   2
Poladryas_arachne   AATTAATTAAT   2

It is apparent that Melitaea athalia shares a larger number unique positions with Phyciodes tharos, and probably merged out next in the tree, leaving Chlosyne theona and Poladryas arachne as sister taxa. Additionally, we can look directly at Poladryas arachne sequence and check out unique positions it shares with other sequences, such positions are highlighted in green:

Phyciodes_tharos    TTTAGTAAT-T   2
Melitaea_athalia    ATGAATAAATA   3
Chlosyne_theona     TAATTATTG--   5
Poladryas_arachne   AATTAATTAAT    

The number of green positions is the largest in Chlosyne theona which is consistent with it being a sister taxon of Poladryas arachne. Thus we see that the following tree is reasonable. We can muddle it with statistics (and it will be fun!), but will refrain from it for now for the sake of the readers.

      +----------- Euphydryas phaeton 
      |                               
root--|  +-------- Phyciodes tharos   
      |  |                            
      +--+  +----- Melitaea athalia   
         |  |                         
         +--|  +-- Chlosyne theona    
            +--+                      
               +-- Poladryas arachne  

Finally, it has to be noted that 3 positions supporting the branch between Euphydryas phaeton and Phyciodes tharos (blue thicker branch in a tree construction diagram above) is not a very large number of positions. It is conceivable that Phyciodes tharos and Melitaea athalia might be sister taxa. This, however, will not change the conclusion about Poladryas and Chlosyne being sister genera, as their common branch is supported by 5 positions.

Just for fun – preliminary Chlosyne trees:

2-Aug-2009 © Nick V. Grishin


Frequently Asked Questions Our Supporters Bylaws of the Butterflies of America Foundation
Read our 501(c)(3) status letter

This website is supported by Butterflies of America Foundation, a U.S. registered 501(c)(3) tax-deductible nonprofit 170(b)(1)(A)(vi) public charity.