Building Fitch-Margoliash Phylogenetic Trees

Stacy Taylor
Gettysburg College
Enter DNA sequences or a distance matrix

Entry Type
Distance Matrix DNA sequences

Background

The Fitch-Margoliash algorithm, commonly called the FM-algorithm, is used to cluster taxa using evolutionarily related distances calculated using the Jukes-Cantor. The Jukes-Cantor model equation is...

dij=-3/4 ln(1- 4/3 * p)

...where p is the fraction of mismatches between sequences (click here for more). Using the JC distances, the FM distance based algorithm produces unrooted phylogenetic trees in which the common, ancestral node is unknown (in attempt to produce rooted trees one can use an outgroup that is known to be distantly related). This program clusters taxa using the FM-algorithm and builds a phylogenetic tree using PHYLIP. PHYLIP (click here for more) is a useful tree building tool which also contains numerous other programs by which the data can be analyzed.

*For more information on Fitch-Margoliash algorithm click here or here

Return to top
How to enter data

The data can be entered as either a distance matrix or a group of DNA sequences to build a tree. Entering the data as a distance matrix allows you to enter your own specific mutational distances other than those calculated using the Jukes-Cantor model for each of your labels. To enter data for each taxa into a properly formatted distance matrix each taxon should be given its own row starting with the taxon name and followed by the evolutionarily related distances as seen below:

Leonardo_DiCaprio 0 16 6 16 6
Ryan_Reynolds 16 0 16 8 16
Bradley_Cooper 6 16 0 16 2
Jake_Gyllenhaal 16 8 16 0 16
David_Beckham 6 16 2 16 0
When entering the distances into the distance matrix format a sequence compared to itself it will have a distance of zero and must be entered into the matrix so that the zeroes are located on the main. Care must also be taken that the names do NOT contain spaces and underscores are used instead. However, there should be spaces between all other entries present and each taxon should be on its own line.

Data can also be entered as DNA sequences in order to build a tree using the Jukes-Cantor model to calculate the mutational distances. The sequences entered MUST be of equal length to one another and also contain labels. The label should go first, followed by a colon and then the sequence as seen below:

a : GTACGTAGCTAGC
b : GTACGTAGAGAGC
c : GTATGTAGCTGGC
d : GGGCGTAGCTAGC
e : GTACGTAGTTACC
f : GTACATGGCTAGC
g : GTACGTAGCTATT
h : ATACGTAGCTAGC
When entering the DNA sequences spaces can be entered before and after the colon (a : GTAC) but must NOT be found within the DNA sequence itself (a: GTAA C GTAC).

Return to top
Example FM Trees
Example Tree from Distance Matrix Example Tree from DNA sequences
exampleMtree exampletree

Return to top