README file for MrAIC.pl 1.4.4 -- 2009-05-11

MrAIC.pl 1.4.4 by Johan A. A. Nylander
E-mail: jnylander @ users.sourceforge.net


Important note (2011)

The current version (v.3) of PhyML available from http://www.atgc-montpellier.fr/phyml contains a bug, preventing "I" models to be correctly evaluated using the "toggle" mode (used by MrAIC.pl to communicate with PhyML). This bug appears, however, to be fixed in the PhyML source available on http://code.google.com/p/phyml/. Please make sure you have a functioning version installed before running MrAIC.pl.

As an alternative, you can try the parallel version of MrAIC on http://www.abc.se/~nylander/mraic/pmraic.html, it uses the alternative way of communicating with PhyML, and should be safe(er) to use with older (v.3) versions.

Description

MrAIC.pl is a Perl script for calculating AIC, AICc, BIC, and Akaike weights (for a review, see Burnham and Anderson, 2002) for nucleotide substitution models. Likelihood scores under different models are estimated using PHYML (Guindon and Gascuel, 2003).
Input is DNA data in PHYML format (see below).
If the argument -modeltest is parsed, 56 models (the ones tested in Modeltest [Posada and Crandall, 1998]) are evaluated in PHYML. Default is to test the 24 models that can be specified in MrBayes v3 (Ronquist and Huelsenbeck, 2003). These are JC, F81, K2P (aka K80), HKY, SYM and GTR, each combined with Propinv (I) and/or Gamma (G).
A difference between Modeltest and MrAIC.pl is that MrAIC.pl does not evaluate all models on the same, approximate topology. Instead, PHYML is used to try to find the maximum of the likelihood function under all models. This is necessary for finding AIC, AICc, or BIC for the models.

Requirements

1) Perl (see also ActivePerl) must be installed on your system.
2) PHYML (Guindon and Gascuel, 2003) version 3 must be installed on your system (named "phyml" and be in the PATH). Alternatively, user might edit MrAIC.pl to specify the full path to the PHYML binary.
(The old MrAIC.pl version compatible with PHYML version 2.4 can be found here: MacOSX/Unix/Win.)
Windows users: Make sure you have set the environment variables to include an executable named "phyml.exe". Alternatively, rename/copy/link the phyml_w32.exe to phyml.exe and put it in the same folder as the mraic.pl script.

Usage

Interactively or by passing arguments as below

mraic.pl infile
mraic.pl -modeltest infile

where infile contain DNA data in PHYML format which is somewhat similar to the PHYLIP format (see example files dat and dati).
Example, "sequential" format (note the space between taxon name and sequence):

3 8
Taxon_1 ACGTACGT
Taxon_2 ACGTACGT
Taxon_3 ACGTACGT

Notes

In this script, sample size (n) used in AICc and BIC is assumed to be the number of characters in the data matrix. This is probably not correct when it comes to phylogenetic analyses (Nylander, 2004), but serve as an approximation to the true n.
Branch lengths are included when specifying the total number of parameters for the models.
AICc values are only printed if the number of parameters are high compared to number of characters (Nchar/Max.Number of parameters < 40). Maximum number of free parameters is 10 (for the GTR+I+G model) plus the number of branch lengths. The limit of 40 is a heuristic suggested by Burnham & Anderson (2002).
Furthermore, all calculations of Akaike weights, AIC, AICc, and BIC are all dependent on PHYML's ability to find the maximum of the likelihood under each model. More elaborate searches might be necessary to get more correct assessment of the ML for some data sets!

Acknowledgements

Thanks to Torsten Eriksson for advice on slick Perl programming.

Suggested reference for MrAIC.pl

Nylander, J. A. A. 2004. MrAIC.pl. Program distributed by the author. Evolutionary Biology Centre, Uppsala University.

Other References

Burnham, K. P., and D. R. Anderson. 2002. Model selection and multimodel inference, a practical information-theoretic approach. Second edition. Springer, New York.
Guindon, S., and O. Gascuel. 2003. A simple, fast, and accurate algorithm to estimate phyogenies by maximum likelihood. Systematic Biology, 52:696-704:
Nylander, J. A. A. 2004. Bayesian phylogenetics and the evolution of Gall wasps. Comprehensive Summaries of Uppsala Dissertations fro the Faculty of Science and Technology 937. Uppsala University.
Posada, D., and K. A. Crandall. 1998. MODELTEST: Testing the model of DNA substitution. Bioinformatics 14:817-818.
Ronquist, F., and J. P. Huelsenbeck. 2003. MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572-1574.