# Recent Progress on Multi-State Perfect Phylogeny (Compatibility) Problems

Phylogenetics

The Multi-State Perfect Phylogeny Problem is an extension of the Binary Perfect Phylogeny (or Compatibility) Problem, allowing evolutionary characters to have more than two states. The problem is of interest due to prior elegant mathematical and algorithmic results, and due to integer data that is increasingly available.

In 1975, Buneman showed a how to view the Multi-State problem as one of triangulating non-chordal graphs, but the result has not been widely exploited as a computational tool. In this talk, I discuss our recent work on Multi-State Perfect Phylogeny problems, mostly by exploiting Buneman’s result. I will talk about three sets of results.

The first set of results concerns problems that extend the multi-state problem: The Missing Data (MD) Problem, where some entries in the input are missing and the question is whether (bounded) values can be imputed so that the full data has a multi-state perfect phylogeny; The Character-Removal (CR) Problem, minimizing the number of characters to remove so that the resulting data has a multi-state perfect phylogeny. Using results on triangulation of non-chordal graphs, we establish new necessary and sufficient conditions for the existence of a perfect phylogeny with (or without) missing data. This leads to solutions of the MD and CR problems using integer linear programming.

The second set of results concerns generalization of the famous ``four-gametes test” that characterizes the compatibility of binary data. Such a generalization was sought since the 1970s. Our new generalization establishes that 3-state data has a perfect phylogeny if and only if every subset of three characters has a 3-state perfect phylogeny. We also characterize the patterns in the data that prohibit a 3-state perfect phylogeny. We briefly discuss efforts to generalize this to more states.

The third set of results concerns reductions of multi-state data to binary data, to the 2-SAT problem, and to sparser problem instances.

This talk is part of the Isaac Newton Institute Seminar Series series.