Experiments with the site frequency spectrum
University of Canterbury, Christchurch, New Zealand.
Evaluating the likelihood function of parameters in highly-structured population genetic models from extant deoxyribonucleic acid (DNA) sequences is computationally prohibitive.
In such cases, one may approximately infer the parameters from summary statistics of the data such as the site-frequency-spectrum (SFS) or its linear combinations. Such methods are known as approximate likelihood or Bayesian computations. Using a controlled lumped Markov chain and computational commutative algebraic methods we compute the exact likelihood of the SFS and many classical linear combinations of it at a non-recombining locus that is neutrally evolving under themany-sites mutation model. Using a partially ordered graph of coalescent experiments around the SFS we provide a decision-theoretic framework for approximate sufficiency. We also extend a family of classical hypothesis tests of standard neutrality at a non- recombining locus based on the SFS to a more powerful version that conditions on the topological information provided by the SFS.
Keywords: controlled lumped Markov chain, unlabelled coalescent, random integer partition sequences, partially ordered experiments, population genomic inference population genetic Markov bases, approximate Bayesian computation done exactly