We develop statistical and computational methods to answer biological questions from big data, from genomic sequence data to images. Our work is motivated by both the wealth of biological data currently being generated, as well as by the observation that dedicated models are often required to answer long-standing questions in biology.
We work closely with our collaborators from many areas of biology and interact equally closely with research groups in statistics and computer science to push our approaches beyond the state-of-the-art in the field.
The ability to obtain DNA from fossils presents a unique opportunity to address long-standing questions about the origin of peoples and their migration patterns. Ancient DNA, however, is highly fragmented and affected by Post-Mortem-Damage. We develop new tools to address these challenges and use them to characterize major (pre-)historical events such as the spread of farming or the initial colonization of the Americas.
Biodiversity is vanishing globally at frightening speed and preserving what is left is of uttermost importance to ensure the long-term stability of ecosystems and ultimately our own survival. However, to be successful, decisions on conservation actions need to be well informed. To aide in that, we develop and apply tools to characterize populations of threatened species such as great apes, and to compile biodiversity data for the last pristine ecosystems on our planet. A major focus of our attention is given to the Chinko Nature Reserve in Central Africa, one of the last but particularly exciting wilderness areas.
The main evolutionary forces mutation, genetic drift, migration and selection all affected the genetic diversity observed today. We aim at tracing back the interplay and relative importance of these forces in the history of a population or species. Since multiple evolutionary forces can leave very similar signatures, we continue to develop dedicated statistical tools, with which we elucidate the evolutionary history of a diverse set of organisms from chimpanzees to date palms.
Population genetics can contribute to our understanding of human health by characterizing the genetic makeup of humans, the evolutionary forces that shaped these, and by studying the evolutionary forces acting on human pathogens. We continue to develop and apply new statistical tools in all these areas.
While next-generation sequencing has massively lowered the costs of sequencing, it is prone to relatively high error rates that are usually overcome by sequencing at higher depth. For a fixed budget, this results in a tradeoff between technical (sequencing) and biological (samples) replicates. In order to shift that trade-off towards including more biological samples, we develop inference methods that integrate over genotype uncertainty, which result in higher statistical power.
ABC is a flexible approach for parameter inference when analytical likelihood functions are difficult or impossible to obtain. The main idea is to simplify the problem by replacing the data with summery statistics and to approximate the likelihood function using a large number of simulations. We continue to develop such likelihood-free inference algorithms, with a focus of extending their range to high-dimensional problems.
Thanks to our expertise in statistical inference, we are able to contribute to a large range of biological questions, and we are very open for collaboration with experimentalists from all areas of biology. Some recent example of such fruitful collaborations includes the quest for RNA-seq normalization genes, model-based inference of molecular parameters governing circadian rhythms or the inference of episodes of rapid morphological evolution on phylogenetic trees.