Advanced Methods for Evolutionary Sequence Analysis

Project: PCIG10-GA-2011-303614 (CIG)

Sequence alignment is widely used in molecular biology. Despite its age, the challenge is still not fully resolved: no method can suit all tasks, new approaches are needed for the evolving questions and even traditional methods can still be improved. Although many alignment tasks are related, some are based on incompatible principles and their need for distinct tools is not always understood. Evolutionary sequence alignment, the focus of this application, is a pre-requisite for all comparative analyses and needed e.g. in agricultural and medical research.

This FP7-funded project aims at developing new methods for evolutionary and comparative sequence analysis using a novel approach of modelling data and considering evolutionary information. These methods target two current trends in sequence analysis, the increasing size of datasets and the specific properties of data produced on next-generation sequencing (NGS) platforms, from several different angles. First, we will develop novel ways to extend existing alignments and compute very large alignments; second, by modelling the properties of NGS data, we will expand the possible applications of NGS methods in evolutionary analysis. The new methods are meant to replace my earlier approaches for phylogenetic sequence alignment and become internationally recognised tools in evolutionary inference.

The proposed work consists of five sub-projects, each describing a specific use case: (a) evolutionary sequence alignment and inference of evolutionary change; (b) modelling of sequence features and their inference from data using comparative methods; (c) analysis of high-throughput sequencing data with a special emphasis on non-model organisms and the use of evolutionary modelling to exploit phylogenetic information; and (d) large-scale analyses using computational speed ups.