The group focuses on computational analysis of biological (mainly sequence) data and development of analysis methods for such purposes. Current topics include e.g. phylogenetic sequence alignment and homology inference for evolutionary analyses and the development of methods for comparative analysis of high-throughput sequencing data. We actively develop two existing software packages PAGAN and PRANK, a web tool for evolutionary analysis and visualisation Wasabi, and analysis tools related to these.
Active research projects
Template switch mutations
We extended the template switch mutation process proposed for bacteria and implemented it in a computational tool. Applying this method to genomic alignments of human and chimp, we identified large numbers of mutation patterns consistent with the template switch process, some of them of a previously unknown type. These mutations are polymorphic in human populations and the mutation clusters segregate together. More information in the publication.
We now study this mechanism in a project funded by the Academy of Finland. Our first study shows that the mechanism can explain the origin and evolution of RNA secondary structures. More information in the publication. In another study, we applied the original algorithm to high-throughput data and show that each human carries thousands of template switch loci. Strikingly, we observed that commonly-used analysis tools may hide the signals of these mutations and they may go unnoticed e.g. in medical genetic studies. More information in the preprint.
Advanced web application for evolutionary sequence anlaysis
Phylogeny-aware sequence alignment and extension of existing sequence alignments
Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. We have developed a phylogeny-aware approach to add new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences. The same ideas can be used to extend reference alignments with fragmented sequences that contain relatively little information, e.g. in next-generation metagenomics analyses. The functionalities have been implemented in the PAGAN software and the Glutton and Séance analysis pipelines build around that.
Evolutionary genomics of Saimaa ringed seal
The Saimaa Ringed Seal Genome Project is producing a high-quality reference genome as well as re-sequenced genomes of tens of individuals. We are involved in the analysis of re-sequencing data with the aim of understanding the evolutionary history and population structure of the species as well as the genomic effects of the isolation and its recent population bottleneck. More information at the Genome project page.
Evolutionary genomics of nine-spined stickleback
We collaborate with the Ecological Genetics Research Unit in genomic analysis of nine-spined sticklebacks. We are interested in understanding the evolution and the history of different pond and marine populations, especially the genomic effects of small founder populations and drift within ponds. More information at the EGRU’s home page.
Reference-based scaffolding of RNA-seq data
Transcriptomics data produced with high-throughput sequencing methods targets nearly exclusively gene regions of the genomes and can potentially provide an inexpensive approach for evolutionary and comparative analyses. The challenge is to assemble the reads to longer contigs and to identify the correct homologous sequences. We develop, Glutton, a targeted approach that uses information from distantly related species to accurately reconstruct transcriptome for non-model organisms lacking a reference genome. We apply this to real data in collaboration with the Metapopulation Research Group and the Evo-Devo group.
Phylogeny-aware alignment for phylogenetic analyses
The phylogeny-aware alignment method PRANK has been found to be an excellent sequence aligner for comparative evolutionary analyses. It should be used with caution for phylogenetic analyses, however, as the underlying algorithm uses the phylogenetic information from a guide tree during the alignment procedure and the resulting multiple alignment will reflect this phylogenetic structure. We are studying the approaches to get around this issue and reduce the potential bias in phylogenetic analyses. More at the Canopy homepage.
Metagenomic analysis of noisy sequence data
High-throughput sequencing platforms produce data with characteristic errors. We have developed an analysis software that can correct for these errors and allows for using the noisy pyrosequencing data for evolutionary analyses. In collaboration with the Evo-Devo group we apply this method to 18S metagenomic data. More at the Séance homepage.