529053 Evolutionary Genomics

The course took place in spring 2016.

This course will teach efficient use of central tools in computational analysis of high-throughput resequencing data and the theoretical background of the key analyses. The course comprises of lectures and computer exercises. Computer exercises will be performed on personal laptop computers using CSC servers and a tailored Linux virtual machine. The course requires familiarity with Linux command-line work.

Lectures give an introduction to theoretical (population genetic) background of the analysis methods.

  • coalescent theory
  • effective population size
  • drift vs. selection

Computer exercises focus on the peculiarities of working with non-model organisms (nine-spined stickleback in this case) but also include analyses with 1000 Genomes (human) data.

  • mapping of short-read data to a reference genome
  • variant calling, joint calling, and filtering
  • exploratory analyses (PCA, clustering)
  • ancestral allele inference using outgroup species
  • population structure and history (admixture, Fst, Ne)
  • annotation (Ensembl, UCSC) transfer from model species
  • data visualisation

The bioinformatic software used include bwa, samtools, bcftools, gatk, plink, last, admixture, stairway-plot and R.

Lecture notes


Theory of short read mapping (by Alan Medlar)

From Fastq to Vcf

Handling Sam and Vcf data, quality control

Exploratory statistics

Genomic alignment, ancestral alleles, lift-over of coordinates

Detecting directional selection (by Tuomas Toivainen)

Visualisation, targeted assembly and Admixture

Ensembl REST API, Variant Annotation and FFD sites

1000 Genomes data and populationhistory

Population genetics