Lectures on Bioinformatics





I am glad to introduce you to 27 lectures on bioinformatics (including descriptions, full of slides and video). Lectures were given on Summer School on Bioinformatics , which took place in July 2014 near St. Petersburg, brought together 100 students from all over Russia and the CIS, as well as lecturers from MSU, MIPT, Skolkovo Tech, Russian Academy of Sciences, SPbAU, STU, St. Petersburg State University, Yale University, Fox Chase Cancer Center, George Washington University, Pennsylvania State and other fine companies.

School traditionally conducted by a team of the Institute of Bioinformatics, and supported her SPbAU Sciences, St. Petersburg State University, JetBrains, RVC, BIOCAD, EMC, Foundation "Dynasty" and the Russian Federal Property Fund, which helped make the event completely free for all participants.

This year the school will be held July 20-25 at Moscow ( information here , deadline soon ), all the lectures will also be recorded.

Organization of materials
If the list below you will find too difficult to find and select the lectures you can also use applets school , playlist on YouTube and page with all the slides .

1. Introduction to bioinformatics (Alla Lapidus, SPbAU Russian Academy of Sciences, St. Petersburg State University)
[Video] [Слайды]

The revolution in nuclear physics led many years ago to the accumulation of vast amounts of data that need to be stored and processed. It turned out only by computers, and behind them, and super-computers.

Boom Genomics last 10-15 years has continued this tradition and multiplied her biomedical research relating to each of us and, therefore, the data will be more and more especially in the light of the idea of ​​personalized medicine and Big Pharma claims. Here it without computer knowledge and software products and do nothing at all. But in addition, it is necessary to know well what to study, how to how to analyze the data and how they can be trusted. How to store and handle. Where to use and where to use.

The lecture lit most of the "how." Alla aims to talk about the significance and breadth of applications of bioinformatics.



2. Mutational process and methods of its study (Alexey Kondrashov, MGU)
[Video] [Слайды]

Mutation - the first of the two essential factors of Darwinian evolution. The lecture examines the causes and mechanisms of mutation, methods of measuring the parameters of the mutation process in small, medium and large time, data on the rates of mutation and the simplest models of the effect of mutations in the genetic structure of populations.

3. Natural selection and methods of its study (Alexey Kondrashov, MGU)
[Video] [Слайды]

Natural selection - the second of the two necessary factors of Darwinian evolution. The lecture examines the causes and mechanisms of selection, methods and parameters used to describe and study data on the nature and selection of the simplest models of the impact of screening on the population.



4. Child development and bioinformatics: Challenges and Solutions (Elena Grigorenko, Yale University)
[Video] [Слайды]

The lecture talked about several "joints" of the development of science and bioinformatics.
The problems of prenatal diagnosis and prenatal sequencing and sequencing ekzomnogo newborns.

The book is about the study of the influence of the early environment of the state metiloma, and genomic etiology of childhood disorders. Finally, we consider the ethical issues associated with the use of genomic information in making diagnostic and individualized decisions about the child's development.

5. The next-generation sequencing: principles, possibilities and prospects (Maria Logacheva, MGU)
[Video] [Slides]

Next-generation sequencing (NGS) has transformed many areas of biological and biomedical research. It allows a relatively quick and inexpensive to obtain a sequence of genes and genomes of previously studied species, and - on a material of a large number of individuals of one species - reveal intraspecific variation, to search for genes associated with traits of interest. In addition to the sequencing of genomes NGS allows a detailed analysis of gene expression in different tissues of the body, or under different conditions, are widely used in epigenetic research.

The lecture provides an overview of the main methods of sequencing, their physical and chemical principles, particularly the sample preparation, characterization of the data, their value and common mistakes. Particular attention is given to the applicability of the different methods to solve biological problems and recommendations for planning of experiments related to the NGS.

6. Structural biology of protein: a review of issues and approaches (Pavel Yakovlev, BIOCAD)
[Video] [Слайды]

Using only the primary sequence can solve most issues related to the nucleic acids (DNA and RNA). In the study of protein functions only knowledge of the primary sequence does not allow to solve most problems. Which proteins interact with each other and how much? Will cause any amino acid substitution change protein function? How to remove the side effects of the drug, or a protein to increase its effectiveness? These questions are designed to answer bioinformatics, which develops algorithms for modeling the spatial form of proteins and their interactions.



7. De novo transcriptome assembly (Artem Kasyanov MIPT)
[Video] [Слайды]

Due to the significant reduction in the cost and productivity improvement technologies the number of projects dealing with de novo genome sequencing nemodelnyh organisms has increased significantly. In some cases, de novo genome sequencing and assembly difficult - for example, if it is of considerable size. In such cases resort to the study of the transcriptome. Also, de novo transcriptome analysis may be required in the case of the species studied with a lot of alternatively spliced ​​genes, because even if the genome is quite difficult to determine the full list of isoforms.

The lecture is devoted to issues of assembly transcriptomic data in the absence of the genome. Considers topics such as splice graphs and program trinity newbler, comparison and analysis of assemblies, assembly transcriptome polyploid organisms.

8. The evolution of the genome assembly algorithms (Anton Bankevich, SPbAU RAS)
[Video] [Слайды]

Currently, there are several generations of DNA sequencing methods. However, new technology is meaningless without algorithms capable to process the results. Constantly emerging new sequencing methods give all new algorithmic problems. One of the most important of these challenges is to build the genome. The lecture talked about the evolution of sequencing methods and algorithmic approaches to the assembly of the genome that have arisen and continue to arise at every step of this evolution.



9. Introduction to Molecular Biology and Genetics (Paul Dobrynin, St. Petersburg State University)
[Video] [Слайды]

The lecture deals with the structure and organization of DNA in prokaryotes and eukaryotes, the molecular mechanisms responsible for the maintenance and reproduction of the genetic material. Collated basic mechanisms behind genetic variability, and embodiments of the genetic material.

10. The problem of multiple local alignment and the construction of synteny blocks (Ilya Minkin, Pennsylvania State University)
[Video] [Слайды]

The lecture examines two similar algorithmic problems in comparative genomics: multiple local alignment and the construction of synteny blocks. These algorithms play a crucial role in the comparison of complete genome sequences. We talked about setting goals and basic ideas on which were built a few modern algorithms.



11. Why and how to do a presentation (Andrey Afanasiev, iBinom)
[Video] [Слайды]

The lecture discusses the types of presentations, what they really need, and tells you how to make so that all students understand and do not fall asleep, and what mistakes to avoid, and to take one example, in preparing his speech.

12. Business in bioinformatics (Andrey Afanasiev, iBinom)
[Video] [Слайды]

The lecture described what bioinformatics companies exist in Russia and in the world, who created them and what they earn money.
They discussed plans for the major players and trends in the industry.

In the final part of the lecture Andrew gives food for thought about the organization of their own startup or choosing a new job.

13. Opportunities and Challenges for Systems Biology (Ilya Serebriysky, Fox Chase Cancer Center)
[Video] [Слайды]

The lecture aims to give an overview of the system properties of biological objects. Ilya Serebriysky tells about the basic components of systems biology, an interaktomike and construction of models, the main problems in systems biology and trying to resolve them. We discuss some of the achievements of systems biology (mainly from the field of oncology). It also discusses public resources for Systems Biology (TCGA / cBioPortal, CCLE).



14. Laboratory for Systems Biology (Ilya Serebriysky, Fox Chase Cancer Center)
[Video] [Слайды]

The session focuses on building networks of cooperation on the basis of publicly available databases. To use such databases and web services as Entrez, GeneMANIA, BioGRID and others. Various imaging techniques networking, in particular through Cytoscape.

15. metagenomics (Alla Lapidus, SPbAU RAS)
[Video] [Слайды]

Germs are everywhere, microbes rule the world, but not with all of them we can meet in the laboratory. Most of them we do not know how to grow, which means that they must be somehow removed from their natural habitat - the land, the water from the roots of trees, etc., where they live in large groups.

Metagenomics and helps in these very complicated research. And it helps to feed, warm, heal people and catch criminals. All this in metagenomics and bioinformatics and was dedicated to this lecture.



16. The problem many statistical testing of hypotheses (Anton Korobeynikov, St. Petersburg State University, SPbAU RAS)
[Video] [Слайды]

The lecture is considered the classic problem of testing multiple hypotheses simultaneously. Such problems arise very often, such as genome-wide search for associations or analysis of microarray data. Possible solutions to this problem, ranging from the classical approach Bonferroni and finishing techniques, allows to control the FDR (false discovery rate).

17. How to use statistics and wrong (Nikita Alexeev, St. Petersburg State University, George Washington University)
[Video] [Slides]

The lecture deals with errors in the use of statistics and methods of prevention. In particular, given the answer to the question: in what situations you can use the standard criteria for comparing the typical representatives of the sample, and what to do if the standard criteria do not fit?



18. Mathematical models of regulation of gene expression (Maria Samsonov STU)
[Video] [Слайды]

Understanding the subtle mechanisms of regulation of gene activity - a necessary condition to decipher the mechanisms of human disease. Unfortunately, to date there is no such understanding that we can not satisfactorily explain how any group of transcription factors interact with chromatin proteins, adapter proteins and other complex RNA polymerase, or how and why this or that portion of the DNA sequence can control complex, limited in space and time deterministic pattern of gene expression.

Mathematical modeling helps to understand the mechanisms of gene regulation by mechanistic and quantitative description of this process. The lecture discussed the two most common approaches to modeling of gene expression - based nonlinear reaction-diffusion equations and thermodynamic equilibrium. Consistently the stages of the construction of such models and examples of their use for the generation of new knowledge.



19. Semi-local and local sequence alignment (Alexander Tiskin, University of Warwick)
[Video] [Слайды]

Calculating the longest common subsequence (longest common subsequence, LCS) of two strings - one of the classic algorithmic problems, which has wide application in computer science, as well as in computational biology, where it is known as "global alignment of sequences". In many applications it is necessary to generalize this problem, which we call the calculation semilocal LCS (semi-local LCS), or "semilocal alignment." In this case, the need to compute the line between the LCS and all substrings of the other strings, and / or between one line of all prefixes and suffixes all different. In addition to the important role of the generalized problem in string algorithms, she found an unexpected connection with the semigroup algebra and computational geometry, with networks of comparisons (comparison networks), as well as practical applications in computational biology. In addition, the problem of calculating semilocal LCS can be used as a flexible and efficient approach to (fully) local alignment of biological sequences.

The lecture presents an effective solution to the problem of calculating semilocal LCS and an overview of the main results and related applications. These include dynamic support LCS; rapid calculation cliques in some special graphs; a quick comparison of the compressed rows; parallel computing on strings.



20. Analysis of the families of molecular sequences (Sergey Nurk, SPbAU RAS)
[Video] [Слайды]

When solving a variety of problems, from finding regulatory motifs to predict the functions of proteins, bioinformatics have to work with the whole "family" evolutionarily related nucleotide or amino acid sequences. The lecture discussed the different ways of presenting such families used in popular bioinformatics tools and databases. It told how to decipher and interpret the PROSITE pattern sequence logo, what is the difference from the profile HMM PSSM, as well as how to avoid mistakes in their construction and analysis of the results.



21. Epigenomics, RNA, and all (Andrey Mironov, IITP)
[Video] [Слайды]

The lecture provides an overview of the concepts of epigenetics. We consider the levels of structural organization of chromatin, talked about various epigenetic modifications: modifications of histones, methylation of CpG-motifs. Discussed their impact on gene expression.
It is also considered the role of epigenetic modifications in splicing, imprinting, etc.

Talked about system XIST (X-inactivation specific transcript), antisense RNA splicing, RNA-dependent regulation.
It is also considered a model for the study of epigenetic modifications.

22. Data quality control NGS (Constantine Okonechnikov, Max Planck Institute for Infection Biology)
[Video] [Слайды]

The lecture describes the sequencing errors that are typical of technologies NGS. Examples of such errors are PCR amplification, sequence-specific reading errors, uneven distribution of GC-ended and others. We break down the different methods for evaluating these errors and accounting for them in the analysis. Raised the question of practices and solutions of existing software tools.



23. Data quality control NGS, seminar (Konstantin Okonechnikov, Max Planck Institute for Infection Biology)
[Video] [Слайды]

During the workshop, participants learned how to use the programming skills for quality control NGS. We examined data formats BAM / SAM, library and pysam pyplot, fundamental concepts. In particular, the analyzed examples of counting GC-composition estimates of the frequency of duplication, distribution, insertion length, calculation covering the windows.

24. Practical RNA sequencing (Konstantin Okonechnikov, Max Planck Institute for Infection Biology)
[Video] [Слайды 1] [Slide 2]

The seminar versed practical task of data analysis RNA sequencing.
The format of the presentation and practice were discussed and demonstrated methods: alignment Reed, the initial quality control Pipeline for studying gene expression and DESeq Cufflinks, finding isoform transcripts, search for hybrid genes.



25.











Thank you for your attention.