Computational Concepts in Biology

Computational Concepts in Biology I

At the University of Vienna, an interesting new Master's programme was introduced only a couple of years ago (in 2013). This Master's programme is called "Computational Science" and it is highly interdisciplinary. To be admitted for this Master's programme, you need to have a Bachelor's degree in Computer Science, Mathematics, Biology, Physics, Chemistry, Astronomy, Geology or a related field. The Master's programme has a minimum duration of two years and afterwards, you can enroll for a PhD programme.

"Computational Science" is all about research in natural sciences that is done using computers and most of all self-written computer programs. It it thus an ideal study programme for people who are both into computers as well as natural sciences. Depending on what type of Bachelor's degree students have, they either have to attend basic lectures in mathematics, basic lectures in computer science or advanced lectures in these fields. In addition, they have to attend lectures about all of the aforementioned branches of science - physics, chemistry, biology, geology and astronomy, especially lectures about computational approaches to scientific problems. Moreover, students have to complete a Master's thesis.

The lecture "Computational Concepts in Biology I" is a new lecture that had not been held at the University of Vienna before this new Master's programme was freshly introduced. It is an obligatory lecture for all students of the Master's programme. It is held in the winter term, two academic hours a week. Since in the winter term 2017/2018 it was held in the late afternoon, I was able to attend this lecture although I am already working in the software industry and not a student any more. There were a couple of times when I decided not to go to the lecture since it was very cold outside, so I prefered to stay at home, but most of the times, I was there. In this article I am going to tell you about what I learned in the lecture. In addition, I would like to mention that there is also a lecture called "Computational Concepts in Biology II", which is held in the summer term; we will see whether I will find time to attend it as well.

The lecturers involved in "Computational Concepts in Biology I" are Thomas Rattei, Bojan Zagrovic, Andrea Tanzer, Gerhard Ecker, Arndt von Haeseler, Ivo Hofacker and Christoph Bock. Most of them are employees of the University of Vienna. The exception is Christoph Bock who is working at the CeMM, a research institute that belongs to the Austrian Academy of Sciences.

Computational Biology is quite a broad field and it basically consists of two components: Bioinformatics and Computational Systems Biology. The lecture "Computational Concepts in Biology I" is more about the former subfield, while the latter subfield will be dealt with in the lecture "Computational Concepts in Biology II".

Bioinformatics is actually primarily about nucleic acids (DNA, RNA) and proteins. Nucleic acids make up the substance in which the genetic information of a cell is stored. They are located in the cell's nucleus, which is why they are called nucleic acids. Proteins are the product that results of transcription and translation of the nucleic acids. Inside an organism, proteins mainly serve two purposes: First, some of them are enzymes that make biochemical reactions possible. Second, some of them are so-called structural proteins, which means that they contribute to the constitution of the body.

There are huge databases with DNA sequences and these databases have to be processed by computer programs. That's what bioinformatics is all about. One application in particular is sequence alignment. The purpose is to discover relationships between genetic sequences. For example, it may be that two organisms of different species are related to each other, but they slightly differ in some of their DNA sequences. With sequence alignment algorithms such as the Needleman-Wunsch and the Smith-Waterman algorithms, it is possible to discover relationships between different DNA sequences. This allows researchers to speculate about the phylogeny, i. e. how these organisms are related to each other. In this context, the tools BLAST and FASTA are also well-known.

What I found especially interesting was Bojan Zagrovic's part of the lecture. He deals with computational biophysics of proteins. Proteins are large molecules composed of hundreds or even thousands of amino acids. It is difficult to predict the function of a protein in an organism just from its amino acid sequence unless one manages to visualize the protein in 3D. But to discover the correct folding of protein, enormous computational power is needed. According to Zagrovic, with today's computational power it is already hard work for the computer just to correctly simulate the folding behaviour of a protein for a period of a hundred nanoseconds. That is also why distributed computing is often used for this purpose. If you are a Windows user, you can download and install the program "Folding@home" from Stanford University on your computer and run it whenever you have nothing else to do. In this way you can actively support research in computational biophysics of proteins without actually doing anything but providing computational power.

Andrea Tanzer's part of the lecture was a basic introduction to modern molecular biology and genetics, which was not new for me, not only because of my medical studies but because I had already learned about these things at grammar school. For some of the other students, it was quite a tough part of the lecture.

Gerhard Ecker talked about pharmacoinformatics. It is interesting that artificial intelligence is already being used to identify potential drug candidates.

The lecture "Computational Concepts in Biology II" will be more about Computational Systems Biology, so the lecturers said. Computational Systems Biology is the science of creating computer models of biological processes and simulate entire organisms and ecosystems on the computer.

I am already curious what we are going to learn in this lecture in the upcoming summer term, and I am happy and grateful that I was able to attend the lecture in the winter term although I am officially not a student any more. 

Computational Concepts in Biology II

In the summer term 2018, I attended most of the lectures belonging to "Computational Concepts in Biology II". Since the lecture was held in the late afternoon, I was able to attend it although I am already working in the software industry and not a student any more. In this article I am going to tell you about what I learned in the lecture.

The lecturers involved in "Computational Concepts in Biology II" were, among others, Thomas Rattei, Jörg Menche, Filipa L. Sousa and Andras Aszodi. Unfortunately there were three lecturers whose names I do not remember.

Computational Biology is quite a broad field and it basically consists of two components: Bioinformatics and Computational Systems Biology. The lecture "Computational Concepts in Biology I" was more about the former subfield, while the latter subfield was supposed to be dealt with in the lecture "Computational Concepts in Biology II".

In the first lecture of "Computational Concepts in Biology II", Thomas Rattei talked about exact string matching algorithms, including Boyer-Moore, suffix trees and suffix arrays. This is something computer science students at the Vienna University of Technology ususally learn about in their mandatory algorithm classes, so it was mostly a repetition for me.

The next lecture dealt with microbial ecology. In this field of science, computers are mainly used to predict the species based on DNA sequences and which species it might be phylogenetically related to. The lecturer presented a couple of algorithms for prediction of the probability of a sequencing error and for splitting sequences with minimizing the entropy. The latter type of algorithm has the purpose to split a long sequence in possible partial sequences and thus increase the probability that a homology will be found.

I was unable to attend the third lecture due to work, but I recall that in the fourth lecture, topics included diversity (richness, evenness), the Shannon index, the Hill Diversity index, principal component analysis and clustering.

In the next lecture, Jörg Menche, an employee of CeMM (the Center of Molecular Medicine hosted by the Austrian Academy of Sciences), talked about network medicine. He mainly presented basics of graph theory (which I knew from my computer science studies) and mentioned a recent paper written by a researcher named Gerstein.

The week after, Filipa L. Sousa talked about homology, orthology, paralogy and xenology, i. e. how genes and species can be related to each other (by common ancestry, by duplication of a gene, etc.).

Then it was again Thomas Rattei's time. He hosted an exercise session to practice suffix arrays. Afterwards he talked about genome sequence assembly and graph theory (Hamiltonian circles, Eulerian circles, De Bruijn graphs).

In the two following lectures, Andras Aszodi talked about the theory of evolution, artificial life, cellular automata, Tierra and Avida. The final lecture was about system theory, but I had to skip it due to physical illness.

All in all, a nice lecture. There is also a book I can recommend if you are interested in working in the field, it is called "Bioinformatics and Functional Genomics" and was written by Jonathan Pevsner. It was published by Wiley Blackwell. 

Claus D. Volko

Comments

Popular posts from this blog

A Proof of the Riemann Hypothesis

A Proof of the CTMU - Sketch

The Synthesis of Metaphysics and Jungian Personality Theory