Algorithms and biological data - meet Dmitry Kobak

Following a PhD in bioengineering from Imperial College London and a postdoc at the Champalimaud Centre in Lisbon, Dmitry Kobak set up his own group at the Hertie AI Institute at Tübingen University. In 2026, he moved to the VIB Center for AI & Computational Biology. Time to ask him some questions.

Dmitry Kobak

Hi, Dmitry, and welcome to VIB! Can you briefly introduce us to your research?

Hi! Sure – I study machine-learning algorithms for self-supervised learning and apply them to various biological data, often in the context of exploratory data analysis. In recent years I’ve mostly been working with single-cell RNA-sequencing data and multi-modal single-cell data from the brain.

Together with my students, we also worked on image data and on text data, and these are data modalities that I want to continue working with, now in a more biological context. Finally, I am really looking forward to joining forces with my colleagues at VIB.AI and diving into genomic, epigenomic, and protein-related data.

Unsupervised and self-supervised learning are very exciting for me because exploratory data analysis is really ubiquitous in modern data-hungry biology, and so it is important to do it right and to develop good tools for it. Self-supervised learning is also very interesting from the machine-learning point of view, with many new ideas and developments happening in the field in recent years.

So you come at it more from the machine learning than the biology angle?

I guess I am a bit of an outlier at VIB because I am more interested in algorithms than in biological systems. Most scientists at VIB try to understand how a certain biological system works. I am happy to work together with them, but biological systems are so complex and can be so intimidating! I am often interested in understanding how a certain machine-learning systems works. They can also be intimidating, but it is easier to get an answer.

Nowadays people can take a lot of biological data (for example, genomic and epigenomic), train a machine-learning model to do some predictions (on genomic regulation, for example), and end up with a system that works but it is not entirely clear how it works. So now we have a new question: how does our model work? I want to be able to start answering such questions.


When did you realize you wanted to be a scientist?

Actually I have always wanted to be a scientist, I can remember thinking about that as a kid. For a very long time I was not sure what kind of scientist I want to be exactly, but I knew that I will be some scientist. I am not sure why: my parents are not scientists so it was not “in the air” when I was a child. Maybe it was the books I was reading. In any case, science, and also academia as an institution, always felt very attractive to me. I think I had a very romantic view of academia, and in some sense I still do! Later, during my PhD and postdoc years, many of my friends, colleagues, and relatives left academia to go into industry or finance, and some became very successful. But I could never imagine doing that.

You’ve done really interesting work on LLM forensics, which showed that LLM-assisted scientific writing contains a lot more ‘style words’. How do you feel about using AI to help write scientific papers?

Good question. If we are talking specifically about the process of writing, then I must say that I am very skeptical about using LLM assistance for that. Of course I am aware of the many advantages that it could bring, such as helping scientists with weak knowledge of English and removing some of the disadvantage that they may experience due to poorly phrased English texts. But at the same time, I believe that the process of writing is actually the process of thinking, and the experience of struggling to write a clear manuscript is one of the essential parts of academic training. Asking an LLM to write things up from a bullet-point list can feel like a mere time saver, but I worry that in the long run it can undermine one’s ability to think clearly.

In principle I have no problems with using LLMs (or any other software) to correct grammar or to translate a text from one language to another. But I would avoid delegating higher-level writing to them. Would you have wanted to read this interview if I used an LLM to generate the answers?

You also work on making complex datasets easier to understand and explore through data visualization. What are some key principles you use here?

I like to say that the key principle here is “all data visualizations are wrong but some are useful”. This is paraphrasing the famous aphorism of George Box who said it about models. The thing is, people like to criticize visualizations of high-dimensional data because they inevitably introduce distortions. And this is true. But at the same time it can be very useful to lay out the data in front of your eyes and to just look at it: our eyes are really excellent at noticing interesting things. Of course one should be aware of possible distortions and one should always follow-up and confirm what one has noticed, but this is always the scientist’s job.

Visualization of the CIFAR-100 dataset, a collection of 60,000 images spanning 100 object categories commonly used to evaluate image-recognition systems. Images with similar visual features are grouped together in a two-dimensional map, revealing clusters that correspond to different categories.

We all get stuck sometimes. When you feel stuck in a project, is there anything that helps you get unstuck?

I wish I knew! Do something else. Go for a walk. Go for a run. Meet some friends. Sometimes just put a project away for some time and work on something else. Maybe its time will come later. In science, one never knows whether a project will succeed or not, so one should be ready to try other things.

What do you hope your research will be the foundation for fifty years from now?

Whoa! I was often asked at interviews where I want to be in five years, but this is the first time somebody asks me where I want to be in fifty years!

I honestly do not know, this timescale exceeds my imagination. Our world is changing so rapidly that it is hard to say where we are all going to be in ten years, let alone fifty... I hope to contribute to the ways machine learning is used in biology in a way that brings us more understanding of biology (and also of machine learning). I hope this perspective will still be relevant in fifty years. Nowadays people are often talking about “prediction”: about using AI to predict this or that in biology. And this is great. But prediction is not all. We also need understanding.

If there's one book that has changed your view on life, which one is it?

I don’t think I can select one, it is either too many that come to mind or none at all, depending on the threshold for what counts! Above I said something about romantic views of science and academia… maybe one book that could have contributed to that is Hesse’s The Glass Bead Game which I loved as a teenager. I haven’t read it since then though, so am not sure how much I would like it now. Maybe I should give it a try.

If you could give one piece of advice to a current graduate student working at the interface of machine learning and life sciences, what would it be?

Try to find a healthy balance between not using AI tools at all and relying on them too much. This is difficult, plus nobody really knows what is healthy here :) So at least be mindful and keep reflecting on what you are doing!

Thanks, Dmitry!

The Dmitry Kobak lab is hiring! Please email Dmitry if you are interested.


Gunnar De Winter

Gunnar De Winter

Science Communications Expert, VIB

 

Share

Latest stories

Website preview
On World Ocean Day, we celebrate the power of marine microbes
The ocean is one of Earth's biggest allies in buffering climate change. But some of its most important work happens out of sight, at the scale of molecules and microbes. At the VIB-KU Leuven Center for Microbiology, Sammy Pontrelli and his team study how marine microbes control the fate of carbon in the ocean, and what that might teach us about the future of our climate.
blog.vib.be
Website preview
Website preview
An EPIC research trip to Stockholm
Eva Van Bun, PhD student in the Verstrepen lab, recently spent four weeks at the lab of Vicente Pelechano at the Karolinska Institutet in Stockholm, Sweden, to learn advanced methods for polysome profiling and RNA-sequencing. It was an intense month with long lab days, intense scientific exchange, new collaborations, and memorable Swedish traditions, all of which left her eager to return.
blog.vib.be

About VIB Blog

On our blog, you can find content curated by the VIB community. Discover our research through the eyes of our scientists.

Want to be kept up-to-date on our biotechnological news and stories? Join our community and subscribe to our bi-monthly newsletter here.

Contact

Suzanne Tassierstraat 1 9052 Ghent

communications@vib.be

vib.be