Proteomics project faster with grids and lightpaths

Lead applicant: Péter Horvatovich Organisation: University of Groningen

The research

Dr Peter Horvatovich, assistant professor in bio-informatics at the University of Groningen, is developing software to process large quantities of data generated during proteomics research. Horvatovich is an expert in the field of proteomics, the study of the proteome. He explains: “The proteome is the collective term for all the proteins in an organism that are produced on the basis of the genome, the genetic information present. These proteins differ from cell to cell and they are constantly changing over the course of life due to all kinds of biochemical interactions, for example with the environment.” In many medical conditions, defects in and between proteins play an important role. “Proteomics research can contribute to new medical insights, for example in the form of new therapies,” says Dr Horvatovich. “The research focuses, for example, on what proteins occur in a cell and in what quantities (the protein profile), what changes proteins undergo, and what interactions occur between them. We aim to be able to identify cancer cells, for example, with the aid of biomarkers, which are proteins that function as indicators.”

The challenge

The proteomics studies are intensive. “Using a mass spectrometer produces a particularly large amount of information,” says Dr Horvatovich. “It generates between ten and a hundred thousand results from just a single sample. Bio-informatics is the discipline that brings together biology and IT. We use IT to store data in databases, to analyse information, and to convert data into useful knowledge.”

Distributing the growing quantities of information to computer centres and storage locations was causing a great deal of delay. Processing the information generated during the research requires a lot of computing capacity.

The solution

To solve the challenges of Dr Horvatovich and many other life science researchers, a multi-faceted solution was created combining better network connections and access to computer facilities. To speed up the research process lightpaths were used to transmit the data faster to researchers at other locations, and process and store on supercomputers, which are only available in a few places in the Netherlands. “Using SURFnet’s dynamic lightpaths has alleviated those bottlenecks,” says Dr Horvatovich. “We can now share data between the various computer centres. And we can also process it significantly faster and cheaper. Those locations were connected by lightpaths. That created opportunities for our research.”

The computing capacity and the associated storage capacity are available in Groningen, at the SURFsara High Performance Compute center in Amsterdam, and also in Utrecht, Rotterdam and Delft. This Life Science Grid is a network of compute clusters intended specifically for researchers in the life sciences. The Life Science Grid (LSG) consists of 10 compute clusters. Each member institution has access to an on-site compute cluster managed by SURFsara. Using the grid structure gives researchers access not only to their own local LSG cluster, but also to other LSG clusters. Institutions can scale up to greater computational power or storage capacity when performing large-scale analyses.