"Identification of domain shuffling in closely related prokaryotes"
In order to detect possible targets for diagnostic markers, we want to investigate possible changes in proteins which might enable us to distinguish between closely related bacteria. A prominent example would be different strains of E. coli, where some are harmless, but others causing severe illness. The trivial approach would be to look for unique genes. In addition to this we want to investigate changes in the genome leading to a shuffling in the internal structure of proteins. We will use PFAM domain signatures to define a domain architecture for each protein. For all cases in which this architecture is not trivial (containing more as one domain), we check if this architecture is still detectable in any protein of a closely related species. In addition, we can look for gene fusion events.
The aim is to come up with a list of altered proteins against which we possibly could make specific antibodies which would distinguish two related strains.
We will use following species:
E. coli: standard strain vs. EHEC strain, EPEC vs. EHEC strain
Staphylococcus aureus: multi-resistant (MRSA) vs. normal
"Yeast genome Variation and phenotype"
In this project, we will investigate the relationship between genome re-arrangements and phenotype (here: the ability to grow under a certain condition). Firs, we will compare our in-house data against a public available data-set of growth rates in different conditions and determine if it provides the same or different information on the relationship yeast-strain~growth-rate (by e.g a plot). Then we will try to correlate single re-arrangements with growth using R to hunt for influential mutations which might explain different growth in different conditions to some extend. The data for the re-arrangements is also generated in-house and will be provided.
"Intrinsic epitope prediction"
In a very old approach of Kolaskar and Tongaonkar (1990), B-cell antibody epitopes are predicted using amino-acid propensities. We would like to know if these relate to structural features (like not being disordered but part of a secondary structure element). For this, we compute epitope predictions for a large set of sequences. Then we perform a secondary structure and disorder prediction and try to correlate both with the epitope prediction. If we can then correlate epitope regions with e.g. secondary structure elements, then the method captures some secondary structure features. This will help to answer the question if adding secondary structure prediction to an epitope finding algorithm could be beneficial or redundant.