Monday, 21 April 2008

Predicting RNA binding from protein sequences

The first Birkbeck seminar of the new term was given by Sue Jones, from the University of Sussex. Sue is no stranger to Birkbeck as she did her Ph.D. with Janet Thornton at University College, and later worked with her at the European Bioinformatics Institute and the biotech company Inpharmatica. Today she described a piece of software that she and her colleagues have developed for predicting motifs in protein sequences that are likely to bind to RNA.

Proteins function largely by interacting with other molecules - they are "social" molecules. Protein interaction partners include other proteins, carbohydrates, "small" molecules and ions, and the focus of today's talk: nucleic acids. The structures and functions of RNA molecules are diverse and include protein coding (mRNA), protein synthesis (tRNA and ribosomal RNA), splicing, hydrolysis of nucleic acid bonds (in RNA enzymes or "ribozymes") and control of gene expression (the so-called "micro-RNAs or miRNAs). RNA-binding domains in proteins include RNP domains, dSRNA binding domains, and K homology (KH) domains - all these are mixed (alpha and beta) structures.

Jones and her colleagues surveyed known structures of protein-RNA complexes and marked residues that were in close contact (through van der Waals or hydrogen bonding) with the RNA. They described each amino acid in terms of predicted accessible surface area, conservation within the family of homologous proteins, and chemical properties. Not surprisingly, positively charged and polar amino acids were favoured in binding to the negatively charged nucleic acid over negatively charged and hydrophobic ones; glycine, which is flexible, and tryptophan, which can form base stacking interactions were also favoured.

Jones then built these features, averaged over a "window" of 5-25 amino acids, into a support vector machine to predict RNA binding features in proteins of unknown function. (This technique is a form of "machine learning"; you don't need to know about it for this course, but if you're interested in knowing more and can cope with maths at a relatively high level, see the Wikipedia entry.) This was found to be at least as reliable as any similar tools that are publicly available.

There will be more about protein-nucleic acid binding in the next section of course material, Protein Interactions and Function, which is due to be released at the end of April.

No comments: