Principles of Protein Structure: May 2014

Every year, Birkbeck hosts a lecture by a distinguished scientist to honour the memory of the founder of its Crystallography Department, J.D. Bernal. “Sage” as he was called by all who worked with him had an enormous range of research interests spanning both science and society; he is widely considered one of the most brilliant scientists never to have won a Nobel Prize. The 2014 Bernal Lecture, held on March 27, was given by Professor Janet Thornton, the director of the European Bioinformatics Institute (EBI) at Hinxton near Cambridge.

Professor Dame Janet Thornton, © BBSRC 2014

Introducing the lecture Professor David Latchman, Master of Birkbeck, described it as a unique occasion: the only time he has introduced as a guest lecturer someone who he had interviewed for a job. Thornton includes both Birkbeck and UCL on her CV: appropriately, her last post in London was that of Bernal Professor, held jointly at both colleges. She moved on to “even greater heights” as director of one of Europe’s top bioinformatics institutions in 2003.

Thornton began her lecture with a quote from Bernal: “We [academics] can go on being useless up to a point, with confidence that sooner or later some use will be found for our studies”. That quote is of particular relevance to the subject that she has made her own: bioinformatics. She had already begun her research career in 1977, when Fred Sanger invented the process that was used to obtain the DNA sequence of the human genome. That endeavour, which was completed in 2003, took over ten years and cost billions of dollars. Sequencing a human-sized genome, which has about 3 billion base pairs of DNA, now takes maybe 10 minutes and costs about a thousand dollars. While a decade ago we had one “Human Genome”, we now have lots. Mega-sequencing projects already planned or in progress include projects to sequence about 8,000 Finns, and the entire 50,000 population of the Faeroe Islands; one to sequence paired tumour and normal genomes from 20,000 cancer patients; and the UK10K project, which is investigating the genetic causes of rare diseases.

It is now almost extraordinarily simple and cheap to obtain genomic data, but real challenges remain in interpreting and understanding it so that it can be used in medicine. This is the province of bioinformatics, and Thornton devoted much of her presentation to explaining five ways in which gene (and protein) sequence information is being applied to both basic and clinical medical research:

Understanding the molecular basis of disease
Investigating differences in disease risk caused by human genetic variation
Understanding the genomics of cancer
Developing drugs for infectious diseases, including neglected diseases
Investigating susceptibility to infectious disease

There are rather more than 20,000 genes in the human genome, far fewer than were originally predicted. Tiny differences between individuals in many of these either directly cause a genetic disorder or confer an increased – or in some cases decreased – risk of developing a disease. The genetic causes of some diseases, such as the bleeding disorder haemophilia, were known many years before the “genome era”: others have been discovered more recently. Mapping known mutations onto the structure of the enzyme copper, zinc superoxide dismutase has revealed the cause of the inherited disorder amyotrophic lateral sclerosis, a form of motor neurone disease. And knowing the genome sequence has already made an enormous contribution to our understanding of the mechanisms of disease development, contributing to improvements in diagnosis and the design of novel drugs.

We now understand that cancer is a genetic disease: it arises when mutations in a group of cells cause them to grow and divide excessively. A cancer is no longer classified just by its location (for example, a breast or lung cancer) but by the particular spectrum of genetic variations in its cells. About 500 different genes are known to be mutated in cancer, some much more often than others. For example, about 60% of cases of melanoma, a type of skin cancer, contain one specific mutation in the gene BRAF. This codes for a protein that can direct cells to grow and divide, and the cancer-causing mutation sticks this protein into the ON position, so this signal is always sent. Scientists in a company called Plexxicon used their knowledge of this mutation and the structure of the protein to design a drug, vemurafenib, which prevents the BRAF protein from signalling. This can cause a dramatic, if short-term improvement in melanoma patients, but, crucially, it only works in patients whose cancers carry this mutation. It is one of the first developed examples of a “personalised medicine” that is only used alongside a diagnostic test for a genetic variation. There will soon be many more.

Genomics is also proving very useful in the fight against infectious disease. Antibiotic resistance is one of the greatest emerging threats to human health, and scientists have to use all the tools at their disposal, including genomics and bioinformatics, as they try to stay one step ahead of rapidly mutating pathogens. Sequencing is widely used to track the sources of outbreaks of infection and of resistant bacteria such as methicillin-resistant Staphylococcus aureus (MRSA) in hospitals, and it is the only way of determining the exact nature of an infection. One of the most dramatic examples of the use of genomics in infectious disease control occurred in 2011, when a novel strain of E. coli O104 caused about 4,000 cases of serious food-borne illness and 50 deaths in Germany. This was originally linked to cucumbers imported from Spain but a global effort to trace its specific sequence variants proved that the source of the infection was beansprouts grown on a farm near Hamburg.

There was much more to Thornton’s wide-ranging lecture than simply bioinformatics and medicine: more, indeed, than it is possible to do justice to in a single blog post. She went on to describe some of the benefits of genomics for agriculture and food security. These included designing new strategies for controlling pests and diseases, maximising the efficiency of biomass processing, and even managing biodiversity. It is necessary to measure biodiversity in order to manage it properly; it is now possible to define a short stretch of DNA sequence that fully identifies a species or sub-species (a so-called “DNA barcode”) and these are beginning to be used to track some very diverse organisms, including the 400,000 known species of beetle.

The lecture ended with a short discussion of some of the challenges facing bioinformatics and genomics in the second decade of this century, largely relating to difficulties with storing, manipulating and understanding the enormous quantity of data that is being generated. Mining this data mountain for the benefit of mankind is a task that is beyond either the academic community or the biotech industry alone. It will require novel ways of doing science that involve governments and charities as well as academia and industry. The new Centre for Therapeutic Target Validation, launched at Hinxton on the same day as Thornton’s Bernal Lecture, is a pioneering example of such a partnership. It has been set up by the EBI, the Sanger Institute where a third of the original human genome sequence was obtained, and pharmaceutical giant GSK, and its scientists aim to use the whole range of available genomic data to select and evaluate new targets for novel drugs.

Bioinformatics is covered in section 6 of the PPS course. Students who take the second-year option Techniques in Structural Molecular Biology will return to it then, where the material focuses on selecting protein targets for structural genomics initiatives: a task that is linked to that of selecting drug discovery targets.

This post will be cross-posted on the Birkbeck Events blog.

Principles of Protein Structure

Wednesday, 7 May 2014

The Many Uses of Bioinformatics

Blog Archive

Contributors