Thursday, 21 March 2013

E. coli, turbocharging the workhorse


Escherichia coli, or E. coli, has long been the trusty workhorse of the structural biologist.  It is by far the most popular expression host, that is an organism which is used to translate introduced DNA into target proteins, with an astonishing 90% of structures deposited in the Protein Data Bank (PDB) having at least one subunit if not the entire protein produced by this bacterium.  Targets are becoming more ambitious, however, as the importance of larger protein complexes in biological pathways becomes increasingly apparent.  A recent review (Vincentelli, R., Romier, C. (2013)) examines whether E.coli is fit to tackle these new challenges.  Can the workhorse be taught to jump fences?

E. coli as host for single expression

The virtual monopoly held by E. coli as a host is due to a number of factors, not least its ease of use and low cost.  It is amenable to several different methods of genetic engineering, has rapid growth and benefits from an ever expanding range of host specific tools.

There are mature methodologies available for the production of a single protein.  Reliable and straightforward screening processes, which can discover the conditions required for optimised protein solubility, are accessible for even challenging targets.  Since the solubility of a protein has a direct impact on the size and quality of crystal that can be grown, this is a critical aim.

Genomics laboratories tend to use robotic platforms which can work with thousands of cultures side by side in order to optimise the culture conditions for their long shopping list of proteins.  This wealth of experience now allows biologists to make a rational selection of a smaller set of expression parameters when targeting proteins which are more difficult to produce.  If this restricted method fails then the team can revert to the broader set of conditions.

Supporting this trend towards more efficient screening processes are studies that have identified the parameters with the greatest impact on expression and solubility, a key example being specific fusion protein tags which enhance solubility.

Another recent development which has increased the complexity of proteins open to expression by E. coli, is the ability to co-express post-translational modifying factors.  These factors can be critical in protein folding, complex assembly and catalysis, for example by promoting glycosylations and disulphide bridges.

Streamlining for maximum solubility and quality

The traditional approach to protein production was to perform small scale expression tests designed to discover the conditions for the highest soluble yield.  These conditions would be scaled up and only at this stage would consideration be given to the quality of the sample, specifically protein aggregation, oligomeric states, protein stability and correct folding.

By using growth media with high cell density, particular E.coli strains and protein fusions designed to enhance solubility, initial yields have been increased.  This coupled with the improvement in biophysical characterization, such that assays to examine protein quality can now be performed on just micrograms of a sample, has meant that culture conditions can be optimised for both solubility and quality at the first stage thereby streamlining the expression protocols.

E. coli and the expression of macromolecular complexes

Most biological processes involve macromolecular complexes alongside single proteins and there is an increasing desire to understand the structural and biochemical basis of these larger structures.  The co-expression of partner proteins has been demonstrated to be advantageous for complex formation as protein-protein interactions are often the platform which allows co-folding and co-stabilization.

One example is the fimbrial tip complex of E. coli.  Many Gram-negative and some Gram-positive bacteria are covered in a fringe of short, thin fimbriae which are used to attach both to eukaryotic cells and to each other although the mechanism was previously unknown.  Co-expression of the FIM proteins of the fimbrial tip established that each subunit inserts a β strand into its neighbouring subunit such that allosteric changes in the tip protein trigger signals which can be passed down the fimbria.


Image adapted from (Le Trong, I. et al., (2010)).  A view of a fimbrial tip complex of E. coli.  (PDB 3JWN)

Birkbeck’s head of biological sciences, Prof Gabriel Waksman, has also used this technique to elucidate the interaction of FimH with its transmembrane translocation channel.  His work has been the subject of previous blogs in June ’11 and May ’08.


Another impressive example which illustrates the size of complex that can be achieved through co-expression is the 1.8MDa baseplate of the lactococcal phage TP901-1.  The baseplate is responsible for adhesion of the phage to the host and for delivery of the genome at infection and this particular version consists of 6 subunits of DIT, 18 of BppU and 54 of RBP proteins.

 

Image adapted from (Veesler, D. et al. (2012)).  The baseplate of the lactococcal phage TP901-1.  (PDB 4DIW)

Despite the advantages of co-expression, only a small percentage of large complexes listed in the PDB have been fully produced this way.  Often, subunits are produced using co-expression but then labour intensive in vitro reconstitution strategies are employed to form the complete complex.

In some cases, this is because the protein complexes have critical interactions with nucleic acids but this is not always the case. This raises the question of what the barriers are that are discouraging complex formation via co-expression from E.coli.

Approaching the jumps

Studies have found various parameters that influence the quality and yield of co-expression using E.coli.  Results can vary depending on whether a single vector is used to introduce the target genes rather than multiple vectors, whether multiple genes are used rather than cis and trans copies of the same gene or on the precise location of the affinity tag.

The best approach to tackle this quantity of possible sets of conditions is to use the high-throughput technologies which have been refined so effectively for single expression protein production.  This will need to be combined with the tools in development for miniaturization of biophysical characterization so that the sample quality can be considered during the initial stages rather than creating a further bottleneck as a second round of tests are performed with greater quantities.

As techniques for performing characterization assays on minute samples improve further and, hopefully, a co-expression system can be evolved which allows the production of protein/RNA and protein/DNA complexes, there is optimism that E. coli can extend its hosting duties into ever larger and more intricate protein complexes.

Protein-protein interactions and protein expression for structural biology is covered in detail in the TSMB course.

Tuesday, 19 February 2013

Molecular Graphics: Before and Beyond Jmol

A quarter of a century ago, molecular graphics was a slow process. It was possible to generate high quality images of protein structures, but the programs were all expensive, command line driven and very slow. They were, therefore, almost out of the reach of students and of biologists working at the lab bench.

What changed everything - and, incidentally, made courses like Principles of Protein Structure possible - was a little program called Rasmol. This program was the first free tool to offer real-time manipulation (rotation and zooming) of structures as complex as proteins on "ordinary" desktop PCs. Its author, Roger Sayle, wrote the first version as a final-year project for his BSc in Computer Science at Imperial College; at the time it was, remarkably, the second fastest molecular graphics algorithm in the world. It was released to the worldwide biomolecular research community in 1993 and at its peak had over half a million users.

When the Principles of Protein Structure course was first launched in the late 90's, Rasmol was the obvious molecular graphics program for us to choose. In the first years of the course, however, there were no movable molecules embedded in the course web pages. Instead, students had to download molecular structure files from the PDB or Birkbeck's own server, save them and open them using Rasmol. This worked very well for a number of years; in 2000, soon after the MSc course started, Roger Sayle was awarded the Heatley Medal from the Biochemical Society for "exceptional work that makes biochemistry widely accessible and usable”.

But even by 2000 Rasmol was beginning to lose its popularity. Students and others with little experience of command lines were finding it increasingly "clunky" to use. Roger moved on from Glaxo - now part of GSK - where he had been continuing to develop the code; he founded and is still CEO of cheminformatics company NextMove Software, based in Cambridge, UK. Several new developers then spawned different Rasmol versions. The original algorithms also lived on in two programs that allowed the software to be embedded into web pages: first Chime and then Jmol, which, as you know very well, is widely used today.

But even Jmol has its disadvantages, particularly when it comes to publication. As PPS students, you should already have experimented with using Jmol to create still images showing protein structures in a particular orientation and format. These images look great if presented electronically, but their resolution is 72 dpi (dots per inch, a unit that is still widely used in publishing) which is not enough for print publication. Most journals insist on all figures being of at least 300 dpi. There is much more about this in Section 4 of the PPS course, under "Writing a paper or report".

Your projects for PPS (and for TSMB, if you go on to do that course) are to be written as Web based dissertations, so this will not necessarily be an issue for you. We will be perfectly happy if you generate your images using Jmol or Rasmol. (We will be considerably less happy if you copy them from external sources, however high quality they are, but that is for another occasion.) You may, however, want to try out some more advanced software that generates high quality images, and we will be delighted if you do!

There are many programs available for molecular graphics and modelling, and we will be describing some of the "modelling" aspects of these in much more detail in section 9 of the course, Molecular Forces in Proteins. For high quality graphics only, however, I would like to recommend three programs that are all "more or less" free: PyMol, Chimera and CCP4mg. All these programs allow users to make publications quality images of molecules in a wide variety of formats, and, interestingly, all also allow users to make simple movies. If you have ever been in a lecture and wondered how the speaker could automatically rotate and zoom into a structure to show the active site in detail, you need do so no longer.

These two pictures should illustrate the difference. The top one is taken from the PPS course material and shows an close-up of an image generated in Rasmol; it is clearly pixelated. The lower one shows part of a spacefilling image generated using PyMol at a rather similar scale; it is much better quality although still not perfect (if you look carefully you will see that the spheres are not quite spherical). Similar quality images can be produced using Chimera and CCP4mg.

Zoomed image of part of a protein molecule saved using Rasmol

Zoomed image of part of a protein molecule saved using PyMol

PyMol was written by a young researcher at the University of California San Francisco, Warren Lyford DeLano, and he founded his company DeLano Scientific LLC, to promote it as "an experiment in the commercial viability of an open source software company". This lasted only until DeLano's death in 2009 at only 37; the program is now supported by Schrodinger and is only available free to "bona fide students and educators" - a category that, of course, includes PPS tutors and students.

Chimera also hails from the University of California San Francisco. It has been developed within the university's biocomputing department and, thanks to NIH support, remains free to all but industrial users. It has been developed alongside DOCK, a public domain program for "docking" small molecules such as drugs into protein active sites, and it makes creating the complex input files that are needed to run DOCK a lot easier.

CCP4mg is the UK's main contribution to the field of high quality, public domain molecular graphics programs. It is part of the CCP4 software suite for protein crystallography and includes facilities for displaying the electron density maps that that technique generates as the first step towards solving a structure. You will learn a lot more about this if you take the second year protein crystallography option in the MSc, and some if you take the more general course, TSMB.

Do explore any of these programs if you like and if you have time. But I must end by reassuring you that we are not expecting you to use any of them - Rasmol or Jmol will be fine for the Web based dissertations that form part of this MSc.

(A version of this post will be appearing as the Cyberbiochemist feature of the Biochemical Society's membership magazine, The Biochemist, in April 2013.)

Wednesday, 9 January 2013

From Genome to Proteome: BCA Winter Meeting 2012

The British Crystallographic Association is the main UK organisation supporting the science of crystallography in all its forms. Every year, its Biological Structures Group holds a meeting in the run-up to Christmas to discuss and celebrate recent developments in structural biology research. In 2012, this Winter Meeting was held at the MRC Laboratory of Molecular Biology at the University of Cambridge.

The LMB, as it is usually known, is one of the birthplaces of modern structural and molecular biology. It moved into its current building in 1962, the year when four of its most famous scientists were awarded two Nobel Prizes for some of the most important discoveries in twentieth-century biology: James Watson and Francis Crick or the structure of DNA, and Max Perutz and John Kendrew for the very first three-dimensional structures of proteins (myoglobin and haemoglobin, respectively).

It was appropriate, therefore, that the theme of this year's Winter Meeting was "From Genome to Proteome". The basic molecular processes that underlie all of life - DNA replication, transcription of DNA into RNA and translation of RNA into protein - are all, now, quite well understood. These processes are all very complicated and require numerous proteins, many of which interact together to form complexes and "molecular machines" that are quite large, at least in molecular terms. Scientists presenting at the meeting discussed recent, innovative studies of the structures of many of these proteins and the nucleic acids that they interact with. Many of these processes will be discussed in some detail in section 8 of the PPS course, "The Protein Lifecycle".

The meeting programme was divided into three sections, corresponding respectively to DNA synthesis and repair, RNA transcription and protein translation.

DNA Replication and Repair

DNA synthesis and repair are not even mentioned in the famous Central Dogma of Molecular Biology (put very simplistically, DNA makes RNA makes protein) but they are, of course, essential for it. The first speaker in this session, and therefore in the meeting as a whole, was Luca Pellegrini from the University of Cambridge. He described structural studies of the first part of this process: the initiation of DNA synthesis. In all organisms, this process involves an enzyme called primase, which is found at the DNA replication fork - the point at which the strands of the original DNA helix divide so that a new strand can be synthesised on each of the template strands. Pellegrini and his group have solved the structure of several of the subunits of yeast primase, alone and bound to part of the DNA polymerase Pol alpha, and are using these structures to deduce the precise mechanism of this vitally important process.

Then Neil Kad of the University of Essex described the techniques he has developed for visualising individual molecules, and how he is applying them to the study of DNA repair by nucleotide excision. Briefly, this technique involves stretching a single molecule of DNA between two positively charged silica beads, and tagging individual molecules of DNA-binding proteins using fluorescent quantum dots so that their binding to and progress along this DNA "tightrope" can be monitored. He has discovered that although single subunits of the Uvr DNA repair protein complex may bind DNA and search it for errors, a complex between the subunits UvrA and UvrB is required for quick and efficient searching.


Schematic diagram of a "DNA tightrope" with labelled proteins bound. (c) Neil Kad, from the Kad Lab homepage

Transcription

The spliceosome is a "molecular machine" comprised of protein and small nuclear RNA (snRNA) subunits that found only in eukaryotes and that catalyses the removal of introns from the messenger RNA precursor molecules that are initially transcribed from DNA. Chris Oubridge, a member of Kiyoshi Nagai's group at the MRC Laboratory of Molecular Biology in Cambridge (and therefore one of the "home team") described an atomic resolution structure of a complex known as U1 that forms a major part of the soliceosome. This "small nuclear ribonucleoprotein" (snRNP) comprises the snRNA molecule U1 bound to ten proteins. This technically challenging exercise in X-ray crystallography is yielding important insights into the function and mechanism of this important part of the spliceosome.

Structure of the U1 ribonucleoprotein, from Kiyoshi Nagai's web pages at the MRC-LMB.

Another interesting presentation in the Translation section was given by David Lilley from the University of Dundee, who described the structures of kink turns in RNA molecules, and how these structural motifs interact with proteins.

Translation

Since the modern Laboratory o Molecular Biology was constituted as the "Unit for Research on the Molecular Structure of Biological Systems'" in 1947, nine Nobel prizes have been awarded to scientists working there. Its most recent laureate, Venki Ramakrishnan, shared the 2009 chemistry prize with Tom Steitz from the US and Ada Yonath from Israel for determining the first atomic resolution structure of the ribosome. Israel Sanchez from Ramakrishnan's lab at the LMB gave a presentation on the mechanism by which stop codons, which give the signal to terminate protein synthesis, are decoded on the ribosome. This process, which occurs when one of the stop codons (UAA, UAG and UGA in the standard genetic code) binds to the ribosomal A site, is still less well understood than the process through which "sense" codons are decoded into amino acids. Sanchez and his colleagues are studying the structure and function of ribosomes bound to modified RNA in which the uridine in the first position of a stop codon has been substituted by pseudo-uridine. They have discovered that the decoding centre of the ribosome is more flexible than they had originally thought, an insight that may help the understanding of the termination of protein synthesis further.

The final speaker was Birkbeck's own Cara Vaughan. She discussed some of her recent research using a combination of X-ray crystallography and electron microscopy to decipher the assembly of the kinetochore. This is a structure that forms in eukaryotic cells during cell division and that links the dividing chromosome to the mitotic spindle. Vaughan's research concerns a protein called Hsp90 that activates many signalling proteins. This protein is a member of a class of proteins termed the chaperones, which are generically involved in the folding, unfolding and activation of other proteins. Vaughan and her co-workers have solved the structure of two interacting proteins found in yeast, Sgt1 and Skp1, which togethe3r seem to hold Hsp90 in an open conformation that enables other kinetochore proteins to bind.

Image of a dividing eukaryotic cell. The chromosomes are shown in blue, the microtubules of the mitotic spindle in green, and the kinetochores in pink. Image from Wikimedia Commons.

The annual Winter Meeting is the most high profile event organised by the Biological Structures Group of BCA. The association as a whole organises many other events, including, this year, the annual European Crystallographic Meeting. ECM 28 will be held at the University of Warwick from 25-29 August 2013; it will provide an opportunity for British and European crystallographers to celebrate the origin of their science with the discovery of X-ray diffraction by father and son William Henry and William Lawrence Bragg, almost exactly a hundred years ago.

Monday, 3 December 2012

Epigenetics for Beginners



Before I start, I’d like to briefly introduce myself and say hello.  My name is Jill Faircloth and I studied the PPS course in 2009-10 and then went on to take the TSMB course the year after.  I posted a proper introduction of myself in my first blog in March this year.  I also gave some tips from previous years on managing the revision and project work during the summer and thoughts on the two follow on courses.  

Since that time I have started writing for The Brain Tumour Charity.  The challenge there is to write explanations of the highly technical research being sponsored in language that can be understood by a non-scientist.  This PPS blog is a fantastic opportunity for me to continue to explore protein structure research, which I enjoyed very much during my MSc, and hopefully to bring interesting and relevant pieces of work to your attention.  Since you have only just begun to look at protein structure, however, I thought I’d begin with a beginners guide to a hot topic: epigenetics.



The nature versus nurture argument has long been a fertile source of entertaining and/or heated debate.  As team nature pinned their colours to the mast of the all powerful genome, the nurture camp would gleefully point out that identical twins often exhibit different personalities, proof that the genome is not the ultimate dictator.  A compromise was generally agreed upon whereby an individual’s personal traits were thought to be formed by a nurture overlay on a nature foundation.

Epigenetics is the emergent science which is poised to provide a more sophisticated answer both to the origins of individuality and to the question of how dividing cells in a developing foetus have the ability and apparent programming to become brain cells or liver cells or any of the other very many different types of cells in a mature organism.
  
To understand the epigenome, we must first look at the storage of DNA in the nucleus.  The familiar double helix of DNA is tightly wound around histone proteins, packaged in groups of eight.  Each histone package is wound twice round by the helix to form a nucleosome so that the DNA appears as a string sporting nucleosome beads along its length.  This thread of nucleosomes is woven into a rope called chromatin which, in turn, is woven into a chromosome.
  
Each of the diverse cells of the body carries the same DNA which encodes the entire genome, that is, the code for every protein required throughout the organism.  The epigenome is the system of molecules which acts upon the packaged DNA to determine which of the genes encoded is activated in any particular cell.

There are two key methods employed by the epigenome, both of which can act to “turn off” particular genes:

  • ·         the histone proteins can be post-translationally modified so that they bind the DNA more tightly, thereby shielding some genes from transcription, and
  • ·         small chemical groups, predominantly the methyl group, bind to the DNA at CG pairs.  A gene with several methyl caps will be blocked from transcription but removal of the caps will allow re-activation of the gene.


Illustration of epigenetic mechanisms adapted from Wikipedia



Using this mechanism, cells with exactly the same genome can behave differently since they can have different combinations of histone modifications and methyl groups, i.e. different epigenomes, controlling which genes are expressed and which are dormant.  The processes which govern how cells acquire these different epigenomes are not well understood although it is known that the basic patterns are encoded on the genome and are then altered by environmental factors.  These include signals from neighbouring cells so that a cell’s location is a key factor in determining its unique epigenome.


Cell location is not the only factor, however, as demonstrated in a study by Frago et al, (2005),  which showed that although identical twins begin life with the same epigenome, wide differences accumulate over time in the acetylation of their histones and the methylation of different genes.  This is thought to account for the clear differences seen in the personalities and disease susceptibilities of monozygotic twins.


Another illustration of an environmental factor is provided by some intriguing research into the epigenome of honey bees.  This study, (Lyko, F. et al, (2010)), found that all of the honey bees in a swarm have the same genome but as larvae they are given a different diet, with future queens receiving royal jelly, and this creates a difference in the methylation of more than 550 genes, including those for histones.


Another study not only demonstrates that diet has a strong epigenetic effect but also shows that this can have transgenerational ramifications.  Kaati et al, (2002) surveyed the long term health of the residents of a sparsely populated region of Sweden called Ӧverkalix, which has been prone to cycles of high harvest yields followed by years of famine.  The results showed that the paternal (but not maternal) grandsons of men who experienced famine in preadolescence were less likely to die of cardiovascular disease whilst paternal grandsons of men who enjoyed plentiful food were more likely to die of diabetes. Interestingly, the opposite effect was found with women.  Women whose paternal (but not maternal) grandmothers were exposed to famine whilst in the womb, i.e. when their eggs were forming, were found on average to have a shorter lifespan.



Epigenetic changes are usually considered to apply to the genome within the lifetime of the organism but the Ӧverkalix study demonstrated that changes occurring in a sperm or egg which is then fertilized can have a transgenerational effect.  In other words, environmental effects on the epigenome are potential evolutionary drivers.


Epigenetics also excites a great deal of interest from a clinical point of view.  A great many cancers are found to be associated with both aberrant DNA methylation patterns and a histone deviant called H2AZ.  These abnormalities cause disactivation of certain genes which could provide vital clues to the mechanisms of malignancies and their possible treatment.
  

The National Institute of Health in the USA allocated $190m to epigenetics research between 2008 and 2013, recognizing its potential to explain the molecular processes behind human development and many important human diseases.    This investment is bearing fruit with a recent announcement that variations associated with several common diseases have been found in non-coding regions of DNA.  88% of these regions are responsible for regulating genes during foetal development and are known to be susceptible to environmental exposures.   In other words, environmental factors experienced in utero produce epigenetic changes which can manifest decades later as adult onset diseases (Maurano, M.T. et al, (2012)).


There is still much to discover in this advancing field but between the potential for a whole new regime of therapies and the possibility of an explanation of the interaction between nature and nurture there is also much to be excited about.