Bioinformatics and Bioengineering Summer Institute (2003)
Problems in producing large amounts of an enzyme
Threading one protein through another

I. The Scenario (revisited)

Here's the situation in brief. You've cloned and overexpressed an enzyme, UDP-glucose dehydrogenase (UDPGDH), from Mesorhizobium loti, but it forms inclusion bodies and low activity. To overcome this, you've generated random mutations in the protein. Now you have several mutants that are somewhat more soluble. You intend to analyze the positions of the mutations to predict where might be the critical region or regions governing solubility. The sequence of your protein has been determined but not the 3-dimensional structure. The 3-D structure of a similar enzyme from Streptococcus pyogenes has been determined.

You need to get the following:

Then, given the alignment between your version of the enzyme and the enzyme from S. pyogenes, you want to somehow superimpose the amino acid sequence from M. loti on top of the three dimensional structure of S. pyogenes and display the result in Protein Explorer, coloring in those amino acid residues that were mutated in relatively soluble mutant forms of UDPGDH. In this way, you can examine which regions of the protein appear important in increasing the solubility of the enzyme.

II. Get what you need
II.A. Get the sequences and coordinate file

All the sequences you need can be obtained from GenBank through the National Center for Biotechnology Information (NCBI) web site. Go there, click on Protein in the Search box, then enter UDP-glucose dehydrogenase Mesorhizobium, and click Go. You'll pull up six sequences. Ignore the ones marked "homolog" and the ones from Yersinia. That leaves two. They look equivalent to me, so I advise saving the one with the later date stamp. Save it with an HTML extension so that Windows can interpret the file correctly later.

Then go back and search for UDP-glucose dehydrogenase Streptococcus pyogenes. This will get you seven sequences. Two of them are identified as structures. Which to take. Unfortunately, the files are not well annotated, and so the choice is difficult. However, if you go into one of the files and get the abstract of the article that describes the structures (click on the number next to Medline), you find that the authors obtained two structures: one of the wild type enzyme and the other of a mutant in which cysteine at position 260 was mutated to serine. Now if you go back and examine the two sequences, you'll find that one has cys at position 260 and the other has ser. Choose the wild-type sequence to save, and note its 4-letter code.

To get the structure of UDPGDH from S. pyogenes, go back to the NCBI site and click on Structure in the Search box. Enter the 4-letter code you found above or UDP-glucose dehydrogenase. Click on the 4-letter code you found above, then click on it again (after PDB:). Click on Download/Display File in the right hand column, then choose TEXT as the PDB file format, complete with coordinates. This will bring up the PDB file. Save it to your disk.

II.B. Get ClustalX

You'll need to align the amino acid sequences of the two UDPGDH proteins to determine which positions are constant and which differ, and the program ClustalX will do this. To get a copy of this program, go to the ClustalX site of the Institut de Genetique et de Biologie Moleculaire et Cellulaire.

II.C. Get Protein Explorer (and Chime)

Protein Explorer displays protein structures from coordinate files (PDB = Protein DataBase) and gives you considerable flexibility in how the structures are colored and otherwise displayed. The program is relatively easy program to use but demands: (a) that you use early versions of Netscape (4.7; not 6.0 or 7.0) or Explorer (5.5 or 6.0) and (b) that you have a plug-in called Chime, which is used to display chemical structures. So long as you've installed Chime, you can run ProteinExplorer off the web or download it to your own computer. Either way, go to the Protein Explorer site.

To test whether your brower is compatible and if you have Chime properly installed, scroll down to Find Any Molecule, and enter 1A3N into the window next to the Go button. When you click Go, if all is well, you should obtain a rotating image of human hemoglobin.

III. How to run the programs
III.A. Running ClustalX

Clustal asks for all the sequences to be aligned to be put in a single file in FastA format, using a one-letter code:

>Mesorhizobium loti UDP-glucose dehydrogenase
MNLTVFGIG...

>Streptococcus pyogenes UDP-glucose dehydrogenase
MKIAVAGSG...

You can get the sequences from the GenBank files you downloaded earlier. Clustal does not require that you rid the sequence of its numbers and embedded spaces, but it is good practice to do so (since you may use the file later for a program that is more particular). You can do the editing conveniently in Word, since it allows blocking out and deleting columns of text (i.e. where the numbers are). Save the file in text format.

To align the sequences, enter Clustal and click File, then Load Sequences. Find the file you just made. Once it is in, click Alignment and then Do Complete Alignment. Save the alignment by clicking File, then Save Sequences as. Choose NBIF/PIR as the format.

III.B. Running Protein Explorer

If running from a locally installed Protein Explorer, enter a compatible browser (e.g. Netscape 4.7) and click Open Page. Browse to find where you put Protein Explorer and open the file called PE Startup.html. Scroll down to near then end of the column entitled Find Any Molecule and click on Bare Protein Explorer. A black screen should come up on the right. On the upper left window, you'll find a way of browsing for files on your disk, or you can specify a PDB ID, for example the one for UDPGDH that you found in Section II.A. For now, however, enter the ID for human hemoglobin, 1A3N, and press Load. (You'll be asked to confirm that you really want to go all the way to the internet to get the sequence -- press OK -- and whether you want to keep the name of the sequence on a list -- why not?).

Now you have a protein spinning in space. Here are things you can do with Protein Explorer. First of all, click on Explore More with QuickViews to get to a menu-driven interface. From there you can click on Water to get rid of the extraneous balls. Then click on Select and then Chain A, and on Color and blue to color the first of hemoglobin's four chains blue. You can continue on with this, coloring each chain a different color, or you can accomplish the task all at once by clicking on Color and Chain (chime). Try selecting different classes of amino acids (e.g. hydrophobic or acidic) and coloring them different colors. Or you can select specific amino acid residues. Try this sequence:

Select All
Display ball+stick
[in command line type]Select Val1   (valine is the first amino acid of chains A and C)
Display Spacefill
This accentuates the first amino acids of chains A and C. There are many such games you can play.

IV.C. Running ProteinCompare.Tru

This program is designed to take as input:

It combines the information to produce a new PDB file, in which the amino acids of the protein whose structure is unknown replaces those of the known protein, according to the given alignment.

You will want to modify the program so that it takes an additional file, consisting of positions of mutations, and creates a PDB file where these positions are specially marked to enable you to visualize easily their positions on the three-dimensional structure.

We'll do all this Tuesday.