NIGMS Protein Structure Initiative (PSI)
Protein Production Workshop
March 7 - 8, 2002

    The National Institutes of Health (NIGMS) Protein Structure Initiative is part of an international effort in Structural Proteomics.  The aim of this effort is to provide one or more representative structures from each of the several thousand protein domain families in nature.  Driven by the rapidly expanding database of genomic sequences, the ultimate goals of this structural proteomic research are to understand evolutionary implications, and the biochemical and biophysical functions which are elucidated by three-dimensional structural information.  These protein structures, and the corresponding reagents and technologies generated in the course of the International Structural Proteomics Initiative, will provide a scientific and technological infrastructure that will have a broad and important impact on biomedical and biomolecular research well into the 21st century.

    It is generally recognized that a major challenge to high-throughput (HTP) protein structure analysis is the production of protein samples suitable for X-ray diffraction and/or solution NMR analysis.  Some experts have suggested that this is the only real barrier to high-throughput protein structure production.  In any case, the nascent protein structure initiative is already making important advances in the technology of protein production for structural biology.  The resulting technology will have broad value to the molecular bioscience community, and is certain to be one of the clear and valuable successes of the initiative.

    Protein production for structural proteomics includes aspects of target protein selection, cloning and recombinant expression, high-density fermentation, labeling with selenium and/or stable isotopes required for NMR studies, purification, analytical characterization,  and the process of organizing these results and reagents into databases and reagent libraries.  This NIGMS Protein Production Workshop brought together representatives from nine P50 Center Grant and two P01 Program Project Grants that have been funded by NIH as pilot projects in structural proteomics, together with representatives from some of the international efforts, to discuss high-throughput methods for cloning, production, and purification of proteins suitable for X-ray crystallography and NMR studies.  The primary goal of this workshop was to nurture the development of this technology by sharing ideas, problems, and progress among the experts representing these several projects.  In particular, the meeting was organized so as to encourage the establishment of contacts and collaborations among the participating groups.  Although critical to the overall process, the issues of target selection, crystallization, structural data collection, structure analysis, and database integration were not included in this workshop.

The following sections provide brief summaries of the talks presented.

J. Norvell, NIGMS Program Director:  Opening Remarks.
The Protein Structure Initiative (PSI) was initiated in 1999 for the development of a national support program for the field of structural genomics.  Following recommendations from several workshops, NIGMS announced this program and subsequently funded nine research centers as pilot projects.  Dr. Norvell opened the workshop with a summary of the overall goals of the PSI. It is a cooperative, large-scale effort for high-throughput determination of unique, non-redundant protein structures that will lead to an inventory of protein structures and a complete coverage of the protein structure space.  The targets of the PSI include proteins of both known and unknown functions.  Dr. Norvell pointed out that one of the important goals of this initiative is the creation of a large research network  and public resource.  He emphasized the importance of genome-directed target selection, target registration, timely deposition of coordinates and data release, and discussed various options for rapid electronic publications. He also introduced several issues related directly to the Protein Production Workshop, such as new technology, new high-throughput (HTP) methods, and the sharing of protocols, materials and samples.

Session I.  Invited Speakers 

D. Waugh, NCI, Frederick, MD:  Gateway production vectors and maltose-binding protein fusions.  Dr. Waugh described development of new Gateway production vectors based on maltose-binding protein (MBP) fusions.  MBP fusion appears to be superior to all other tested fusion systems (GST, TRX), because it provides efficient translation initiation, better stability and protection against proteolysis, increased solubility of target protein and an affinity tag to aid purification.  Interestingly, after cleavage of the target protein from the MBP fusion, some proteins are in an inactive form.  As a result, there is a need for evaluation of ěnativenessî of protein structure (using CD, NMR) after it is expressed.  After testing MBPs from several sources, the MBP from P. furiosus was observed to be most effective in solubilizing ěpassengerî proteins.  It was pointed out that expression of fusion proteins at lower temperature (25 oC) enhanced solubility.  Two of the best constructs evaluated to date consisted of MBP fusion proteins with a TEV cleavage site and a hexaHis (His6) affinity tag next to the protein target at either the N- or C-termini (MBP-attB-TEVsite-His6- passenger and MBP-attB-TEVsite-passenger-His6).  These vectors, when combined with Gateway cloning technology, provide a powerful system for high-level and high-throughput production of proteins in E. coli.

D. Stuart, Oxford University, Oxford, UK:  Large scale protein production for structural biology.     Dr Stuart heads one of the largest UK structural genomics efforts and presented plans for a newly constructed protein production facility. This effort is part of a larger consortium that includes synchrotron beamlines.  In addition to X-ray crystallography and NMR, cryo-EM will also be used for structural studies.  Cloning will utilize microarray and SAGE (Serial Analysis of Gene Expression) technologies as well as GFP (green fluorescent protein) fusions to track proteins.  Expression vectors with two affinity tags will be utilized to facilitate protein purification, including a three-residue tag with an epitope for a monoclonal antibody.  Because of the diverse set of targets, the facility plans to express proteins in bacteria, insect and mammalian cells.  The facilityís goal is to process 1000 clones/year. The resulting proteins will serve as a general resource for many projects.  All processes from bioinformatics to experimentation and data analysis will be integrated using a LIMS (NAUTILUS) and ORACLE database.  Targets include proteins from herpes viruses, proteins involved in immune cell function, Zn fingers and transcription factors, protein-DNA complexes, and human proteins involved with cancer.

S. Yokoyama, RIKEN Institute, Yokohama, Japan: Protein production and isotope enrichment in cell-free systems.   Dr. Yokoyama described how cell-free protein synthesis offers many important advantages including easy manipulation coupled with small volumes.  PCR-amplified linear DNA can be used as a template (nonspecific DNA is added to prevent degradation of PCR templates), gene transcription can be coupled with protein translation in E. coli extracts, and the process is easily automatable.  It is also possible to express integral membrane proteins in the presence of detergents, however, the choice of detergent is critical.  Dr. Y. Endo at Ehime University has developed a wheat-germ cell-free protein expression system (in which translation inhibitors were eliminated) suitable for eukaryotic proteins.  A number of small proteins from A. thaliana were expressed in this system with good success rate.  A new workshop is planned in Japan to disseminate this information to all interested researchers.   The E. coli cell-free expression system was used production of protein domains that were predicted using homology-based methods, as well as for labeling proteins with Cl-Tyr using suppressor tRNAs.  In one example, it was possible to go from protein production using Se-Met labeling in a cell-free expression system to structure determination in two weeks.  His tag removal , as well as small adjustments in predicted locations of domain boundaries, significantly improved NMR spectra. Scientists at Riken are also developing a mammalian expression system for mammalian proteins.

C. Arrowsmith Ontario Cancer Institute, Toronto, Canada:  Protein production and purification.  This institute is associated with two pilot centers in the U. S. (NESG and MCSG).  Data was presented on large-scale protein production and purification. This group has cloned over 2,000 genes and expressed and purified over 1,200 proteins from five organisms for structure determination.  Several standard protocols have been developed to improve the success rate and efficacy including vectors with a hexa-His tag cleavable with thrombin or TEV protease, use of BL21-Gold I (DE3) strain and 2xM9 media with ZnCl2, thiamine, and biotin for protein expression. Up to 36 cultures are grown in parallel at 37 oC with protein induction at 15 oC.  Purification uses Ni-NTA resin in batch mode allowing purification of up to18 proteins at a time. This rather ělow techî approach is highly successful.  Over 30 crystals and 14 NMR structures have been determined thus far.  Dr. Arrowsmith pointed out that it is necessary to use genomes of multiple organisms for target selection in order to cover protein fold space.  Results were presented suggesting that it is valuable to use both NMR and X-ray crystallography in these efforts; for example, out of 32 proteins with poor HSQC spectra, 7 crystallized.  On the other hand, the 3D structures of several proteins that could not be crystallized were suitable for structure determination by NMR methods.  It was pointed out that data mining of these large sets may reveal some important trends and facilitate better target selection.

D. Freemont Washington University, St. Louis, MO:  Protein production in baculovirus systems.  Dr. Freemont  (MCSG member) described some of the advantages of the baculovirus system for production of eukaryotic proteins: eukaryotic proteins are properly folded, disulfide bridges are correctly formed, prolines undergo correct isomerization, and many posttranslational modifications are handled properly including certain proteolytic processing and glycosylation.  On the other hand, protein expression in baculovirus is not really a high-throughput approach, viral stocks for expression need to be maintained, protein expression levels are low and often require elaborate optimization, and baculovirus expression is quite expensive.  Because the system is based on protein secretion, extracellular proteins are obvious targets.  Several protein expression strategies have been developed, including addition of various fusion tags, expression of proteins with signal sequences, and expression of heterodimeric proteins on a single vector using dual promoters.  Good progress has also been made in labeling proteins that are secreted with Se-Met for MAD experiments with over 98% incorporation.  Despite this progress and the unquestionable value of these baculovirus production methods, Freemont recommended that efforts are first made to express eukaryotic proteins in E. coli, and if this approach does not produce satisfactory results, baculovirus vectors should then be considered.

Session II.  Presentations from NIH Structural Genomics Centers

     The first set of presentations focused on protein sample requirements for high- throughput crystallography and NMR analysis.  A. Joachimiak (MCSG and Argonne Natl. Labs) described protein sample requirements for X-ray crystallography, particularly when using anomalous diffraction of selenium-labeled proteins.  For this approach, proteins should incorporate at least 1 ordered Se atom per 100 - 150 residues.  Crystallization generally requires folded, homogeneous protein samples at concentrations of 5 - 25 mg/ml.  Constructs should minimize flexible polypeptide segments or affinity tags where possible, as such flexibility can frustrate crystallization efforts.  Although absolute protein purity is not a requirement for crystallization, it is critical to choose methods for concentrating and storing protein samples that minimize aggregation.   T. Szyperski (SUNY, Buffalo) described protein sample preparation requirements for NMR studies.  In the context of structural proteomics, such studies are limited to proteins with molecular weights less than about 25 kDa.   Samples are generally prepared with uniform isotope enrichment with 15N, 13C, and (sometimes) 2H, putting special requirements on expression and fermentation systems.  Protein samples must be generated at 5 - 25 mg/ml concentrations, at pH < 7.5 (ideally at pH 6.5), and must exhibit good stability over several weeks with respect to chemical degradation, slow precipitation, and aggregation.  The NMR samples must be highly (> 97%) homogeneous.

Berkeley Structural Genomics Center (R. Kim, D. Busso, J. Jancarik).  BSGC focuses its efforts on the proteins from "minimal organisms," including Mycoplasma genitalium (479 orfs) and Mycoplasma pneumoniae (677 orfs), as well as homologues of these proteins.  The group has explored several approaches to cloning:  vector construction, including Gateway as well as TOPO cloning.  In this system, the group reports a high rate of PCR mutations, particularly within the primer regions.  In order to screen a large number of expression vectors, efforts were made to explore different approaches to extracting soluble protein from E. coli cells.  Significant numbers of false positives were encountered when using chemical reagents (e.g. BPerII, Pierce).  Usage of a Misonix 96-well sonicator was more effective for evaluating protein solubility.  Better results were generally obtained using no salt in these solublization studies.  Efforts were also made to compare in vivo and in vitro (cell-free) expression and solubility screening.  This screening can be done much quicker using the cell-free system and dot blots.  It was confirmed that MBP fusion improves solubility of passenger proteins and that cleavage of the fusion could be improved by adding six glycines between the TEV protease site and the target protein.  GFP as a reporter of solubility was found to be too sensitive a reporter to accurately correlate activity/solubility.  Finally, it was noted that initial light scattering results from crystallization screening data can give critical clues regarding which  additivies to use when attempting to make a protein sample more monodisperse.

Center for Eukaryotic Structural Genomics (B. Fox).  This projectís focus is on proteins from Arabidopsis.  A major challenge in eukaryotic structural proteomics is access to suitable cDNA reagents for cloning.  Some 51 cDNAs have been provided by collaborators.  However, these targets come from collaboratorsí unique interests, rather than from bioinformatic approaches.  For this reason, the group has undertaken the production of cDNAs by reverse-transcript PCR (RT-PCR), using mRNA from undifferentiated T87 cells.  Based on Gene Chip analysis, up to 80% of arabodopsis genes are expressed in these T87 lines.  Over 50% of some 705 targeted genes have been cloned and amplified to date.  Marked differences were observed in the ability of commercial polymerases to amplify in this system; the best results were obtained with ExTaq and Yieldbase  (1 - 2 errors / kbase).  The current effort utilizes restriction endonuclease / ligase vector construction using NdeI and BamHI sites, which are compatible with a large number of  Arabidopsis targets.  The group also described experience with "fed-batch labeling" high density fermentation and cell-free expression on wheat germ derived media.  15N enrichment was demonstrated using the wheat germ cell-free system.  During the first 5 months of the project, 441 cDNAs were generated using RT-PCR and 190 expression plasmids were produced.

Joint Center for Structural Genomics (S. Lesley).   Progress in high-throughput expression, purification, and crystallization of Thermatoga maritima proteins (1877 genes) was described.    Technologies being developed have a throughput of 10,000 clones/year using processes such as (1) the GNFermentor system with parallel 40 ml fermentations of many samples; (2) optimization of parameters for fermentation;  (3) optimization of automation for cell lysis; (4) robotic purification of proteins with hexaHis tags; and (5) secondary purification where needed using FPLC.  Robitic cell lysis includes a combination of lysozyme treatment, freeze/thaw, polymixin,  and sonication.   Progress was described using the arabinose promoter which exhibits much better control of basal protein production.  This provides better synchronization of protein induction, which is important for parallel fermentation processes.  Pichia and Bacculovirus protein production systems have been set up ­ though it was pointed out that these require large investments of time and resources.  The use of gene expression analysis in E. coli to monitor cell conditions characteristic of overproduction of misfolded proteins was described,  demonstrating that Genechip expression profiles look different for cells producing folded and misfolded proteins.   Overall, the group produces 96 - 192 proteins/week.  This is followed by robotic crystallization screens using 50 nanoliters of protein/drop.  Many of the 172 crystals that were sent to the SSRL synchrotron showed good diffraction.  Nearly all of these proteins had hexaHis affinity tags.

Midwest Center for Structural Genomics (A. Savchenko, F. Collart,  I. Dementieva, P. Laible).  The protein production pipeline, including different approaches to high-throughput gene cloning, protein expression and large-scale production, was discussed.  Standard protocols have been established and are being implemented at MCSG sites. The center has cloned over 1,000 targets from Bacillus subtilus, Escherichia coli, Haemophilus influenzae, Methanobacterium themoautotrophicum, and Thermatoga maritima.  The strategy employs a multiplex approach that includes parallel manual and automated strategies for the high-throughput generation of soluble expression clones.   Automation is being developed  using a ligation independent cloning (LIC) system in a 96 well format and assessing protein production level and solubility.  Thus far, over 300 protein targets have been moved through the protein production pipeline.  Expressed proteins are purified in parallel using a semi-automated method implementing IMAC and gel filtration.  Purification tags are removed by cleavage with TEV protease prior to crystallization screens. This step is important since it was observed that affinity tags could be detrimental to crystal growth and quality of diffraction. Proteins produced using this approach are of high quality and yield good quality crystals.  In addition, the MCSG approach offers a potentially significant increase in efficiency and speed.  This soluble protein pipeline is integrated with a membrane protein expression group that uses a Rhodobacter based expression system.  This organism has been engineered to provide coordinated synthesis of foreign membrane proteins with synthesis of a new membrane that provides a matrix for incorporation of the newly synthesized target proteins.  The system is being optimized with a set of 150 membrane targets from multiple organisms.

Northeast Structural Genomics Consortium (G. Montelione, M. Inouye, L. Ma).  The group described progress in producing proteins targeted from eukaryotic protein domain families.  Some 2100 proteins have been targeted from eukaryotic or prokaryotic ěreagent genomes.î  Each of these is a representative from a large domain family (ěRost clustersî) that includes at least one representative from the proteome of a multicellular organism.  Multiple members of each family are then targeted from one or several of the reagent genomes.  Cloning efforts focus on a modified version of the pET production system (a ěmultiplex vectorî system) providing a set of different constructs from a single PCR product. The process has been implemented in a 96-well format using a Qiabot 8000 robot.  Between the Rutgers and Toronto nodes, approximately 800 of these targets have been cloned and screened for expression and solubility.  About 40% provide good expression and solubility, and some 220 have been scaled up and purified at a level of tens of milligrams for crystallization screening and NMR studies. Shortcomings of the Gateway vector system were discussed.  Prof. M. Inouye also described progress with a novel ěcold shock vector,î using the E. coli cold shock promoter, to selectively express and isotope-enrich protein targets.  Progress was described in generating large numbers of cDNAs for eukaryotic targets by RT-PCR, and in stabilizing linear DNA templates for cell-free protein expression screening.

New York Structural Genomics Consortium (S. Burley).  The NYSGRC has recently merged its efforts with Structural GenomiX, Inc. (SGX), a structural genomics company.  Target selection is carried out by researchers at Rockefeller University and protein production and crystallization will be done largely at SGX.  This relationship brings significant commercial resources to the effort.  Cloning efforts focus on ligation-independent topoisomerase cloning with affinity tagged proteins using vectors developed by Chris Limaís lab at Cornell.  High efficiencies of tag cleavage is observed using ěpolioviral proteaseî instead of the TEV protease cleavage site.  The NYSGRC group has also developed a double-tagged production system combining Smit 3 and His6 affinity tags that provide for rapid purification.  These cloning technologies have been implemented in a 96-well format on a Qiabot 8000 robot.  The BL21(pLysIce) strain was described as providing more efficient cell lysis and solubilization.  By taking a multigenomic approach (multiple targets from each family), soluble proteins have been obtained for ~ 80% of targets.  Mutagenesis and intragenic shuffling of GFP fusion proteins for improving solubility was explored with mixed success.  Dr. Burley pointed out that the development of a LIMS system is very resource intensive, and perhaps is best done by a commercial entity.  An integrated storage system for images generated in robotic screening has the capacity to archive some 1,000,000 images and to evaluate up to 100,000 crystallization trials per day.  A dedicated beam line, SGX-CAT, has been constructed at the Advanced Photon Light Source.

Session III.  Presentations from NIH Structural Genomics Centers 

Southeast Collaboratory for Structural Genomics  (M. Adams, M. Luo and H. Dailey).   This center is involved in production of proteins from P. furiosus, C. elegans and human genes.  The group has cloned 1465 ORFs, expressed 242 proteins, purified 200 proteins, and obtained 24 crystals from P. furiosus proteins. ICP-MS was used to determine the metal ion contents of protein samples.  The Gateway system has been chosen for C. elegans genes and an ELISA-based assay is being to determine small scale expression/solubility level of target proteins.  The group has cloned 1130 genes, expressed 369 proteins, purified 32 proteins, and crystallized 5 C. elegans proteins.   The pTrcHis vector has been selected for cloning human proteins.

Structure 2 Function Pilot Project at CARB / TIGR (O. Herzberg).  Statistics were presented on the performance of various vectors and the influence of tags on crystallization efficiency and the quality of crystals. Genes from Haemophilus influenzae were cloned in several different expression systems including an intein system, T7 promoters, Gateway, and Directional Topo.  It was found that native proteins expressed without a hexaHis tag were more likely form diffraction quality crystals (12 structures / 29 native proteins) than proteins produced with hexaHis tags which were then removed by proteolysis  (8 structures / 41 proteins).  The overall success rate for obtaining a structure was almost twice as high with native proteins produced without hexaHis tags (41%) than proteins in which the tags had been removed  (25%).

Structural genomics of integral membrane proteins (R. Nakamoto). This project focuses on technology development for structural studies of integral membrane proteins.   A M. tuberculosis (Mtb) membrane protein expression library is being constructed.  The group will use 2D X-ray crystallography and electron microscopy, solid and solution state NMR, and 3D X-ray crystallography.   Some 1160 potential membrane proteins of Mtb have been identified.  These membrane proteins have been divided into target groups based upon the predicted number of transmembrane alpha-helices. The Gateway system will be used for expression.  For membrane proteins, expression should be slow, thus minimal media, lower copy number plasmids and lower temperatures will be used.  One major bottleneck is this membrane protein sample production pipeline is isolation of membrane fractions by high speed ultracentrifugation.

Structural Genomics of Pathogenic Protozoa Consortium (M. Dumont, C. Mehlin). 
This consortium has two protein expression centers, one in Seattle, WA and a second in Rochester, NY.  Both groups will work on soluble protozoan proteins, and each will be taking on a set of uniquely challenging targets.  The Seattle group will produce proteins from P. falciparum, an organism with an extremely AT-rich genome and ill-defined gene start and stop points.  The Rochester node will focus on membrane protein production.  The Seattle protein expression group tentatively plans to use a directional Topo cloning approach.  As the conventional Topo vectors (Invitrogen) put large, hydrophobic amino acids at the N-terminus of the expressed protein, the Seattle group outlined a custom vector which avoids this problem by inverting the direction of the insert into the vector.  The Rochester expression group will explore ligation-independent cloning technology because of its speed, low background, directionality, and low cost.  They also have extensive experience and interest in using Pichia pastoris as a host system for expressing membrane proteins.  This system allows for rapid cloning, post-translational modification, it is inexpensive and has been used for expression of membrane proteins.  Both centers plan to use a hexaHis affinity tag system.

TB Structural Genomics Consortium (M. Park, C. Kim).  A large scale Mycobacterium tuberculosis protein production platform using a 96 format cloning strategy with simple modular liquid handling systems such as Hydra 96, a vacuum manifold system and plate sonicator for 96 well microtiter plates was presented.  The group reported a GFP-superfolder vector for screening positive expression clones as well as the usage of "Terrific Broth" fermentation to obtain high bacterial cell density.  GFP was also used as a reporter for cell disruption.  The use of a Perfusion chromatography platform for rapid screening of conditions for ion exchange chromatography and development of a low to medium cost parallel gel filtration chromatography system were also described.

NIGMS Protein Structure Initiative Materials Repository (C. Lewis).  A plan to develop an NIGMS Materials Repository that will store and distribute materials that are generated by PSI Centers was presented.  A tentative schedule has been proposed to award a contract by August, 2003.  The NIGMS Human Genetic Cell Repository was presented as a model and advice was requested on how to set up the Contract.  The areas
NIGMS seeks input on are to define scope and need, materials to be stored, quality control, efficient data tracking and storage system, and a list of potential contractors.

Session IV.  General Discussion

An informal discussion took place in which the following were proposed:

1. A web based bulletin board or chat box devoted to protein production so that scientists can share information was recommended.

2. A full issue in the Journal of Structural and Functional Genomics devoted to collected papers from this meeting will be recommended to the editors of the journal.

3. Participants liked the format of this meeting and would like to see it repeated next year.  There was discussion as to whether biotechnology vendors should be included.

These points were brought up during the open discussion and are the comments from individual scientists:
1. Statistics (as opposed to anecdotal statements):  A large data set is being generated and it should be mined to extract significant trends or procedures.  A controlled vocabulary for this purpose needs to be developed.

2. Cloning:  The Gateway system leaves several extra amino acids on a protein.  The Intein system has not produced good results so far.  Baculovirus expression can be good but the lines are not stable.  The Drosophila system does produce stable lines but the yields are low.

3. cDNA libraries:  These libraries have been found upon propagation to undergo significant degradation.  The original cDNA library was found to be the most reliable.  It was suggested that NIH could facilitate these efforts by assisting researchers in accessing human tissues for generation of cDNA libraries and cDNA clones.

4. Protein solubility:  It has been found that growing cells at 15 oC, 18 oC, or 30 oC slows down protein production and thus often increases solubility of the proteins.  The T7 Promoter is a very strong one, and rapid production of protein at higher incubation temperatures may lead to protein aggregation.  For insoluble proteins people have reported some success by completely denaturing the protein in 6 M urea, 20 mM DTT, followed by dialysis in the presence of oxidized / reduced glutathione, arginine, and NaCl.  Where possible, determination of whether a target protein requires cofactors or metals would be very important.  Trace metals can be added to the media and this may enhance solubility of the protein.

5. DNA sequencing: Different error rates have been reported from the different centers.  One center sequences the DNA of the gene prior to sending purified protein to another facility for crystallization.  The protein is also sent for mass spec analysis.  Another center only does DNA sequencing of the proteins that have been crystallized.

6. High density fermentation:  Fermentors are of great benefit.  It was found that lactose can be used as an inducer when using BL21 cells (glucose will cause suppression).

7. Storage of proteins:  Store protein in two separate freezers either with or without glycerol.  If a protein is being shipped to another site, it was suggested that it be attached to the IMAC resin and shipped that way.

8. Recordkeeping:  It would be advantageous to have an controlled vocabulary and common LIMS between the centers.

Final Comments: This NIGMS Protein Production Workshop was of one of the first Protein Structure Initiative meetings to address a very specific subject identified as a significant bottleneck in the implementation of key program goals.  It is evident that structure determination of proteins targeted in structural genomics can be successful only if there is a continuous supply of high purity protein samples in milligram quantities. Requirements for X-ray crystallography and NMR are somewhat different but both methods require high quality protein samples. Moreover, this is a high-impact field since protein production goes far beyond the goals of structural genomics and addresses the needs of functional proteomics, biology and biotechnology.

    Indisputably, this workshop accomplished the several  goals that were put forward prior to the meeting.  In particular, the workshop  (1) focused on issues associated with protein expression and production, (2) highlighted the state-of-the-art of large-scale high-throughput protein production, (3) examined different approaches taken by pilot projects and reported their successes and failures, (4) showed new directions that should be pursued for successful implementation of pilot projectsí goals, (5) collected researchers that are working on protein production at all pilot centers, and potentially created new personal interactions, and (6) facilitated open and quite generous interactions between  researchers,  including data sharing and exchange of procedures. All these are key factors in intensifying progress at the structural genomics pilot centers.  In fact, workshop participants emphasized that this format provided an excellent platform for openness and exchange of information.

    Although, most participants agreed that this was an excellent choice of format for the first Protein Production Workshop, a number of contributors suggested increasing the size of future meetings. The main reason is considerable public interest that far exceeds current workshop format.  Therefore, this expansion should be seriously considered for future workshops.  Contributions from relevant industrial technology platforms should also be considered.

    One of the key ideas behind establishing several pilot centers was to undertake and test different approaches to structure determination.  The structural genomics centers are following quite diverse approaches to gene cloning, protein expression and production. Some of the approaches appear more successful than others. Therefore, it is important to present data and exchange ideas between centers on a regular basis and to discuss future directions and avoid overlaps.  Participants of the Protein Production Workshop strongly suggested, that the next Protein Production Workshop should be scheduled in early spring of 2003 and its format should be expanded.