1st EMSL Workshop on Structural Genomics
June 25 - 26, 1999
Richland, WA





June 25th
 
 

Teresa Fryberger (PNL)
 
 
- Overview of Pacific Northwest Labs. (PNL)

- Time available on instruments at no cost (in general) to scientific community
 
 

website: www.emsl.pnl.gov

Paul D. Ellis

NO NOTES

Charles Edmonds (DOE)
 
 

Goals:
- Specifying the relationships of sequence, structure, and function

- Understanding the complexity of living systems

Genomes -> Gene -> Structure/ -> Pathways/ -> Population -> Ecosystems

Products Function Physiology & Evolution
 
 

Showed flowchart of structural genomics process and called it a "historical document"
 
 

Structural biology at DOE in "Zero Sum" wherefore Structural Genomics?
 
 

- Future of Structural Genomics at DOE would be a component of a broader program in structural biology. Message - time for lab directors to make direct appeals to DOE to generate support for an expanded program in structural biology. Failing such a move, funding for structural biology by DOE would be the same in the future as it is now.

- The principle sponsor of DOE would be willing to support "computational aspects" of structural genomics. Proposal deadline of Oct. 9 - should contact appropriate DOE officials.

John Norvell (NIH)
 

SEE HANDOUT

- NIH encouraged at early stage by letter from John Moult and another NIST official

- NIH supports both crystallography and NMR
 
 

Crystallography

primary: 225 grants $51M total costs

secondary: 400 grants $96M total costs

NIGMS ~50%

NMR

500 grants: $85M total costs

NIGMS is 60% of total

Other funding for infrastructure, NCRR, etc.:

~$30M crystallography

~$30M NMR

Summarized 3 NIH Workshops: reports of all three are on the NIH website

April 1998  NIH Workshop:
Conclusions - project is feasible and worthwhile

November 1998 - contradiction of discussions at

- Avalon meeting on organizing national meeting

- Advise to aim pilot projects on a larger scale - research centers - which NIGMS has not done much in past

February 1999

- Focused on target selection, progress on pilot projects

- Discussion of collaboration among groups; developed WEB sites

Research Center Grants: P50
- pilots for subsequent integrated, large scale research networks

- test strategies for high throughput

- must contain all constituant tasks of structural genomics - cost saving, efficiency, methodology development

Constituent components

- Family classification and targeted selections

- Generation of samples,

- Sample preparation; encourage innovative projects for membrane proteins

- Structure determination

Access to state-of-the-art synchrotron and/or NMR facilities

- Analysis / dissemination of results

- deposition of coordinates

- target selection

- methodology, high throughput efficiency, cost analysis

- business-like approach to this

- management / administration are crucial

- Special requirements

- Intellectual property available to the public

- Annual meetings / website

- Sharing materials, sharing protein samples

- External SAB

- Research training may not be appropriate and must be justified

- Integrated, coordinated, interdependent subprojects

- Collaboration/ Consortia encouraged

- Foreign connections allowed

Review and Award

letter October 15, 1999; Applications: Feb 15, 2000

Reviews: April - June 2000

Awards September 2000

NIGMS will consider 3 - 6 centers

Ends discussion of RFA for Research Centers. A one-time offering. May be repeated, but not planned to be repeated.
 
 
 
 

Program Projects (PA-99-116)

- Technology and Methodology Development - support for one or more of these tasks. Can focus on one task (e.g. target selection; crystallization)

- NIGMS, NCRR, NIEHS

- R01's and P01's

Aimed at methodology and technology underpinning structural genomics.

SBIR Grants

- Constituent tasks of structural genomics

- Partnerships between small business and centers

- Connections with small business greatly encouraged

See website info with these notes

Abraham Levy (NIH - Research Resources)
 
 

RR supports infrastructure for NIH-wide research efforts
 
 

Biomedical Technology Program - supports research to discover, create, develop and disseminate innovative technologies for a broad spectrum of research activities.
 
 

P41 - Resource Center Grant

S10 - Shared Instrumentation Grant (SIG)

R01 - Research Project Grant

R21 - Innovative (High Risk) Research Grant - like phase I SBIR - does not need preliminary data - Exploring grant * good for RNA grant ~$75 K/year for 2 years

R43 - SBIR - mandated by congress set aside

R41 - STTR
 
 

FY99

SIG = $38 M

Centers = $67 M (Mandated by congress: 2.5% SBIR, 0.3% STTR)

R21 & R01 = $16 M

TOTAL = $121 M

Biomedical Technology RR Centers

~63 around country
- structural / functional biology

- drug design

- computational / molecular

- imaging from molecule to organism

Five principle functions
- R & D

- Collaborative research

- Service

- Dissemination

Biomedical Technology Resources - RR Centers

Features:

- at cutting edge

- high risk / high payoff

- often one-of-a-kind, scarce, expensive

- accessible to research needs of a significant part of biomedical research

community

- cost-saving, efficient, shared

- multidisciplinary and collaborative

- cannot be done on a R01 - it can be done by R01 it cannot be funded.

Difficult to do without a center-type environment.

New concept for centers - virtual laboratories / collaborators

"Laboratory working together and apart"
 
 

Stanford - complete crystallographic experiment over the web

SIG (shared instrument)

- shared by at least 3 PI's (typically ~15) For NMR ~10 users/instruments

- MOO with NSF for >$500K

- now funding level is very high at the moment - in '92-'95 it was very low

- most $ for NMR and NMR imaging

R01/R21

- new technologies or instruments

- broad applications to biomedical community

- significant improvements

R21

- test ? technologies

SBIR/STTR

- not enough good applications to spend the mandated set asides

- limit ~$100K Phase I

~$750K Phase II

- two new announcements

- high-throughput synchrotron detectors - increase $$

- developing tools for high field NMR

- can you go straight to Phase II?

J. Markley pointed out that there is no funding mechanism for high field NMT. Plans are being made to provide such mechanisms. How much $$ for structure determination. - The cost effectiveness must be addressed.
 
 

Bill Studier
 
 

I prefer to call this a broad "Human Proteome Project". Need genome scale approaches
- chip technologies

- structural genomics

Need informatics to manage and access data
 
 

Key goal - understand functions of human proteins
 
 

Determine ~10,000 well-chosen structures to produce a catalog of protein folds.
 
 

Parallels to Human Genome Project

- large scale cooperative effort

- model organisms

- production centers

- technology is approaching feasibility

- leadership roles for both DOE and NIH (though DOE funding not

allowing it to take as strong a role as it would like to)

Nice slide of each step at process. Once crystals are available - can be very quick
 
 

Potentially one protein structure per day
 
 

Bottlenecks:

- target selection

- get from purified protein to crystals

- going from X-ray data to refined structure (~2mos)

Develop web-based proteome database

- harvest information from multiple databases

Have initiated pilot project - progress on web: proteome.bnl.gov
 
 

Target selection:

- Yeast proteins as initial targets
- eukaryotes, few introns

- many human homologies

- active research community

- Family / function information
- evolutionary distribution

- human homologue

- disease, cancer
- functional information
- signaling pathway, stress response, repair
- Likely to be soluble

Expressing full length proteins
 
 

Using ProDom database. Showed a bunch of ProDom outputs.
 
 

Developing db similar to our plan, #AA, #Met annotation,

- currently in an Excel format, 17% of this protein target has homology with something in PDB. Have other groups selected - T4-very similar to T project. Is there human, C. elegans, bacterial, archbacterial homologue.
OTH - what other groups are doing
 
 

More on BNL website

Also on web - 116 targets that have been selected. For 18 original targets - some give poor solubility - one is subunit of complex, one transmembrane, bottleneck in going from microcrystals to structures - two structures have been solved. May do proteolysis and mass spec on those that did not provide good crystals. Developing 96-well processes to go from clone to solubility analysis. Now have 116 targets so you have more targets for developing 96-well format.
Then - described 1st 3D structure. Unknown function initially. Turned out to be TIM based. This is an enzyme. Looks like another protein - Ala racemase. Initial biochemical analysis shows it does have racemase activity. Sali did not predict - second try Adam Godzik had predicted this structure as exercise for class - had found that this would bind pyrodexyl phosphate. Godzik had submitted this prediction before the pyridoxyl phosphate was observable in the crystal structure.

Andrzej Joachimiak (Argonne Nat'l Labs)
 
 

NMR and crystallography are very complementary - ~20% of structures determined by NMR; but you can go much faster if you have crystals.
 
 

Credit must go to DOE for building synchrotrons and funding beam lines.
 
 

Hardware/software must be integrated. Not yet fully done.
 
 

Advantages/Disadvantages of MAP

- no errors from non-isomorphism

- all measurements from single crystal

- rapid data collection

- signal is very weak - requires very accurate measurements

- requires specialized synchrotron beam

- crystal freezing protocol must be established

T. thermophilis Hsp60 Peptide-hmb domain

- ~120 AA; structure known in advance

- 23 min to collect all data

- 45 min total collection time

- data at 2.3 Å

- automatic tracing using wARP. Traced 93% of backbone, 23% of

sidechain

DNA decamer w/ Br.
- structures not known in advance.

- refined at 1.2 Å

- did not use wARP, electron density analyzed in 8 hours after data

collection

Showed list of proteins with full MAD data collection in min to a few hours.

Projected data collection rates:

10 -15 data sets / day; >1000 data sets / year

current records 31 data sets / day; 37 data sets / 3 day

Ultrafast MAD Data Collection is a Reality.
 
 

Magpie Project

- 141 targets

- collaboration with Arrowsmith group

- collaboration with Israel Womm - looks for unique fold

1000 residue Cyanase

-total time of experiment - 150 min.

- calc. phases at 2.4 Å

- very high quality electron density

- apply wARP. Largest trace solved using wARP. 12000 atoms. 90

refinement cycles. No sequence homology. N-terminal domain

homology to DNA-binding proteins: C-terminal domain has new

fold. - solved decamer - see oxylate, deduce mechanism.

Bottlenecks:
- sub-optimal data collections

- need automatic analysis of MAP phase data.

- wARP need high resolution data

Todd Yeates (UCLA)
 
 

Pyrobaculum aerophilium consortium
- UCLA

- Los Alamos

- UC - Berkeley

P. aerophilium

- easier to grow than other hyperthermophiles

- complete sequence available (done by J. Miller at UCSD)

- ~50% of ORFs are unique to this genome

- ~10 crystals from project; ~3 completed structures

- very many steps - essentially every step is a bottleneck; a lot of work is needed on every step except data collection.

ycac_celeg f35G2.Z

ycac_ecoli 1yac.

Dali identifies 1nba as structural homologue.
Looked at substrates of homologues identified by comparative genomics. Made query motifs for analysis with DOCK. Have hypothesis on molecule that might bind; but no validated ligands or substrates.
 
 

Thermophiles do not like SeMet.

Comment by Wim Hol - is there a consensus of going for thermophiles (Jmart) or mesophiles (or eukaryotes) - which may be much tougher. Some feeling that thermophiles are "smarter"
 
 

~50% of attempts give expression.

Paul Bash (Genome-Directed Structural Biology)

 

Process of structural genomics is a big engineering project - not appropriate to a university effort.
 
 

Parse genome into families of proteins - homologues with similar structures.
 
 

Identify a "basis set" of structures.
 
 

10,000 families of 25 - 35% pair-wise similarity
 
 

Claims it was his idea to do "one structure from each sequence family"
 
 

Somehow this is really "evolutionary genomics".
 
 

Identity non-trivial structure-function relationships.
 
 

Talked about COGs idea - proteins universal to life
 
 

860 COGs
-> remove membrane proteins
741 COGs
known folds<- blast against SCOP to remove known folds
552 COGs
200 more folds<- Psi blast to pull out more SCOR representation
328 COGs
-> Koonin filter-likely to be folds
206 COGs
-> removal of "uninteresting" COGs
155 COGs
-> ??
65 COGs final list
Use whatever method works best -
NMR or X-ray

Clone - by - phone

Clone- by-phone

Contacted 100 investigators
48 COGs - structure work in progress

35 constraints committed to Bank (mostly E. coli)

11 structures already determined

ATG - company to make expression constraints. Juli Kalihara

Use bacterial genes (E. coli) with a His tag
[193, 527, 152, 315, 166, 176, 430, 231, 710] COGs with known structure
 
 

Xray Structures 424, 854, 858, 353, 557, 566, 009 - targets Bash is working on

566, 009, 023 (NMR Structure), 251 - other three are isotypically labeled
 
 

424-Mat-nice crystals

solved structures using MAD

dimer in XTAL, from light scattering appears to be dimer in solution

Said he went to P & G
 
 

Michael Kennedy
 
 

Follows Bash
 
 

COGs 009

023

251 (all have known function)

566

109-229 AA

Deinococcus radiodurans

mcrA-like protein - 119 AA ->with Eugene Koonin, Ken Minton

LEA14 - 160 AA

HRD domain - 76 AA

All expression constraints generated by ATG Laboratories
 
 

Determined structures of COG 023 from E. coli

III AA with hexaHis tag

~120 mg/L expression soluble at ~3 - 4mM

Stable at 25° C, beautiful HSQC spectra

Appears to be new superfamily based on CATH or SCOP. Dali picks up other hits.
 
 

Function of this protein YciH is in translation initializing. Has one face of molecule that is positively charged. But could learn a lot about biophysical function even though the function was annotated in the COG database.
 
 

Homologous to IFI (human) - structure determined by Gerhard Wagner. - mutate data maps active site.
 
 

Painting conserved residues onto YciH allows identification of the same active site.
 
 

15N-1H NOE measurements provide information bout flexibility and function.
 
 

Total time for data collection: 4.5 weeks
 
 

6 weeks from clone to 3D fold.
 
 

COG 009 - 176 AA- gives beautiful spectra;
 
 

XJGF - nice HSQC || success seems to be very high because they

MrcA - nice HSQC || are using full length proteins

Rob Clubb (UCLA)
 
 

80 - 100 K sequences of human genes

~60,000 will be soluble (25% are membrane proteins)

~42,000 will not be homologous to any proton of known structure (~75%)

~21,000 will have domains <150 amino acids (~50%)

only ~ 10,000 may constitute unique structures

only ~500 unique NMR structures in PDB; ~10% from Wüthrich's lab
 
 

NMR can be used to very rapidly evaluate suitability for structure analysis

Primary bottlenecks - sample preparation; data analysis

Data collection - 45 days

Assignments - 21 days

Analysis of NOESY - 100 days

Final Refinement ~10 days

4 - 6 MONTHS

Suppose we have steady-state supply of sample.
 
 

~38 days of data collection required to get lots of data.

38 - 45 days of data collection
8 - 10 structures / year
 
 

Estimate $1 billion for 20,000 structures
 
 

1 X 109 / 2 X 104= 5 X 105 / structures
 
 

No assumptions regarding speed up due to automation.
 
 

Manual analysis - required ~95 days to analyze NOESY spectra.
 
 

Can get good structures with less NOEs using N-H and C-H residual dipolar couplings.
 
 

Magnets may have lifetimes of 20 -30 years. Affects how you ? Could collect data on two samples simultaneously.

David Wemmer
 
 

Advantages
- work at higher temperatures

- ease of purification

- good stability

- several genomes available

Disadvantages:

- degree of homology

- codon usage for expression

- no guarantee of good properties

- possible conformational equilibria - is structure at low temperature biologically relevant

Chose a sample which did not crystallized from Sing Ho Kim's project. MJ307, MJ1469. Both grew on minimal media.
 
 

Using trp leader sequence to force poisonous proteins into inclusion bodies.

LW60 / LW20 = 0.41 | big improvement

LW60 / LW40 = 0.67 | due to change in

LW80 / LW40 = 0.47 | viscosity of water
 
 

ID spectra of 20 kD protein look like 5 kD protein at 60° C.
 
 

At pH 6.3 amide resonances of backbone are not lost at 60°. Side chain NH2's do drop out.
 
 

Routine analysis of effective molecular weight:

- mass spec for chain mw

- size exclusion chromatography (monomer, dimer)

- ultracentrifugation (more precise, distribution)

- light scattering

Using CBCANH, CBCA(CO)NH, CCCONH-TOCSY,

GARANT 1st pass > 40% assignments

In ~1 week had 83% complete assignments

Using C[C]CONH-TOCSY to fill out sidechain assignments. Can define these experiments for GARANT; optimizes using genetic algorithm.
 
 

Key issue - how do you optimize conditions for NMR data collection, how can you systemize this. e.g. lowering conc. made sample behave much better.
 
 

For sample that was not stable - prepare matched samples and collect quick 3D TR experiments on each sample.
 
 

For one thermopile - see two comformers at low temp.
 
 

Thermophile have very useful properties for a structural genomics effort.
 
 

One of the limiting issues in using these programs is that they are very poorly documented.
 
 

--Key problem for people to use AutoAssign - need more complete documentation. Wemmer's lab had a Wüthrich student who brought expertise with GARANT to lab.
 
 

Strategene has just announced a slew of tRNA plasmids - these can be used to supplement tRNA's at low natural abundance in these thermophiles.
 
 

Used GARANT / Dyana combination for automated analysis of NOESY spectra.

Wim Hol (Protein Crystallography in Drug Design)
 
 

Need to solve all human proteins; and proteins in pathogens not represented in humans.
 
 

Need 106 to 107 structure determinations over the next 50 years. Complexes. Structural genomes of HIV not even done yet.
 
 

It is as important to do chemical genomics as structural genomics in parallel

e.g. adding tight binders can make crystallization easier

GM - should think about using CDs to screen ligand binding on affinity resins.
 
 

Plasmodium falcipasum (malari) -

- very difficult target for structural genomics

- 80% AT content

- 'warts' - apparently random insertions of hydrophilic peptide sequences

John Moult

NO NOTES

Shigeyuki Yokoyama

NO NOTES

June 26th
 
 

John Markley
For small biomolecules NMR is competition with x-ray in terms of speed of structure determination
 
 

Data collection times (now 3 - 4 weeks) can be shortened (~3 days) by employing multiple spectrometer and/or cryoprobes.
 
 

Advantages over crystallography: disordered regions can be observed, multiple conformations can be studied
 
 

Brazzein - sweet tasting protein

Caldwell (1998) Nature Structural Biology 5: 427
 
 

Insect Cytokine Peptide - ~40 residues

4 - 6 weeks for 3D structure determination

Shown example of plastocyanin from cyanobacterium - structure determined by novice setting help from the lab.
 
 

Should be thinking about how to alter solvent viscosity to improve spectral quality.
 
 

Design principles

- use computers / robotics where possible

- trade efficiency of protein production against expensive labeling strategy

- minimize the number of labeled samples

- build in validation of each step - minimize wasted resources

- software should be modular, use commercial components where possible

- use standards for data representation

Use combination of crystallography and NMR where possible
 
 

Using H/D fractional occupancy to estimate strength of H-bond
 
 

Installed 720 MHz or higher - ask Markley for these numbers
 
 

STAR (Self-defining Test Archive and Retrieval) 0 good format for relational database

XML (eXtensible Markup Language) - should ask Hunter to look into this
 
 

Conversions of STAR to XML

StarDom, software package for this conversion. Linge, Nilges & Ehrlich
 
 

SESEME - NMR project database
 
 

Also, scheduling database
 
 

Ouroborus Applet

catalogs expression vectors

database can be queried

Software for concerting PDB restraint files (DIANA, XPLOR) into NMRSTAR format has been developed and will soon be available

Cheryl Arrowsmith
Collaboration with Aled Edwards
 
 

Structural Genomics - should be part of a holistic approach to functional genomics - share reagents to develop bigger picture
 
 

Microarray facility (Gene Chips) at Toronto

- have yeast array

- building a human array

Primary bottleneck - generation of a sample suitable for data collection
 
 

Goals of Pilot Project

- Develop high throughput (http) cloning

- Feasibility of getting good spectra

- etc.

M. thermoautotrophicium 1855 genes MT

- genomic DNA available

- GC-rich. PCR is very reproducible

- no introns

- archea - many eukaryotes featured

Thermophile
- easy to purify

- thermostable

~400 genes clones (60% big, 40% small)

- current cloning rate - 20 clones/wk/person

- pE15b (His6 - thrombin cleavage site)

- BL21PE3 "Gold" cells (Stratagene)

- PCR screen for positive clones

375 genes tested for protein expression

- "magic plasmid for rare tRNAs

- screen total lysate for expression

In general - removing His tag did not make much difference. They removed His tags for about 1/2 of samples
 
 

She hands her protein out to find person to work on structures of "unknown" proteins. All proteins being evaluated by CD and thermal melts to characterize thermal stability. 6X parallel purification of proteins for NMR screening

Use timer to start shaking at 5 AM
 
 

For proteins that aggregate, remove His-tag. But only helps in 1 of 10 samples.
 
 

Strategy for Small Proteins

cloned 150 - 15 good HSQC
 
 

express 120 - 4 promising
 
 

tested for solubility 100 - 9 aggregated, 4 unfolded

(in soluble fraction of cell lysate)
 
 

soluble 82 - 12 too dilute
 
 

attempted purification 67 -> 44 NMR samples evaluated

66%

No gel filtration step.
 
 

NMR structures underway

MT1699

MT1048

MT0040 - L McIntosh

Strategy for Large Proteins

Soluble - Crystal trials - Crystal - SeMet protein - Structure
 
 
- Limited Proteolysis - Stable Domains - Reclone
Insoluble
- Protein - Protein Interaction

Progress with Large Proteins (x 20 kD)

cloned - 225
purified 24
express 150
crystals 12
tested for 100 solubility
 
 

soluble 38

Have not done a lot of informatics ahead of time - just check for integral membrane, it is in PDB. First crystal structure - TIM Barrel (Amyl Pye Lab) Can do Western to His6 tag
 
 

not doing heat step - except in preparing samples for crystallization. For small proteins -- lose a few when heated
 
 

Sykes Group - Cheryl now works with Brian Sykes to develop NMR view. interact

- NMR view interface for backbone assignments

- avoid blackbox concept, ultimate assignments left to user

- interaction assignment tool

Direction

- Automation, Robotics of cloning / expression

- International coordination

- Dedicated protein production facilities

- Improved methods for http NMR / crystallography

- Dedicated NMR/beamline facilities

Note: can use Zn-affinity resins in place of Ni for Hex-His purification

Stanley Opella
 
 

30 % of genes in M genitalium are membrane proteins

Vast majority of membrane proteins done are 1 - 3 TM domains

Arkin et al. 1997 Proteins: Structure, Function, Genetics, 28: 465
 
 

8 small membrane proteins in the PDB. Use by Xray - 7 by NMR (25 - 122 AA chains) Five structures in micelles from Opella's lab.
 
 

Now that we can get good results of peaks using PISEMA - need to develop methods for determining resonance assignments - using 2D 15N-15N spin exchange to make these assignments. 3D 13C shift / 1H-15N coupling / 15N shift expand

can get i -> i + 1 dipolar couplings

Cai -> 15Ni+1 - working with low efficiency

merP - family (includes Merkes) bind Hg, Cu, Cd, Zn
 
 

Chemical shift changes upon Hg+2 binding localized to binding loop

Binding constants in mM ranges. Two Cys ligands. Also binds Ni, Ag.

Ann McDermott

 

Backbone assignments

Heteronuclear transfers
S. Strauss, T. Bremi, R. Ernst, 1998
C-N, N-C -> Cia - Ni

C-C-N -> Ci-1a - Ci-1 - Ni

N-C-C -> Ni - Ci-1 - Ci-1a; Ni - Cia - Ci

Sidechain assignments

C-C Melodrama

C-C RFDR

C-C - RFDR

C-N - heteronuclear transfer

N-C-C - correlation

(key types of experiments that she is using)
 
 

13C - 13C correlation

at 400 MHz - poor transfer ~ 2 mg

300 MHz - weak signal ~ 1 mg

800 MHz - spectra start to look good

She is dealing with too little sample (2 mg) - has space to accommodate 20 mg.

Microcrystaline sample - may be crucial

C' -> Ca connections

C' -> Cb connections

Aromatic -> Ca,b connections

compared RFDR and Spin Diffusion experiments.

RFDR works best with things far from diagonal

Spin Diffusion works best with things near diagonal (similar s)

* should use much more sample; use resolution enhancement

* need to do a complex
 
 

C'sc -> Cbsc -> Casc -> C'bb of Asn

C' -> Ca region very strong in the sample

Cbi -> C'i

Cai -> C'i-1
 
 

15N - 13C transfer using ubiquiting

N -> Ca

N -> C'

Done with lyophilized material - line widths with lyophilized ubiquiting much poorer than crystalline BPTI; need to do work with crystalline BPTI.

Homo and heteronuclear correlation spectra of the C-subunit of ATPase

Also looking at light harvesting complexes.
 
 

Strategy

solid state NMR - get assignments

make complex (very large)

microcrystals of complex

(can be done with BPTI / trypsin)

measure 13C shifts in solid state to determine chemical shift perturbation

Asp, Glu, Arg - weak or missing - may be related to ionization state.

Robert Wind
 
 

Challenges
1. Increase throughput 1 -3 orders of magnitude

2. Larger proteins

1. Improve NMR sensitivity

- high magnetic fields

- increase sample concentrations

- improve probe design

- enhance magnetic capabilities

- reduce sample losses

Sample losses:

- dielectric losses
- associated with intrinsic dielectric of H2O

- associated with conductivity of solvent

- induction losses
- due to conductivity of sample

Inductive losses - long thin samples better than thick samples
 
 

Assume measuring time

60 days at 500 MHz

9.2 days at 800 MHz (measured at PNL - includes)

7.8 days by eliminating sample losses

2.5 days use 8 mm sample tube (possible if other losses are)

21 hours Eliminate all circuit losses except for coil losses

10 hours Improve coil effect

39 minutes use super-conducting coil

10 minutes increase protein con. by factor of 2.

Design magnets with magnetic homogeneity over a larger volume and/or handle more than one sample at a time

Proposes a program to develop NMR as a tool for protein studies as the main objective.

Use waiting time between scans to collect the second spectrum.

Toby Zens (Nalorac)
 
 

A lot of little things can be done to improve sensitivity along the way.

Advent of "cold metal" or "superconducting" probes.
 
 

SWAT - can provide square root of 2 increase in sensitivity. Zehns has demonstrated this at 12kHz sampling rate
 
 

Long coils
 
 

Should run steady-state experiments - then retune probe - will gain ~10% in S/N
 
 

3 mm, 5 mm

HS/N - 690:1, 1300:1

XRF 720/0 - 90%, 85%

Gradient recovery at 30 G/cm - N/A, 10 microsec

Salt Tolerance (0.5 M) - 1.12, 1.42
 
 

In measuring gradient profile, the top of the shape should be very flat.
 
 

Cold probe - ID-Triple - 600 5mm

0.1% ETB - 3600:1

1H-RF 810/90 : >70%

H-PW : < 10 microsec

X-RF 720/0 : > 55% real problem

13C PW: <15 microsec

Gradient Recovery: < 5 msec @ 30 G/cm

Salt Tolerance (0.5 M): > 50% spin lock performance unknown

Not clear that you can get the full range of performance in these superconducting TR probes, especially in area of spin locking.
 
 

Worst-case Scenario Comparison

Normal metal
1300:1 x XRF x salt tolerance 774:1
Cold Probe
3000:1 x XRF x salt tolerance 990:1
time savings (990 / 774)2
but if you get rid of salt - can do much better with cryoprobe

Lengthen coil window by factor of 2 - can get ~1.6 fold improvement in S/N
 
 

3 mm long probe 895:1 (115 microlit)

5 mm long probe 959:1 (345 microlit)

These numbers are S/N x XRF x salt tolerance

3 mm probe is wonderful when you can use high solubility sample.

Joshua Wand
 
 

Big proteins > 50 kD remain a challenge

Can gain enough by temperature dependence of viscosity of water

reverse micelles - Na his blah blah
 
 

Solvent - ethane, propane, butane, pentane, water

Viscosity (in Pa's) 35, 97, 158, 220, 850

but need to be at high pressure to keep them as liquids - but pressure is modest
 
 

MR - Water - Butane - Propane - Ethane

10 kB - 3 - 4.5 - 2.8 - 1.0

25 - 8 - 7.3 - 4.5 - 1.6

50 - 15 - 11.1 - 6.8 - 2.5

100 - 30 - 17.6 - 10.8 - 3.9

tau m is ns

Protein solubilization by simple phase transfer - lots of literature on this

Kevin Gardner
 
 

Compared ORFs in bacterial genomes with NMR structures in PDB - size distribution is limited.
 
 

Maltose-binding protein (MBP) - 42 kD - largest monomer. protein assigned to date tauc- 16 ns
 
 

Showed tremendous enhancement in MBP HN-15N spectrum - small upfield 2H shift on 15N is observed.
 
 

MBP - good test sample for large MW. NMR methods development. BPTI of the CO's.
 
 

Did backbone assignments in ~6 days using NMRView.
 
 

Using Metzler's method for Ca/Cb chemical shifts to identify 2° structures
 
 

Ile, Val, Leu - selective protonation of methyl groups: a-ketobutyrate.

also stereospecific 2H of the Cb
 
 

NH + CH3 - still along way to go to get good quality structures.
 
 

With TALOS - getting phi, 4 within 10 ° - so 2° structure is essentially done

Lewis Kay has described NOEs to get Methyl/Methyl NOEs for non deuterated proteins - very useful for setting initial folds

James Prestegard
 
 

Residual Dipolar Couplings Contributions to Structural Genomics
- Rapid determination of folds within protein domains

- Structural relationships of domains in multiple domain proteins

Usually uses intensity-based experiments for measuring residual dipolar couplings - or IPAP experiments - convenient to have one peak per site.
 
 

Order matrix analysis

Idea - use residual dipolar couplings early - and then use NOEs to refine

Can get sufficient measurements in fragments

1) amide bond

2) 2° structure

3) domain orientations

Applying this to 3 helical bundle (NodF).

HNHA

TOCSY - HSQC

NOESY - HSQC

Identify a helices

use Ca -Ha and N-H dipolar couplings

Ca -Ha couplings especially valuable for this - wider spread

Plots on "globe" projection

Then - rotate helix frames (2-axes) to align helix order frames
 
 

From diffusion measurements, Cyto C (12 kDa) in AOT micelles diffuses more rapidly than ubiquitin.
 
 

At these pressures (moderate) 50 atm, proteins are not unfolding - claims that structure not affected over this temperature range.
 
 

In different solvents - HSQC spectra of ubiquitin in AOT does not change. - behaves like 25 kD protein.
 
 

Shows linear dependence of 1/T2 on the bulk solvent viscosity.
 
 

Minimizes "lossy" problem in cold probes.
 
 

May be a good way to deal with proteins that aggregate - minimize aggregate.

Gaetano T. Montelione

NO NOTES

Discussion
 
 

Address limitations

Unique capabilities, structures known only by NMR

Much of the technology will come after investment.

Investments in magnets always worthwhile.
 
 

Sample preparation is common to NMR and crystallography
 
 

Can we consider a distributed data collection enterprise. People say that they are willing to do so if funds are available from central source. It is hard to get people to collect data on unknown protein.
 
 
 

We really don't know the right technology -- how can you invest in one method over another without pilot project data? Need pilot projects that can address in a serious way "what are limitations, what are costs?"
 
 

Opella: Many key proteins can only be addressed by solid-state NMR. High-throughput methods may be appropriate to only a segment of genome - but majority of genome cannot be addressed by existing methods - need basic method development.
 
 

To do pilot projects on a reasonable scale - must be significant instrumentation funding. - need this in order to be competitive.
 
 

David Cowburn - need to select targets that fit mission of DOE.
 
 

Bill Studier - technology development may be best done in a distributed way. R01's may be proper mechanism.

May not make sense to bring integral membrane protein structure analysis into Genomic Center

US labs lagging behind -- should do a survey of state-of-the-art of NMR data collection resources. - need
 
 

Pilot project must be set up to integrate approaches into a single efficient system. Early stage investments to test integration.
 
 

Need to develop group analogous to "Biosym" to monitor usage, make recommendations.

Need software to harvest info from multiple sites to summarize target selections and progress. Data harvesting of target selection progress

HANDOUTS: