Second International Structural Genomics Meeting
Airlie Center, Warrenton, Virginia
April 4 - 6, 2001


International Organization -- Tom Terwilliger

* Proposes that we create an International Structural Genomics Organzation
* Later in the meeting, the group identified T. Terwilliger, S. Yokoyama, and U.
        Heineman to head this Organization.
* Heineman is also organizing next Intl Structural Genomics Meeting for Nov
        2002 in Berlin.

Report of Task Force: 1. Data Capture (Helen Berman)

* Task Force report attached
* Aim to capture in the PDB all the information provided in the Methods and
        Materials section of a good protein crystallography or NMR paper,
        including extensive information about protein production
* A complete set of data items for archiving will be developed by Helen's
        committee by May 2001.

Report of Task Force: 2. Target Tracking (Steve Bryant)

* Task Force report attached
* Open exchange of target lists and progress on each target
* Data exchange must be simple
* Each SG Project would provide the following information in a common format
     - lab assigned target name
     - lab name
     - date of creation, date of last modification
     - target sequence
     - status of work
* Discussion later in the meeting (including S. Bryant, A Godzik, G Montelione,
        M Gerstein, S Brenner and others) refined this list and the "status" definitions.

Report of Task Force: 3. Data Quality Assurance (Randy Read)

* Task Force report attached
* Is automatic triggering feasible.  For X-ray, maybe.  For NMR, not yet.
        Methods to validate structure (e.g. R_free) are still in flux.  Recommend
        against "automatic-triggered release"
* Proposes some guidelines for when structure is "done", but we must leave
        the decision to investigators to decide when the structure is complete.
* Encourage rapid (< 3 weeks) deposition of structure into public domain once
        it is "complete".  In some cases, can put 6 month hold on structure to
        evaluate scientific and/or IP issues.
* Recommend depositing also raw data (diffraction data, NMR FIDs).  Also
        recommend depositing chemical shift data as soon as it is available.

Project Bottlenecks (W Hendrickson, M Linial, W Studier, chairs)

General bottlenecks:

I. Organization / Administration
     - Process Management
     - Administration

I.1 -- Process Management
     - LIMS
     - Automated Progress Reports
     - Identify bottlenecks
     - Improve process
         - Tom Terwilliger described Integrated database for coordination
         - A Joachimiak and Montelione/Gerstein described similar db's
         - BC Wang requires weekly written progress reports

I.2. -- Administration
     - Project coordination
     - hiring
     - motivation of key scientists
     - physical coordination; control and remote sites
         - some people concerned about long-term personal motivation
         - tracking of credit for peoples roles in each structure is important

II. Protein Expression / Solubility

Some comments made in the discussion:

- Small project with Thermotoga maritime.  7 ORFs gave good expression, 5
        provided crystals
- Success on each project is enhanced by expending a lot of time on each target
- Berlin group - every SG project should be on the same floor in the same building
        -- when in separate institutions, not enough communication; need to put
            people as close together as possible to ensure success
- ~ 20% of proteins provide crystals; but only a portion of these can be
        optimized for data collection

Relationships Between Industry and Adcademia 1. The Structural Genomics
    Consortium (B Skeen, Burroughs Wellcome Foundation)

Stuctural Genomics Consortium - not yet a reality.
Idea - 3D structures of important human targets could be shared in a
        pre-competitive way
2000 ORFs -> 1000 protein samples -> 200 structures in 5 years
~ 15 companies at ~ $1 million each = ~ $15 million total budget
Most of activities will be in a single dedicated center
Unlikely to make target lists available
Michelle Browner (Roche, Palo Alto) -- expressed that Roche want to partner
        with any structural genomics efforts that can enhance drug discovery

Relationships Between Industry and Adcademia 1.  Structural GenomiX
    (Tim Harris)

Aim - get more structures so that more good drugs can be developed
Can access NMR as needed (Peter Wright on SAB)
Strategy - express many homologues from the family
His Tags, Ni affinity / gel filtration.  Not removing tags prior to crystallization
New SGX beamline, next door to COM-CAT
     - on line in November
     - $8 MM cost
4.5 people at SGX developing "joining" software to integrate data analysis efforts.
8 hrs from data collection to structure (result of S Burley)
SGX aquires Prospect Genomics
     - ab initio modeling by David Baker
     - MODBASE by Sali
     - docking software by Tac Kuntz
     - Dan Santi also involved
     - Bill Rutter will join board.  Philip Chambon (Sprout) already on board.
     - want to forward integrate to drug design
     - nuclear receptors
     - kinases
     - phosphatases
     - proteases
     - ion channels, GPCRs
Therepeutic Areas
     - cancer
     - inflammation
     - metabolic disease
     - infectious disease (pathogens)
Have generated ~ 20 X-ray crystal structures.  Have learned more about function
        from these structures that we expected to
Bacterial Genomes
     - at least 40 sequences completed
     - over 100 in progress
     - express targets from 5 different genomes per target.  Get at least one
        soluble protein
     - 1852 bacterial targets to date
Business partners:
     - Cystic Fibrosis Foundation
              - 5 yr, $13 MM agreement to solve CFTR structure
     - Caliper Technologies
              - joint development of HT microfluid systems
     - Yale - collaboration on membrane protein structure determinations
     - Argonne - beam line construction
Business development plan:
     - $85 MM in private capital; Agaronne on steroids.
Partnerships with NIH Structural Genomics Centers are in progress
Discussion - how to improve synergies between commercial and academic activities.
    -Syrxx representative -- crystallization robot (~ $4 MM) will be
        shared with academic group at Scripps.
    - Marv Cassman (head of NIH General Medical Sciences: "Process of
        Academic and industrial efforts will converge.  The distinction is the
        "target selection"


Breakfast Meeting of P50 PIs

Marv Cassman (head of NIH General Medical Sciences)
    - we will need to put a lot more money into the centers in the coming years.
    - the goal of the Protein Structure Initiative in "completeness"

John Norvell  (head of NIH Structural Genomics Initiative)
    -  Needs summary of Organizational Process, Mid Year Report. Due April 23.
    - The goal is to get representative structure from families with no known

NIH Common Public Archive of All Target Sequences
     Items - for each target
          - Center id
          - Target id
          - Canonical sequence
          - Date stamp
          - Hot link to the group
          - GenBank link if appropriate
Guy to send around Mark's site to all center Pis.  NIH will ask PDB to provide
    public archive of all targets.

Intellectual Property in Structural Genomics (Joseph Strauss)

Joseph Strauss, Patent Lawyer and Professor, George Washington University

The decision of whether to put coordinates into the public domain may have
    important legal consequences.

USA - "First to Invent", based on laboratory documentation
Most of World - "First to File"

Some things that can be patented:

    * Gene Sequences.
    - identification of function is necessary - but it does not need to be
        a biological function
    - there are more than 3000 patents on human gene sequences

    * Amino Acid Sequences

    * Proteins - if isolated, technically produced, modified, etc

    * Methods of isolation

    * Uses of these products, if it is an inventive step

- "who first made the invention and when".  Priority is based on date invention was made, based on laboratory documentation, not on the date of filing.
- also applies to patent rights in the US for all member states of World Trade Organization
- priority is a matter of proof - laboratory notebooks, witnesses, publications

Rest of World
- priority date is determined by date of patent filing

Paris Convention Right of Priority - can file in all states within 12 months; does not control "first to invent" principle

Patentability Requirements

1. Novelty - generally an invention is novel if it does not form part of the state-of-the-art

Relevant state of prior art:

USA: - 12 month grace period.  Can apply for patent within 12 months of invention
        - use or oral disclosure abroad to not constitute prior art
        - do not have a standard for disclosure on the internet

Europe - everything, - no matter where - made available to the public before
    filing date forms part of the prior art

Any substance (e.g. protein) composed in the state-of-the-art, for use in
    therapeutical, diagnostics, or surgical methods is considered "new" if
    such USE is not comprised in the state-of-the-art.

Grace period in the "first to file" system provides immunity only against
    OWN disclosures, not immunity against 3rd party disclosures.  But - in
    the US - no matter what others disclose, you have priority based on date
    of invention.

Novelty examination -- comparison 1:1 (all features must correspond)

2. Obviousness - whether, in view of the prior art, it is obvious to try the
    invention with reasonable chance of success.

A "surprising" function identification would not be obvious, even if the
    methods used to provide that functional information or structure are trivial.

3. Sufficiency of Disclosure  - must disclose in a way that allows a person
    skilled in the art to review the disclosed invention at will.

Types of Patents

1.  Product Patents
     - if the product meets patentability requirements
     - one indicated use is sufficient
     - first patent applicant to describe ANY first medical use, gets
        product patent on ALL medical uses

2. Process Patents - cover not only the process, but also what you get
    from the process.

3.  Use Patents - additional uses of products or patents.

Scope of Protection

Patents claiming genetic information cover "anything derived" using that
    genetic information

Special Dependency Rule:  If the overlapping sequence is not essential to the
    invention, the two patents will be regarded as independent  Relative to
    splicing issues.

Research Exemption: Statutory in Europe.  In USA, not clear.  Provides
    freedom to use the product for R&D work prior to the time when you start
    to commercialize a product.  Provides right to use inventions to improve
    them, but not to use them as research tools.

National laws override these intellectual property agreements.

Conclusion: Want to get product patents on a medical use of each protein
    structure, but to do this need to identify biological function.   Perhaps,
    academic groups should try to at least guess a function and medical use
    for each protein structure released.

Report of Task Force: 4. Intellectual Propety Rights (Marv Cassman)

Marv Cassman - The goal of the program is to get the coordinates out to
    the community as soon as possible.  You want to characterize function
    for IP, you do it on your own time.

If it is not high throughput - it is nothing.

J Strauss:
Stevenson-Weidler Act and Technology Transfer Act of US - require efforts
    to protect intellectual property.  Should look carefully at SNP consortium
    to see how well the pre-competitive strategy of free data release works.

S Burley:
Pointed out that there are lost opportunity costs to investigators and
    institutes if patents are not pursued.  Brought up concept of "software
    patent" - patenting coordinates as "machine readable code for drug discovery
    and design".

Report of Task Force: 5. Publication (Guy Dodson)

* Task Force report attached
* Encourage publication of at least a short "Structure Note", in format
    like Acta Cryst. C uses for small molecules.  Several structural biology
    journals will support these.
* People expect 100 - 200 structures from SG projects by end of 2001
* People will not be obliged to publish as they release coordinates.  Could
    hold back publication until enough data is available for full paper - but
    would have to release coordinates in 3 wks - 6 months after "completing"

cDNA  Repository (Josh Labaer, Harvard Medical School)

Complete(extensive?) set of cDNAs for:
     - human
     - yeast
     - fly
Prevalidated and sequenced.  Enable rapid transfer to GateWay vectors
    with tags at either end, no tags, etc.

Automated NMR Structure Determination and Refinement (M. Nilges)

Excellent progress on automated NOESY analysis.
Proposes all data analysis could be done in 1 day.

HMMs based on SCOP (Cyrus Chothia)

Can identify folds for 45% of bacterial genomes and 30% of metazoan genomes.
Claims 1% false positive rate.

Summary of International Structural Genomics Projects (3 - 10 min reports)

I Bertini (Italy) - provided nice summary of SG around the world.
A Joachimiak (USA) - developed web-based data base for accessing SG data
S Burley (USA) - claims 80% of targets were soluble.  Developed "automated
    bioinformatics pipeline" to generate targets.  Test - place protein at 1 mg/mL
    in low saly - proteins which are monodisperse under these conditions have 70%
    likelihood to provide crystals
G Montelione / M Gerstein - ~ 20 structures from NESG so far.  SPINE db
    provides approach for integrating efforts across project and for data mining.
     CryoProbes and automated analysis methods provide resonance assignments
    for BPTI in 4 hrs of data collection plus 2 hrs of processing -- can expect
    major breakthrough in NMR for SG using cryoprobes
BC Wang (USA) aiming to have 3D structure in 30 min.  Focus on single-
    wavelength anamolous dispersion using S-S or S groups in proteins on home
Ian Wilson (USA) - effort at Scripps is a close collaboration with Syrxx.  No
    structures yet.  Impressive robotization effort.  Pilot project on expressing
    proteins in yeast in progress at Salk.
Berlin Structure Factory (Germany) - expressing in both E coli and yeast.
    Nice effort in hpt crystallization with robotics.  Have effort in SAR-by-NMR.
RIKEN (Japan) - see progress report at,  Found much better
    expression in pET11a without IPTG induction (this is funny??). Using "normal
    L broth".  Get much better solubilization of some proteins at pH 6.  Suggests
    it helps to try solubilization with different pH values as some proteins have
    pH dependent solubility.  Claims to now have 15,000 human cDNAs.  They are
    putting ~ 1000 of these into GateWay.  Sequence data is being released in
    PDBJ? (check this and provide info to Burkhard).

Conclusions of Meeting

1. A policy statement outlining policy conclusions will be released as Press Release
    and Document in mid April.

2. Policy calls for rapid release of protein structures determined in publicly
    funded SG efforts.  In general, release into PDB would follow soon (~ 3
    weeks) after completion of the structure.  Some of these may be put "on
    hold" for up to 6 months to evaluate scientific and IP issues.

3. Policy discourages patenting of coordinates without a clear use.  This seems
    like a vague statement, since use is a requirement for patenting.

4. Policy encourages relationships between publicly funded structural genomic
    centers and private entities.  This is a turn around from previous statements.

5. Major obstacles to structural genomics remain protein production and

Agenda Handout 5 Handout 6
Handout 1
Page 1
Page 1
Handout 2
Page 2
Page 2
Handout 3
Page 3
Page 3
Handout 4
Page 4
Page 5
Page 6
Page 7
Page 8
Page 9
Page 10
Page 11
Page 12
Page 13
Page 14