The Merck Genome Research Institute
DNA Microarray Gene Expression Program
October 21, 1999

Mark Boguski

Connecting expression data with Entrz
    connect with Medline and other data, including 3Dprotein structure data

The problem with a database is that it must be constantly maintained/updated

novel, important - throw away words

Medical subject heading
Title terms
Abstract terms
Full text terms

Can use gene seed - get PubMedUIDs which mention that gene
Neighboring finds "related" papers (very relevant to listing potential functions)
seed gene --> gene neighbor
    get clusters of functionally - related genes
Only 19% of automated identified neighbors were included in expert lists
(Can consider to do this analysis on each protein in PDB - or each DALI domain)

Steve Gullans
HuFL Gene Chip
6606 genes currently in HuGe Index (for 5000 - full length cDNA info available)
                                   Human Genes
Database Curation
    - locus link IDs
    - gene functions - only a fraction (few hundred) have annoted fcns; trying to develop this classification

Most of the genes are expressed at very low levels in various tissues.


Would be interesting to know which of those have structures in PDB.
            look at Locus Link ID

now 5 chips; 30,000 human genes, mostly unannotated ESTs

Jim Eberwine
Some people believe that schizophrenia is a developmental disease; can we correlate genes differentially expressed in brains of tissues from schizophrenia with developmental pathways.
Barbara Dunn    (Pat Brown/Botstein groups)
~6000 ORFs
~7000 intergenic regions
        Differential expression
                1000 experiments X 6000 genes - much of these are on Pat Weber website - effects of stress etc.
        e.g. response to stress, look at gene induction, gene repression

Common Stress Response

    Response of cell is common (up or down) for many different stresses, ranging from gamma irradiation, to N2, to hydrogen peroxide, etc...
            Need to figure outwhat the genes are - how they are clustered

**Controlled Vocabulary***

    Gene Ontology- A language for annotation of genes -

    Process, Function, Cell Component

Now can figure out in clusters of genes why they are clustered

        Process: electron transport
        Function: cytic reductase
        Cell Component: mitochondrion  ? membrane

Quantitative Linguistics - guy at EBI started this

    almost done ?? but only ~2000 see function annotated

Gerry Rubin
Full length cDNAs from fly; then clone into vectors that allow rapid transfer to other vectors

Vector- Gateway system; Ed Harlow's lab most expert in this.

Generate a unique set of full-ORF clones of half of all Drosophila genes.

12,000 clones - distributed by Research Genetics

Have ~7000 unique cDNA clones

Programs using HMM works better than other like GRAIL:
    - Compared gene finding programs - GAJP
        very successful experiment to compare gene predicts programs
    - New - validating the gene predicting by per-ing the cDNAs.

Celera - whole genome shotgun (10X coverage) + 28 Mb finished sequence from BG DP
        By end of year - will have ?
        sequence - Dec. 1999
        paper - Feb. 2000
        annotations - Feb 2000 distributed
    Half of sequence done in last 2 weeks
    Initial assemblies look good - working well
    Same technology will be used for human

Affymetrix will make a fly chip.

Mark Videl
Genome     ----------------------------------->    Signal

Sequence   ----------------------------------->    Transduction
                Standardized Functional Assays

From microarray analysis - 500 genes in sporulation

Large scale 2-hybrid analysis; focused on C. elegans.  Other supporting evidence; GST pu;;-down, Related phenotypes, etc.

All functional genomics methods provide way to formulate hypothesis.
All these methods require ORFs cloned  into specific open reading frames - want to clone open reading frame into "universal" vector.

ORFeome - not proteome since proteins can be phosphoryl, etc.

- Bacteriophage Lambda Recombination in Phase Landing
- Full Length cDNAs from C. elegans

Gateway PCR Cloning



    ORF                DEATH

    Select for integrating
Have very extensive cDNA library.

Pick gene from Ace DB
    PCR success rate ~80%
    Gateway success rate ~97%

Fold induction typically > 100.  Of 35 clonings, 20 (ie 100%) gave correct inserts.  Sequencing these clones is BIG effort - corresponds to 1/4 of genome sequencing project

Using thes for RNAs, 2 hybrid - focus on ? development

    30 (clone proteins) X 30     2 - hybrid anlaysis         ---> Paper just accepted- data not yet out on web, maybe can get preprint
        55% of interact in literature confirmed
        2 new ones

The C. elegans ORFeome project

    Throughput:     One 96 - well plate / day
    Goal:               80% of ORFs in one year
    Collaborators:  Research Genetics (Primer)
                          Life Technologies (Gateway)
    Priorities:       1) cloned genes
                         2) ORFs from EST projects
                         3) ORFs predicted by GeneFinder - seems these predictions are not very good
Doing PCR on 96-well plate:

Microarray - Stuart Kim
RNAi - Tony Hymen                 2000 RNAi expert - cell div. early lethals
Deletions - Alan Coulson
2-Hybrids   - Mark Vitali

Vectors will be in public domain.
Appears that there is no proprietary issues related to using these.

Research Genetics will have distribution rights to these vectors

Stuart Kim
DNA microarrays with 11,990 genes on chip

Have 150 microarrays with 12,000 genes once you can make 1 - can make tons of chips; has full geome chips - can send RNA to S. Kim - ~40 labs have signed up to do this analysis with Kim

Looking at germ line development.  Using these chips to look at changes in gene expression during development.  Done 5X each - so you have good statistics.
        653    sperm-line genes expressed
        258    oocyte genes expressed
                 intrinsic genes - expressed in both
Sperm-line specific expression       14%    kinases/phosphates (RNAi does not work in sperm - may help to explain mechanism)

Oocytes (Good targets for RNAi; good target for structure analysis) - 258 total showed 4 cell embryo notch, others in notch pathway ? - other ??? pathway

Have done RNAi on J; they have function

oocyte genes - on all chromosomes
sperm gees - few on X - on other chromosomes
intrinsic genes - hardly any on X - on other chromosome

On web site (VALUABLE DATA FOR SG PROJECT) - find which worm genes are co-regulated and which genes are differtially expressed.

Connectio with Rosetta - Stew Scherer (Rosetta) - helping with Bioinformatics

Yeast and worm the furthest ahead in this ? of array data and RNAi.

also collab. with Steve Jones (Sanger Center), Pat Brown (Stanford)

This grant is a service grant (should be able to get S. Kim to run maternal RNA analysis - but see some competitions possible.) - 40 investigating labs need to screen against his chip.

Should be able to make MGRI grant that leverages this investment by MGRI - Tom Caskey very interested

S. Tilham asked when chips would be available - S. Kim said that oligo printers (Roseta, HP) will open this up.

Research Genetics has primers - perhaps these would soon be released.  Everyone has asked him for primers, but this is something he cannot amplify - so the ? is very valuable.

Page 1
Page 2
Page 3