Development of Spectral Archive of NMR Data
the Montelione lab is working to automate the process of NMR analysis.
Automated NMR analysis techniques will play an essential role in the field
of structural genomics. As a member of the Northeast Structural Genomics
Consortium the Montelione lab will be working to solve protein structures
for a representative of every protein family in the human genome.
As one can image a project of this magnitude will generate an enormous
amount of spectral data. The purpose of the SAND project is to design
and implement a database to store the labs spectral data.
The Spectral Archive of NMR Data (SAND) was developed using ORACLE 8i on a Linux 6.2 platform. The primary job of the database is to track NMR spectral files as well as related experimental data. One of the main concerns was whether the database was capable of holding the large amount of information it was anticipated to hold. Due to this concern it was decided to store spectral files within the file system. In order to organize the directory structure a dynamic directory creation system was implemented. SAND is unique in that it creates directories on the fly within the file system which correspond to the correct NMR file. The program then physically moves the file from directory it was uploaded to the corresponding directory which was just created. This allows users to access data either by querying the database or by simply searching through a logically created directory structure. The dynamic directory creation and file migration is coded in java servlets, Java Server Pages (JSP), PL/SQL, SQL, and PERL.
SAND features a fully web based user interface which can be accessed at SANDContents.html. To enter data the user simply enters a directory where their experimental data is stored. SAND is then smart enough to AutoFill most of the data entry fields and move the users experimental files to the appropriate directory. The interface allows one to enter new spectral data or query the database for existing datasets. Every table in the database is able to be queried and makes use of relational operators. For example, users can search for particular experiments conducted at specific temperatures.
Lastly, users can search the procpar variable values for the FIDs. The procpar file is parsed by a PERL program which makes use of the DBI module to connect to the database and insert procpar variables as well as their values. In order to call the PERL program from within ORACLE it is instantiated from a java program "wrapped" in PL/SQL which makes a LINUX system call.
Future directions of SAND include a multi-Autofill tool for inserting multiple experiments at once, A tool to repopulate the database from the directory structure, a tight security system, automated backup and recovery procedures, interfacing SAND with the labs reagents database and possibly providing functionality to store structures.