Preparing Datasets

Contents


Dataset Creation Graphical User Interface

The Dataset Creation Graphical User Interface is started with the "Convert/Create Dataset" File menu option.  This interface guides the user through the steps needed to create a usable AutoAssign dataset.  The interface prompts the user for the necessary information needed to perform each step automatically.  The whole process is broken down into 4 major steps:  Peak List Conversion, Registration, Dataset Properties, Dataset Completion. 

Views :


Main Window

When a user starts the Dataset Creation interface, one will see the main window as follows. 

Data Conversion Main Window

There are four window tabs or screens representing the four major steps in the process to create a complete AutoAssign dataset.  The tabs are listed left to right.  The rightmost button in the top of each screen pulls you to the next screen or part of the process.  At the end of the process, the user will have AutoAssign formatted peak lists and a control file describing the dataset.


 1.Peak List Conversion Tab

A user will see the Peak List Conversion tab when the interface is started.  The peak list conversion process is designed to reformat given peak lists into AutoAssign format.  Those reformatted files will be used to obtain registration and tolerance values in subsequent steps/tabs by running the scripts in background.  

Note:  A user should open only one Dataset Creation gui window since it does not support multiple processes running at the same time.  

Working directory [text field]:  it is a base directory that the interface will use to create a subdirectory with a time stamp. This subdirectory stores intermediate files, logs, and the completed dataset generated during the process.  The default is set to the current working directory.

Auto Fill [button]:  when a user decides to reuse the same peak lists or the same type of configuration based on the previous analysis, this function allows a user to import the data previously used.  A user only needs to specify the time-stamped subdirectory created by the interface that preserves the previous configuration information.  Note: The configuration file should NOT be edited by a user.  This option allows easy regeneration of a dataset from raw peak lists so that the user can easily move back and forth between a spectral visualization tool like Sparky and AutoAssign in order to improve the quality of the peak lists.   

Add Experiment [button]: at the startup, gui displays only 5 peak lists indexed from #1 to #5.  When a user needs to add more experiments or peak lists for analysis, this function allows a user to display additional blank peaklist panel.

Clear All Entries [button]: this function clears populated data fields displayed on screens.

Convert Peak Lists [button]: this function runs a set of AutoAssign perl scripts to converts peak lists entered by a user. It  generates intermediate files used to compute registration/tolerance values for the subsequent steps.  All scripts executions run by interface is logged in event.log file found under the subdirectory named with a time stamp along with other intermediate files.

Exit [button]: this button closes the interface.

1a. Peak List Information

Peak List Conversion Tab

Type [drop down menu—required]: a user needs to choose one of experiment types such as NH-HSQC, etc.  Depending on the choice, it will display additional information that a user needs to provide about the experiment. If a user decides not to use the particular peaklist, simply choose “N/A” from Type menu.  When “N/A” in Type field is selected, the gui will ignore any other information displayed on that peaklist.

Filename [text field-required]: it specifies a filename and a file directory.

Format [drop down menu-required]: it specifies a type of peaklist format that the file was produced with.

Select Columns [drop down menu-required]: it specifies the column number found in the given peaklist for the corresponding label (H,N,intensity, etc).  For example, the first column in your peaklist may be an intensity.  If so, you would like to choose one (1) for Intensity.  When a user clicks a “View File” button, it displays the content of peaklists in a table format.  It is only used to view its content for column selection and allow to select columns/rows.  Currently, it accepts only one (1) Note column.  Note columns is the mechanism for pulling in peak-based assignment constraints from the raw peak lists.  These assignment contraints are normally entered into some "Notes" field inside the spectral visualization tool used to generate the raw peak lists.

Column/Row Selection Window

Select Rows [drop down menu-optional]: it specifies the top/bottom rows to delete.

Phase [drop down menu-optional]: it specifies the type of phase that the experiment was conducted. 

Flip Intensity [radio button-optional]: it specifies the necessity of intensity sign conversion [from positive to negative or from negative to positive]. A user can choose this option depending on how the experiment data was collected.  It only allows a global change to flip sign for values on the intensity column.  


2. Registration Tab

Registration Tab

A user will be taken to the Registration tab when “Convert Peak List” button is clicked.  It indicates that a user has completed the peaklist conversion process.  The main objectives of this window are to:

1) allow a user to calculate registration values, and

2) apply those values to shift peaks by creating new / temporal peaklist files.  

Calculate Registration [button]: this function starts calculating registration value based on the information provided on Peak List Conversion tab. It will open a progress monitor window to notify you how many calculations are completed. The error may be recorded in .reg file(s) under subdirectory if it occurs.

Progress Monitior Window

View Registration [button]: the interface will create an html page by consolidating .reg files generated and will try to open it with a browser.  When an error occurs and .reg files are improperly generated by the scripts, an empty page might be produced. You may need to go back and check your peaklists or might be able to obtain more information from event.log or .reg files when an error occurs.  The "Weighted Registration" values are used to register peak lists.  The "Full Std" values are used in calculating tolerances included in the control file.

Registration Result Page

Revert to Original Value [button]: this function refills text fields for registration value with original registration value calculated.

Apply Registration [button]: this function applies registration values to each peaklist.  It creates new peaklist files with _shifted.pks extension.      

Registration Status [text field]: it starts with “N/A” when no action is taken.  It indicates how far the calculation is done. When a user shifts peaks with a registration value, it tells you that shifting has been completed.

Global Shift Values [text field]: it specifies values for corresponding resonance to globally shift across all peaklist files including a root file.  For example, if you specify values for H and N, The column values for H and N in all peaklist files including an NH-HN root file will be shifted.

Registration Results [text field-required]: it displays the calculation result indicating which experiment is a root for particular resonance.  A user is allowed to modify the registration value to shift.


3. Dataset Properties Tab


Dataset Properties Tab

A user is asked to provide additional information to complete the control file creation.  The value entered by a user on this tab will be used in property, tolerance and sequence section on a control file.

Open Sequence File [button]: this function opens a window for a user to select a sequence file.

Save Properties/Create Control File [button]: this function saves dataset property information into an internal data structure and use it to produce a control file.   This  button should be clicked to produce an initial control file and the result  will be displayed on the next tab.

Sequence:

Starting Residue Number [text field-required]: it specifies the starting residue number of a given sequence.

Sequence [text field-required]: it specifies a amino acid sequence of a given protein.   

Protein Name [text field-required]: it specifies the name of the protein.  The default value is given as XYZ.

Deuteration [drop down menu]: it specifies the type of deuteration. 

Percent(%) [text field]:  it specifies the percent of deuteration.  The valid value ranges between 0 and 100.     

Radio Buttons of Analysis Properties: These buttons specify whether to run AutoAssign in a given analysis mode.  A list of all property keywords can be found in the Control File Format.

Match Tolerance Settings [text field-required]: it specifies the matching tolerance for each resonance.   These tolerances are used in grouping peaks into spin systems and for linking nearest neighbor spin systems together into segments.
 


4. Dataset Completion Tab

Dataset Completion Tab

A user can review the control file generated by the interface, manually edit its content, and then save it.  To save a revised control file select the "Rename Control File" button. A user should check the content and the format of the control file before the file is submitted to the AutoAssign server.  At the end, a user can open the control file by selecting the "Open Control File" button in this tab or by selecting the "Open Control File" File menu option in the main AutoAssign window.  

View Quality Report [button]: this function opens a window for a user to view quality assessment report on the dataset.  It contains a summary across the dataset and information on each detectable spin system.  This report is immensely valuable in improving the quality of the dataset.  A user can use the "Error List" to guide them in improving the raw peak lists.  The report gives detailed information on each spin system that can guide the user through refining all their peak lists simultaneously on a spin system by spin system basis.

Quality Assessment Report

Read Control File [button]: this function allows a user to import an existing control file to view.

Revert to Original [button]: this function brings back original control file to the screen.

Rename Control File [button]: this function allows to save a control file with a different name.

Open Control File [button]: this function opens a control file for analysis.  The default control file is the one with original values.  When a user saved the control file, that file becomes the default.

Control File [text area]: it displays the control file generated based on the user input. A user should review its content to make sure that all information is correct.  See AutoAssign help page for more details. 



Manually Preparing Peak Files

The easiest way to prepare peak lists is to use the Dataset Creation Graphical User Interface.  However, if the interface cannot handle the strange experiment you are giving it or you need to create an automated process of your own, you can use the manual approach described here which uses the extract_columns.pl and create_peak_list.pl Perl programs.  If this does not suffice, then you can write your own tools based on the file format given below:

Peak File Format:

<Index>     <dim1>     <dim2>     [dimX]     <Intensity>     <Label>[.PeakNotes]
...
*

Individual fields are separated by spaces and/or tabs.
Comments are indicated by a "#" sign at the beginning of a line.
An asterisk is used to indicate the end of a peak file. This asterisk is required. Extended comments may be placed after the asterisk.

- The index field (first field) should list an integer that, along with the spectrum name, provides a unique handle for the peak. Thus in the example above, the first peak might be identified as "hnca 126".

- Peak resonance frequencies should be given in ppm, and values for frequencies and intensities may be in floating point or exponential format.

- The intensity of the peak should follow the frequency fields.

- The last field provides a label that is included in various output routines, and helps the user to identify the peak.  A period can be used to add a comment or note to this field.  Such a note can be used to add assignment constraints.  The form of the assignment constraints are:

peak_label.;<adghilmrs>[-]<list>...

Here is an example HNCA peak list file:

#Index   Xppm     Yppm     Zppm     Intensity    Label.notes
126     8.871      110.859   50.247   3242120       HNCA
125     8.870      110.898   62.529   724463         HNCA.;g45
73       8.744      116.161   56.614   2287600       HNCA
:
:
145     9.153    112.004     57.415   2788050       HNCA   
*

Assignment Constraints

There are nine types of assignment constraints: RACs, GACs, SACs, TACs, LACs, MACs, HMACs, and PACs which allow the user to override the resonance interpretation,  resonance grouping, sidechain classification, spin system typing, spin system linking, and segment mapping steps performed by AutoAssign.  This formulation of assignment constraints allows both gentle and hard use by the user.  Gentle use allows the user to guide the assignment process without forcing errors into the assignments.  Hard use is as it sounds.  The user can force AutoAssign to absolutely trust certain information that the user provides, which can in turn force errors into the assignments if the information is not 100% correct.  In general, it is safer to limit to a few possibilities rather than forcing the program to try the single most probable one.  Assignment constraints are listed in a peak note of a peak in a given peak list.  Having assignment constraints directly associated with peaks allows the user to use their spectral visualization tool of choice (i.e. Sparky) to enter them into the peak lists generated by that tool (i.e. the Notes field in Sparky).

Assignment constraints are given in a simple mnemonic that starts with a ";" followed by a single character identifying the type of assignment constraint.  This is followed by assignment constraint specific information.  The text is case insensitive.  Any text after an underscore "_" character is ignored.   The general form of the assignment constraints given in a peak note field is as follows:

peak_label.;<adghilmrs>[-]<list>...[_]ignored

where

;r[#]<rn>[,rn]... - RAC to limit the possible resonance interpretations in a peak's dimension.  The dimension specification is optional when using 3-dimensional peaks.
;g<##>[,##]... - GAC to include peak in a specific GS.  The group_id is an arbitrary positive number used by the user.  All peak in the GS must be given the same group_id.
;a - SAC to specify that the given GS is not a sidechain GS and should be in the usable list of GSs.
;d - SAC to specify that the given GS is a sidechain GS and should not be in the usable list of GSs.
;i[-]<aa>[aa]... - i-TAC to only include (or exclude) the specified amino acids as possible ones for the given intraresidue ladder of the GS.
;s[-]<aa>[aa]... - s-TAC to only include (or exclude) the specified amino acids as possible ones for the given sequential residue ladder of the GS.
;l<n/c>[-]<##>[,##]... - LAC to include (or exclude) the possible neighbors in the c/n direction to those with a given link_id.  The link_id is an arbitrary number given by the user.  LAC's are used in pairs of ;ln[-]## and ;lc[-]##.
;m[-]<ss>[,ss]... - MAC to include (or exclude) the specified sequence site as as a possible mapping for the given GS.
;h<ss> - HMAC to immediately assign the given GS to the given SS.
;p[-] - PAC to keep (or delete) this peak even if it is a duplicate of another peak.

An example is:

126     8.871    110.859  50.247   3242120       HNCA.;g5;iad;a;m-a32,d50;rca-1;ln50_this_is_ignored_text

where
126 - peak index.
8.871 - peak amide hydrogen dimension.
110.859 - peak amide nitrogen dimension.
50.247 - peak aliphatic carbon dimension.
3242120 - peak intensity
HNCA - peak label.
Assignment Constraints:
;g5 - GAC indicates that this peak should be grouped into group 5.
;iad - i-TAC indicates that the intra ladder can only be amino acid types A or D.
;a - SAC indicates that this is not a sidechain GS.
;m-a32,d50 - MAC indicates that the GS containing this peak should not be mapped to A32 nor D50.
;rca-1 - RAC indicates that the carbon dimension (3rd dimension) is limited to sequential CA resonances. Explicity dimension declaration is ";r3ca-1".
;ln50 - n-LAC indicates that the n-linked neighbor GS must be identified by link_id 50 (i.e. the n-linked neighbor GS must have a SLAC of ";lc50" as well).
_this_is_ignored_text - ignored text in the peak note.

Resonance Assignment Constraint (RAC):  
A resonance assignment constraint limits the possible resonance interpretations for the chemical shift or dimension of a peak.  For example, in an HNCACB type experiment the aliphatic carbon dimension can represent chemical shifts for the intra and sequential CA and CB resonances of a GS.  The general form of a RAC is as follows:

;r[#]<rn>[,rn]...

where
";r" indicates that this is a RAC.
"#" indicates the dimension (number) referred to for a given peak.  This is not needed for 3-dimensional peak lists.
"rn" indicates a list of possible resonances.  For an HNCACB peak, the possible resonances are: ca, cb, ca-1, and cb-1.

In the following example for an HNCACB peak:

;rca,ca-1

A RAC is given that limits the aliphatic carbon dimension to intra and sequential CA resonances.

Grouping Assignment Constraint (GAC):
A grouping assignment constraint limits which GS a peak is grouped into.  This is probably the most valuable type of assignment constraint since it allows the user to disambiguate overlapped GSs.  The general form of a GAC is as follows:

;g<##>[,##]...

where
";g" indicates that this is a GAC.
"##" indicates a group_id (positive number) that the peak is associated with.  The peak will be grouped only in GSs comprised of this group_id.

In the following example:

;g34,35

The peak will only be grouped in a GS associated with either the group_id 34 or group_id 35.

Sidechain Assignment Constraint (SAC):

A sidechain assignment constraint indicates if the given GS is a sidechain or not.  In a practical sense, a SAC either deletes or keeps AutoAssign from deleting the given GS from the list of usable GSs.  The general form of a SAC is as follows:

;<a/d>

where
";a" indicates a SAC that will include this GS in the list of usable GSs.
";d" indicates a SAC that will exclude this GS from the list of usable GSs.

In the following example:

;d

The GS is deleted from the list of usable GS.

Typing Assignment Constraint (TAC):  
A typing assignment constraint limits the possible amino acids allowed in the typing of the intra or sequential ladder of a GS (remember, a GS represents a dipeptide spin system). The general forms for TACs is as follows:

;<i/s>[-]<aa>[aa]...
where

";i" indicates an intra TAC or i-TAC that limits the possible amino acid types for the intra ladder of a GS.
";s" indicates a sequential TAC or s-TAC that limits the possible amino acid types for the sequential ladder of a GS.
"-" (minus sign) indicates that the TAC is excluding the following amino acids from the list of possible ones.
"aa" indicates an amino acid type.

In the following example:

;iilv;sg

A pair of TACs are given: i-TAC limiting intra ladder typing to I, L, and V; s-TAC limiting sequential ladder typing to G.

Linking Assignment Constraint (LAC):
A linking assignment constraint limits which neighbor GSs that the given GS can be linked to.  The general form of a LAC is as follows:

;l<n/c>[-]<##>[,##]...

where
";l" indicates that this is a LAC.
"n" indicates that this is a n-linked LAC or n-LAC which limits the possible n-linked GS neighbors.
"c" indicates that this is a c-linked LAC or c-LAC which limits the possible c-linked GS neighbors.
"-" (minus sign) indicates that the following link_ids should be excluded.
"##" indicates a link_id (positive number) identifying possible links that the GS can have (or not have) in the identified direction.

Normally, LACs come in pairs that associates a link_id from both the n-linked and c-linked direction.  Given this form of an assignment constraint, it is usually easier to indicate an excluding LAC (one with a minus sign).  In the following example:

;ln45,46

A n-LAC is given which limits the n-linking of the given GS to other GSs that have c-LAC link_ids of 45 or 46.

Mapping Assignment Constraint (MAC):

A mapping assignment constraint limits the possible sequence sites (SS) that a GS can map to.  MACs are generally safer to use in excluding possible SSs.  This is because a user can generally tell where a GS does not belong.  Even if the user does limit a GS mapping to only one SS.  AutoAssign may not map it there if other GSs may also be mapped to the same SS. The general form of a MAC is as follows:

;m[-]<ss>[,ss]...

where
";m" indicates that this is a MAC.
"-" (minus sign) indicates that the following SSs should be excluded as possible mapping sites for the given GS.
"ss" sequence site to limit mapping for.

In the following example:

;m-r30,r35

A MAC is given that limits the GS not to be mapped to R30 or R35.

Hammer Mapping Assignment Constraint (HMAC):

A hammer mapping assignment constraint immediately maps the given GS to a specific SS.  This is a very forceful assignment constraint that should be used with extreme caution.  This assignment constraint has been added because sometimes nothing quite works except a hammer.  The general form of a HMAC is as follows:

;h<ss>

where

";h" indicates that this is a HMAC.
"ss" sequence site to immediately map the GS to.

In the following example:

;he35

A HMAC is given that maps the given GS to E35.

Peak picking Assignment Constraint (PAC):

A peak picking assignment constraint either causes the given peak to be used or deleted even if it is a duplicate of another peak in a given peak list.  The general form of a PAC is as follows:

;p[-]

where

";p" indicates that is is a PAC.
"-" (minus sign) indicates that this peak should be deleted and not used in the assignment process.

In the following example:

;p

A PAC is given that indicates this peak is to be used even if it is a duplicate of another peak in the peak list.

Manually Preparing a Control File

The easiest way to prepare peak lists is to use the Dataset Creation Graphical User Interface.  However, if you need to create a control file by hand, then copy one from the example data sets in the distribution and modify it based on the file format given below.

Control File Format:

Here is an example Control File for the a protein called JR19 from the NESG project:

#
# ER14  AutoAssign Table file
#
Protein: JR19

Properties: no_referencing no_sequential_intras

Sequence: 1 MAEDEGYPAEVIEIIGRTGTTGDVTQVKVRILEGRDKGRVIRRNVRGPVRVGDILILRETEREAREIKSRRAAALEHHHHHH*

Tolerances: HN 0.016 N15 0.2 CA 0.3 CB 0.3 HA 0.03 CO 0.4

Spectra:
HSQC  ROOT      hsqc.pks  1   0   0 phase: {} {
{HN       4  12   0  .02 unfolded }
{N15    100 130   0  .3 unfolded }
}
HNCO  HSQC     hnco.pks  0   1   0 phase: {} {
{HN       4  12   0  .02 unfolded }
{N15    100 130   0  .3 unfolded }
{CO     170 180   0  .3 unfolded }
}
HNcoCA  HSQC     hncoca.pks  0   1   0 phase: {} {
{HN       4  12   0  .02 unfolded }
{N15    100 130   0  .3 unfolded }
{CA      40  70   0  .50 unfolded }
}
HNCA  HSQC       hnca.pks  1   1   0 phase: {} {
{HN       4  12   0  .02 unfolded }
{N15    100 130   0  .3 unfolded }
{CA      40  70   0  .50 unfolded }
}
HNcoCACB HSQC    hncocacb.pks  0   1   0 phase: {} {
{HN       4  12   0  .02 unfolded }
{N15    100 130   0  .3 unfolded }
{{CA CB } 10  80  0 .5 unfolded }
}
HNCACB   HSQC    hncacb.pks  1  1   0  phase: {CB {ACDEFGHIKLMNPQRSTVWY}} {
{HN       4  12   0  .02 unfolded }
{N15    100 130   0  .3 unfolded }
{{CA CB } 10  80  0 .5 unfolded }
}
HNcoHA  HSQC   hncoha.pks 0   1   0 phase: {} {
{HN       4  12   0  .02 unfolded }
{N15    100 130   0  .30 unfolded }
{HA       2   8   0  .05 unfolded }
}
HNHA HSQC   hnha.pks 1   1   0 phase: {} {
{HN       4  12   0  .02 unfolded }
{N15    100 130   0  .30 unfolded }
{HA       2   8   0  .05 unfolded }
}
:
STD: HNHA.3 HNCOHA.3 0.005
STD: HNCACB.3 HNCOCACB.3 0.0758873 HNCOCA.3 0.0623896
STD: HNCO.1 HNHA.1 0.00296171 HNCOCA.1 0.002 HNCACB.1 0.00335142 HNCOHA.1 0.00221347 HNCOCACB.1 0.002 HNCA.1 0.00324182
STD: HNCO.2 HNHA.2 0.0368455 HNCOCA.2 0.02 HNCACB.2 0.042985 HNCOHA.2 0.0250925 HNCOCACB.2 0.02 HNCA.2 0.0449021
STD: HNCOCA.3 HNCA.3 0.0928806
STD: HSQC.1 HNHA.1 0.00278925 HNCO.1 0.002 HNCOCACB.1 0.00272964 HNCACB.1 0.00399468 HNCOHA.1 0.00205141 HNCOCA.1 0.00258827 HNCA.1 0.00413856
STD: HSQC.2 HNHA.2 0.0388324 HNCO.2 0.02 HNCOCACB.2 0.0226115 HNCACB.2 0.0492007 HNCOHA.2 0.0263674 HNCOCA.2 0.0238823 HNCA.2 0.0494308
$

Control File Sections:

The sections "Protein:", "Sequence:", "Tolerances:", and "Spectra:" must occur exactly as shown and in that order. "Properties:" is an optional section that must come before the "Sequence:" section.  The "STD:" section is optional and comes after a "Spectra:" section that is ended with a ":".  The file should be terminated with a "$".  You can put comments between sections by using a "#" at the beginning of the line.

"Protein:" is used to specify the name of the protein and has no effect on execution.

"Sequence:" is used to specify the amino acid sequence of the protein.  The sequence information should include the position in the sequence of the first residue ("1" for Met 1 in this example), followed by a space and letters of the sequence. The residue characters may have any number of intervening spaces, tabs, or new lines, but must be terminated with an asterisk.  "B" represents nondisulfided bonded cysteines.  "U" represents cysteines of unknown state.
 

"Properties:" is an optional section and can have different keywords to alter basic AutoAssign behavior.  The keywords are as follows:

"deuterated" - indicates that the spectra were collected on 100% deuterated samples. 

"deuterate_%%" - indicates percent of deuteration for fractionally labeled samples.

"ILV_deuterated_%%" - indicates percent of deuteration for samples that have selective ILV protonation.

"no_sequential_intras" - prevents "Match Degenerate" that allows sequential shifts to be used as intra shifts from being run in "Default Execution".

"loose_sequential_intra_matching" - allows a sequential peak to match multiple intra peaks in determining which intra peaks are sequential."  This mimics older, less conservative, AutoAssign behavior.

"no_referencing" - indicates that the spectra should not be internally referenced in their comparable dimensions.

"ignore_constraints" - causes AutoAssign to ignore all assignments constraints (both peak based and constraint file based).

"alt_root" - indicates an alternate root spectrum to use instead of the "HNCO" spectrum.  A good candidate is the "HNcoCA".

"root_matching_std_units" - indicates the number of standard deviation units to use in root HN matching for building spin systems (default is 4).

"link_matching_std_units" - indicates the number of standard deviation units to use in link matching for linking spin systems together (default is 3).

"max_matching_multiplier" - indicates the maximum tolerance to allow in matching (i.e. std * num_std_units * multiplier). The default is 1.5 .

"min_matching_multiplier" - indicates the minimum tolerance to allow in matching (i.e. std * num_std_units * multiplier).  The default is 0.75 .

"tolerance_bound_grouping" - makes spin system grouping assignment constraints only work if the peaks they group are within the amide grouping tolerances.

"keep_duplicate_peaks" - causes AutoAssign not to automatically delete duplicate peaks found in a given spectrum.  Duplicate peaks exactly match in all dimensions and intensity values.

Examples of valid "Properties:" sections are:

Properties: deuterated
Properties: deuterated_50 override override1.aao
Properties: deuterated_75
Properties: deuterated_25
Properties: ILV_deuterated_90
Properties: no_sequential_intras
Properties: ignore_constraints
Properties: alt_root HNcoCA
Properties: root_matching_std_units 5.0
Properties: link_matching_std_units 3.5
Properties: max_matching_multiplier 2.0
Properties: min_matching_multiplier 0.5
Properties: tolerance_bound_grouping
Properties: keep_duplicate_peaks


"Tolerances:" specify default intraresidue (in the root dimensions for inclusion in a spin system) and sequential match tolerances for the atom types listed. THESE TOLERANCES ARE NOW SUPERSEDED BY THE VALUES IN THE "STD" SECTION.  All atoms detected in the spectra should be listed here, even if that atom type does not participate in any "matching" per se. In this example, the sequential carbonyl frequency is detected in the HNCO spectrum, but there is no corresponding experiment to detect intraresidue CO frequencies. Even so, the CO atom is listed here, with a default tolerance arbitarily set to 0.0.


"Spectra:" specifies the start of the spectra section which specifies the peak lists used in this dataset.  This section is preceded by the keyword "Spectra:" on a line by itself.  Then each peak list file is specified as follows.  The section is ended with a ":" on a line by itself.

line 1: <name> <ref-spec> <file> <intraresidue> <sequential> <through-space> [phase: {}] [NH2_PHASE] {

Dimension Specifications:
For each dimension of a spectrum, the following information is provided on a single line:

{ <atom> <sw 0> <sw 1> <correction> <tolerance> <folded/unfolded> [intra] [seq] [any] [phase: {}] [print_order val] [print_ref val] }

<atom>

may be a single atom name or a list of atoms delimited by curly braces.The atom names should correspond to those listed for "Tolerances:".
<sw 0> and <sw 1>
should specify the lower and upper bounds of the sweepwidth respectively, in ppm.
<correction>
is used to specify an external correction to the frequencies in that dimension. For example, if the reference molecule used for the C13 dimensions is non-standard, a correction value should be specified to adjust the chemical shift frequencies in those dimensions.
<tolerance>
a match tolerance is again specified for atoms in this dimension.  This is now deprecated and not used; however, a value still must be placed here.
<folded/unfolded>
this field uses a string to indicate whether or not the values in this dimension are folded. If so, AutoAssign will use the specified sweepwidths for this spectrum to locate another (unfolded) spectrum which "contains" the sweepwidth of this spectrum and attempt to "unfold" the values.
[intra] [seq] [any]

These are the  "intraresidue", "sequential", and "through-space" keywords that override the values given for the spectrum as a whole.

[phase: {}]

The optional "phase" field specifies how negative intensities should be interpreted for this dimension.
[print_order val]

The value is an integer indicator the order of the dimension when printing assigned peak lists.
[print_ref val]

Absolute reference shift to apply to this dimension when printing assigned peak lists.
Be careful to have a space between unfolded/folded and the ending "}".  Unfortunately this version of AutoAssign is sensitive to it and will flag it as an error in the table file.


"STD:" specify the standard deviations for matching dimensions between a base dimension and other dimensions.  These values are derived from the calculate_registration program and represent the standard deviation of matching that includes any covariance present between the two peak lists.  The format is as follows:

STD:  base_spectrum_name.dim#  spectrum_name1.dim# std_value  spectrum_name2.dim# std_value ...