9.2 Database explorer

The Database explorer is a tool useful to manage the molecular databases, allowing some basic operations that can be classified in two main groups: the first one includes the database operations (open, synchronize, close and close all) and the second one includes the molecules management functions (find, get, put, remove, rename and update). These functions are accessible trough buttons and popup menus.
 

9.2.1 Usage

The Database explorer is shown automatically when the database is opened with the Open/create file requester or selecting File -> Database -> Explorer main menu item. Its window is full resizable and the Database and Molecule boxes are resizable inside the window also.

 

9.2.2.1 Database management

The left box shows the gadgets to manage the database: the Database list to select the current database, the Open button to open/create a database, the Close button to close the selected database and the Close all button to close all databases.  The user can also open the database using the drag & drop operation: drop one or more database files over the Database box to open them. Clicking with the right mouse button on the database items, a popup menu is shown: it replicates the button functions (Open, Close and Close all) and it include the Synchronize item that allows to synchronize the large databases (LDB, see below).

 

9.2.2.2 Molecule extraction

The right box includes all controls for the molecule data manipulation. To extract a structure from the database you must select one or more molecules in the Molecule list (the multiple selection is allowed), choose the Get mode in order to add/replace the molecule in the current workspace or to place the molecule in a new workspace, and finally click the Get button. Instead of the Get button click, you can double click the molecule name or use the popup menu or hit the return key and it will be extracted automatically. The Next and Previous button are useful for the database scanning, because they allow to get sequentially the structures in the database.

 

9.2.2.3 Molecule insertion

To insert a molecule in the database, you must open its structure in the current workspace and thus click the Put button: a dialog box will shown to specify the name of the molecule. Clicking the Ok button, the molecule will be added into the database using the default parameters (molecule format, compression, connectivity and constraints) defined when the database is created. Alternatively, you can drag & drop one or more molecule files over the Molecule box to add molecules not opened in the current database. Is it possible to copy one or more molecules from one database to another one, just dragging and dropping the molecules from the Molecule box to a previously opened database in the Database box. Dragging & dropping a database over another one, all molecules are copied to the destination database. During the copy operation, the progress bar is shown at the bottom of the VEGA ZZ main window. If you want stop the operation, you can click the Abort button close to the progress bar.
When you insert or update a molecule, it can be pre-processed before the insertion procedure. In particular, it can be converted to 3D, optimized by MM (provided by AMMP) and completed adding the missing hydrogens. To enable/disable or set the parameters of these features, you must expand the Database explorer window, clicking the slim > button on the right of the window.

To revert the enlarged window, click the < button. The 2D to 3D box allows to disable (Never), enable (Always), enable if needed (When needed) the structure conversion. In this last case, the conversion is done only if the starting structure is 2D. If the molecule is already in 3D, the conversion step is automatically skipped. In the 2D to 3D conversion, is strongly recommended to perform the energy minimization. For this reason, Do steepest minimization is automatically checked when you select Always or When needed.
The Add hydrogens box allows to disable (Never), enable (Always), enable if needed (When needed) the procedure to add the hydrogens. This last option adds the hydrogens only if they are missing. The best algorithm is automatically selected on the basis of the molecule type (protein, nucleic acid and generic organic molecule).
The Minimization box allows to set the parameters for the energy minimization phase: checking Do steepest descent, the steepest descent minimization is performed for the specified number of Steps or until the gradient is not satisfied (Toler value). In the same way, clicking Do conjugate gradients, it's possible to enable the conjugate gradients minimization that will be stopped satisfying the number of Steps or the gradient (Toler). Clicking Normalize the coordinates, the molecule is translated at the origin of the Cartesian axis.
If you need to revert to the default parameters, click the Default button.

 

9.2.2.6 Filtering a database

Databases containing millions of molecules are very hard to manage. For this reason, a function to filter/query a database was implemented. This function is more sophisticated than the Find feature (see the next section), because it allows to filter molecules on the basis of chemical-physical properties, composition and so on. It's available only for SQL databases (e.g. SQLite) because it requires some pre-calculated properties that aren't supported in other databases format as SDF and Zip. To open the Database SQL filter window, click the Filer button:

In the Filter tab, it's possible to compose the query in easy way thanks to the graphic interface. More in detail, select the property to filter clicking in the Fields column. There are some pre-calculated properties:

Field Type Description
Angles I Number of angles.
Atoms I Number of atoms.
Bonds I Number of bonds.
Charge F Total charge.
ChiralAtms I Number of chiral atoms.
Description S Molecule description.
Dipole F Dipole moment (Debye)
EzBonds I Number of bonds with E/Z geometry.
FlexTorsions I Number of flexible torsion angles. In other words, it's the number of rotable bonds.
Formula S Molecular formula.
FuncGroups S Functional group list. The string has the following format (for internal use):
NUM_1 GRP_1 NUM_2 GRP_2 ... NUM_N GRP_N

where NUM is the number of functional groups of king GRP. The functional groups are detected by the GROUPS.tem ATDL template (see the Data directory). The following table shows the functional group types that are detected:

Group Description
COOH Carboxylic acid.
COOR Ester.
CHO Aldheyde.
CON2 Urea.
CON Amide.
OCOO Carbonate.
COCl Acyl chloride.
COBr Acyl bromide.
CNH Aldimine.
CNR Imine.
CO Ketone.
OCN Cyanate.
NCO Isocyanate.
NCS Tiocyanate.
CN Nitrile.
N1 Primary amine.
N2 Secondary amine.
N3 Tertiary amine.
N+ Ammonium salt.
NP Aromatic planar nitrogen.
NO3 Nitrate.
NO2 Nitrite.
NNN Azide.
NO Nitrose.
NC Isocyanide.
OH2 Water.
OH Alchol.
PhOH Phenol.
OR2 Ether.
2O2 Peroxyde.
SO3H Sulfonic acid.
SO2 Sulfone.
SO Sulfoxide.
SH Thiol.
SR2 Thioether.
2S2 Disulfide.
PO4 Phosphate
P3 Phospine.
F Fluoride.
Cl Chloride.
Br Bromide.
I Iodide.

When you click this field, the list of the functional groups is automatically enabled in order to build the query in easy way.

GroupID I Group identification number. The molecules in a database can be grouped and each group has a unique identification number.
Gyrrad F Gyration radius (Å).
HbAcc I Number of H-bond acceptor atoms.
HbDon I Number of H-bond donor atoms.
HeavyAtoms I Number of heavy atoms.
ID I Molecule identification number (primary key).
Impropers I Number of improper angles (out of plane).
Inchi S InChI string.
Lipole F Lipole (lipophilicity moment).
Mass F Molecular weight (Daltons).
Molecules I Number of molecules.
Name S Name of the molecule.
Ovality F Ovality (Å).
Psa F Polar surface area (Ų).
Rings I Number of rings in the molecule.
Sas F Solvent accessible surface (Ų).
Sav F Solvent accessible volume (ų).
Sdiam F Surface diameter (Å).
Smiles T Smiles string (not yet used).
Surface F Molecular surface (Ų).
Torsions I Number of torsion angles.
Vdiam F Volume diameter (Å).
VirtualLogP F Virtual logP.
Volume F Molecular volume (ų).

where the Type column indicates the type of filed: F = floating point number, I = integer number, S = character string.

More properties can be added by the user through SQL operations or by other softwares. They will be automatically available in the Fields column. After the selection of the property to filter, you can select the operator, put the value and click the Add button: the expression will be added to the Conditions column. There are some pre-defined operators:

Operator Description
= Equal.
<> Not equal.
< Less than.
> More than.
<= Less than or equal.
>= More than or equal.
GLOB This operator allows the pattern matching using the same syntax of the Unix shell:
 
Expression Description
* Wildcard character: it represents any string in the pattern.
? It means any single character.
[ ] It's a range of characters. Example:
[a-b]*

The first character must be in the a-b range and the reminder of the string can be any character.

 

NOT GLOB Not glob (see above).
LIKE Same function of GLOB, but it's possible to use the SQL expressions:
 
Expression Description
% Wildcard character: it represents any string in the pattern. It's equivalent to * of DOS pattern matching.
_ It means any single character (equivalent to ? of DOS).

 

NOT LIKE Not like (see above).
REGEXP Same function of GLOB, but it's possible to use the regular expressions (RegExp). For more information, see http://www.regular-expressions.info/.
NOT REGEXP Not regexp (see above).
IS NULL The field must be empty.
NOT IS NULL The field mustn't be empty.

You can add more than one condition that are related each other by the AND logical operator. If you want edit a pre-entered condition, click it in the Conditions column and change the value and/or the operator. If you want remove a condition, click it and press the Remove button.
In the SQL tab, it's possible to do more complex queries typing directly the SQL code:

Finally, clicking the Apply button the query is performed and the molecules satisfying the user-defined conditions are shown in the Database explorer. To remove the filter, click the Rescan button.

 

9.2.2.5 Other operations

A structure in the database can be updated selecting it in the molecule list and clicking the Update button. This structure will be replaced by the molecule in the current workspace.
A molecule in the database can be renamed, selecting it and clicking the Rename button, but remember that the molecules included in a Zip file can't renamed.
In similar way, a molecule can be removed from the database selecting it and clicking the Remove button. The multiple selection is allowed.
The Update button is useful to force the update of the molecule list when the database was changed by another application.
To find molecules inside the active database, you can use the find function (Find button, or Find item in the popup menu). The search is case-insensitive and it allows the wildcards (*, ? characters). Another method to find molecules inside the database is available trough the keyboard: typing a character on the keyboard, the molecule with the name starting with that character is automatically selected.
The status bar placed on the bottom of the window shows the total number of opened databases, the total number of molecules, the number of molecules in the active database and the database type indicator (SDB, LDB, see below).
The Done button closes the Database explorer.

 

9.2.2 Context menu

Clicking with the right mouse button on the Database list or in the Molecule list, the context menu is shown:

Database context menu
Item Accelerator Description
Open Ctrl+O Open a new database.
Close Ctrl+C Close the database.
Close all - Close all databases.
Synchronize Ctrl+S Synchronize the database (see the next section).

 

Molecule context menu
Item Accelerator Description
Get Ctrl+G Get the molecule.
Put Ctrl+P Put the molecule in the database.
Update - Update the molecule.
Rename F2 Rename the molecule.
Find Ctrl+F Find a molecule in the list.
Rescan - Rescan the database to update the molecule list.
Save list - Save the molecule list to a file.
Copy list - Copy the molecule list in the clipboard.

 

9.2.3 Small database and large database

VEGA ZZ can operate with the molecular databases in two modes:

The status bar highlight the operation mode showing the SDB or LDB labels. The LDB label could have a star (*) indicating that the selected database isn't synchronized. If the database is read-only, the Remove, Rename, Put, Synchronize and Update functions are disabled and in the status bar the RO label appears at the left of SDB/LDB word.

 

9.2.4 Notes: