Go to the main menu

  

1. Introduction
This Web service is based on the PHP code generated by Tree2C program and performs the classification of molecules between permeant and not permeant of blood-brain barrier (BBB) through a decision tree. Since the attributes are calculated by VEGA and no other program were used.

2. Usage
Put the molecule name in Molecule name field or Copy & paste the structure of the molecule in the specific text box. Finally, click Predict button.
Since the training set used in the learning phase to build the model includes molecules in neutral form, also the molecules for which you want to predict the BBB permeation must be in this form.

3. About the decision tree model
To derive the model, the Li's dataset (J. Chem. Inf. Model., 2005, 45, 1376-1384) was used as learning set with Weka 3.8 software. All molecules were converted from SMILES to 3D by VEGA ZZ and optimized by MOPAC 2016 (PM7 PRECISE GEO-OK SUPER keywords), keeping them in neutral form. 129 properties/attributes were calculated by both VEGA ZZ and MOPAC 2016. The most significant attributes were selected according to the BestFirst search algorithm (direction = Forward; lookupCacheSize = 1; searchTermination = 5) and the WrapperSubsetEval attribute evaluator (classifier = RandomForest with default settings; doNotCheckCapabilities = False; evaluationMeasure = accuracy, RMSE; folds = 5; seed = 1; threshold = 0.01) as implemented in Weka. In this way, only 9 attributes were kept, namely:

  • Bonds = number of bonds
  • Charge = total charge
  • HeavyAtoms = number of heavy atoms
  • Mass = molecule mass
  • Vdiam = volume diameter
  • VirtualLogP = molecular lipophilicity cacluated as Log P according to Testa's method
  • FG_aaNH = Kier-Hall E-state descriptor
  • FG_sCH3 = Kier-Hall E-state descriptor
  • FG_sssN = Kier-Hall E-state descriptor

Charge descriptor appears in the list because the learning set includes quaternary ammonic molecules that were not neutralized with a counterion. All electronic descriptors calculated by MOPAC 2016 with SUPER keyword were considered not meaningful by the selection algorithm to be considered in the next phase.
The final model was obtained by Random Forest machine learning algorithm implemented in Weka with default parameters (bagging with 100 iterations and base learner) and performing a 10 fold cross-validation. The results are summarized here:
 

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances 346 83.3735 %
Incorrectly Classified Instances 69 16.6265 %
Kappa statistic                   0.6179
Mean absolute error               0.2655
Root mean squared error           0.3647
Relative absolute error          59.5558 %
Root relative squared error      77.2667 %
Total Number of Instances       415

=== Detailed Accuracy By Class ===

              TP Rate FP Rate Precision Recall F-Measure MCC   ROC   Area  PRC Area Class
              0,705   0,101   0,778     0,705  0,740     0,620 0,865 0,787 0
              0,899   0,295   0,858     0,899  0,878     0,620 0,865 0,912 1
Weighted Avg. 0,834   0,230   0,831     0,834  0,832     0,620 0,865 0,870

=== Confusion Matrix ===

  a   b <-- classified as
 98  41 | a = 0
 28 248 | b = 1

Finallly, the model was converted to PHP code by Tree2C software.

4. How the service works

  1. The service connects itself to PubChem to search and download the 3D structure of the molecule specified by the user. PUG REST APIs are used to interface the service to PubChem. If the molecule provided as text, it is used as input.
  2. VEGA command line calculates the molecular descriptors.
  3. Each property value is checked if it's in the range of values of the training set used to derive the model. If the property is outside of the prediction domain, the number of violations are shown.
  4. Finally, the classification is performed according to the PHP decision tree.
  

Back  Return to the BBB Preditor service