PeptBuilder
A tool to build libraries of peptides

Main topics:

1. Introduction
2. System requirements
3. Installation
4. Usage
5. History
6. Copyright and disclaimers

 

1. Introduction

This program was developed with the aim to create a versatile tool to build large libraries of 3D structures of peptides starting from their primary amino acid sequence. In detail, it can:

 

2. System requirements

PeptBuilder is compatible with both Linux (x86, x64, and ARM) and Windows (x86 or x64) operating systems. It requires the VegaBase and HyperDrive libraries, which are also used in the development of VEGA and VEGA ZZ programs. Thanks to its parallel architecture, PeptBuilder can generate millions of structures in a short amount of time. For instance, it can construct the 3D structure of all possible tetrapeptides using the twenty natural amino acids (204 = 160,000 peptides) in less than 2.8 seconds (57,204 peptides/sec.) on a Windows 10 PC with a 12-core/24-thread CPU (AMD Ryzen 9 3900X, 3.8 - 4.2 GHz). The main bottleneck is the storage system, which must be extremely fast to handle the high throughput. Therefore, if you plan to build very large peptide libraries, a high-end disk system is essential.

 

3. Installation

PeptBuilder is a command-line tool, which is included in both VEGA ZZ and VEGA packages:

 

4. Usage

If you run this utility by command prompt without arguments, the program options are shown as here below:

PeptBuilder V1.0.0.7 - (c) 2020-2025, Alessandro Pedretti

Usage: PeptBuilder -o[OUTFILE] -f[OUTFORMAT] -w[OUTFILTER] -j[SOLVENT SHAPE SIZE]
                   -i[CLUSTMAXITER] -k[CLUSTNUM] -l[SEQ_FILE] -m[MAXSEQ] -n[MODE]
                   -p[THREADS] -r[VALUE] -s[SECSTRUCT] -t[PHI PSI OMEGA] -z[NTERM CTERM]
                   -a[ALLELEMASK] -c[CLUSTFILE] -d[DIR] -e[TEMPLATE] -x PATTERN

PATTERN supports standard amino acids:
 L- ACDEFGHIKLMNPQRSTVWY
 D- acdefGhiklmnpqrstvwy
 as well as non-natural ones. For the complete list see "Amino acid.csv" file in "Data\Rotamers"
 directory. 3-4 characters label are permitted by placing them between two parenthesis and can used
 together the single characyer labels, e.g.:
  (ALA)(CCS)V(PYZ1)

Amino acid patterns:
 (!NAT) or . -> L-, D- natural
 (!POS) or + -> L- positively charged (KHR)
 (!NEG) or - -> L- negatively charged (DE)
 (!ARO) or @ -> L- aromatic (FHWY)
 (!APL) or # -> L- apolar (ACFGILMPVWY)
 (!POL) or % -> L- polar (DEHKNQRST)
 (!SUL) or $ -> L- sulfured (CM)
 (!XXX) or X -> L- natural (ACDEFGHIKLMNPQRSTVWY)
 (!DXX) or x -> D- natural (acdefGhiklmnpqrstvwy)
 (!GLX) or Z -> L- GLX (EQ)
 (!DGL) or z -> D- glx (eq)
 (!ALL) -> L-, D natural and non-natural
 (!LAL) -> L- natural and non-natural
 (!DAL) -> D- natural and non-natural
 (!NNN) -> L-, D- non-natural
 (!LNN) -> L- non-natural
 (!DNN) -> D- non-natural

Options:
 a -> MHC class I allele mask for immunogenicity prediction (default: 1, 2, C-Term)
      Available alleles:
      H-2-Db   , H-2-Dd   , H-2-Kb   , H-2-Kd   , H-2-Kk   , H-2-Ld,
      HLA-A0101, HLA-A0201, HLA-A0202, HLA-A0203, HLA-A0206, HLA-A0211,
      HLA-A0301, HLA-A1101, HLA-A2301, HLA-A2402, HLA-A2601, HLA-A2902,
      HLA-A3001, HLA-A3002, HLA-A3101, HLA-A3201, HLA-A3301, HLA-A6801,
      HLA-A6802, HLA-A6901, HLA-B0702, HLA-B0801, HLA-B1501, HLA-B1502,
      HLA-B1801, HLA-B2705, HLA-B3501, HLA-B3901, HLA-B4001, HLA-B4002
      HLA-B4402, HLA-B4403, HLA-B4501, HLA-B4601, HLA-B5101, HLA-B5301
      HLA-B5401, HLA-B5701, HLA-B5801
 b -> Build the 3D structures and save them in the specified format:
      Alchemy, AMMP, Biosym, ChemSol, CIF, CML, CML2, CPMDXYZ, CRD,
      CRT, CSSR, Fasta, GAMESS, GaussIn, Gromos, GromosNm, IFF,
      InChI, InChIAux, InChIKey, Indigo, MdlMol, MdlMol3, mmCIF,
      Mol2, MopCar, MopInt, MSF, NamdBin, OldBiosym, PDB, PDB2,
      PDBQ, PDBA, PDBF, PDBL, PDBNOTSTD, PDBQT, PQR, PQRXML, PSFX,
      QMC, RIFF, SMILES, SpilloRBS, TEST, VINA, XYZ.
      (TEST builds the structures, but not saves them)
 c -> Save the clusters in csv format
 d -> Output directory of the 3D structures (default: current)
 e -> 3D backbone template file. The supported file formats are:
      Alchemy, AMMP, Arc, CAR, CHARMM CRD, CIF, CML, CML 2.0,
      CPMD XYZ, CRT, Chem3D, ChemDraw CDX, ChemSol, CSSR, EMPIRE,
      ESCHER NG, GAMESS, Gaussian In/Out, Gromacs/Gromos mol,
      HIN, IFF, MDL, MDL V3000, Mol2, Mopac, MSF, NAMD binary,
      PDB, PDBA, PDBF, PDBL, PDBQT, PQR, PQRXML, PSFX, QMC,
      Quanta CSR, RIFF, SDF, TINKER XYZ, XYZ.
 f -> Output file format (CSV, FASTA, TEXT, default CSV)
 j -> solvate the peptide:
      SOLVENT = solvent to use:
                ACETONE, AMMONIA, CCL4, CH2CL2, CHCL3, DMSO, ETHANOL,
                FORMALDEHYDE, KETOPROFEN_RACEMATE, METHANE, METHANOL,
                OCTANOL-WATER, POPC_CRYSTAL, POPC_LIQUID, WATER, WATER_100,
                WATER_200, WATER_35_MOPAC
      SHAPE   = BOX, BOXLAYER, LAYER and SPHERE
      SIZE    = box size (XxYxZ), box increment (value), layer
                thickness and sphere radius
 i -> Maximum number of iterations for cluster analysis (default 100)
 k -> Number of clusters (default 0 = no cluster analysys)
 l -> Load the sequences from file
 m -> Limit the maximum number of sequences to build (0 = unlimited,
      default 0)
 n -> Neutralization method:
      EXPLICIT = place the counterions (Na+ or Cl-) to neutralize
      IMPLICIT = scale all charges to 0
 o -> Output file (optional)
 p -> Number of threads (0 = all, default 0)
 r -> Use rotamer library when you build the 3D structures:
      - Positive integer number > 1 sets the maximum number
        of rotamers per residue
      - Positive floating point number > 0 and <= 1 sets the
        probability threshold of the rotamers
 s -> Secondary structure (optional)
      Residue-by-residue mode:
        H = Alpha helix
        L = Left-handed helix
        3 = 3.10 helix
        P = Pi helix
        E = Beta strand
        A = Anti-parallel beta strand
        B = Parallel beta strand
        U = Default dhiedral values
      Automatic mode: AUTO METHOD
        GORIV = use GOR IV for the prediction
 t -> Set the default values of PHI PSI and OMEGA torsions
      (default: -135.0, 135.0, 180.0)
 w -> Filter to save sequences (ALL, CLUSTCTRS, default ALL)
 x -> Exclude/remove metabolic unstable peptides
 z -> protection of N-term and C-term (default: H3N+ O-):
      NTERM: NONE, H3N+, HCONH, H3CCONH
      CTERM: NONE, O-, OH, OCH3, OC2H5, NH2

Examples:
 PeptBuilder -k 10 AGX+(SEP)
 PeptBuilder [+-]VS@[NQ](!LNN)
 PeptBuilder -b mol2 -s HHH -k 100 -w CLUSTCTRS -x XXX
 PeptBuilder -b pdb -s EEE -r 2 XXX
 

All parameters are optional with the exception of the the amino acid pattern (PATTERN). The meaning of the other parameters is summarized in the following table:

Option Argument Description
-a ALLELEMASK Predict the immunogenicity of the peptide. The prediction algorithm is based on that is available by Class I Immunogenicity on-line tool available at http://tools.iedb.org/immunogenicity/.
-b FORMAT Builds the 3D structures of the peptides and saves them in one of the formats supported by VEGA and VEGA ZZ.
-c CLUSTFILE Saves the clusters in CSV format.
-d DIR Output directory of the 3D structures (default: current).
-e TEMPLATE Peptide structure file, which is used as template for the backbone. It must be in one of the file formats supported by VEGA and VEGA ZZ.
-f OUTFORMAT Output format of the sequence. It can be: CSV (default), FASTA and TEXT.
-j SOLVENT SHAPE SIZE Solvates the peptide:
 
Value Description
SOLVENT Solvent to use: ACETONE, AMMONIA, CCL4, CH2CL2, CHCL3, DMSO, ETHANOL,
FORMALDEHYDE, KETOPROFEN_RACEMATE, METHANE, METHANOL,
OCTANOL-WATER, POPC_CRYSTAL, POPC_LIQUID, WATER, WATER_100,
WATER_200, WATER_35_MOPAC
SHAPE BOX, BOXLAYER, LAYER and SPHERE
SIZE It assumes different meanings according the shape type:
  • box size (XxYxZ) for BOX
  • box increment (floating point) for BOXLAYER
  • layer thickness (floating point) for LAYER
  • sphere radius (floating point) for SPHERE

 

-i CLUSTMAXITER Sets the maximum number of iterations allowed during cluster analysis using the k-means algorithm  (default 100).
-k CLUSTNUM Specifies the number of clusters to be created by the k-means clustering algorithm
-l SEQ_FILE Sequences are loaded from a file rather than being provided via the command line.
-m MAXSEQ Limits the maximum number of sequences to build (0 = unlimited). The default value is 0.
-n MODEL_NAME Name of the model. By default, it is the input file name without path and extension.
-o MODE When you build the 3D structures, you can neutralize the peptides explicitly (EXPLICIT keyword), placing the counterions (Na+ or Cl-) or implicitly, scaling the total charge to 0.
-p THREADS Sets the number of threads for the calculation. The default value is 0 (= all possible threads).
-r VALUE Use the rotamer library to add the side chains of the amino acids. The value is used to select the rotamers, which  are used to search for the lowest energy structure. In particular, a positive integer number greater than 1 sets the maximum number of rotamers per residue. A positive floating point number greater than 0 and less or equal  to 1 sets the probability threshold of the rotamers.
-s SECSTRUCT By this option, you can specify the type of secondary structure of each residue according to these single-character definitions:
 
Character Secondary structure
H Alpha helix
L Left-handed helix
3 3.10 helix
P Pi helix
E Beta strand
A Anti-parallel beta strand
B Parallel beta strand
U Default dihedral values

 

-t PHI PSI OMEGA Sets the default Phi, Psi and Omega dihedral angles. The default values are respectively -135.0, 135.0 and 180.0, which correspond to a beta-sheet secondary structure.
-w - Filters the sequences to save according these parameters:
  • ALL = saves all sequences
  • CLUSTCTRS = saves only the sequences closest to the cluster center
-x - Excludes metabolic unstable peptides.
-z NTERM CTERM Type of N-term and C-term capping (default: H3N+ O-):

NTERM: NONE, H3N+, HCONH, H3CCONH
CTERM: NONE, O-, OH, OCH3, OC2H5, NH2

*All options are case-insensitive.

 

4.3.1 Command line examples

Here are some examples to clarify the use of Tree2C:

 

17. History

 

18. Copyright and disclaimers

All trademarks and software directly or indirectly referred in this document, are copyrighted from legal owners. PeptBuider is a freeware program and can be spread through Internet, BBS, CD-ROM and other electronic formats. The Author of this program accepts no responsibility for hardware/software damages resulting from the use of this package. No warranty is made about the software or its performance. Use and copying of this software and the preparation of derivative works based on this software are permitted, so long as the following conditions are met:

   

PeptBuilder
is a software developed in 2025
by Alessandro Pedretti
All rights reserved.

Alessandro Pedretti
Dipartimento di Scienze Farmaceutiche
Università degli Studi di Milano
Via Luigi Mangiagalli, 25
I-20133 Milano - Italy
Tel. +39 02 503 19332
Fax. +39 02 503 19359
E-Mail: info@vegazz.net
WWW: http://www.vegazz.net