Note: Vaxign and Yongqun "Oliver" He have been cited in news: Reverse vaccinology on the cusp by Dan Jones. Nat Rev Drug Discov. 2012 Feb 10;11(3):175-6. doi: 10.1038/nrd3679. PMID: 22322255. Please click HERE to find more papers that have cited Vaxign.
In the post-genomic era, strategies of vaccine development have progressed dramatically from traditional Pasteur’s principles of isolating, inactivating and injecting the causative agent of an infectious disease, to reverse vaccinology that starts from bioinformatics analysis of the genome information. Vaxign is the first web-based vaccine design software program freely available for the purpose of facilitating reverse vaccinology. Here we provide some relevant documentation about Vaxign:
Edison Ong, Michael F Cooke, Anthony Huffman, Zuoshuang Xiang, Mei U Wong, Haihe Wang, Meenakshi Seetharaman, Ninotchka Valdez, Yongqun He, Vaxign2: the second generation of the first Web-based vaccine design program using reverse vaccinology and machine learning, Nucleic Acids Research, 2021 [Journal Link]
He Y, Xiang Z, Mobley HLT. Vaxign: the first web-based vaccine design program for reverse vaccinology and an application for vaccine development. Journal of Biomedicine and Biotechnology. Volume 2010 (2010), Article ID 297505, 15 pages. [PMID: 20671958] (Note: this paper introduces the Vaxign program and ause case of how to use Vaxign for uropathogenic E. coli vaccine target prediction. Please use this paper as your formal Vaxign citation.)
He Y, Xiang Z. Bioinformatics analysis of Brucella vaccines and vaccine targets using VIOLIN. Immunome Research. 2010 Sep 27;6 Suppl 1:S5. PMID: 20875156. PMCID: PMC2946783. (Note: this paper introduces ause case of how to use Vaxign for Brucella vaccine target prediction)
Papers that cited Vaxign through collaboration with the Vaxign team:
McNamara L, He Y, Yang Z. Using epitope predictions to evaluate efficacy and population coverage of the Mtb72f vaccine for tuberculosis. BMC Immunology. 2010 Mar 30;11(1):18. [PMID: 20353587] [Journal Link]
Ma J, He Y, Hu B, Luo ZQ. Genome sequence of an environmental isolate of the bacterial pathogen Legionella pneumophila. Genome Announc. 2013 Jun 27;1(3). PMID: 23792742. PMCID: PMC3675512.
More papers that have cited Vaxign. Please checkHERE.
Vaxign2 Pipeline for Vaccine Target Prediction:
Vaxign2 includes a pipeline of software programs to predict possible vaccine targets based on various vaccine design criteria using microbial genomic and protein sequences as input data. The predicted features in the Vaxign2 pipeline include antigen sublocation, adhesion, epitope binding to MHC class I and class II, and sequence similarities to human, mouse and/or pig proteins. This pipeline integrates both existing open source tools and an internally developed program (Vaxitope) with user-friendly web interfaces. Vaxign2 predicts vaccine targets based on protein sequences at a genome level or using individual protein sequences. This pipeline includes the following steps:
Subcellular localization: Surface-exposed proteins such as outer membrane proteins (esp. adhesins) and secreted proteins are usually ideal targets for vaccine developments. Non-surface proteins such as cytoplasmic/inner membrane proteins may not be good targets for vaccine development.
Topology and Transmembrane helices: It is very difficult to clone, express, and purify proteins with more than one transmembrane spanning region. Therefore, it might be better to ignore those proteins with multiple transmembrane spaaning regions in the first place.
Adhesin probability: Adhesins are often good vaccine targets.
Epitope prediction: This step predicts both MHC class I and class II binding epitopes using Vaxitope, an internally developed program.
Similarity to host genome sequences: A vaccine candidate with similar sequence to the host (e.g., human, mouse, pig) is likely to cause autoimmunity in the host.
Vaxign-ML Pipeline for machine learning-baesd Vaccine Target Prediction:
Vaxign-ML is a supervised machine learning classification to predict protective antigens. To identify the best machine learning method with optimized conditions, 5 machine learning algorithms (logistic regression, support vector machine, k-nearest neighbors, random forest, and extreme gradient boosting) were tested with biological and physiochemical features extracted from the Protegen database. Nested five-fold cross-validation and leave-one-pathogen-out validation were used to ensure unbiased performance assessment and the capability to predict vaccine candidates for a new emerging pathogen. The best performing model, Vaxign-ML (extreme gradient boosting trained on all Protegen data), was compared to three publicly available reverse vaccinology programs with a high-quality benchmark dataset, and showed superior performance in predicting protective antigens.
(Paper in preparation)
Vaxign-ML standalone version is available in Docker. Source code is avaiable in GitHub
Installation:(Docker version >=1.13.1, API version >=1.26)$ docker pull e4ong1031/vaxign-ml:v1.0$ wget https://raw.githubusercontent.com/VIOLINet/Vaxign-ML-docker/master/VaxignML.sh$ chmod a+x VaxignML.sh$ ./VaxignML.sh [INPUT_FASTA] [OUTPUT_DIRECTORY] [ORGANISM_TYPE](You may need root privilege to run docker commands)
Open-Source Software Programs/Databases used in Vaxign2 and their Licenses:
PSORTb: PSORTb (v.3.0.2) is probably the most precise bacterial localization prediction tool. PSORTb is under the GNU General Public Licence (GNU GPL).
SPAAN: Prediction of adhesins and adhesin-like proteins. According to the SPAAN publication, the SPAAN program is freely available.
BLAST: NCBI sequence similarity alignment and analysis program. BLAST is a USA NCBI program within the public domain.
IEDB: The Immune Epitope Database and Analysis Resource. The immune epitope data obtained from IEDB is used for training Vaxitope, our epitope prediction program. Please check the IEDB Term of Use.
XGBoost: An optimized distributed gradient boosting library implementing machine learning algorithms under the Gradient Boosting framework. XGBoost is licensed under the Apache 2.0 license.
Rationale, Parameters, and Options for Consideration and Filteration:
Our automatic Vaxign2 pipeline allows users to select and/or modify the following parameters:
Subcellular localization: Please select the localizations you wish to include. Default setting includes (1) Cell Wall, (2) Cytoplasmic, (3) Cytoplasmic Membrane, (4) Extracellular, (5) Out Membrane, (6) Periplasmic, (7) Unknown, and (8) Any Localization (default choice). Please see more details in the PSORTb help page.
Transmembrane helices: Please enter maximum number of transmembrane helices. Default value is 1 (link to TMHMM help page).
Adhesin probability: Please specify the minimum value of adhesin. Default value is 0.51 (Sachdeva et al.).
No similarity to human proteins: Check this option if you wish to exclude any protein that shows any similarity to a human protein.
No similarity to mouse proteins: Check this option if you wish to exclude any protein that shows any similarity to a mouse protein.
No similarity to pig proteins: Check this option if you wish to exclude any protein that shows any similarity to a pig protein.
Vaxign & Vaxign-ML Benchmark:
Benchmarking performance of Vaxign and Vaxign-ML comparing to other open-source reverse vaccinology tools
Vaxitop (previously named Vaxitope, now changed to Vaxitop to avoid name conflict) e is an MHC Class I and II binding epitope prediction tool developed in Dr. Yonggqun "Oliver" He's laboratory. Vaxitope is a position specific scoring matrice (PSSM)-based epitope prediction program. Vaxitop relies on statistical P value (instead of a percentage or top number) as the cutoff. Our studies indicate that the P value of 0.05 provides a cutoff with high and balanced sensitivity and specificity. Vaxitop also allows genome-wide query on different MHC host species. To evaluate the performance of Vaxitope, a receiver operating characteristic (ROC) curve was generated using HLA A*0201 specific PSSM. The result is shown below. The value of the Area Under the ROC Curve (AUC) of 0.929 for predicting the epitopes for the allele HLA A*0201 with the length of 9 amino acids. The positive and negative testing dataset was obtained from IEDB. The positive HLA A*0201 aelle epitopes were used to calculate the True Positive Rate (Sensitivity). The negative peptides for the allele with the same lenghth were used to calculate the False Postive Rate (1-Specificity). We have recently updated our program. The performance of our 2012 version has been improved compared to the original 2009 version (as shown below). In addition, we have installed IEDB MHC Class I and II epitope prediciton programs in our systems, and allow a user to compare our Vaxitop-predicted results with the results from the IEDB tools.
Peters B, Sidney J, Bourne P, Bui HH, Buus S, Doh G, Fleri W, Kronenberg M, Kubo R, Lund O, Nemazee D, Ponomarenko JV, Sathiamurthy M, S choenberger S, Stewart S, Surko P, Way S, Wilson S, Sette A. The immune epitope database and analysis resource: from vision to blueprint.PLoS Biol. 2005 Mar;3(3):e91. PMID: 15760272.
Jones D. Reverse vaccinology on the cusp. Nat Rev Drug Discov. 2012 Feb 10;11(3):175-6. doi: 10.1038/nrd3679. PMID: 22322255. (News and Analysis)
He Y, Rappuoli R, De Groot A, Chen RT. Emerging vaccine informatics. Journal of Biomedicine and Biotechnology. 2010 (2010), Article ID 218590, 26 pages. 2010;2010:218590. Epub 2011 Jun 15. PMID: 21772787.
SD Siadat, AS Salmani, MR Aghasadeghi. Brucellosis Vaccines: An Overview. In Book: Zoonosis, edited by Dr. Jacob Lorenzo-Morales. Publisher: InTech. Published online 04, April, 2012. ISBN 978-953-51-0479-7. (Book chapter)
Tomar N, De RK. Immunoinformatics: an integrated scenario. Immunology. 2010 Oct;131(2):153-68. doi: 10.1111/j.1365-2567.2010.03330.x. Epub 2010 Aug 16. Review. PMID: 20722763.
AntiJen: A database containing quantitative binding data for peptides binding to MHC Ligand, TCR-MHC Complexes, T Cell Epitope, TAP, B Cell Epitope molecules and immunological Protein-Protein interactions.