Vaxign: Vaccine Design

Table of ROC Areas Under Curve (AUCs)

To evaluate the performance of the Vaxitope epitope prediction program in Vaxign, the Areas Under Curve (AUCs) of the ROC analysis were calculated using individual allele specific PSSMs. The positive and negative testing dataset was obtained from IEDB. Specific positive alelles were used to calculate the True Positive Rates, and the negative alleles were used to calculate the False Postive Rates. A leave-one-out approach was applied to test if a known epitope can be predicted on the condition that this epitope is excluded in initial generation of PSSMs.

---------------------------

Sept 18-27, 2012: Using new training data from IEDB, we have updated our Vaxitope program.

Training Data: The training data was downloaded from IEDB on September 18, 2012, and is available HERE. This new training data was obtained by extracting the IEDB CSV data file on September 18, 2012.

Results: The AUC, specificities, and sensitivities of our prediction for each MHC I allele are available HERE and for MHC II allele are available HERE.

Notes: The differences between this version and previous version:

We used the newest training data collected from IEDB. The following describes how we use the training data:
For each MHC Class I or II allele, the treatment of the training data: If there are over 200 positive high training data items, we will use only these "positive high" data and ignore those "positive" data. If we do not have over 200 peptides labeled as "positive high" in the IEDB database, we will also include the peptides labeled as "positive" in the IEDB database.
We do not any IEDB peptides labeled as "positive intermeidate" for our program training.
For any allele that has less than 50 peptides labeled as "positive" or "positive high" in the IEDB database, we do not include the allele in our prediction website.
The rationale for the above training data selection is: we can make very reliable prediction of MHC class I or II epitopes if we have over 200 peptides as training data. If not, we would like to have at least 50 positive peptides as training data for a sound prediction.

Notes: The differences between MHC Class I and II epitope prediction:

For MHC Class I allele epitope prediction: We match the length, i.e., we can predict epitopes for various lengths (e.g., 9, 10, 11, ...) if we have sufficient traning data.
For MHC Class II allele prediction: we always predict 9-mer amino acid epitopes. However, we do use all the training data with different lengths from IEDB. The selection of the positive training data for MHC class II follows the same procedure as described above.

---------------------------

2009: The early calculation of AUCs occurred in 2009. Below are the results calculated based on IEDB training data provided.

Training Data: The training data was downloaded from the IEDB dataste site (accessed in 2009), and is available HERE.

Results: The AUC, specificities, and sensitivities of our prediction for each MHC allele are available HERE.

This table below includes results for those MHC alleles that we have randomly chosen for AUC calculation, but it does not include all MHC alleles available in the Vaxign database. The results were obtained using the training data downloaded from IEDB in 2009.

More details about the the method and results are available in our original Vaxign paper.