Comparison and scaling

Based on these datasets, a number of computational models could be developed to address different problems. Some models may focus on distinguishing MHC binding peptides from non-binding peptides, while some others were used to predict the binding affinity between MHC molecules and peptides. Thus different strategies should be employed to evaluate their performance.

For the assessment of classification accuracy, the area under the ROC curve (AROC) could be used. This curve is a plot of the true positive rate TP/(TP+FN) on the vertical axis vs false positive rate FP/(TN+FP) on the horizontal axis for the complete range of the decision thresholds. The values AROC>=0.9 indicate excellent, 0.9>AROC>=0.8 indicate good, 0.8>AROC>=0.7 indicate marginal and 0.7>AROC indicate poor predictions.

To assess the accuracy of binding affinity predictions, the Pearson correlation coefficient could be used:

where and are experimental individual and average affinities; and are average peptide predictions

 

In order to enable visual inspection of prediction comparison, it would be helpful to scale all the data to a common scale, e.g., 0-100, using linear transofrmation:

where is the scaled value, ymin is the minimum and ymax is the maximum value.