Comparison and scaling
Based on these datasets, a number of computational models could be developed to address different problems. Some models may focus on distinguishing MHC binding peptides from non-binding peptides, while some others were used to predict the binding affinity between MHC molecules and peptides. Thus different strategies should be employed to evaluate their performance.
For the assessment of classification accuracy, the area under the ROC curve (AROC) could be used. This curve is a plot of the true positive rate TP/(TP+FN) on the vertical axis vs false positive rate FP/(TN+FP) on the horizontal axis for the complete range of the decision thresholds. The values AROC>=0.9 indicate excellent, 0.9>AROC>=0.8 indicate good, 0.8>AROC>=0.7 indicate marginal and 0.7>AROC indicate poor predictions.
To assess the accuracy of binding affinity predictions, the Pearson correlation coefficient could be used:
where and
are
experimental individual and average affinities;
and
are average peptide
predictions
In order to enable visual inspection of prediction comparison, it would be helpful to scale all the data to a common scale, e.g., 0-100, using linear transofrmation:
where is
the scaled value, ymin is the minimum and ymax
is the maximum value.