ShenLab Webtools

Change Log

v0.0.3

  • Update MAGPIE task query page.

v0.0.2

  • Add other prediction tools results.
  • Add prediction results for variants in GRCh37/hg19 assembly.

v0.0.1

  • Add pathogenicity classification.
  • Add ClinVar VUS prediction results.

Todo List

  • Add MAGPIE score for all possible exonic variants as soon as possible. Prediction is currently in progress.


Introduction

The model was trained to predict pathogenic scores of multi-type variants and included three steps. First, candidate variants were annotated with high-dimensional features covering 6 different modalities. Second, automatic feature engineering and separated feature selection were undertaken step by step. Finally, a gradient boosting method with controllable tuning was implemented to train the model and obtain predictions for the pathogenicity of variants.

Workflow of MAGPIE
Preview
Fig. 1 | The model was trained to predict pathogenic scores of multi-type variants and included three steps. First, candidate variants were annotated with high-dimensional features covering 6 different modalities. Second, automatic feature engineering and separated feature selection were undertaken step by step. Finally, a gradient boosting method with controllable tuning was implemented to train the model and obtain predictions for the pathogenicity of variants.


Threshold Test
Preview
Fig. 2 | A Threshold evaluation was conducted on a balanced independent test set. Line charts show the Matthews correlation coefficient (MCC), accuracy, precision, recall, F-beta score, and geometric mean (G-mean) with different thresholds. B A threshold evaluation on an imbalanced orthogonal validation set. Line charts show the Matthews correlation coefficient (MCC), accuracy, precision, recall, F-beta score, and geometric mean (G-mean) with different thresholds.

In the imbalanced dataset, we suggest a higher threshold to achieve a more accurate prediction. However, if there is no clear estimate of the compositions of the dataset, MAGPIE with the default parameter still outperformed other methods and achieved accurate prediction on both imbalanced and balanced datasets as shown before.