mlscorecheck: testing the consistency of machine learning performance scores
- Preliminaries
- Binary classification
- 1 testset with no k-fold
- 1 dataset with k-fold, mean-of-scores (MoS)
- 1 dataset with k-folds, score-of-means (SoM)
- n testsets without k-folding, SoM over the testsets
- n testsets without k-folding, MoS over the testsets
- n datasets with k-folds, SoM over datasets and SoM over folds
- n datasets with k-folds, MoS over datasets and SoM over folds
- n datasets with k-folds, MoS over datasets and MoS over folds
- Not knowing the mode of aggregation
- Not knowing the k-folding scheme
- Multiclass classification
- 1 testset, no k-fold, micro-averaging
- 1 testset, no k-fold, macro-averaging
- 1 dataset, known k-folds, score of means aggregation, micro-averaging
- 1 dataset, known k-folds, score of means aggregation, macro-averaging
- 1 dataset, known k-folds, mean of scores aggregation, micro-averaging
- 1 dataset, known k-folds, mean of scores aggregation, macro-averaging
- Regression