Skip to main content

Table 9 Strengths and limitations of commonly-used models

From: Artificial intelligence-driven radiomics study in cancer: the role of feature engineering and modeling

Type

Method

Strengths

Limitations

ML

PCA

It remains most of the main information and has simple calculation process

It would lose some important information and the Interpretation is poor

 

mRMR

It is suitable for handling multiple classification tasks

The correlation between feature crosses and target variable is ignored

 

LASSO

It is a good solution for solving multicollinearity problems, and the results are easy to interpret

It tends to select one of a set of highly correlated features

 

CV

It can evaluate the model more reasonably and accurately and obtain more useful information from limited data

The computation is increased

 

SMOTE

The overfitting problem of simple over-sampling is overcome

It requires repeated adjustment of important parameters

 

LR

It has low computation cost, fast computation speed, and is easy to understand and implement

It only handles binary classification tasks and is easy to underfit

 

SVM

It can solve high-dimensional problems and has strong generalization ability

It can only handle binary classification tasks (conventional SVM) and the efficiency of training large sample is low

 

KNN

It is suitable for nonlinear classification, and has high Acc

It requires a lot of memory, and when the sample is imbalanced, the deviation of prediction is large

 

DT

It can be analyzed visually, and the running speed is fast

It is easy to overfit and overlook the correlation of attributes in a dataset

 

RF

It is suitable for handling high dimensional data, and the ability to adapt to datasets is strong

It is not good at dealing with low dimensional data, and it is much slower than DT

 

Cox regression model

It has great flexibility and no requirement on data distribution

The best fitting effect for each data may not be achieved

 

Naïve Bayes

It is easy to understand the interpretation of the results, and performs well on small datasets

It is sensitive to the form of input data

DL

3D-CNN

It is easy to handle high-dimensional data, and the feature extraction process is automatic

It is difficult to interpret results and lots of valuable information may be lost

 

ANN

It has high classification Acc and strong robustness and fault tolerance

It is difficult to interpret results and requires a lot of parameters

SM

t-test

It is easy to explain, has strong robustness and can control individual difference well

It can not be used for multiple comparisons, only to compare whether the difference between the two averages is significant

 

Mann–Whitney U test

There is no requirement for data distribution

When the data conforms to normal distribution and the variance is homogeneous, the test efficiency is lower than the t-test efficiency

 

Spearman correlation analysis

It is suitable for nonlinear relations and continuous and discrete datasets

It is less efficient than Pearson correlation coefficient

 

Kaplan–Meier analysis

It provides a variety of test methods, and is easy to implement

It can only perform univariate analysis

 

Log-rank test

It analyzes the data in combination with all time points

It requires meeting equal proportional risk assumptions and only performs univariate analysis

 

Fisher’s exact test

It is suitable for small samples and can accurately calculate the significance of deviations from the null hypothesis

It can only applicable to sample size n < 40 or theoretical frequency T < 1

 

Chi-square test

It is convenient, concise, and widely used

It is more complex than t-test and the test efficiency is lower than t-test efficiency

  1. ML machine learning, SM statistical method, DL deep learning, PCA principal component analysis, mRMR maximum relevance minimum redundancy, LASSO least absolute shrinkage and selection operator, CV cross validation, SMOTE synthetic minority over-sampling technique, LR logistic regression, SVM support vector machine, KNN K-nearest neighbors, DT decision tree, RF random forest, CNN convolutional neural network, ANN artificial neural network, Acc accuracy