Type | Method | Strengths | Limitations |
---|---|---|---|
ML | PCA | It remains most of the main information and has simple calculation process | It would lose some important information and the Interpretation is poor |
 | mRMR | It is suitable for handling multiple classification tasks | The correlation between feature crosses and target variable is ignored |
 | LASSO | It is a good solution for solving multicollinearity problems, and the results are easy to interpret | It tends to select one of a set of highly correlated features |
 | CV | It can evaluate the model more reasonably and accurately and obtain more useful information from limited data | The computation is increased |
 | SMOTE | The overfitting problem of simple over-sampling is overcome | It requires repeated adjustment of important parameters |
 | LR | It has low computation cost, fast computation speed, and is easy to understand and implement | It only handles binary classification tasks and is easy to underfit |
 | SVM | It can solve high-dimensional problems and has strong generalization ability | It can only handle binary classification tasks (conventional SVM) and the efficiency of training large sample is low |
 | KNN | It is suitable for nonlinear classification, and has high Acc | It requires a lot of memory, and when the sample is imbalanced, the deviation of prediction is large |
 | DT | It can be analyzed visually, and the running speed is fast | It is easy to overfit and overlook the correlation of attributes in a dataset |
 | RF | It is suitable for handling high dimensional data, and the ability to adapt to datasets is strong | It is not good at dealing with low dimensional data, and it is much slower than DT |
 | Cox regression model | It has great flexibility and no requirement on data distribution | The best fitting effect for each data may not be achieved |
 | Naïve Bayes | It is easy to understand the interpretation of the results, and performs well on small datasets | It is sensitive to the form of input data |
DL | 3D-CNN | It is easy to handle high-dimensional data, and the feature extraction process is automatic | It is difficult to interpret results and lots of valuable information may be lost |
 | ANN | It has high classification Acc and strong robustness and fault tolerance | It is difficult to interpret results and requires a lot of parameters |
SM | t-test | It is easy to explain, has strong robustness and can control individual difference well | It can not be used for multiple comparisons, only to compare whether the difference between the two averages is significant |
 | Mann–Whitney U test | There is no requirement for data distribution | When the data conforms to normal distribution and the variance is homogeneous, the test efficiency is lower than the t-test efficiency |
 | Spearman correlation analysis | It is suitable for nonlinear relations and continuous and discrete datasets | It is less efficient than Pearson correlation coefficient |
 | Kaplan–Meier analysis | It provides a variety of test methods, and is easy to implement | It can only perform univariate analysis |
 | Log-rank test | It analyzes the data in combination with all time points | It requires meeting equal proportional risk assumptions and only performs univariate analysis |
 | Fisher’s exact test | It is suitable for small samples and can accurately calculate the significance of deviations from the null hypothesis | It can only applicable to sample size n < 40 or theoretical frequency T < 1 |
 | Chi-square test | It is convenient, concise, and widely used | It is more complex than t-test and the test efficiency is lower than t-test efficiency |