Accuracy is low on the testing set (split from the same distributation of the training set) but is high when conducting K-Fold validation on the training set. #1895

freshn · 2024-05-11T00:08:48Z

freshn
May 11, 2024

Thanks to KFoldDataset and tools/kfold-cross-valid.py, we can now do the k-fold validation easily. However, I found a weird problem when evaluating the network on my own dataset. The dataset stores the training set in the folder 'train' and the testing set in the folder 'test'. The annotations of samples in both sets are available. The training set and testing are split out from the same original dataset and could be assumed to have the same distribution.

Now I want to train and test a model, e.g. a ResNet 50, on the dataset. I evaluate two strategies:

train with 'train'; test with 'test'.
Do 5-fold validation with 'train' (split the train into 5 folds, then train with any of 4 folds and test on the rest 1 fold)

For strategy 1, run the command:
python tools/train.py configs/resnet/my_config.py --auto-scale-lr
The result is like this:

Epoch(train) [100][100/227]  lr: 1.2500e-05  eta: 0:00:43  time: 0.3435  data_time: 0.0007  memory: 8052  loss: 0.0046
INFO - Epoch(train) [100][200/227]  lr: 1.2500e-05  eta: 0:00:09  time: 0.3436  data_time: 0.0007  memory: 8052  loss: 0.0058
INFO - Saving checkpoint at 100 epochs
INFO - Epoch(val) [100][40/40]    accuracy/top1: 34.5571  accuracy/top3: 73.9824  single-label/precision: 24.9772  single-label/recall: 23.9435  single-label/f1-score: 23.8994  spec: 11.4134  data_time: 0.0090  time: 0.1106

For strategy 2, run the command:
python tools/kfold-cross-valid.py configs/resnet/my_config.py --num-splits 5 --auto-scale-lr
The result is like this:

Epoch(train) [100][100/182]  lr: 1.2500e-05  eta: 0:00:28  time: 0.3404  data_time: 0.0004  memory: 8053  loss: 0.0027
INFO - Exp name: resnet50_8xb32_spinal_all_20230514_032355
INFO - Saving checkpoint at 100 epochs
INFO - Epoch(val) [100][46/46]    accuracy/top1: 89.2562  accuracy/top3: 99.0358  single-label/precision: 90.5743  single-label/recall: 87.7679  single-label/f1-score: 89.0269  data_time: 0.0078  time: 0.1091

From my understanding, the accuracy of those two experiments should not have a large gap between the accuracy like the one in my case (34.56 against 89.26). Does anyone have any idea?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accuracy is low on the testing set (split from the same distributation of the training set) but is high when conducting K-Fold validation on the training set. #1895

{{title}}

Replies: 0 comments

Select a reply

Accuracy is low on the testing set (split from the same distributation of the training set) but is high when conducting K-Fold validation on the training set. #1895

freshn May 11, 2024

Replies: 0 comments

freshn
May 11, 2024