利用TAL_OCR_MATH小学算数公式数据集进行微调相关问题 #14491
Unanswered
Bestboy125
asked this question in
Q&A
Replies: 1 comment 2 replies
-
从您的描述来看,微调的过程中,数学符号无法正确识别的原因可能有以下几点: 原因分析
解决方案根据上述可能的原因,您可以尝试以下方法来解决问题: 1. 检查字典和映射的正确性
2. 扩充数据集
3. 尝试重新训练模型
4. 调整微调配置
5. 优化损失函数
6. 检查训练日志和可视化
示例调整(针对配置文件)以下是针对您的 # 学习率调整
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001 # 提高学习率
warmup_epoch: 5
# 数据扩增
Train:
dataset:
transforms:
- RecConAug:
prob: 0.7 # 增加数据增强概率
image_shape: [48, 320, 3]
- RecAug:
aug_prob: 0.5
# 后处理参数
PostProcess:
name: CTCLabelDecode
ctc_beam_search: True # 启用 Beam Search 解码 总结
希望这些方法能帮助您解决问题! Response generated by feifei-bot | chatgpt-4o-latest |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
数据集格式如下:
其中诸如(,),x,÷,余号的数学符号,该数据集进行了字符的映射,我将这些映射直接作为GT和字典。数据量有3万条,识别模型微调后推理结果大部分数学符号都直接空过,只有数字以及-=+这些没有映射的符号能识别出来。
请问这种情况下,我是否应该重新训练模型而不是微调,还是说我的微调过程出了问题呢,
以下是我的字典和识别模型训练集的标签
dict.txt
rec_gt_train.txt
以下是train的配置文件:
Global:
debug: false
use_gpu: true
epoch_num: 200
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/no_f_math_paddle_v4
save_epoch_step: 10
eval_batch_step: [0, 2000]
cal_metric_during_train: true
pretrained_model: /opt/data/private/envs/paddle_ocr/ch_PP-OCRv4_rec_train/student.pdparams
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: /opt/data/private/envs/paddle_ocr/PaddleOCR/dict.txt
max_text_length: &max_text_length 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3.txt
d2s_train_image_shape: [3, 48, 320]
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.0001
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: rec
algorithm: SVTR_LCNet
Transform:
Backbone:
name: PPLCNetV3
scale: 0.95
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 120
depth: 2
hidden_dims: 120
kernel_size: [1, 3]
use_guide: True
Head:
fc_decay: 0.00001
- NRTRHead:
nrtr_dim: 384
max_text_length: *max_text_length
Loss:
name: MultiLoss
loss_config_list:
- CTCLoss:
- NRTRLoss:
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
Train:
dataset:
name: MultiScaleDataSet
ds_width: false
data_dir: ./train_data/rec
ext_op_transform_idx: 1
label_file_list:
- /opt/data/private/envs/paddle_ocr/PaddleOCR/train_data/rec/rec_gt_train_no_f.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
max_text_length: *max_text_length
- RecAug:
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
sampler:
name: MultiScaleSampler
scales: [[320, 32], [320, 48], [320, 64]]
first_bs: &bs 192
fix_bs: false
divided_factor: [8, 16] # w, h
is_training: True
loader:
shuffle: true
batch_size_per_card: *bs
drop_last: true
num_workers: 8
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data
label_file_list:
- /opt/data/private/envs/paddle_ocr/PaddleOCR/train_data/rec/rec_gt_train_no_f.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 128
num_workers: 4
Beta Was this translation helpful? Give feedback.
All reactions