[ Paper ] |
[ Website ] |
---|
Teaser Video (Click to view) |
---|
To make the code run, install the necessary libraries preferably using conda or else pip environment manager.
conda create -n seamformer python=3.7.11
conda activate seamformer
pip install -r requirements.txt
Overall Two-stage Architecture: Stage-1 generated binarised output with just text content along with a scribble map. Stage-2 uses these two intermediate outputs to generate Seams and finally the required text-line segmentation.
Stage - 1: Uses Encoder-Decoder based multi-task vision transformer to generate binarisation result in one branch and scribble(strike-through lines) in another branch.
Stage - 2: Uses binarisation and scribble output from previous stage to create custom energy map for Seam generation. Using which final text-line segments are produced
The SeamFormer is split into two parts:
- Stage-1: Binarisation and Scribble Generation [ Requires Training ]
- Stage-2: Seam generation and final segmentation prediction [ No Training ]
To train the model dataset should be in a folder following the hierarchy:
In case of references to datacode , it is simply a codeword for dataset name .
For example , Sundanese Manuscripts in short form is known as SD
.
├── DATASET
│ ├── <DATASET>_Train
│ │ ├── images/
│ │ ├── binaryImages/
│ │ ├── <DATASET>_TRAIN.json
│ ├── <Dataset>_Test
│ │ ├── images/
│ │ ├── binaryImages/
│ │ ├── <DATASET>_TEST.json
│
├── ...
For each experiment, internal parameters have been extracted to an external configuration JSON file. To modify values for your experiment, please do so here. For quick reference , check out Sample_Exp_Configuration.
Parameters | Description | Default Value |
---|---|---|
dataset_code | Codename for dataset | I2 |
data_path | Dataset Folder | /data/ |
model_weights_path | Location to store trained weights | /weights/ |
visualisation_folder | Folder path to store visualisation results | /vis_results/ |
learning_rate | Initial learning rate of optimizer (scheduler applied) | 0.005-0.0009 |
weight_logging_interval | Epoch interval to store weights, i.e 3 -> Store weight every 3 epoch | 3 |
img_size | ViT input size | 256 x 256 |
patch_size | ViT patch size | 8 x 8 |
encoder_layers | Number of encoder layers in stage-1 multi-task transformer | 6 |
encoder_heads | Number of heads in MHSA | 8 |
encoder_dims | Dimension of token in encoder | 768 |
batch_size | Batch size for training | 4 |
num_epochs | Total epochs for training | 30 |
mode | Flag to train or test. Either use "train"/"test" | "train" |
train_scribble | Enables scribble branch train | false |
train_binary | Enables binary branch train | true |
pretrained_weights_path | Path location for pretrained weights(either for scribble/binarisation) | /weights/ |
enableWandb | Enable it if you have wandB account, else the results are stored locally in visualisation_folder |
false |
wid | WandB experiment Name (optional) | I2_V0_Train_lr_0.001 |
Stage 1 comprises of a multi-task vision transformer for binarisation and scribble generation. You can refer to our sample Sundanese dataset JSON here.
[
{"imgPath": "./ICDARTrain/SD_DATA/SD_TRAIN/images/palm_leaf_1.jpg",
"imgDims": [2000,1000],
"gdPolygons": [[11,20]..[13,25],....[[101,111]..[1121,2111]]],
},
...
{"imgPath": "./ICDARTrain/SD_DATA/SD_TRAIN/images/palm_leaf_2.jpg",
"imgDims": [1111,7777],
"gdPolygons": [[77,21]..[11,21],....[[222,233]..[1121,1111]]],
...
},
]
The Stage I architecture of the SeamFormer pipeline is dependent on image patches (default : 256 x 256 pixels). Therefore, by providing the path folder and relevant parameters, the following script arranges the patch data within their corresponding folders. For the provided polygon/box annotations within datapreparation.py
we have generate the scribbles.
Note : The argument binaryFolderPath
is optional , and in case if your dataset does not have a binarisation ground truth , it will rely Sauvola-Niblack technique to create the binarisation images.
python datapreparation.py \
--datafolder './data/' \
--outputfolderPath './SD_patches' \
--inputjsonPath './data/ICDARTrain/SD_DATA/SD_TRAIN/SD_TRAIN.json' \
--binaryFolderPath './data/ICDARTrain/SD_DATA/SD_TRAIN/binaryImages'
For the SeamFormer pipeline , we first start out by training the encoder and binarisation branch (while freezing scribble branch ) . To start the process , you can optionally initialise with DocENTR's pretrained weights.
python train.py --exp_json_path 'Sample_Exp_Configuration.json' --mode 'train' --train_binary
After every epoch , we perform validation and we store the train loss , average PSNR and few randomly selected image patches along with their ground truth in visualisation_folder
. If enableWandB
is configured to be 'true' , then they automatically get synced to corresponding WandB account's dashboard and will be tracked across experiment runs. Additionally , you can override the parameter by specifying --wandb
flag while executing the above command.
For training of the binarisation branch , we initialise the branch weights with the prior binary branch weights for better text localisation.
python train.py --exp_json_path 'Sample_Exp_Configuration.json' --mode 'train' --train_scribble
For leveraging Stage II - Scribble conditioned seam generation independantly , we need to provide set of image path , binary image path and the corresponding scribbles as input . If these are avaiable in a JSON file , then the following command can be executed :
python3 seam_conditioned_scribble_generation.py --jsonPath 'XYZ.json' --outputjsonPath './OUT_XYZ.json'
The script will return the predicted text lines and store the results in the configured outputjsonPath.Please note that internal parameters like alpha , beta and gamma are configured to the optimal value.
Download our existing modelcheckpoints for SeamFormer network via the following commands , additionally you have to override pretrained_weights_path
in experiment configuration file accordingly.
pip install gdown
For Indiscapes2 Dataset Checkpoint
gdown 1O_CtJToNUPrQzbMN38FsOJwEdxCDXqHh
For Balinese/Sundanese/Khmer Checkpoint
gdown 1nro1UjYRSlMIaYUwkMTrfZzrE_kz0QDF
Alternatively , you can also run the downloadWeights.sh
file for one shot download of all the various dataset
pretrained weights . Please configure the destination path inside the bash script.
bash downloadWeights.sh
For our pipeline infrence, we have provided two options for our users : via an input json file & input image folder path . In the former case , we expect details of imgPath of the test samples present in the JSON File . Please note , you will have to enable flag input_json
or input_folder
accordingly.
Case I : Via JSON File
python3 inference.py --exp_name "v0" --input_image_json 'test.json' --output_image_folder './output' --model_weights_path 'BKS.pt' --input_json
Case I : Via Image Folder Path
python3 inference.py --exp_name "v0" --input_image_folder './test/images/' --output_image_folder './output' --model_weights_path 'BKS.pt' --input_folder
Please note , by default we store all the visualisations - binary image , raw scribble image and scribble overlaid images in the sub-directories of visualisation_folder
, you can turn it off via 'vis
flag.
For leveraging SeamFormer for your custom dataset , these points can be useful in deciding the parameters and model checkpoint that would be optimal for you .
- Parameters that are to carefully configured.
- In
datapreparation.py
- THICKNESS - This parameter defines the thickness of scribble ground truth. Reduce this as needed if the predicted scribbles are so thick so that two scribbles merge into one.
- OVERLAP - If you think you have fewer number of images in your dataset , increase overlap to 0.50 or 0.75 so that you get more training patches.
- In
- Choose learning rate and a finetuning strategy (refer topic 'When and how to fine-tune' in CS231n Notes) based on available data at hand and its closeness to pretrained data.
- You can choose to unfreeze decoder(for binariser/scribble branch) alone and train decoder alone. Additionally , you can choose to unfreeze few layers of the decoders as well . Or you can choose to unfreeze both decoder and encoder for binarisation, but it is preferred to always freeze encoder during scribble generation and only finetune its decoder.
- Freezing and unfreezing parameters can be configured in
builModel()
intrain.py
using the commandparam.requires_grad = False
appropriately- By default
- During Binarisation: Scribble decoder branch's weights is freezed
- During Scribble Generation: Binary branch's decoder and also the encoder is freezed.
- By default
- Configuration of GPU that we used and typical training time to achieve these results :
- 1 NVIDIA GeForce GTX 1080 Ti GPU, 12 GB of GDDR5X VRAM , 20 CPUs.
- Refer sample training setup that we used for Sundanese Dataset here.
Attached is a collated diagram , starting top (clockwise ) from Bhoomi , Penn-In-Hand (PIH) , Khmer Palm Leaf Manuscript and Jain Manuscript . Of particular significance is the intrinsic precision exhibited by the predicted polygons depicted within, handling the presence of considerable image degradation, a complex multi-page layout, and an elevated aspect ratio, etc.
Please use the following BibTeX entry for citation .
@inproceedings{vadlamudiniharikaSF,
title = {SeamFormer: High Precision Text Line Segmentation for Handwritten Documents},
author = {Vadlamudi,Niharika and Rahul,Krishna and Sarvadevabhatla, Ravi Kiran},
booktitle = {International Conference on Document Analysis and Recognition,
{ICDAR}},
year = {2023},
}
For any queries, please contact Dr. Ravi Kiran Sarvadevabhatla
This project is open sourced under MIT License.