Skip to content

Release for CHART annotation tools used for ICDAR CHART 2019 competition

License

Notifications You must be signed in to change notification settings

benetech/ChartInfo_annotation_tools

 
 

Repository files navigation

Chart Infographics - Tools for chart annotations

The official relase of the tools developed at the University at Buffalo which have been previously used for annotation of charts at ICDAR 2019 CHART-Info competition (https://chartinfo.github.io/). This code is

Note. This is a sofware in early stage development. It is distributed without any warranties (as is), under a GNU public license version 3.0. A copy of the license is included in the package.

This tool was tested and intended for usage with Python 3.6+

Main Library Requirements:

  • Pygame
  • OpenCV
  • Numpy
  • Scipy
  • Shapely
  • PyTesseract

Main Annotation Tool

The main annotation tool, chart_annotator.py, allows to annotate chart images. Currently, only bar charts, line charts, and box plots are supported for full annotation. Other images can still be annotated at the level of panels (for multi-panel figures), chart classes (including other chart types currently not supported for data annotation), and text annotation (for chart images and other figures).

Usage:

 python chart_annotator.py config [small/large]

Where:

  • config: Path to the Configuration File
  • size: Window Size (normal by default)
    • small: for smaller window
    • large: for larger window

Example:

python chart_annotator.py config.txt large

Note. The pygame window created by the annotation tool is sensitive to the local OS font scaling settings used for increased readability in high resolution displays. For windows, we recommend adding python.exe to the list of exceptions of these settings to avoid any scaling issues.

Note. A parameter can be added or removed to the config file in order to enable the Administrator mode. Note that only administrators can validate the annotations.

For administrators or supervisors of the annotation process use:

ENABLE_ADMIN_MODE = 1

For the rest of the annotators, just delete this line or set it to 0 to disable it.

ENABLE_ADMIN_MODE = 0      

Chart Annotation Stats tool

An overview of the annotation process status can be obtained using the chart_stats.py program. This tool allows to check how many images have been annotated per class per stage.

Usage:

python chart_stats.py config

Where:

  • config: Path to the Configuration File

Example:

python chart_stats.py config.txt

Tool for batch-merging annotations

This tool allows merging two existing sets of annotations for a single set of images. The tool assumes one of the annotation sources represents the newer version while the second represents the original annotations. Instead of simply using every annotation from the source to overwrite the annotations at the destination, this tool attempts to determine if the source contains a more complete annotation of each file at the destination and will only overwrite the annotations that have an absolute higher completion status (e.g. number of tasks annotated/validated).

Usage:

python chart_update_annotations.py src_config dst_config

Where:

  • src_config: Source Configuration (newer annotations)
  • dst_config: Destination Configuration (annotations to update)

Example:

python chart_update_annotations.py config_bar_charts.txt config_all_charts.txt

Tool for exporting XML annotations to JSON format used by CHART-Info 2019

Our annotation tool uses a custom XML-based file format to store all the annotations. However, for the Competition on HArvesting Raw Tables from Infographics (CHART-Infographics) at ICDAR 2019 we use a different file format based on JSON which is oriented to provided only the information required as input and output of each task of the competition. A synthetic dataset was also liberated using this file format. In order to generate JSON annotations from the XML files generated by our tool, the chart_json_export.py program is provided. This allows the creation of new annotated datasets which can be further evaluated using the evaluation tools from this competition.

Usage:

python chart_json_export.py img_folder xml_folder json_folder task_num test_mode

Where:

  • img_folder: Path to Directory containing annotated images
  • xml_folder: Path to Directory containing XML based annotations
  • json_folder: Path to Directory where the generated JSON annotations will be stored.
  • task_num: Competition task to export. Note that each task includes the outputs from some of the previous tasks automatically.
  • test_mode: Determines if the JSON files are being generated for a testing dataset or not. Normally, the JSON files will contain both the input and outputs for the indicated task, but for testing datasets, only the inputs will be included for this task.

Example:

# This will generate a testing dataset for Task 3  
python chart_json_export.py data/images data/annotations data/task3_json 3 0

Update (July 28, 2020)

  • Extended, re-factored and improved JSON export
    • New validations added for Task 4
    • Added parsing-based generation of GT for Tasks 6a, 6b and 7 for:
      • Bar Charts
      • Box Charts
      • Line Charts
      • Scatter Charts
  • Added partial support for additional Chart types included in ICPR 2020 - CHART-Infographics
    • These can have annotationss of text, axes and legends but data annotation is not yet implemented
  • Added the "Auto-Check" function to help detect annotation errors
    • This feature is based on the same parsing functions used by the exporter
    • Full chart parsing based on user annotations is attempted
    • Errors found by the exporter are then reported to the annotator
    • Annotations now record if Auto-check has been used and if it was a Success or a Failure
    • The Chart Stats tools can be used to get a report on Auto-checks for a given set of charts

Update (July 7, 2020)

  • Minor improvements on Line Chart Data Annotation
    • For text regions containing data series names (e.g. from the legend):
    • Double click over them will make the tool go to the edition mode menu for that line.
  • Added Inverted color view modes.
  • Bug fix for Box plot annotator to handle properly charts without categorical values using a single unnamed default category

Update (July 3, 2020)

  • Minor improvements on Axis Annotation
    • Zoom-in shown on right side panel:
      • when setting or editing the axes bounding box
      • when setting the new position of a given tick
    • Ticks are now displayed on red by default if they do not have associated labels. Otherwise, they are shown in dark orange.
  • Minor improvements on Line Chart Data Annotation
    • It is now possible to swap data points between lines
    • For text regions containing line names (e.g. from the legend or from data mark labels):
      • Moving the mouse over them will highlight the corresponding line.
      • Double click over them will make the tool go to the edition mode menu for that line.
    • Double right-click can now be used to stop the "line point edition" mode.

Update (June 28, 2020)

  • Minor improvements to data annotations
    • In some cases, legend boxes will be automatically inferred from image
    • Reduced the number of intermediate zoom levels between 100% and 400%
    • Added heuristics to help annotate line and scatter charts by using right click on "Add Points" mode.
      • For Scatter Charts, the user can save time on charts containing solid colored marks by just placing the cursor over the data mark and then using right click, then the new point will be added on the centroid of current mark as shown on the right-side panel of the annotation tool.
      • For Line Charts, the user can save time on charts containing solid colored lines by placing the cursor close enough to the line and then using right click to add the suggested line point shown in green in the right side panel.

Update (June 17, 2020)

  • Major improvements to data annotations
    • Tools for Bar Charts and Box Charts now use draggable tools to quickly adjust bar/box parameters
    • Tool for Bar Charts now includes semi-automatic bar height adjustment which greatly simply annotations of bar chats using solid color bars.
    • Toos for Axes and Line Charts now use draggable tools to make adjusting points much easier/faster

Update (June 05, 2020)

  • Major improvements to text annotation including:
    • Improved Text Box Behavior
    • Copy/Paste Options
    • Two new text roles: Tick Grouping and Data Mark Label
  • Data Mark labels are now used as the next default name for data series on charts which have them but do not include legends
  • Other minor improvements on the UI

About

Release for CHART annotation tools used for ICDAR CHART 2019 competition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%