- pull request #2 of Transkribus
- pull request #1 of Transkribus
- bugfix T2I: also ignore small regions in T2I, which are also ignored in HTR-jobs
- make parent 1.8
- bugfix T2I: use all transcripts of pageXML (instead of only special T2I-regions) when no transcripts are given
- feature LA: if regions overlap, baselines are will not be separated or deleted.
- feature HTR(x): If the process gets as property "use_lr" = "true", a language resource in the HTR folder is assumed. If "use_lr" = , this language resource is taken.
- better error handling/notification/logging for OutOfMemoryException
- feature T2I: if no file to a text file is given, method utilizes the special reference text lines in XML-file
- feature T2I: new property: T2I_IGNORE_LB = "ig_lb": if "true", line breaks are substituted by spaces.
- feature LA: leaves special reference text lines in XML-file.
- feature LA: if property LA_DELETESCHEME = "la_deletescheme" is set to DEL_ALL = "all", also the special reference text lines are deleted
- bugfix for HTR+ ignore very small line regions
- bugfix for HTR+ delete old *.pb in export folder
- bugfix for HTR+ base model training
- needs tf_htsr version 3.0.5
- training snippets are greyscale to reduce disc storage
- make planet and private dependencies 'provided'
- make dependent on new tokenizer
- bugfix in Text2Image is used
- bugfix format characters in transcripts: ignore lines and characters, when they are from an unknown category and report to observer.
- bugfix T2I: do not delete folder, if input and ouput is the same
- bugfix B2P: ignores text lines while B2P, if they have invalid baselines.
- bugfix training: charmap cannot be in htrOut
- bugfix language model: exception for empty transcript caught and reported via ErrorNotification
- feature: slightly better preproc
- bugfix training: shuffle training list so that order is "randomly"
- bugfix surrogates: if a line contains surrogates, it is ignored in training and validation
- bugfix empty validation: If a validation set is given, but no lines are in, throw exception
- feature notification: errors while creating training/validation data are reported via notification
- bugfix LA Advanced: when region is given and rotated text lines are found, lines will now not rotated out of image
- update to newer versions for bugfix in planets sources
- bugfix training of HTR+ with base model
- faster TextAlignment and base technology moved to CITlabTextAlignment
- HTR+ can be trained further - also with changed CharMap
- bugfix load dictionary: now also possible for ";"-seperator with headlines
- feature-request: logging in multi-process-use possible.
- newest versions of all citlab and planet libs
- bugfix fail-save Baseline2Polygon - 2 fallbacks for fails
- MOVE REPOSITORY WITHOUT HISTORY FROM Transkribs to CITlabRostock AND MAKE OPENSOURCE
- bugfix: check System.getenv() instead of System.getProperty() for $PYTHONPATH
- bugfix: do not set train_size_per_epoch instad of "-1"
- bugfix: add process listener to TrainHTRPlus to make status/process observable
- feature: language resource can be created when trainingdata are created (set property "create_lr"==>"true" or "path_to/file"
- bugfix: master-confict
- add new HTR+ which requires CUDA and Tensorflow
- switch from file-structure to folder-structure fo HTR
- switch from 1 big planet jar to planet artifacts
- feature request: make Filename of ConfMatContainer configurable via properties
- bugfix erronious calculation of coords for very small baselines
- patch ignore bidi control characters
- image type: indexed_byte is supported
- bugfix LA module (array out of bounds error)
- memory leak fixed?!
- fix concerning erroneous baselines
- reduced number of points for short baselines
- bugfix erroneous split of baseline in LA module
- adapted the scaling prior to LA module
- BaseLine2Polygon uses angle of baseline, not angle of region for calculation
- character \n are ignored when getting TextLine->TextEquiv->Unicode
- minor bugfixes
- bugfix #27: switch from CenterOfMass to Average
- better logging and statistic for Text2Image
- bugfix #27: line are sorted correctly
- make kws-group (?ABC) optional
- bugfix charmap for advanced ATR can be set, if charmap is equal
-
make dependent on TranskribusErrorRate 2.2.3
-
enhance KWS-quality
-
update advanced HTR
-
make KWS useable for large sets
- improve kWS (calculate maximum of PRE_KW_POST, PRE and POST group)
- improve GT-creation for KWS
- KWS improvement:
- Property "kws_upper" can be set to true
- Property "kws_expert" can be used to directly set regular expressions
- Property "kws_part" can be set to find keyword in words
- additional properties "kws_min_conf", "kws_threads" and "kws_max_anz" can be set
- bugfix #26: only throw exception when
- bugfix T2I: do not throw exception if no baseline for matching is found
- bugfix LA: resources available in jar
- bugfix B2P: if region too small, nullpointer can be handled
- add tests for advanced Layout Analysis
- update dependencies to PageXmlExtractor-0.3
- delete unsolved dependencies to log4j
- bugfix KWS is threadsave
- bugfix HTR: missing baselines are added using BaselineGeneartionHist
- update GetTrainFile to seperated train- and test-file
- bugfixes on creating traindata (traindata will be written into subfolders)
- save jar-version into metadate of PageXml-file
- bugfix KWS: missing lineID does not result in empty result
- adding advanced Layout Analysis
- adding advanced HTR (without training)
- make module dependent on trensorflow and GCOC
- Bugfix number of threads in T2I-Train-Workflow
- delete depricated T2I-Workflow
- do not allow B2P in T2I-Workflow
- delete old T2I-methods
- make LayoutAnalysisParser::process with PcGtsType public
- switch to java-version 1.8
- KWS improvements
- save LineID in ConfmatContainer
- use TRANSKRIBUS_HTR-properties to generate name for xml metadata
- Bugfix Bidi in Text2Image
- switch to new Interface in KWS
- Bugfix maxAnz KWS
- T2I is available with BIDI
- Hyphenation more is configurable via T2I_Hyphenation-Properties
- property "train_status" can be set to train only on pages with a specific statuses (e.g."GT;DONE")
- Semi-Supvervised training adds channels to Network, if channels are not there
- Accept all images, that are accepted by transkribus 1.3.*
- bugfix LA: LA works also for binary images
- making Text2Image available via Interfaces
- continue Text2Image
- add Test-Method de.uro.citlab.module.util.DictionaryTest to test external dictionaries
- make dependent on stable versions of TranskribusTokenizer and TranskribusErrorRate
- Integrate TranskribusErrorRate in WER-calculation while training
- Make HTR Training observable
- Cleanup of dictionary works
- cleanup (test-) resources - only corresponding planet_jar-x.x.x.jar is needed
- feature #22
- solve bugfix #23
- solve bugfix #20
- more accurate handlings of unicode characters
- starting of using changelog after this version