Skip to content

Releases: IBM/unitxt

1.5.1

18 Jan 14:59
Compare
Choose a tag to compare

What's Changed

Full Changelog: 1.5.0...1.5.1

Unitxt 1.5.0

18 Jan 13:57
Compare
Choose a tag to compare

What's Changed

  • Add Perplexity by @assaftibm in #442
  • Balance demos in fair_tos datasets by @yoavkatz in #473
  • postprocessors become extensions of FieldOperator by @dafnapension in #468
  • Initial UI by @Roni-Friedman in #475
  • introducing text at the top of loaders.py by @dafnapension in #474
  • Improve Unitxt UI and add support for multiple catalogs by @Roni-Friedman in #476
  • Improve ExtractMostCommonFieldValues performance by avoiding the creation of the stream-length long list of values instead building values counter on the fly by @dafnapension in #471
  • Add a new Perturbate operator that is handy for faking prediction as a perturbated version of target or for any other perturbation use cases by @dafnapension in #456
  • add a card for HF xsum, a summary dataset by @dafnapension in #479
  • Add QA, NER, Targeted sentiment and Generation tasks, Llama and Alpaca formats and instructions and whitespace augmentor by @matanor in #483
  • Cfpb product by @ilyashnil in #485
  • Improve catalog UI on documentation website by separating catalog objects to files and adding information per catalog item by @matanor in #461
  • Update QA templates by @matanor in #486
  • Length balancer docstring by @matanor in #487
  • Add process_instance function to every mutli stream operator for easier testing and usage at instance level by @elronbandel in #488
  • Add LRU caching for catalog artifact loading to minimize IO overhead and enhance performance by @elronbandel in #489
  • Make postprocessors a general operator that operates on 'prediction' and on 'references' which enables the use of every possible operator as postprocessor by @dafnapension in #484
  • Improve UI code presentation and organization by @Roni-Friedman in #491
  • Simplify ui launching with the console command unitxt-explore by @elronbandel in #492
  • Xmmlu template multilingual by @gitMichal in #493

New Contributors

Full Changelog: 1.4.6...1.5.0

Unitxt 1.4.6

11 Jan 16:01
Compare
Choose a tag to compare

What's Changed

  • Fix automatic dataset and metric uploads to Hugginface hub by @elronbandel in #466

Full Changelog: 1.4.5...1.4.6

Unitxt 1.4.4

11 Jan 15:40
Compare
Choose a tag to compare

What's Changed

  • Fix HuggingFace integration by changing logging.py to logging_utils.py to comply with hf dataset conventions by @elronbandel in #457
  • Add FilterByQuery and ExecuteQuery as simple operators receiving textual python query by @dafnapension in #449
  • New multi label template by @yoavkatz in #462
  • Delete old formats and make SystemFormat the only format exist by @elronbandel in #458

Full Changelog: 1.4.3...1.4.4

Unitxt 1.4.3

09 Jan 18:36
Compare
Choose a tag to compare

What's Changed

Full Changelog: 1.4.2...1.4.3

Unitxt 1.4.2

08 Jan 15:09
Compare
Choose a tag to compare

Enhancements

  • Added ability to calculate confidence interval for additional scores beyond the main_score by @assaftibm in #431
  • Improved error messaging in test cards and added option to disable via environment variable by @yoavkatz in #440
  • Added support for Japanese language in sacrebleu by @yoavkatz in #433
  • Introduced binary recall, precision metrics, and advanced filtering operators by @lilacheden in #424
  • Added text completion task and modified lm harness template by @perlitz in #429

Internal code changes

Bug fixes

Documentation

  • Documented release notes process by @matanor in #444
  • Added guidelines for updating Unitxt documentation by @matanor in #446

Non backward compatible changes

  • Implemented SystemFormat instance operator and updated BasicRecipe to use all operators, removing renderers and ICLFormat by @dafnapension in #423
    (this change can break any code using the deprecated ICLFormat)
  • Additional verification introduced, aiming for uniquely determined results by @dafnapension in #435

New Contributors

Full Changelog: 1.4.1...1.4.2

Unitxt 1.4.1

31 Dec 10:45
Compare
Choose a tag to compare

(Same as 1.4.0 - rereleasing due to release process error)

Enhancements

  • New random generation mechanism to remove dependency between different random generators @matanor in #414
  • New MultipleChoiceTemplate which changes all QA datasets by @elronbandel in #405
  • New MAP, MRR, and Retrieval@K metrics by @assaftibm in #422
  • New LoadFromKaggle loader which allows direct loading of datasets from Kaggle by @ilyashnil in #413
  • New StringContainment metric that if one of the references is contained in the prediction by @ellarabi in #394
  • New ConvertToBoolean post processor that changes the predict to either TRUE or FALSE by @ellarabi in #394
  • 15 new open source classification datasets by @ilyashnil in #410, #418
  • Documentation is now automatically generate in each release (#384)

Bug fixes

  • Fixed and improved error checking in multi label F1 by @yoavkatz in #390
  • Changed translations to use normalized_sacrebleu by @gitMichal in #426

Non backward compatible changes

  • MultipleChoice QA datasets need to move to new MultipleChoiceTemplates
  • Translation blue metric is now between 0-1 and not 0-100
  • New randomization mechanism may change selects selection of demos , randomized text augmentation, or any other random choice.

What's Changed

New Contributors

Full Changelog: 1.3.1...1.4.0

Uniitxt 1.4.0

31 Dec 10:39
30ba648
Compare
Choose a tag to compare

Enhancements

  • New random generation mechanism to remove dependency between different random generators @matanor in #414
  • New MultipleChoiceTemplate which changes all QA datasets by @elronbandel in #405
  • New MAP, MRR, and Retrieval@K metrics by @assaftibm in #422
  • New LoadFromKaggle loader which allows direct loading of datasets from Kaggle by @ilyashnil in #413
  • New StringContainment metric that if one of the references is contained in the prediction by @ellarabi in #394
  • New ConvertToBoolean post processor that changes the predict to either TRUE or FALSE by @ellarabi in #394
  • 15 new open source classification datasets by @ilyashnil in #410, #418
  • Documentation is now automatically generate in each release (#384)

Bug fixes

  • Fixed and improved error checking in multi label F1 by @yoavkatz in #390
  • Changed translations to use normalized_sacrebleu by @gitMichal in #426

Non backward compatible changes

  • MultipleChoice QA datasets need to move to new MultipleChoiceTemplates
  • Translation blue metric is now between 0-1 and not 0-100
  • New randomization mechanism may change selects selection of demos , randomized text augmentation, or any other random choice.

What's Changed

New Contributors

Full Changelog: 1.3.1...1.4.0

Unitxt 1.3.2

19 Dec 05:42
Compare
Choose a tag to compare

Enhancements

  • Add classification datasets by @ilyashnil in #410
  • Added StringContainment metric and convert_to_boolean post processor that normalizes "yes/no/true/false" by @ellarabi in #394

Bug fixes

  • Fix for AugmentPrefixSuffix that ignored labels field by @yoavkatz in #409

What's Changed

New Contributors

Full Changelog: 1.3.1...1.3.2

Unitxt 1.3.1

18 Dec 14:48
Compare
Choose a tag to compare

Enhancements:

Fixes:

  • DiverseLabelSampler used in balancing demonstrations in NER , binary and multi-label classification receives input field to balance on.(@yoavkatz) #399
  • Fix to allow override empty_label of multi_label template used in non-english templates. (@yoavkatz) #403
  • Fix to AugmentorPrefixSuffix not return same prefix/suffix repeated (@yoavkatz) in #407

Possible changes

  • NER results will improve due to better balancing(@yoavkatz) #399

Documentation

What's Changed

New Contributors

Full Changelog: 1.3.0...1.3.1