Releases: IBM/unitxt
Releases · IBM/unitxt
1.5.1
What's Changed
- 1.5.0 by @elronbandel in #494
- FIx all datasets by @elronbandel in #495
- fix manifest by @elronbandel in #496
Full Changelog: 1.5.0...1.5.1
Unitxt 1.5.0
What's Changed
- Add Perplexity by @assaftibm in #442
- Balance demos in fair_tos datasets by @yoavkatz in #473
- postprocessors become extensions of FieldOperator by @dafnapension in #468
- Initial UI by @Roni-Friedman in #475
- introducing text at the top of loaders.py by @dafnapension in #474
- Improve Unitxt UI and add support for multiple catalogs by @Roni-Friedman in #476
- Improve ExtractMostCommonFieldValues performance by avoiding the creation of the stream-length long list of values instead building values counter on the fly by @dafnapension in #471
- Add a new Perturbate operator that is handy for faking prediction as a perturbated version of target or for any other perturbation use cases by @dafnapension in #456
- add a card for HF xsum, a summary dataset by @dafnapension in #479
- Add QA, NER, Targeted sentiment and Generation tasks, Llama and Alpaca formats and instructions and whitespace augmentor by @matanor in #483
- Cfpb product by @ilyashnil in #485
- Improve catalog UI on documentation website by separating catalog objects to files and adding information per catalog item by @matanor in #461
- Update QA templates by @matanor in #486
- Length balancer docstring by @matanor in #487
- Add process_instance function to every mutli stream operator for easier testing and usage at instance level by @elronbandel in #488
- Add LRU caching for catalog artifact loading to minimize IO overhead and enhance performance by @elronbandel in #489
- Make postprocessors a general operator that operates on 'prediction' and on 'references' which enables the use of every possible operator as postprocessor by @dafnapension in #484
- Improve UI code presentation and organization by @Roni-Friedman in #491
- Simplify ui launching with the console command unitxt-explore by @elronbandel in #492
- Xmmlu template multilingual by @gitMichal in #493
New Contributors
- @Roni-Friedman made their first contribution in #475
Full Changelog: 1.4.6...1.5.0
Unitxt 1.4.6
What's Changed
- Fix automatic dataset and metric uploads to Hugginface hub by @elronbandel in #466
Full Changelog: 1.4.5...1.4.6
Unitxt 1.4.4
What's Changed
- Fix HuggingFace integration by changing logging.py to logging_utils.py to comply with hf dataset conventions by @elronbandel in #457
- Add FilterByQuery and ExecuteQuery as simple operators receiving textual python query by @dafnapension in #449
- New multi label template by @yoavkatz in #462
- Delete old formats and make SystemFormat the only format exist by @elronbandel in #458
Full Changelog: 1.4.3...1.4.4
Unitxt 1.4.3
What's Changed
- Change formats to be backward compatible by @elronbandel in #453
Full Changelog: 1.4.2...1.4.3
Unitxt 1.4.2
Enhancements
- Added ability to calculate confidence interval for additional scores beyond the main_score by @assaftibm in #431
- Improved error messaging in test cards and added option to disable via environment variable by @yoavkatz in #440
- Added support for Japanese language in sacrebleu by @yoavkatz in #433
- Introduced binary recall, precision metrics, and advanced filtering operators by @lilacheden in #424
- Added text completion task and modified lm harness template by @perlitz in #429
Internal code changes
- Converted instructions to operators by @elronbandel in #450
Bug fixes
- Fixed split definition in debater datasets by @ilyashnil in #443
- Added new requirements for sacrebleu by @gitMichal in #448
Documentation
- Documented release notes process by @matanor in #444
- Added guidelines for updating Unitxt documentation by @matanor in #446
Non backward compatible changes
- Implemented SystemFormat instance operator and updated BasicRecipe to use all operators, removing renderers and ICLFormat by @dafnapension in #423
(this change can break any code using the deprecated ICLFormat) - Additional verification introduced, aiming for uniquely determined results by @dafnapension in #435
New Contributors
- @lilacheden made their first contribution in #424
Full Changelog: 1.4.1...1.4.2
Unitxt 1.4.1
(Same as 1.4.0 - rereleasing due to release process error)
Enhancements
- New random generation mechanism to remove dependency between different random generators @matanor in #414
- New MultipleChoiceTemplate which changes all QA datasets by @elronbandel in #405
- New MAP, MRR, and Retrieval@K metrics by @assaftibm in #422
- New LoadFromKaggle loader which allows direct loading of datasets from Kaggle by @ilyashnil in #413
- New StringContainment metric that if one of the references is contained in the prediction by @ellarabi in #394
- New ConvertToBoolean post processor that changes the predict to either TRUE or FALSE by @ellarabi in #394
- 15 new open source classification datasets by @ilyashnil in #410, #418
- Documentation is now automatically generate in each release (#384)
Bug fixes
- Fixed and improved error checking in multi label F1 by @yoavkatz in #390
- Changed translations to use normalized_sacrebleu by @gitMichal in #426
Non backward compatible changes
- MultipleChoice QA datasets need to move to new MultipleChoiceTemplates
- Translation blue metric is now between 0-1 and not 0-100
- New randomization mechanism may change selects selection of demos , randomized text augmentation, or any other random choice.
What's Changed
- metric and processor related to robustness evaluation by @ellarabi in #394
- Removed variable set in debugging by @yoavkatz in #409
- Adding simple datasets by @ilyashnil in #410
- Add multiple choice template and fix mmlu by @elronbandel in #405
- Fixed error check in multi label F1 by @yoavkatz in #390
- FIX multiple choice template by @perlitz in #416
- example of kaggle loader by @ilyashnil in #413
- Medical abstract by @ilyashnil in #418
- Multilabel and jsons by @ilyashnil in #419
- Improve dataset addition tutorial + Fix auto uploads of the catalog to the web by @elronbandel in #384
- Improve docs by @elronbandel in #420
- Fix docs and docs compilation tests by @elronbandel in #421
- Hard code classes names by @elronbandel in #425
- Retrieval metrics by @assaftibm in #422
- modify to different bleu impl. by @gitMichal in #426
- Improve docs by @elronbandel in #427
- Independent random generators by @matanor in #414
New Contributors
Full Changelog: 1.3.1...1.4.0
Uniitxt 1.4.0
Enhancements
- New random generation mechanism to remove dependency between different random generators @matanor in #414
- New MultipleChoiceTemplate which changes all QA datasets by @elronbandel in #405
- New MAP, MRR, and Retrieval@K metrics by @assaftibm in #422
- New LoadFromKaggle loader which allows direct loading of datasets from Kaggle by @ilyashnil in #413
- New StringContainment metric that if one of the references is contained in the prediction by @ellarabi in #394
- New ConvertToBoolean post processor that changes the predict to either TRUE or FALSE by @ellarabi in #394
- 15 new open source classification datasets by @ilyashnil in #410, #418
- Documentation is now automatically generate in each release (#384)
Bug fixes
- Fixed and improved error checking in multi label F1 by @yoavkatz in #390
- Changed translations to use normalized_sacrebleu by @gitMichal in #426
Non backward compatible changes
- MultipleChoice QA datasets need to move to new MultipleChoiceTemplates
- Translation blue metric is now between 0-1 and not 0-100
- New randomization mechanism may change selects selection of demos , randomized text augmentation, or any other random choice.
What's Changed
- metric and processor related to robustness evaluation by @ellarabi in #394
- Removed variable set in debugging by @yoavkatz in #409
- Adding simple datasets by @ilyashnil in #410
- Add multiple choice template and fix mmlu by @elronbandel in #405
- Fixed error check in multi label F1 by @yoavkatz in #390
- FIX multiple choice template by @perlitz in #416
- example of kaggle loader by @ilyashnil in #413
- Medical abstract by @ilyashnil in #418
- Multilabel and jsons by @ilyashnil in #419
- Improve dataset addition tutorial + Fix auto uploads of the catalog to the web by @elronbandel in #384
- Improve docs by @elronbandel in #420
- Fix docs and docs compilation tests by @elronbandel in #421
- Hard code classes names by @elronbandel in #425
- Retrieval metrics by @assaftibm in #422
- modify to different bleu impl. by @gitMichal in #426
- Improve docs by @elronbandel in #427
- Independent random generators by @matanor in #414
New Contributors
Full Changelog: 1.3.1...1.4.0
Unitxt 1.3.2
Enhancements
- Add classification datasets by @ilyashnil in #410
- Added StringContainment metric and convert_to_boolean post processor that normalizes "yes/no/true/false" by @ellarabi in #394
Bug fixes
What's Changed
- metric and processor related to robustness evaluation by @ellarabi in #394
- Removed variable set in debugging by @yoavkatz in #409
- Adding simple datasets by @ilyashnil in #410
New Contributors
Full Changelog: 1.3.1...1.3.2
Unitxt 1.3.1
Enhancements:
- FilterByValue can filter out instances that contain a given value (@elronbandel) #402
- added ag_news by (@ilyashnil) in #400
Fixes:
- DiverseLabelSampler used in balancing demonstrations in NER , binary and multi-label classification receives input field to balance on.(@yoavkatz) #399
- Fix to allow override empty_label of multi_label template used in non-english templates. (@yoavkatz) #403
- Fix to AugmentorPrefixSuffix not return same prefix/suffix repeated (@yoavkatz) in #407
Possible changes
Documentation
- expand code coverage and documentation of refiner classes by (@dafnapension) #396
What's Changed
- added ag_news by @ilyashnil in #400
- peek at the first instance by @dafnapension in #401
- DiverseLabelsSampler fix by @yoavkatz in #399
- Enhance FilterByValue to have disallowed_values by @elronbandel in #402
- expand code coverage and documentation of refiner classes by @dafnapension in #396
- Also added more tests and documentation to DiverseLabelsSampler by @yoavkatz in #404
- Fix to allow override empty_label of multi_label template. by @yoavkatz in #403
- Fix suffix prefix not return same prefix/suffix repeated by @yoavkatz in #407
New Contributors
- @ilyashnil made their first contribution in #400
Full Changelog: 1.3.0...1.3.1