NVFlare 2.1 Release
Please see the document for more details.
What's Changed
- Fix create_analytic_dxo method calling by @YuanTingHsieh in #141
- Fix poc permission by @IsaacYangSLA in #143
- Add protection to prevent loop stream logging. by @yhwen in #146
- Remove the log streaming codes. by @yhwen in #159
- Enhanced the client_runner to handle the abort command, return Return… by @yhwen in #161
- Enhanced the DXO data and meta default to empty dict. by @yhwen in #169
- added hello-pt-tb example with Learner API and tb streaming by @SYangster in #166
- Fix issue 156 on numpy default casting by @IsaacYangSLA in #172
- Coding style by @IsaacYangSLA in #171
- add aggregation helper class by @holgerroth in #165
- Add to the documentation to clarify about specific cases by @nvkevlu in #175
- Fix private/fed/server docstrings by @YuanTingHsieh in #174
- Improve docstrings and coding style for fuel by @nvkevlu in #176
- Private docstring fixes by @yhwen in #181
- Standardize error messages by @SYangster in #178
- Update admin client command line interface to based on AdminAPI by @YuanTingHsieh in #162
- Fixe cross_validation locate_model should ony return the model weight… by @yhwen in #186
- Fix CredentialType by @YuanTingHsieh in #188
- improve docstrings for apis and widgets by @SYangster in #187
- Fix show_stats command by @YuanTingHsieh in #191
- Fix all issues of isort/black after including nvflare/private folder by @IsaacYangSLA in #154
- Add default FL app validator by @YuanTingHsieh in #179
- Fix admin shutdown client issue by @YuanTingHsieh in #194
- Add information in documentation to help explain fed events by @nvkevlu in #195
- Fix full model shareable generator by @YuanTingHsieh in #196
- Quit admin after timeout dev by @SYangster in #244
- HA: overseer/overseer_agent/overseer_agent_app/overseer_agent_gui by @IsaacYangSLA in #246
- Ha support by @yhwen in #261
- add dummy overseer agent by @SYangster in #262
- adding filesystem and s3 storage and persistor implementations by @SYangster in #255
- updated cli.py to enable HA. by @yhwen in #263
- remove old nfs_storage (filesystem_storage works for nfs) by @SYangster in #264
- Dev 2.1 cli change by @yhwen in #265
- Support HA in POC. by @yhwen in #270
- Add initialize by @IsaacYangSLA in #267
- Add json files and update templates by @IsaacYangSLA in #271
- Fix poc client start by @IsaacYangSLA in #272
- get open port for client_executor. by @yhwen in #275
- Minor changes to master-template.yml to support older versions of bash by @kkersten in #278
- Update with fed_admin config by @nvkevlu in #274
- Change fl_admin launch script to use fed_admin.json config for poc mode by @nvkevlu in #283
- update fed_admin_HA.json in poc to not use SSL by @nvkevlu in #284
- Update dev by @YuanTingHsieh in #285
- Fix intime accumulated aggregators and add tests by @YuanTingHsieh in #288
- Change log levels for some logs in private by @YuanTingHsieh in #289
- Fix license issue by @YuanTingHsieh in #287
- support SAG HA. by @yhwen in #290
- Bugfix HE aggregator by @holgerroth in #292
- Fix isort/black/flake8 issues and CI/CD by @YuanTingHsieh in #296
- Add props to SP dataclass by @IsaacYangSLA in #304
- Persist wf_index in server_runner by @YuanTingHsieh in #305
- make task name configurable by @holgerroth in #254
- Refactor overseer_agent to AdminAPI from cli, add ha commands by @nvkevlu in #302
- Enable poc command in 'pip install -e .' environment. by @IsaacYangSLA in #318
- support data kind WEIGHTS in HE by @holgerroth in #326
- Fix set_run_number for snapshot restore. by @yhwen in #312
- Clean up communicator by @YuanTingHsieh in #315
- Re-organise widgets and handlers by @YuanTingHsieh in #309
- Fix FLComponent docstring typo by @YuanTingHsieh in #329
- Add event_type to AnalyticsSender by @YuanTingHsieh in #330
- disable byoc in provisioning by @holgerroth in #327
- Clean up codes in nvflare/private/fed/app by @YuanTingHsieh in #286
- add storage tests by @SYangster in #325
- Add persist/restore to cyclic workflow by @IsaacYangSLA in #331
- Initial checkin of code with job management and admin commands by @nvkevlu in #332
- 343 fixes svt_privacy by @wyli in #348
- Improve fed_(server|client).json readability by @IsaacYangSLA in #347
- Storage improvements by @SYangster in #341
- move job def related into apis, and add get_apps by @nvkevlu in #350
- Fix persistable docstring by @YuanTingHsieh in #351
- Remove pickle dump/load calls from handling signature file by @IsaacYangSLA in #361
- Show warning after running poc command by @IsaacYangSLA in #363
- Rename study to project in all files in lighter by @IsaacYangSLA in #365
- Runner process by @yhwen in #369
- Make study spec consistent with study tool by @IsaacYangSLA in #359
- study manager by @IsaacYangSLA in #368
- Add study config user app by @IsaacYangSLA in #345
- Requires implementation of fire_event method for ServerEngineSpec by @YuanTingHsieh in #352
- Fix import issues by @IsaacYangSLA in #373
- Yaml loader known to unsafe. Switch to yaml's safe_loader to reduce … by @IsaacYangSLA in #380
- Fix some uncaught yaml loader codes. Replace them with safe_load by @IsaacYangSLA in #383
- Remove refresh api as it wipes out all records. by @IsaacYangSLA in #385
- Fix a few issues on study manager by @IsaacYangSLA in #375
- removed pickle from storage by @SYangster in #381
- Remove pickle from job commands by @nvkevlu in #382
- Update docstrings by @YuanTingHsieh in #390
- Multi run support by @yhwen in #388
- add to docstrings and add method to get app content from Job by @nvkevlu in #394
- Fix job def and manager specs by @YuanTingHsieh in #396
- fix upload and download job commands, add clone_job by @nvkevlu in #397
- fixes the issue with keys in meta not being str by @nvkevlu in #399
- Add Job scheduler by @YuanTingHsieh in #400
- update the commands in the FLAdminAPI by @nvkevlu in #401
- Fix gunicorn cert issues by @IsaacYangSLA in #402
- Fix docstrings by @YuanTingHsieh in #408
- The privilege yaml file must be signed and loaded by secure content s… by @IsaacYangSLA in #403
- Add system state into overseer reply. by @IsaacYangSLA in #410
- Re-factor unit test's folder by @YuanTingHsieh in #405
- Scheduler integration by @yhwen in #411
- Added init.py, made the ABORT command invisible. by @yhwen in #413
- Added components for poc mode. by @yhwen in #414
- Enable components in fed_server.json by @IsaacYangSLA in #404
- keep running clients for job. by @yhwen in #417
- Fix exception messages by @YuanTingHsieh in #415
- Fix store in study manager as it's now a component by @IsaacYangSLA in #418
- Replace pickle in state persistence in provision cert with json by @IsaacYangSLA in #412
- Fix error when running provision with different overseer by @IsaacYangSLA in #409
- Added snapshot write lock. by @yhwen in #419
- Update CI/CD requirements and tests by @YuanTingHsieh in #376
- Add a new entry in fed_server.json by @IsaacYangSLA in #420
- Add coverage report to unit test by @YuanTingHsieh in #421
- Fixed the cross_site_validation for multi_run. by @yhwen in #422
- Add required section for server to start normally in provision (secure) mode by @YuanTingHsieh in #424
- Add default env var PYTHONPATH if it is not set by @YuanTingHsieh in #426
- Enhance error message reporting by @YuanTingHsieh in #425
- Admin command jobs for multi-jobs by @nvidianz in #431
- Update hello-numpy-sag from app to job structure and update docs by @YuanTingHsieh in #430
- Fixed relative path problem with submit_job by @nvidianz in #433
- Update integration test cases by @YuanTingHsieh in #429
- FLARE-128: Allow missing/empty study_name by @nvidianz in #435
- Update commands by @nvkevlu in #432
- Clean up fuel/hci/server/login.py by @YuanTingHsieh in #441
- Fix typos by @YuanTingHsieh in #439
- FLARE-136: Fixed the exception in list_jobs when study is None by @nvidianz in #445
- Tests run with PASSED by @IsaacYangSLA in #427
- Remove result processor by @YuanTingHsieh in #440
- Clean up job scheduler, resource manager, and resource consumers by @YuanTingHsieh in #436
- Remove tee in sub_start.sh by @IsaacYangSLA in #447
- Add to FLAdminAPI, fix FLAdminAPI runner by @nvkevlu in #443
- Add Ditto helper to app_common, update prostate example learners by @ZiyueXu77 in #437
- fix issue of admin API not having poc_key in poc mode by @nvkevlu in #449
- clean up experience with admin client for poc mode by @nvkevlu in #450
- Move job scheduler from private to app_common by @YuanTingHsieh in #451
- Log separation by @yhwen in #446
- Enhance Job_runner logging. by @yhwen in #453
- Not allow shutdown while still jobs running. by @yhwen in #454
- Fixed codestyle. by @yhwen in #457
- Enable read the docs build by @IsaacYangSLA in #458
- Fix integration tests by @YuanTingHsieh in #455
- Add a white space to log info when deploying an application by @holgerroth in #459
- update FLAdminAPI runner by @nvkevlu in #456
- Add back HA test cases by @YuanTingHsieh in #460
- Update storage and tests by @YuanTingHsieh in #448
- Fixed the missing min_sites and required_sites for Job. by @yhwen in #461
- Generate jws-like compact serialized study info by @IsaacYangSLA in #444
- Update cifar10 data setup procedure in integration tests apps by @YuanTingHsieh in #464
- Remove wrong params local_logging by @YuanTingHsieh in #465
- Wait for admin login by @holgerroth in #463
- Changed the job scheduler to use event. by @yhwen in #468
- Only persist the FLComponent snapshot. by @yhwen in #469
- Added SecurityContentService for runner_process. by @yhwen in #473
- Add flask and gunicorn by @IsaacYangSLA in #471
- Add PyJWT license by @IsaacYangSLA in #481
- Update study def and study manager to fit study cli and all required … by @IsaacYangSLA in #480
- Clean up server deployer by @YuanTingHsieh in #479
- Clean up server engine internal spec by @YuanTingHsieh in #478
- Clean up training cmds by @YuanTingHsieh in #477
- Clean up server runner by @YuanTingHsieh in #476
- Clean up fed server by @YuanTingHsieh in #475
- add HA specific commands to FLAdminAPI by @nvkevlu in #474
- Add junitxml to unit test and update exclude var test by @YuanTingHsieh in #483
- Update job scheduler and add unit tests by @YuanTingHsieh in #472
- Update install requires by @IsaacYangSLA in #482
- Update setup.py for release by @IsaacYangSLA in #423
- Support abs path and sort list_jobs result by @nvidianz in #467
- To support Tensor used in the model weights for cross_validation. by @yhwen in #486
- Add unittests for lighter project class by @IsaacYangSLA in #488
- sync clients in runner_proess. by @yhwen in #487
- FLARE-197: Fixed the problem with empty deploy_map by @nvidianz in #494
- Add unittest cases on lighter Participant class by @IsaacYangSLA in #491
- Ensure the job exit for abort_job. by @yhwen in #496
- Added a missing check in. by @yhwen in #499
- Update dummy oa by @IsaacYangSLA in #500
- Update fed analysis example to 2.1 by @holgerroth in #495
- cifar10 example update for 2.1 by @holgerroth in #497
- Fix the list_sp index issue by @IsaacYangSLA in #503
- Fix DummyOverseerAgent overseer_info and sp_list by @IsaacYangSLA in #504
- Added job FAILED_TO_RUN status. Changed the multi-run run_number format. by @yhwen in #492
- use makedirs to auto make intermediate folders if needed by @YuanTingHsieh in #505
- Enchanced the file storage write process. by @yhwen in #502
- Fix list resource manager and add tests by @YuanTingHsieh in #485
- Modify existing premerge CI to runs-on temp-ci label by @pxLi in #507
- Enhance filesystem_storage file writting. by @yhwen in #509
- CIFAR-10 example: update readme and utility scripts by @holgerroth in #511
- Move example np codes into app_common by @YuanTingHsieh in #510
- Restructure integration tests by @YuanTingHsieh in #470
- Update 2.1 docs by @nvkevlu in #498
- Add hello examples into CI integration tests by @YuanTingHsieh in #513
- Init blossom-ci workflow by @pxLi in #517
- Fix event recorder by @YuanTingHsieh in #515
- Fix np codes by @YuanTingHsieh in #514
- update provisioning helper ui to work for 2.1.0 release by @nvkevlu in #522
- Clean up CI integration tests by @YuanTingHsieh in #516
- Blossom test by @IsaacYangSLA in #523
- Rewrote job validation and added test cases by @nvidianz in #520
- Updatedd the abort_task command usage. by @yhwen in #489
- pass the SP target info to the client process. by @yhwen in #528
- Add license files for tools needed for prostate by @ZiyueXu77 in #525
- Update readme by @YuanTingHsieh in #526
- Fix client not online deploy error by @yhwen in #529
- Removed studies from submit_job and list_jobs by @nvidianz in #532
- Removed study_name from meta.json for all examples by @nvidianz in #534
- address issues from QA by adding to docs, removing delete_job by @nvkevlu in #530
- remove study from docs and provisioning UI to match project.yml updates by @nvkevlu in #536
- Change the default value of ignore_result_error in SAG workflows by @YuanTingHsieh in #535
- Remove Study-related info from lighter/apis/poc/setup.py/requirements… by @IsaacYangSLA in #533
- Snapshot concurrent persist by @yhwen in #512
- Heartbeat jobs by @yhwen in #524
- Changed to use ClientEngineSpec for type hint. by @yhwen in #537
- Added mock package. by @yhwen in #541
- Add shutdown system admin api by @IsaacYangSLA in #543
- Remove study manager from cifar10 example by @holgerroth in #542
- Add additional datasets with preprocess and testing for prostate example by @ZiyueXu77 in #452
- Add a helper function to overseer agent by @IsaacYangSLA in #544
- cifar-10: update poc multi-task instructions by @holgerroth in #550
- add validation for shutdown_system by @nvkevlu in #548
- Enable Overseer shutdown system command. by @yhwen in #547
- Fixed the issue abort_job command could not abort all running jobs. by @yhwen in #549
- Update HE exception message by @holgerroth in #552
- Update integration tests for HA by @YuanTingHsieh in #540
- CIFAR-10: use exact match to find run by @holgerroth in #551
- Update plot_tensorboard_events.py by @ZiyueXu77 in #554
- Change tmp path under tmp/nvflare by @YuanTingHsieh in #531
- Set dummy info in both init and initialize methods (to overcome a… by @IsaacYangSLA in #557
- Limit # of servers to 2 at most by @IsaacYangSLA in #555
- Remove minio server in test scripts and fix format issues by @YuanTingHsieh in #556
- Fixed the missing client after running the system for a while. by @yhwen in #559
- Update supervised_learner.py by @ZiyueXu77 in #560
- Add overseer integration tests by @IsaacYangSLA in #493
- logout from admin api after submitting job by @holgerroth in #565
- Update plot_tensorboard_events.py by @ZiyueXu77 in #564
- Remove minio by @YuanTingHsieh in #568
- add check to throw exception if Python version is >= 3.9 by @nvkevlu in #567
- Sort the jobs in scheduler based on submit time by @YuanTingHsieh in #566
- remove study store in cifar10 example project by @holgerroth in #569
- Job validator addition checks by @chesterxgchen in #546
- Update hello-pt and hello-pt-tb example by @YuanTingHsieh in #561
- fix issue with FLAdminAPI stalling by @nvkevlu in #572
- Add execution exception status to a job by @YuanTingHsieh in #558
- Remove reference implemenation of sql/redis stores from nvflare.ha.ov… by @IsaacYangSLA in #573
- Remove S3 storage unit test by @YuanTingHsieh in #574
- Add set -e to integration test script by @YuanTingHsieh in #577
- runtests scripts refactoring by @chesterxgchen in #571
- Split snapshot file by @yhwen in #582
- Update project.yml by @IsaacYangSLA in #585
- update hello-monai README.md by @chesterxgchen in #584
- added 30 seconds delay to terminate the child process. by @yhwen in #579
- Fix handling of config exception during job running by @YuanTingHsieh in #580
- revise wording of validation for show_stats by @nvkevlu in #588
- add tests for running through FLAdminAPI commands by @nvkevlu in #581
- Fix snapshot persistor with storage by @YuanTingHsieh in #589
- hide download commands with issues that will be worked on in next rel… by @nvkevlu in #590
- Added authz for submit_job and list_jobs by @nvidianz in #587
- Init pre-merge scripts by @YanxuanLiu in #527
- Update cifar10 project file by @holgerroth in #586
- avoid swallow exception by @chesterxgchen in #591
- Fixed the snapshot log path info. by @yhwen in #592
- Remove restriction of create inside preexisting in storage by @YuanTingHsieh in #595
- Add cancelled resources to front by @YuanTingHsieh in #594
- Update jobs storage location in cifar-10 project by @holgerroth in #598
- Use PCI_BUS_ID as CUDA_DEVICE_ORDER in GPUResourceConsumer by @YuanTingHsieh in #596
- Support @ALL in site lists of deploy_map by @nvidianz in #599
- fixed the aux communication error under load. by @yhwen in #603
- dev-2.1 docs restructure by @kkersten in #601
- Support @ALL in authz_func by @nvidianz in #606
- hide command scope on CLI interface to reduce confusion by @nvkevlu in #605
- Fixed the abort_job random EOFError. by @yhwen in #597
- Fix job schedule and deploy with @ALL by @YuanTingHsieh in #607
- cherrypick the monitor parent process exit to dev-2.1. by @yhwen in #610
- Fix resource manager by @YuanTingHsieh in #604
- fix links, add information to docs raised from QA by @nvkevlu in #609
- Added proper system shutdown to avoid random Bad file descriptor errors. by @yhwen in #611
- update provisioning ui to generate updated project.yml by @nvkevlu in #593
- Created test-cases for authorization by @nvidianz in #615
- fix shutdown_system command by @nvkevlu in #612
- Fix resource manager clean up thread calling by @YuanTingHsieh in #613
- add HA integration tests with 2 servers by @nvkevlu in #616
- Fix check client replies by @YuanTingHsieh in #620
- Corrected authorization behavior for abort_job and added test cases by @nvidianz in #622
- cifar-10: don't set cuda visible devices by @holgerroth in #617
- Add specific versions to requirements-min.txt by @YuanTingHsieh in #621
- Fix abort job when server execution exception by @YuanTingHsieh in #623
- fixed jobs run race condition issue. by @yhwen in #624
- Treat several messages as NON-ERROR execution by @YuanTingHsieh in #627
- Refactor Brats to 2.1 by @ZiyueXu77 in #563
- Fixed the run_number type hint. by @yhwen in #632
- Fix a situation where server start success but client start failed by @YuanTingHsieh in #630
- fed_analysis: support new dataset with added masks by @holgerroth in #633
- make additions in docs to address various issues by @nvkevlu in #635
- Fix admin logout issue when overseer is dead by @IsaacYangSLA in #628
- Enhanced the child_process logging. by @yhwen in #637
- Enhance the abort_train with retry. by @yhwen in #634
- Add missing import by @YuanTingHsieh in #639
- Fix unit test run number issue by @YuanTingHsieh in #640
- dev-2.1 docs update - overview, examples, programming guide by @kkersten in #642
- Update job meta validator test by @YuanTingHsieh in #644
- cifar-10: Use unique dir for datasplits by @holgerroth in #643
- Fix issue #638 by removing DistributionBuilder by @IsaacYangSLA in #645
- Deprecate accumulate weighted aggregator by @YuanTingHsieh in #647
- Remove fl_ctx in init method of job meta validator by @YuanTingHsieh in #646
- update links in the examples to the location for 2.1.0 docs by @nvkevlu in #641
- Show better messages to users when starting overseer in POC mode by @IsaacYangSLA in #650
- Enable per participant component specification by @IsaacYangSLA in #651
- Replace run_number with job_id in most occurrences by @nvkevlu in #654
- Added support for name in meta.json by @nvidianz in #653
- Fix hello examples by @YuanTingHsieh in #656
- Add new test case to security unit test by @YuanTingHsieh in #652
- Clean up secure content check method by @YuanTingHsieh in #648
- Remove overseer from poc by @IsaacYangSLA in #658
- Save job workspace by @yhwen in #655
- import mock from unittest by @YuanTingHsieh in #661
- Added back delete_job command, removed the delete_workspace command. by @yhwen in #662
- update run_number to job_id in examples by @nvkevlu in #657
- CIFAR-10: add download job instructions and plot downloaded results by @holgerroth in #667
- update cifar-10 project file by @holgerroth in #666
- fed_analysis example: use download_job command; don't specify cuda devices by @holgerroth in #669
- Fix integration tests by @YuanTingHsieh in #663
- Add download_job_url to project.yml and template by @IsaacYangSLA in #671
- Delete job command enhance by @yhwen in #670
- Improve response on poc command by @IsaacYangSLA in #664
- make commands consistent with updates for FLAdminAPI and docs by @nvkevlu in #665
- update links for 2.1.1 by @nvkevlu in #668
- Use custom StorageException instead of RuntimeError by @YuanTingHsieh in #672
- Remove some unused rights from master template yaml file by @IsaacYangSLA in #675
- Add unit test for fuel/hci/zip_utils.py by @YuanTingHsieh in #659
- Readme updates on brats and prostate by @ZiyueXu77 in #673
- Unregister client itself when client shutdown by stop_fl.sh by @IsaacYangSLA in #674
- Revert stop_fl.sh PR and add warnings when users run stop_fl.sh by @IsaacYangSLA in #679
- update provisioning UI by @nvkevlu in #678
- Update example access result part and update documentation by @YuanTingHsieh in #681
- Added run_duration for the job. by @yhwen in #680
- Removal of download_folder and env and security fix for ls by @nvidianz in #682
- Remove env command in doc by @YuanTingHsieh in #683
New Contributors
- @kkersten made their first contribution in #278
- @wyli made their first contribution in #348
- @chesterxgchen made their first contribution in #546
- @YanxuanLiu made their first contribution in #527
Full Changelog: 2.0.6...2.1.1