Releases: opendilab/DI-engine
v0.4.3
Env
- add rule-based gomoku expert (#465)
Algorithm
- fix a2c policy batch size bug (#481)
- enable activation option in collaq attention and mixer
- minor fix about IBC (#477)
Enhancement
- add IGM support (#486)
- add tb logger middleware and demo
Fix
- the type conversion in ding_env_wrapper (#483)
- di-orchestrator version bug in unittest (#479)
- data collection errors caused by shallow copies (#475)
- gym==0.26.0 seed args bug
Style
- add readme tutorial link(environment & algorithm) (#490) (#493)
- adjust location of the default_model method in policy (#453)
New Repo
- DI-sheep: Deep Reinforcement Learning + 3 Tiles Game
Contributors: @PaParaZz1 @nighood @norman26625 @ZHZisZZ @cpwan @mahuangxu
v0.4.2
API Change
config
will be deepcopyed by default incompile_config
function- After calling
compile_config
function, current code repogit log
andgit diff
information will be saved inexp_name
directory
Env
- add rocket env (#449)
- updated pettingzoo env and improved related performance (#457)
- add mario env demo (#443)
- add MAPPO multi-agent config (#464)
- add mountain car (discrete action) environment (#452)
- fix multi-agent mujoco gym comaptibility bug
- fix gfootball env save_replay variable init bug
Algorithm
- add IBC (Implicit Behaviour Cloning) algorithm (#401)
- add BCO (Behaviour Cloning from Observation) algorithm (#270)
- add continuous PPOPG algorithm (#414)
- add PER in CollaQ (#472)
- add activation option in QMIX and CollaQ
Enhancement
- update ctx to dataclass (#467)
Fix
- base_env FinalMeta bug about gym 0.25.0-0.25.1
- config inplace modification bug
- ding cli no argument problem
- import errors after running setup.py (jinja2, markupsafe)
- conda py3.6 and cross platform build bug
Style
- add project state and datetime in log dir (#455)
- polish notes for q-learning model (#427)
- revision to mujoco dockerfile and validation (#474)
- add dockerfile for cityflow env
- polish default output log format
Contributors: @PaParaZz1 @ZHZisZZ @zjowowen @song2181 @zerlinwang @i-am-tc @hiha3456 @nighood @kxzxvbk @Weiyuhong-1998 @RobinC94
v0.4.1
API Change
- upgrade Python version from
3.6-3.8
to3.7-3.9
- upgrade gym version from
0.20.0
to0.25.0
, plenty ofenv_id
needs to update (e.g.,Pendulum-v0
toPendulum-v1
) (#434) - upgrade torch version from
1.10.0
to1.12.0
- upgrade mujoco bin from
2.0.0
to2.1.0
- add new task pipeline demo (DDPG/TD3/D4PG/C51/QRDQN/IQN?SQIL/TREX/PDQN) (#374, #380, #384, #407)
Env (dizoo)
- add gym anytrading env (#424)
- add board games env (tictactoe, gomuku, chess) (#356)
- add sokoban env (#397) (#429)
- add BC and DQN demo for gfootball (#418) (#423)
- add discrete pendulum env (#395)
Algorithm
Enhancement
- add final result saving in training pipeline
Fix
- random policy randomness bug
- action_space seed compalbility bug
- discard message sent by self in redis mq (#354)
- remove pace controller (#400)
- import error in serial_pipeline_trex (#410)
- unittest hang and fail bug (#413)
- DREX collect data bug
- remove unused import cv2
- ding CLI env/policy option bug
Style
- add buffer api description (#371)
- polish VAE comments (#404)
- unittest for FQF (#412)
- add metaworld dockerfile (#432)
- remove opencv requirement in default setting
- update long description in setup.py
New Repo
- InterFuser: Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer
- awesome-decision-transformer: A curated list of Decision Transformer resources
- awesome-exploration-RL: A curated list of awesome exploration RL resources
Contributors: @PaParaZz1 @zjowowen @sailxjx @puyuan1996 @ZHZisZZ @lixl-st @Cloud-Pku @Weiyuhong-1998 @karroyan @kxzxvbk @song2181 @nighood @zhangpaipai @Hcnaeg
v0.4.0
API Change
- refactor DI-engine doc and update doc links doc | 中文文档
- refactor default logging lib and add DI-toolkit (ditk) requirement (just enter
pip install DI-toolkit
)
Env (dizoo)
- add MAPPO/MASAC all configs in SMAC (#310) (SOTA results in SMAC!!!)
- add dmc2gym env (#344) (#360)
- remove DI-star requirements of dizoo/smac, use official pysc2 (#302)
- add latest GAIL mujoco config (#298)
- polish procgen env (#311)
- add MBPO ant and humanoid config for mbpo (#314)
- fix slime volley env obs space bug when agent_vs_agent
- fix smac env obs space bug
- fix import path error in lunarlander (#362)
Algorithm
- add Decision Transformer algorithm (#327) (#364)
- add on-policy PPG algorithm (#312)
- add DDPPO & add model-based SAC with lambda-return algorithm (#332)
- add infoNCE loss and ST-DIM algorithm (#326)
- add FQF distributional RL algorithm (#274)
- add continuous BC algorithm (#318)
- add pure policy gradient PPO algorithm (#382)
- add SQIL + SAC algorithm (#348)
- polish NGU and related modules (#283) (#343) (#353)
- add marl distributional td loss (#331)
Enhancement
- add new worker middleware (#236) (new DRL programming model and pipeline example)
- refactor model-based RL pipeline (ding/world_model) (#332)
- refactor logging system in the whole DI-engine (#316)
- add env supervisor design (#330)
- support async reset for envpool env manager (#250)
- add log videos to tensorboard (#320)
- refactor impala cnn encoder interface (#378)
Fix
- env save replay bug
- transformer mask inplace operation bug
- transtion_with_policy_data bug in SAC and PPG
Style
- add dockerfile for ding:hpc image (#337)
- fix mpire 2.3.5 which handles default processes more elegantly (#306)
- use FORMAT_DIR instead of ./ding (#309)
- update quickstart colab link (#347)
- polish comments in ding/model/common (#315)
- update mujoco docker download path (#386)
- fix protobuf new version compatibility bug
- fix torch1.8.0 torch.div compatibility bug
- update doc links in readme
- add outline in readme and update wechat image
- update head image and refactor docker dir
Contributors: @PaParaZz1 @sailxjx @puyuan1996 @ZHZisZZ @Will-Nie @zjowowen @HansBug @zerlinwang @Weiyuhong-1998 @davide97l @hiha3456 @LuciusMos @kxzxvbk @lixl-st @zhangpaipai @song2181 @karroyan
v0.3.1
API Change
- Substitute
gym.wrappers.RecordVideo
forgym.wrappers.Monitor
to save video replay - Substitute
policy/bc.py
forpolicy/il.py
and update relevant serial_pipeline and unittest - Polish all the configurations in dizoo with our new config guideline
Env (dizoo)
- polish and standardize dizoo config (#252) (#255) (#249) (#246) (#262) (#261) (#266) (#273) (#263) (#280) (#259) (#286) (#277) (#290) (#289) (#299)
- add GRF academic env and config (#281)
- update env inferface of GRF (#258)
- update D4RL offline RL env and config (#285)
- polish PomdpAtariEnv (#254)
Algorithm
- DREX Inverse RL algorithm (#218)
Feature
- separate mq and parallel modules, add redis (#247)
- rename env variables; fix attach_to parameter (#244)
- env implementation check (#275)
- adjust and set the max column number of tabulate in log (#296)
- speed up GTrXL forward method + GRU unittest (#253) (#292)
- add drop_extra option for sample collect
Fix
- add act_scale in DingEnvWrapper; fix envpool env manager (#245)
- auto_reset=False and env_ref bug in env manager (#248)
- data type and deepcopy bug in RND (#288)
- share_memory bug and multi_mujoco env (#279)
- some bugs in GTrXL (#276)
- update gym_vector_env_manager and add more unittest (#241)
- mdpolicy random collect bug (#293)
- gym.wrapper save video replay bug
- collect abnormal step format bug and add unittest
Test
- add buffer benchmark & socket test (#284)
Style
Contributors: @PaParaZz1 @sailxjx @puyuan1996 @Will-Nie @davide97l @hiha3456 @zjowowen @Weiyuhong-1998 @LuciusMos @kxzxvbk @lixl-st @YinminZhang @song2181 @Hcnaeg @norman26625 @jayyoung0802 @RobinC94 @HansBug
v0.3.0
API Change
- add new
BaseEnv
definition:- remove
info
method - add
random_action
method - add
observation_space
,action_space
,reward_space
properties - Env English doc | 环境中文文档
- remove
- modify the return value of
eval
method inInteractionSerialEvaluator
class fromTuple[bool, float]
toTuple[bool, dict]
. - move the default logger to rich logger, you can set env variable like
export ENABLE_RICH_LOGGING=False
to disable it. - add
train_iter
andenv_step
argument in ding CLI.- you can use them like
ding -m serial -c pendulum_sac_config.py -s 0 --train-iter 1e3
- you can use them like
- remove default
n_sample/n_episode
value in policy default config.
Env (dizoo)
Algorithm
- Gated TransformXL (GTrXL) algorithm (#136)
- TD3 + VAE(HyAR) latent action algorithm (#152)
- stochastic dueling network (#234)
- use log prob instead of using prob in ACER (#186)
Feature
- support envpool env manager (#228)
- add league main and other improvements in new framework (#177) (#214)
- add pace controller middleware in new framework (#198)
- add auto recover option in new framework (#242)
- add k8s parser in new framework (#243)
- support async event handler and logger (#213)
- add grad norm calculator (#205)
- add gym vector env manager (#147)
- add train_iter and env_step in serial pipeline (#212)
- add rich logger handler (#219) (#223) (#232)
- add naive lr_scheduler demo
Refactor
- new BaseEnv and DingEnvWrapper (#171) (#231) (#240) Env English doc | 环境中文文档
Polish
Improve configurations in dizoo and add more algorithm benchmark doc example | 文档示例
- MAPPO and MASAC smac config (#209) (#239)
- QMIX smac config (#175)
- R2D2 atari config (#181)
- A2C atari config (#189)
- GAIL box2d and mujoco config (#188)
- ACER atari config (#180)
- SQIL atari config (#230)
- TREX atari/mujoco config
- IMPALA atari config
- MBPO/D4PG mujoco config
Fix
- random_collect compatible to episode collector (#190)
- remove default n_sample/n_episode value in policy config (#185)
- PDQN model bug on gpu device (#220)
- TREX algorithm CLI bug (#182)
- DQfD JE computation bug and move to AdamW optimizer (#191)
- pytest problem for parallel middleware (#211)
- mujoco numpy compatibility bug
- markupsafe 2.1.0 bug
- framework parallel module network emit bug
- mpire bug and disable algotest in py3.8
- lunarlander env import and env_id bug
- icm unittest repeat name bug
- buffer thruput close bug
Test
- resnet unittest (#199)
- SAC/SQN unittest (#207)
- CQL/R2D3/GAIL unittest (#201)
- NGU td unittest (#210)
- model wrapper unittest (#215)
- MAQAC model unittest (#226)
Style
- add doc docker (#221) (latex support)
Contributors: @PaParaZz1 @sailxjx @puyuan1996 @Will-Nie @Weiyuhong-1998 @davide97l @zjowowen @LuciusMos @kxzxvbk @Hcnaeg @jayyoung0802 @simonat2011 @jiaruonan
v0.2.3
API Change
- move
actor_head_type
toaction_space
(which is related DDPG/TD3/SAC) - add multiple seeds in CLI:
ding -m serial -c cartpole_dqn_config.py -s 0 -s 1 -s 2
- add new replay buffer (which separates algorithm and storage), user can refer to buffer
- add new main pipeline for async/parallel framework tutorial
Env (dizoo)
- add multi-agent mujoco env (#146)
- add delay reward mujoco env (#145)
- fix port conflict in gym_soccer (#139)
Algorithm
- MASAC algorithm (#112)
- TREX IRL algorithm (#119) (#144)
- H-PPO hybrid action space algorithm (#140)
- residual link in R2D2 (#150)
- gumbel softmax (#169)
- move actor_head_type to action_space field
Feature
- new main pipeline and async/parallel framework (#142) (#166) (#168)
- refactor buffer, separate algorithm and storage (#129)
- cli in new pipeline(ditask) (#160)
- add multiprocess tblogger, fix circular reference problem (#156)
- add multiple seed cli
- polish eps_greedy_multinomial_sample in model_wrapper (#154)
Fix
- R2D3 abs priority problem (#158) (#161)
- multi-discrete action space policies random action bug (#167)
- doc generate bug with enum_tools (#155)
Style
- more comments about R2D2 (#149)
- add doc about how to migrate a new env link
- add doc about env tutorial in dizoo link
- add conda auto release (#148)
- udpate zh doc link
- update kaggle tutorial link
New Repo
- awesome-model-based-RL: A curated list of awesome Model-Based RL resources
- DI-smartcross: Decision AI in Traffic Light Control
Contributors: @PaParaZz1 @sailxjx @puyuan1996 @Will-Nie @Weiyuhong-1998 @LikeJulia @RobinC94 @LuciusMos @mingzhang96 @shgqmrf15 @zjowowen
v0.2.2
Env (dizoo)
Algorithm
- Guided Cost IRL algorithm (#57)
- ICM exploration algorithm (#41)
- MP-DQN hybrid action space algorithm (#131)
- add loss statistics and polish r2d3 pong config (#126)
Enhancement
Fix
- async subprocess env manager reset bug (#137)
- keepdims name bug in model wrapper
- on-policy ppo value norm bug
- GAE and RND unittest bug
- hidden state wrapper h tensor compatibility
- naive buffer auto config create bug
Style
- add supporters list
New Repo Feature
Contributors: @PaParaZz1 @puyuan1996 @RobinC94 @LikeJulia @Will-Nie @Weiyuhong-1998 @timothijoe @davide97l @lichuminglcm @YinminZhang
v0.2.1
API Change
- remove torch in all envs (numpy array is the basic data format in env)
- remove
on_policy
field in all the config - change
eval_freq
from 50 to 1000
Tutorial and Doc
Env (dizoo)
- gym-hybrid env (#86)
- gym-soccer (HFO) env (#94)
- Go-Bigger env baseline (#95)
- sac and ppo config for bipedalwalker env(#121)
Algorithm
- DQfD Imitation Learning algorithm (#48) (#98)
- TD3BC offline RL algorithm (#88)
- MBPO model-based RL algorithm (#113)
- PADDPG hybrid action space algorithm (#109)
- PDQN hybrid action space algorithm (#118)
- fix R2D2 bugs and produce benchmark, add naive NGU (#40)
- self-play training demo in slime_volley env (#23)
- add example of GAIL entry + config for mujoco (#114)
Enhancement
- enable arbitrary policy num in serial sample collector
- add torch DataParallel for single machine multi-GPU
- add registry force_overwrite argument
- add naive buffer periodic thruput seconds argument
Fix
- target model wrapper hard reset bug
- fix learn state_dict target model bug
- ppo bugs and update atari ppo offpolicy config (#108)
- pyyaml version bug (#99)
- small fix on bsuite environment (#117)
- discrete cql unittest bug
- release workflow bug
- base policy model state_dict overlap bug
- remove on_policy option in dizoo config and entry
- remove torch in env
Test
- add pure docker setting test (#103)
- add unittest for dataset and evaluator (#107)
- add unittest for on-policy algorithm (#92)
- add unittest for ppo and td (MARL case) (#89)
Style
- gym version == 0.20.0
- torch version >= 1.1.0, <= 1.10.0
- ale-py == 0.7.0
New Repo
- Go-Bigger OpenDILab Multi-Agent Decision Intelligence Environment
- GoBigger-Challenge-2021 Basic code and description for GoBigger challenge 2021
Contributors: @PaParaZz1 @puyuan1996 @Will-Nie @YinminZhang @Weiyuhong-1998 @LikeJulia @sailxjx @davide97l @jayyoung0802 @lichuminglcm @yifan123 @RobinC94 @zjowowen
v0.2.0
API Change
SampleCollector
rename toSampleSerialCollector
EpisodeCollector
rename toEpisodeSerialCollector
BaseSerialEvaluator
rename toInteractionSerialEvaluator
ZerglingCollector
rename toZerglingParallelCollector
OneVsOneCollector
rename toMarineParallelCollector
AdvancedBuffer
registry name frompriority
toadvanced
Env (dizoo)
- overcooked env (#20)
- procgen env (#26)
- modified predator env (#30)
- d4rl env (#37)
- imagenet dataset (#27)
- bsuite env (#58)
- move atari_py to ale-py
Algorithm
- SQIL algorithm (#25) (#44)
- CQL algorithm (discrete/continuous) (#37) (#68)
- MAPPO algorithm (#62)
- WQMIX algorithm (#24)
- D4PG algorithm (#76)
- update multi-discrete policy(dqn, ppo, rainbow) (#51) (#72)
Enhancement
- image classification supervised training pipeline (#27)
- add force_reproducibility option in subprocess env manager
- add/delete/restart replicas via cli for k8s
- add league metric (trueskill and elo) (#22)
- add tb in naive buffer and modify tb in advanced buffer (#39)
- add k8s launcher and di-orchestrator launcher, add related unittest (#45) (#49)
- add hyper-parameter scheduler module (#38)
- add plot function (#59)
Fix
- acer weight bug and update atari result (#21)
- mappo nan bug and dict obs cannot unsqueeze bug (#54)
- r2d2 hidden state and obs pre-processing bug (#36) (#52)
- ppo bug when use dual_clip and adv > 0
- qmix double_q hidden state bug
- spawn context problem in interaction unittest (#69)
- formatted config no eval bug (#53)
- the catch statements that will never succeed and system proxy bug (#71) (#79)
- lunarlander config polish
- c51 head dimension mismatch bug
- mujoco config typo bug
- ppg atari config multi buffer bug
- max use and priority update special branch bug in advanced_buffer
Style
- add docker deploy in github workflow (#70) (#78) (#80)
- support PyTorch 1.9.0
- add algo/env list in README
- rename advanced_buffer register name to advanced
New Repo
- DI-treetensor: Tree Nested PyTorch Tensor Lib
Contributors: @PaParaZz1 @YinminZhang @Will-Nie @puyuan1996 @Weiyuhong-1998 @HansBug @sailxjx @simonat2011 @konnase @RobinC94 @LikeJulia @LuciusMos @jayyoung0802 @yifan123 @davide97l @garyzhang99