22 Sep 17:02

PaParaZz1

f8a596f

v0.4.3

Env

add rule-based gomoku expert (#465)

Algorithm

fix a2c policy batch size bug (#481)
enable activation option in collaq attention and mixer
minor fix about IBC (#477)

Enhancement

add IGM support (#486)
add tb logger middleware and demo

Fix

the type conversion in ding_env_wrapper (#483)
di-orchestrator version bug in unittest (#479)
data collection errors caused by shallow copies (#475)
gym==0.26.0 seed args bug

Style

add readme tutorial link(environment & algorithm) (#490) (#493)
adjust location of the default_model method in policy (#453)

New Repo

DI-sheep: Deep Reinforcement Learning + 3 Tiles Game

Contributors: @PaParaZz1 @nighood @norman26625 @ZHZisZZ @cpwan @mahuangxu

Contributors

cpwan, PaParaZz1, and 4 other contributors

Assets 2

07 Sep 18:11

PaParaZz1

v0.4.2

225247a

v0.4.2

API Change

config will be deepcopyed by default in compile_config function
After calling compile_config function, current code repo git log and git diff information will be saved in exp_name directory

Env

add rocket env (#449)
updated pettingzoo env and improved related performance (#457)
add mario env demo (#443)
add MAPPO multi-agent config (#464)
add mountain car (discrete action) environment (#452)
fix multi-agent mujoco gym comaptibility bug
fix gfootball env save_replay variable init bug

Algorithm

add IBC (Implicit Behaviour Cloning) algorithm (#401)
add BCO (Behaviour Cloning from Observation) algorithm (#270)
add continuous PPOPG algorithm (#414)
add PER in CollaQ (#472)
add activation option in QMIX and CollaQ

Enhancement

update ctx to dataclass (#467)

Fix

base_env FinalMeta bug about gym 0.25.0-0.25.1
config inplace modification bug
ding cli no argument problem
import errors after running setup.py (jinja2, markupsafe)
conda py3.6 and cross platform build bug

Style

add project state and datetime in log dir (#455)
polish notes for q-learning model (#427)
revision to mujoco dockerfile and validation (#474)
add dockerfile for cityflow env
polish default output log format

Contributors: @PaParaZz1 @ZHZisZZ @zjowowen @song2181 @zerlinwang @i-am-tc @hiha3456 @nighood @kxzxvbk @Weiyuhong-1998 @RobinC94

Contributors

i-am-tc, RobinC94, and 9 other contributors

Assets 2

14 Aug 09:14

PaParaZz1

v0.4.1

7a8116e

v0.4.1

API Change

upgrade Python version from 3.6-3.8 to 3.7-3.9
upgrade gym version from 0.20.0 to 0.25.0, plenty of env_id needs to update (e.g., Pendulum-v0 to Pendulum-v1) (#434)
upgrade torch version from 1.10.0 to 1.12.0
upgrade mujoco bin from 2.0.0 to 2.1.0
add new task pipeline demo (DDPG/TD3/D4PG/C51/QRDQN/IQN?SQIL/TREX/PDQN) (#374, #380, #384, #407)

Env (dizoo)

add gym anytrading env (#424)
add board games env (tictactoe, gomuku, chess) (#356)
add sokoban env (#397) (#429)
add BC and DQN demo for gfootball (#418) (#423)
add discrete pendulum env (#395)

Algorithm

add STEVE model-based algorithm (#363)
add PLR algorithm (#408)
plugin ST-DIM into PPO (#379)

Enhancement

add final result saving in training pipeline

Fix

random policy randomness bug
action_space seed compalbility bug
discard message sent by self in redis mq (#354)
remove pace controller (#400)
import error in serial_pipeline_trex (#410)
unittest hang and fail bug (#413)
DREX collect data bug
remove unused import cv2
ding CLI env/policy option bug

Style

add buffer api description (#371)
polish VAE comments (#404)
unittest for FQF (#412)
add metaworld dockerfile (#432)
remove opencv requirement in default setting
update long description in setup.py

New Repo

InterFuser: Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer
awesome-decision-transformer: A curated list of Decision Transformer resources
awesome-exploration-RL: A curated list of awesome exploration RL resources

Contributors: @PaParaZz1 @zjowowen @sailxjx @puyuan1996 @ZHZisZZ @lixl-st @Cloud-Pku @Weiyuhong-1998 @karroyan @kxzxvbk @song2181 @nighood @zhangpaipai @Hcnaeg

Contributors

sailxjx, karroyan, and 12 other contributors

Assets 2

21 Jun 12:37

PaParaZz1

v0.4.0

47940ef

v0.4.0

API Change

refactor DI-engine doc and update doc links doc | 中文文档
refactor default logging lib and add DI-toolkit (ditk) requirement (just enter pip install DI-toolkit)

Env (dizoo)

add MAPPO/MASAC all configs in SMAC (#310) (SOTA results in SMAC!!!)
add dmc2gym env (#344) (#360)
remove DI-star requirements of dizoo/smac, use official pysc2 (#302)
add latest GAIL mujoco config (#298)
polish procgen env (#311)
add MBPO ant and humanoid config for mbpo (#314)
fix slime volley env obs space bug when agent_vs_agent
fix smac env obs space bug
fix import path error in lunarlander (#362)

Algorithm

add Decision Transformer algorithm (#327) (#364)
add on-policy PPG algorithm (#312)
add DDPPO & add model-based SAC with lambda-return algorithm (#332)
add infoNCE loss and ST-DIM algorithm (#326)
add FQF distributional RL algorithm (#274)
add continuous BC algorithm (#318）
add pure policy gradient PPO algorithm (#382)
add SQIL + SAC algorithm (#348)
polish NGU and related modules (#283) (#343) (#353)
add marl distributional td loss (#331)

Enhancement

add new worker middleware (#236) (new DRL programming model and pipeline example)
refactor model-based RL pipeline (ding/world_model) (#332)
refactor logging system in the whole DI-engine (#316)
add env supervisor design (#330)
support async reset for envpool env manager (#250)
add log videos to tensorboard (#320)
refactor impala cnn encoder interface (#378)

Fix

env save replay bug
transformer mask inplace operation bug
transtion_with_policy_data bug in SAC and PPG

Style

add dockerfile for ding:hpc image (#337)
fix mpire 2.3.5 which handles default processes more elegantly (#306)
use FORMAT_DIR instead of ./ding (#309）
update quickstart colab link (#347)
polish comments in ding/model/common (#315)
update mujoco docker download path (#386)
fix protobuf new version compatibility bug
fix torch1.8.0 torch.div compatibility bug
update doc links in readme
add outline in readme and update wechat image
update head image and refactor docker dir

Contributors: @PaParaZz1 @sailxjx @puyuan1996 @ZHZisZZ @Will-Nie @zjowowen @HansBug @zerlinwang @Weiyuhong-1998 @davide97l @hiha3456 @LuciusMos @kxzxvbk @lixl-st @zhangpaipai @song2181 @karroyan

Contributors

sailxjx, HansBug, and 15 other contributors

Assets 2

23 Apr 08:19

PaParaZz1

v0.3.1

efbb35d

v0.3.1

API Change

Substitute gym.wrappers.RecordVideo for gym.wrappers.Monitor to save video replay
Substitute policy/bc.py for policy/il.py and update relevant serial_pipeline and unittest
Polish all the configurations in dizoo with our new config guideline

Env (dizoo)

polish and standardize dizoo config (#252) (#255) (#249) (#246) (#262) (#261) (#266) (#273) (#263) (#280) (#259) (#286) (#277) (#290) (#289) (#299)
add GRF academic env and config (#281)
update env inferface of GRF (#258)
update D4RL offline RL env and config (#285)
polish PomdpAtariEnv (#254)

Algorithm

DREX Inverse RL algorithm (#218)

Feature

separate mq and parallel modules, add redis (#247)
rename env variables; fix attach_to parameter (#244)
env implementation check (#275)
adjust and set the max column number of tabulate in log (#296)
speed up GTrXL forward method + GRU unittest (#253) (#292)
add drop_extra option for sample collect

Fix

add act_scale in DingEnvWrapper; fix envpool env manager (#245)
auto_reset=False and env_ref bug in env manager (#248)
data type and deepcopy bug in RND (#288)
share_memory bug and multi_mujoco env (#279)
some bugs in GTrXL (#276)
update gym_vector_env_manager and add more unittest (#241)
mdpolicy random collect bug (#293)
gym.wrapper save video replay bug
collect abnormal step format bug and add unittest

Test

add buffer benchmark & socket test (#284)

Style

upgrade mpire (#251)
add GRF(google research football) docker (#256)
update policy and gail comment

Contributors: @PaParaZz1 @sailxjx @puyuan1996 @Will-Nie @davide97l @hiha3456 @zjowowen @Weiyuhong-1998 @LuciusMos @kxzxvbk @lixl-st @YinminZhang @song2181 @Hcnaeg @norman26625 @jayyoung0802 @RobinC94 @HansBug

Contributors

sailxjx, HansBug, and 16 other contributors

Assets 2

24 Mar 08:59

PaParaZz1

v0.3.0

a1d8fdf

v0.3.0

API Change

add new BaseEnv definition:
- remove info method
- add random_action method
- add observation_space, action_space, reward_space properties
- Env English doc | 环境中文文档
modify the return value of eval method in InteractionSerialEvaluator class from Tuple[bool, float] to Tuple[bool, dict].
move the default logger to rich logger, you can set env variable like export ENABLE_RICH_LOGGING=False to disable it.
add train_iter and env_step argument in ding CLI.
- you can use them like ding -m serial -c pendulum_sac_config.py -s 0 --train-iter 1e3
remove default n_sample/n_episode value in policy default config.

Env (dizoo)

add bitfilp HER DQN benchmark (#192) (#193) (#197)
add slime volley league training demo (#229)

Algorithm

Gated TransformXL (GTrXL) algorithm (#136)
TD3 + VAE(HyAR) latent action algorithm (#152)
stochastic dueling network (#234)
use log prob instead of using prob in ACER (#186)

Feature

support envpool env manager (#228)
add league main and other improvements in new framework (#177) (#214)
add pace controller middleware in new framework (#198)
add auto recover option in new framework (#242)
add k8s parser in new framework (#243)
support async event handler and logger (#213)
add grad norm calculator (#205)
add gym vector env manager (#147)
add train_iter and env_step in serial pipeline (#212)
add rich logger handler (#219) (#223) (#232)
add naive lr_scheduler demo

Refactor

new BaseEnv and DingEnvWrapper (#171) (#231) (#240) Env English doc | 环境中文文档

Polish

Improve configurations in dizoo and add more algorithm benchmark doc example | 文档示例

MAPPO and MASAC smac config (#209) (#239)
QMIX smac config (#175)
R2D2 atari config (#181)
A2C atari config (#189)
GAIL box2d and mujoco config (#188)
ACER atari config (#180)
SQIL atari config (#230)
TREX atari/mujoco config
IMPALA atari config
MBPO/D4PG mujoco config

Fix

random_collect compatible to episode collector (#190)
remove default n_sample/n_episode value in policy config (#185)
PDQN model bug on gpu device (#220)
TREX algorithm CLI bug (#182)
DQfD JE computation bug and move to AdamW optimizer (#191)
pytest problem for parallel middleware (#211)
mujoco numpy compatibility bug
markupsafe 2.1.0 bug
framework parallel module network emit bug
mpire bug and disable algotest in py3.8
lunarlander env import and env_id bug
icm unittest repeat name bug
buffer thruput close bug

Test

resnet unittest (#199)
SAC/SQN unittest (#207)
CQL/R2D3/GAIL unittest (#201)
NGU td unittest (#210)
model wrapper unittest (#215)
MAQAC model unittest (#226)

Style

add doc docker (#221) (latex support)

Contributors: @PaParaZz1 @sailxjx @puyuan1996 @Will-Nie @Weiyuhong-1998 @davide97l @zjowowen @LuciusMos @kxzxvbk @Hcnaeg @jayyoung0802 @simonat2011 @jiaruonan

Contributors

sailxjx, PaParaZz1, and 11 other contributors

Assets 2

04 Jan 06:43

PaParaZz1

v0.2.3

c6947cd

v0.2.3

API Change

move actor_head_type to action_space (which is related DDPG/TD3/SAC)
add multiple seeds in CLI: ding -m serial -c cartpole_dqn_config.py -s 0 -s 1 -s 2
add new replay buffer (which separates algorithm and storage), user can refer to buffer
add new main pipeline for async/parallel framework tutorial

Env (dizoo)

add multi-agent mujoco env (#146)
add delay reward mujoco env (#145)
fix port conflict in gym_soccer (#139)

Algorithm

MASAC algorithm (#112)
TREX IRL algorithm (#119) (#144)
H-PPO hybrid action space algorithm (#140)
residual link in R2D2 (#150)
gumbel softmax (#169)
move actor_head_type to action_space field

Feature

new main pipeline and async/parallel framework (#142) (#166) (#168)
refactor buffer, separate algorithm and storage (#129)
cli in new pipeline(ditask) (#160)
add multiprocess tblogger, fix circular reference problem (#156)
add multiple seed cli
polish eps_greedy_multinomial_sample in model_wrapper (#154)

Fix

R2D3 abs priority problem (#158) (#161)
multi-discrete action space policies random action bug (#167)
doc generate bug with enum_tools (#155)

Style

more comments about R2D2 (#149)
add doc about how to migrate a new env link
add doc about env tutorial in dizoo link
add conda auto release (#148)
udpate zh doc link
update kaggle tutorial link

New Repo

awesome-model-based-RL: A curated list of awesome Model-Based RL resources
DI-smartcross: Decision AI in Traffic Light Control

Contributors: @PaParaZz1 @sailxjx @puyuan1996 @Will-Nie @Weiyuhong-1998 @LikeJulia @RobinC94 @LuciusMos @mingzhang96 @shgqmrf15 @zjowowen

Contributors

sailxjx, mingzhang96, and 9 other contributors

Assets 2

03 Dec 14:23

PaParaZz1

v0.2.2

312f274

v0.2.2

Env (dizoo)

apple key to door treasure env (#128)
bsuite memory benchmark (#138)
polish atari impala config

Algorithm

Guided Cost IRL algorithm (#57)
ICM exploration algorithm (#41)
MP-DQN hybrid action space algorithm (#131)
add loss statistics and polish r2d3 pong config (#126)

Enhancement

add renew env mechanism in env manager and update timeout mechanism (#127) (#134)

Fix

async subprocess env manager reset bug (#137)
keepdims name bug in model wrapper
on-policy ppo value norm bug
GAE and RND unittest bug
hidden state wrapper h tensor compatibility
naive buffer auto config create bug

Style

add supporters list

New Repo Feature

treevalue speed benchmark

Contributors: @PaParaZz1 @puyuan1996 @RobinC94 @LikeJulia @Will-Nie @Weiyuhong-1998 @timothijoe @davide97l @lichuminglcm @YinminZhang

Contributors

RobinC94, PaParaZz1, and 8 other contributors

Assets 2

22 Nov 08:15

PaParaZz1

v0.2.1

cf8ad13

v0.2.1

API Change

remove torch in all envs (numpy array is the basic data format in env)
remove on_policy field in all the config
change eval_freq from 50 to 1000

Tutorial and Doc

env tutorial/环境指南

Env (dizoo)

gym-hybrid env (#86)
gym-soccer (HFO) env (#94)
Go-Bigger env baseline (#95)
sac and ppo config for bipedalwalker env(#121)

Algorithm

DQfD Imitation Learning algorithm (#48) (#98)
TD3BC offline RL algorithm (#88)
MBPO model-based RL algorithm (#113)
PADDPG hybrid action space algorithm (#109)
PDQN hybrid action space algorithm (#118)
fix R2D2 bugs and produce benchmark, add naive NGU (#40)
self-play training demo in slime_volley env (#23)
add example of GAIL entry + config for mujoco (#114)

Enhancement

enable arbitrary policy num in serial sample collector
add torch DataParallel for single machine multi-GPU
add registry force_overwrite argument
add naive buffer periodic thruput seconds argument

Fix

target model wrapper hard reset bug
fix learn state_dict target model bug
ppo bugs and update atari ppo offpolicy config (#108)
pyyaml version bug (#99)
small fix on bsuite environment (#117)
discrete cql unittest bug
release workflow bug
base policy model state_dict overlap bug
remove on_policy option in dizoo config and entry
remove torch in env

Test

add pure docker setting test (#103)
add unittest for dataset and evaluator (#107)
add unittest for on-policy algorithm (#92)
add unittest for ppo and td (MARL case) (#89)

Style

gym version == 0.20.0
torch version >= 1.1.0, <= 1.10.0
ale-py == 0.7.0

New Repo

Go-Bigger OpenDILab Multi-Agent Decision Intelligence Environment
GoBigger-Challenge-2021 Basic code and description for GoBigger challenge 2021

Contributors: @PaParaZz1 @puyuan1996 @Will-Nie @YinminZhang @Weiyuhong-1998 @LikeJulia @sailxjx @davide97l @jayyoung0802 @lichuminglcm @yifan123 @RobinC94 @zjowowen

Contributors

sailxjx, RobinC94, and 11 other contributors

Assets 2

30 Sep 15:00

PaParaZz1

v0.2.0

769401c

v0.2.0

API Change

SampleCollector rename to SampleSerialCollector
EpisodeCollector rename to EpisodeSerialCollector
BaseSerialEvaluator rename to InteractionSerialEvaluator
ZerglingCollector rename to ZerglingParallelCollector
OneVsOneCollector rename to MarineParallelCollector
AdvancedBuffer registry name from priority to advanced

Env (dizoo)

overcooked env (#20)
procgen env (#26)
modified predator env (#30)
d4rl env (#37)
imagenet dataset (#27)
bsuite env (#58)
move atari_py to ale-py

Algorithm

SQIL algorithm (#25) (#44)
CQL algorithm (discrete/continuous) (#37) (#68)
MAPPO algorithm (#62)
WQMIX algorithm (#24)
D4PG algorithm (#76)
update multi-discrete policy(dqn, ppo, rainbow) (#51) (#72)

Enhancement

image classification supervised training pipeline (#27)
add force_reproducibility option in subprocess env manager
add/delete/restart replicas via cli for k8s
add league metric (trueskill and elo) (#22)
add tb in naive buffer and modify tb in advanced buffer (#39)
add k8s launcher and di-orchestrator launcher, add related unittest (#45) (#49)
add hyper-parameter scheduler module (#38)
add plot function (#59)

Fix

acer weight bug and update atari result (#21)
mappo nan bug and dict obs cannot unsqueeze bug (#54)
r2d2 hidden state and obs pre-processing bug (#36) (#52)
ppo bug when use dual_clip and adv > 0
qmix double_q hidden state bug
spawn context problem in interaction unittest (#69)
formatted config no eval bug (#53)
the catch statements that will never succeed and system proxy bug (#71) (#79)
lunarlander config polish
c51 head dimension mismatch bug
mujoco config typo bug
ppg atari config multi buffer bug
max use and priority update special branch bug in advanced_buffer

Style

add docker deploy in github workflow (#70) (#78) (#80)
support PyTorch 1.9.0
add algo/env list in README
rename advanced_buffer register name to advanced

New Repo

DI-treetensor: Tree Nested PyTorch Tensor Lib

Contributors: @PaParaZz1 @YinminZhang @Will-Nie @puyuan1996 @Weiyuhong-1998 @HansBug @sailxjx @simonat2011 @konnase @RobinC94 @LikeJulia @LuciusMos @jayyoung0802 @yifan123 @davide97l @garyzhang99

Contributors

sailxjx, konnase, and 14 other contributors

Assets 2

Releases: opendilab/DI-engine

v0.4.3

Env

Algorithm

Enhancement

Fix

Style

New Repo

Contributors

v0.4.2

API Change

Env

Algorithm

Enhancement

Fix

Style

Contributors

v0.4.1

API Change

Env (dizoo)

Algorithm

Enhancement

Fix

Style

New Repo

Contributors

v0.4.0

API Change

Env (dizoo)

Algorithm

Enhancement

Fix

Style

Contributors

v0.3.1

API Change

Env (dizoo)

Algorithm

Feature

Fix

Test

Style

Contributors

v0.3.0

API Change

Env (dizoo)

Algorithm

Feature

Refactor

Polish

Improve configurations in dizoo and add more algorithm benchmark doc example | 文档示例

Fix

Test

Style

Contributors

v0.2.3

API Change

Env (dizoo)

Algorithm

Feature

Fix

Style

New Repo

Contributors

v0.2.2

Env (dizoo)

Algorithm

Enhancement

Fix

Style

New Repo Feature

Contributors

v0.2.1

API Change

Tutorial and Doc

Env (dizoo)

Algorithm

Enhancement

Fix

Test