docker pull registry.cn-shanghai.aliyuncs.com/pai-dlc-share/deeprec-developer:deeprec-dev-cpu-py36-ubuntu18.04
# download or clone the DeepRec code to current directory
docker run -it --rm -v $PWD/DeepRec:/root/DeepRec registry.cn-shanghai.aliyuncs.com/pai-dlc-share/deeprec-developer:deeprec-dev-cpu-py36-ubuntu18.04 /bin/bash
# in docker do:
cd /root/DeepRec
mkl_opts="--config=mkl_threadpool \
--copt=-march=skylake-avx512"
default_opts="--cxxopt=-D_GLIBCXX_USE_CXX11_ABI=0 \
--copt=-O3 \
--copt=-Wformat \
--copt=-Wformat-security \
--copt=-fstack-protector \
--copt=-fPIC \
--copt=-fpic \
--linkopt=-znoexecstack \
--linkopt=-zrelro \
--linkopt=-znow \
--linkopt=-fstack-protector"
yes "" | bash ./configure && bazel build ${default_opts} ${mkl_opts} tensorflow/tools/pip_package:build_pip_package \
&& mkdir -p ./wheels/tensorflow \
&& bazel-bin/tensorflow/tools/pip_package/build_pip_package --project_name tensorflow ${_}
get tensorflow-1.15.5+deeprec2110-cp36-cp36m-linux_x86_64.whl at $PWD/DeepRec/wheels/tensorflow/
- Install python 3.6, please note version is 3.6
- Install stock tf 1.15.5
pip install tensorflow==1.15.5
- Prepare DeepRec whl package and install when test DeepRec.
pip uninstall tensorflow pip install $(DEEPREC_WHL_PATH)
- Prepare data refer to README in 'data' folder.
- Prepare jemalloc.so both version 2.4.4 and 2.5.1 (Available in this repo), put them into work path where
train.py
exist.
- Metrics
- global steps per second and
- memory usage
- accuracy and AUC
- Tested object
- Stock TF 1.15.5
- DeepRec FP32 and BF16: Enable all optimizations in DeepRec as follow
-
Clone DeepRec repo, and
cd
intomodelzoo/$(TEST_MODEL)
.
Or just copytrain.py
script into your workspace, for DIEN and DIN,./script/
is also needed. -
Prepare tf environment, dataset and jemalloc lib. For convenience, please use docker or python virtual env to create tf env.
-
Enable DeepRec optimizations (Only for DeepRec!!!)
- Enable TensorPoolAllocator option.
export START_STATISTIC_STEP=100 export STOP_STATISTIC_STEP=110
- Set jemalloc(Version 2.5.4) env.
export MALLOC_CONF="background_thread:true,metadata_thp:auto,dirty_decay_ms:60000,muzzy_decay_ms:60000" LD_PRELOAD=./libjemalloc.so.2.5.1 $(TEST_CMD) # for example # LD_PRELOAD=./libjemalloc.so.2.5.1 python train.py --steps 3000 --bf16 --no_eval
- Set env for DeepRec!!
export TF_LAYOUT_PASS_GRAPH_CAST_FUSION=1
TF_LAYOUT_PASS_GRAPH_CAST_FUSION
is not yet merged into master branch.
- Enable TensorPoolAllocator option.
-
Run python script
Record gsteps data after stable operation and drop abnormal data, for examlpe, in3000 steps
case, record the last ten and then calculate average.
Memory usage recorded in the same steps for each test, like 2500 steps.
Please limit the CPU cores into a specified value.- Only benchmark gsteps/sec:
- FP32
# Don't forget to enable DeepRec feature for DeepRec test !!!! python train.py --steps 3000 --no_eval # For stock tf test python train.py --steps 3000 --no_eval --tf # For DeepRec after enabling jemalloc. LD_PRELOAD=./libjemalloc.so.2.5.1 python train.py --steps 3000 --no_eval
- BF16 (only for DeepRec!!!)
# For DeepRec after enabling jemalloc. LD_PRELOAD=./libjemalloc.so.2.5.1 python train.py --steps 3000 --no_eval --bf16
- FP32
- Benchmark ACC, AUC and gsteps/sec:
- FP32
# Don't forget to enable DeepRec feature for DeepRec test !!!! python train.py # For stock tf test python train.py
- BF16 (only for DeepRec!!!)
# Don't forget to enable DeepRec feature for DeepRec test !!!! python train.py --bf16
- FP32
- Only benchmark gsteps/sec:
kubectl get nodes -o wide
shows the nodes of K8S cluster.
Place image into where each node can access it, like docker-hub's official repo.
Please keep the containers setting of Cheif, Worker and PS the same.
Pod's template (The underlined ones need attention):
-
replicas:
the num of copies of pod(worker and ps). -
affinity:podAntiAffinity:
antiaffinity setting to distribute pods on different nodes as much as possible. -
volumes:
the statement of shared storage that will be mounted on docker to store checkpoint. -
containers:
the setting of docker containers.-
env:
environment variables set when the container is created. -
image:
where is the docker image. -
args:
command executed after container created. -
volumeMounts:
the mounted volume in container.-
mountPath:
the path of volumn. -
name:
the name of mounted volume.
-
-
-
replicas:
-
Woker: set to 10.
-
Ps: set to 4.
-
-
image: set docker image.
-
env:
-
for model training set: refer to stand-alone setting. Note that jemalloc setting change to
MEM_USAGE_STRATEGY
, unable byclose
(for stock tf 1.15.5) and enable by251
(for DeepRec) . -
for
launch.py
:-
JEMALLOC_PATH
: path to jemalloc .so file. -
TF_SCRIPT
: the model training python script name. -
TF_WORKSPACE
: work space.
-
-
-
args: run
launch.py
to call model training script, which is used to set some distributed training config and the args will be passed to model training script. So, Refer to stand-alone setting to set the parameters oflaunch.py
. In addition, there are some settings that need to be set.-
--save_steps=5000
: set steps of saving checkpoint, cannot be too small, because every save takes time. -
--output_dir=/pvc
: set to the mounted shared volumn. -
--protocol=grpc
:set to grpc protocol
! -
--input_layer_partitioner=8
: slice size of input layer partitioner, units MB. Set to 8. -
--dense_layer_partitioner=16
: slice size of dense layer partitioner, units MB. Set to 16. -
--inter
and--intra
: both set to 8.
-
Differe setting for stock tf and DeepRec:
-
Stock TF:
- Set ENV
MEM_USAGE_STRATEGY
toclose
. - Args add
--tf
.
- Set ENV
-
DeepRec:
-
Enable DeepRec feature in stand-alone training except for jemalloc, which enabled in a different way.
-
Set ENV
MEM_USAGE_STRATEGY
to251
. -
Test FP32 and BF16 cases. Enable BF16 by add
--bf16
in args forlaunch.py
.
-
-
run
kubectl create -f test.yaml
to create a KubeFlow/TFjob. -
run
kubectl get tfjob -o wide
to check tfjob's status. -
run
kubectl get pods -o wide
to see status of tfjob's pod,run
watch -n 1 kubectl get pods -o wide
to auto refresh status. -
After all pods running, run
kubectl logs -f trainer-worker-0
to get training log.run
kubectl logs -f trainer-worker-0 | tee test.log
to save log to log file and show log in screen. -
After training completed, run
kubectl delete tfjob trainer
to delete tfjob.Note: training log won't tell you training is completed, worker-0 and PS are still running when training complete, but other Worker's status turns to completed.