Skip to content

Commit

Permalink
[feat] creat fg json doc add upload fg json to mc method (#95)
Browse files Browse the repository at this point in the history
  • Loading branch information
chengaofei authored Jan 23, 2025
1 parent d382062 commit 5f5ebce
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion docs/source/feature/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,14 +137,26 @@ sample_weight_fields: 'col_name'
- 该模式训练速度最佳,但需提前对数据提前进行FG编码,目前仅提供MaxCompute方式,步骤如下:
- 在DLC/DSW/Local环境中生成fg json配置,上传至DataWorks的资源中,如果fg_output_dir中有vocab_file等其他文件,也需要上传至资源中
```shell
cat <<EOF>> odps_conf
access_id=${ACCESS_ID}
access_key=${ACCESS_KEY}
end_point=http://service.${region}.maxcompute.aliyun-inc.com/api
EOF
ODPS_CONFIG_FILE_PATH=odps_conf \
python -m tzrec.tools.create_fg_json \
--pipeline_config_path ${PIPELINE_CONFIG_PATH} \
--fg_output_dir fg_output \
--reserves ${COLS_YOU_WANT_RESERVE}
--reserves ${COLS_YOU_WANT_RESERVE} \
--fg_resource_name ${FG_RESOURCE_NAME} \
--odps_project_name ${PROJECT_NAME}
```
- --pipeline_config_path: 模型配置文件。
- --fg_output_dir: fg json的输出文件夹。
- --reserves: 需要透传到输出表的列,列名用逗号分隔。一般需要保留Label列,也可以保留request_id,user_id,item_id列,注意:如果模型的feature_config中有user_id,item_id作为特征,feature_name需避免与样本中的user_id,item_id列名冲突。
- --fg_resource_name: 可选,fg json在MaxCompute中的资源名,默认为fg.json
- --odps_project_name: 可选,将fg json文件上传到MaxCompute项目名,该参数必须配合参数fg_resource_name和环境变量ODPS_CONFIG_FILE_PATH一起使用
- --ODPS_CONFIG_FILE_PATH: 该环境变量指向的是odpscmd的配置文件
- 在[DataWorks](https://workbench.data.aliyun.com/)的独享资源组中安装pyfg,「资源组列表」- 在一个调度资源组的「操作」栏 点「运维助手」-「创建命令」(选手动输入)-「运行命令」
```shell
/home/tops/bin/pip3 install http://tzrec.oss-cn-beijing.aliyuncs.com/third_party/pyfg039-0.3.9-cp37-cp37m-linux_x86_64.whl
Expand Down

0 comments on commit 5f5ebce

Please sign in to comment.