Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serving镜像打包、注册到kuscia容器及使用问题 #128

Open
neaos opened this issue Dec 6, 2024 · 49 comments
Open

serving镜像打包、注册到kuscia容器及使用问题 #128

neaos opened this issue Dec 6, 2024 · 49 comments
Assignees

Comments

@neaos
Copy link

neaos commented Dec 6, 2024

image

我改了Dockerfile的基础镜像,使用FROM secretflow/ubuntu-base-ci:latest基础镜像,打包成镜像文件serving-anolis8_sv-12-05.tar,会导致镜像比原来的大很多。

然后注册到kuscia容器里,
image
显示注册进kuscia容器成功。
再进kuscia容器里查看
image

我是这么注册的:
1、将kuscia容器里的sf-serving-0.yaml拷贝出来,docker cp 851cfd0a2bbc:/home/kuscia/sf-serving-0.yaml ./
2、修改sf-serving-0.yaml里的最后部分的image配置
原:
image:
id: 91d26a38f00e
name: secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/serving-anolis8
sign: abc13mnjh1olkkp1
tag: 0.3.1b0
修改为:
image:
name: serving-anolis8
tag: sv-12-05
3、执行
bash ./register_app_image/register_app_image.sh -u idata -d com2023011620063473637 -m p2p -n sf-serving-image -f ./register_app_image/sf-serving-0.yaml -i serving-anolis8:sv-12-05

注册完成后,我再发起在线推理请求报错,并且没有日志
image

@neaos
Copy link
Author

neaos commented Dec 6, 2024

查询serving的任务如下:
<200,{"status":{"code":0,"message":"success","details":[]},"data":{"servings":[{"serving_id":"serving-2024120611252796246","status":{"state":"Progressing","reason":"","message":"","total_parties":2,"available_parties":0,"create_time":"2024-12-06T03:25:27Z","party_statuses":[{"domain_id":"com2023011620063473637","role":"","state":"Progressing","replicas":1,"available_replicas":0,"unavailable_replicas":1,"updatedReplicas":1,"create_time":"2024-12-06T03:25:27Z","endpoints":[{"port_name":"communication","scope":"Cluster","endpoint":"serving-2024120611252796246-communication.com2023011620063473637.svc"},{"port_name":"internal","scope":"Domain","endpoint":"serving-2024120611252796246-internal.com2023011620063473637.svc:53510"},{"port_name":"brpc-builtin","scope":"Domain","endpoint":"serving-2024120611252796246-brpc-builtin.com2023011620063473637.svc:53511"},{"port_name":"service","scope":"Domain","endpoint":"serving-2024120611252796246-service.com2023011620063473637.svc:53508"}]},{"domain_id":"com2023011620072311739","role":"","state":"Progressing","replicas":1,"available_replicas":0,"unavailable_replicas":1,"updatedReplicas":1,"create_time":"2024-12-06T03:25:34Z","endpoints":[]}]}}]}},[Content-Type:"application/json; charset=utf-8", Date:"Fri, 06 Dec 2024 04:47:38 GMT", Content-Length:"1193"]>

看起来是正常的

@neaos
Copy link
Author

neaos commented Dec 6, 2024

但是在调用推理预测的时候报错:
org.springframework.web.client.HttpClientErrorException$NotFound: 404 Not Found: [no body]

@wangzul
Copy link

wangzul commented Dec 6, 2024

但是在调用推理预测的时候报错: org.springframework.web.client.HttpClientErrorException$NotFound: 404 Not Found: [no body]

使用命令kubectl get appimage [name] -oyaml 看一下配置

@wangzul
Copy link

wangzul commented Dec 6, 2024

不清楚名称可以先执行kubectl get appimage 看一下serving的name

@neaos
Copy link
Author

neaos commented Dec 6, 2024

image

serving的name应该是sf-serving-image吧

@neaos
Copy link
Author

neaos commented Dec 6, 2024

我注册的时候用的是
bash ./register_app_image/register_app_image.sh -u idata -d com2023011620063473637 -m p2p -n sf-serving-image -f ./register_app_image/sf-serving-0.yaml -i serving-anolis8:sv-12-05

-n后面的是sf-serving-image

@wangzul
Copy link

wangzul commented Dec 6, 2024

image

serving的name应该是sf-serving-image吧

是的。使用kubectl get appimage sf-serving-image -oyaml查看一下配置状态

@neaos
Copy link
Author

neaos commented Dec 6, 2024

[root@idata-kuscia-autonomy-com2023011620063473637 kuscia]# kubectl get appimage sf-serving-image -oyaml
apiVersion: kuscia.secretflow/v1alpha1
kind: AppImage
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"kuscia.secretflow/v1alpha1","kind":"AppImage","metadata":{"annotations":{},"name":"sf-serving-image"},"spec":{"configTemplates":{"serving-config.conf":"{\n "serving_id": "{{.SERVING_ID}}",\n "input_config": "{{.INPUT_CONFIG}}",\n "cluster_def": "{{.CLUSTER_DEFINE}}",\n "allocated_ports": "{{.ALLOCATED_PORTS}}",\n "oss_meta": "{{.MODEL_OSS_META}}"\n}\n"},"deployTemplates":[{"name":"secretflow","replicas":1,"spec":{"containers":[{"command":["sh","-c","./secretflow_serving --flagfile=conf/gflags.conf --config_mode=kuscia --serving_config_file=/etc/kuscia/serving-config.conf"],"configVolumeMounts":[{"mountPath":"/etc/kuscia/serving-config.conf","subPath":"serving-config.conf"}],"livenessProbe":{"httpGet":{"path":"/health","port":53511}},"name":"secretflow","ports":[{"name":"service","port":53508,"protocol":"HTTP","scope":"Domain"},{"name":"communication","port":53509,"protocol":"HTTP","scope":"Cluster"},{"name":"internal","port":53510,"protocol":"HTTP","scope":"Domain"},{"name":"brpc-builtin","port":53511,"protocol":"HTTP","scope":"Domain"}],"readinessProbe":{"httpGet":{"path":"/health","port":53511}},"startupProbe":{"failureThreshold":30,"httpGet":{"path":"/health","port":53511},"periodSeconds":10,"successThreshold":1,"timeoutSeconds":1},"workingDir":"/root/sf_serving"}]}}],"image":{"name":"serving-anolis8","tag":"sv-12-05"}}}
creationTimestamp: "2024-08-23T02:42:02Z"
generation: 2
name: sf-serving-image
resourceVersion: "14266295"
uid: 1b007fa9-98b3-4772-ac06-5358880e5a64
spec:
configTemplates:
serving-config.conf: |
{
"serving_id": "{{.SERVING_ID}}",
"input_config": "{{.INPUT_CONFIG}}",
"cluster_def": "{{.CLUSTER_DEFINE}}",
"allocated_ports": "{{.ALLOCATED_PORTS}}",
"oss_meta": "{{.MODEL_OSS_META}}"
}
deployTemplates:

  • name: secretflow
    replicas: 1
    spec:
    containers:
    • command:
      • sh
      • -c
      • ./secretflow_serving --flagfile=conf/gflags.conf --config_mode=kuscia --serving_config_file=/etc/kuscia/serving-config.conf
        configVolumeMounts:
      • mountPath: /etc/kuscia/serving-config.conf
        subPath: serving-config.conf
        livenessProbe:
        httpGet:
        path: /health
        port: 53511
        name: secretflow
        ports:
      • name: service
        port: 53508
        protocol: HTTP
        scope: Domain
      • name: communication
        port: 53509
        protocol: HTTP
        scope: Cluster
      • name: internal
        port: 53510
        protocol: HTTP
        scope: Domain
      • name: brpc-builtin
        port: 53511
        protocol: HTTP
        scope: Domain
        readinessProbe:
        httpGet:
        path: /health
        port: 53511
        startupProbe:
        failureThreshold: 30
        httpGet:
        path: /health
        port: 53511
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
        workingDir: /root/sf_serving
        image:
        name: serving-anolis8
        tag: sv-12-05
        [root@idata-kuscia-autonomy-com2023011620063473637 kuscia]#

@wangzul
Copy link

wangzul commented Dec 6, 2024

执行一下kubectl get pod -A 看一下

@neaos
Copy link
Author

neaos commented Dec 6, 2024

image
image

image
image

@neaos
Copy link
Author

neaos commented Dec 6, 2024

我的新推理的是"serving_id":"serving-2024120611252796246"

@neaos
Copy link
Author

neaos commented Dec 6, 2024

Uploading image.png…

@neaos
Copy link
Author

neaos commented Dec 6, 2024

[root@idata-kuscia-autonomy-com2023011620063473637 kuscia]# kubectl get pod -A |grep serving-2024120611252796246
com2023011620063473637 serving-2024120611252796246-f4464687d-84w4q 0/1 ErrImagePull 0 125m
[root@idata-kuscia-autonomy-com2023011620063473637 kuscia]#

@neaos
Copy link
Author

neaos commented Dec 6, 2024

ErrImagePull这个状态

@neaos
Copy link
Author

neaos commented Dec 6, 2024

[root@idata-kuscia-autonomy-com2023011620063473637 kuscia]# kubectl describe pod serving-2024120611252796246-f4464687d-84w4q -n com2023011620063473637
Name: serving-2024120611252796246-f4464687d-84w4q
Namespace: com2023011620063473637
Priority: 0
Service Account: default
Node: idata-kuscia-autonomy-com2023011620063473637/172.19.0.2
Start Time: Fri, 06 Dec 2024 11:25:28 +0800
Labels: kuscia.secretflow/app-type=serving
kuscia.secretflow/communication-role-client=true
kuscia.secretflow/communication-role-server=true
kuscia.secretflow/controller=KusciaDeployment
kuscia.secretflow/deployment-name=serving-2024120611252796246
kuscia.secretflow/kd-name=serving-2024120611252796246
kuscia.secretflow/kd-uid=014a9335-f2f7-4bd7-a6f4-eac2a49e10b4
kuscia.secretflow/owner_namespace=cross-domain
pod-template-hash=f4464687d
Annotations: kuscia.secretflow/config-template-volumes: config-template
Status: Pending
IP: 10.88.5.38
IPs:
IP: 10.88.5.38
Controlled By: ReplicaSet/serving-2024120611252796246-f4464687d
Containers:
secretflow:
Container ID:
Image: serving-anolis8:sv-12-05
Image ID:
Ports: 53508/TCP, 53509/TCP, 53510/TCP, 53511/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
Command:
sh
-c
./secretflow_serving --flagfile=conf/gflags.conf --config_mode=kuscia --serving_config_file=/etc/kuscia/serving-config.conf
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Liveness: http-get http://:53511/health delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:53511/health delay=0s timeout=1s period=10s #success=1 #failure=3
Startup: http-get http://:53511/health delay=0s timeout=1s period=10s #success=1 #failure=30
Environment:
KUSCIA_DOMAIN_ID: com2023011620063473637
CLUSTER_DEFINE: {"parties":[{"name":"com2023011620063473637","role":"","services":[{"portName":"service","endpoints":["serving-2024120611252796246-service.com2023011620063473637.svc:53508"]},{"portName":"communication","endpoints":["serving-2024120611252796246-communication.com2023011620063473637.svc"]},{"portName":"internal","endpoints":["serving-2024120611252796246-internal.com2023011620063473637.svc:53510"]},{"portName":"brpc-builtin","endpoints":["serving-2024120611252796246-brpc-builtin.com2023011620063473637.svc:53511"]}]},{"name":"com2023011620072311739","role":"","services":[{"portName":"communication","endpoints":["serving-2024120611252796246-communication.com2023011620072311739.svc"]}]}],"selfPartyIdx":0,"selfEndpointIdx":0}
ALLOCATED_PORTS: {"ports":[{"name":"service","port":53508,"scope":"Domain","protocol":"HTTP"},{"name":"communication","port":53509,"scope":"Cluster","protocol":"HTTP"},{"name":"internal","port":53510,"scope":"Domain","protocol":"HTTP"},{"name":"brpc-builtin","port":53511,"scope":"Domain","protocol":"HTTP"}]}
INPUT_CONFIG: {"party_configs":{"com2023011620063473637":{"server_config":{"feature_mapping":{"AGE":"AGE","EDUCATION":"EDUCATION","DEFAULT":"DEFAULT","BALANCE":"BALANCE","HOUSING":"HOUSING","LOAN":"LOAN","DAY":"DAY","DURATION":"DURATION","CAMPAIGN":"CAMPAIGN","PDAYS":"PDAYS","PREVIOUS":"PREVIOUS","JOB_BLUE-COLLAR":"JOB_BLUE-COLLAR","JOB_ENTREPRENEUR":"JOB_ENTREPRENEUR","JOB_HOUSEMAID":"JOB_HOUSEMAID","JOB_MANAGEMENT":"JOB_MANAGEMENT","JOB_RETIRED":"JOB_RETIRED","JOB_SELF-EMPLOYED":"JOB_SELF-EMPLOYED","JOB_SERVICES":"JOB_SERVICES","JOB_STUDENT":"JOB_STUDENT","JOB_TECHNICIAN":"JOB_TECHNICIAN","JOB_UNEMPLOYED":"JOB_UNEMPLOYED","MARITAL_DIVORCED":"MARITAL_DIVORCED","MARITAL_MARRIED":"MARITAL_MARRIED","MARITAL_SINGLE":"MARITAL_SINGLE"}},"model_config":{"modelId":"model2024120609490673794-model-export-output","basePath":"/","sourcePath":"/home/kuscia/var/storage/data/model2024120609490673794-model-export-output","sourceType":"ST_FILE"},"feature_source_config":{"dbOpts":{"file_path":"{"connection_str":"DSN=dm;SERVER=172.16.0.217;UID=SYSDBA;PWD=SYSDBA001;TCP_PORT=52360","datasource_kind_sub":1,"datasource_kind":3,"table_name":"ALICE"}","id_name":"ID1"}},"channel_desc":{"protocol":"http"}},"com2023011620072311739":{"server_config":{"feature_mapping":{"CONTACT_CELLULAR":"CONTACT_CELLULAR","CONTACT_TELEPHONE":"CONTACT_TELEPHONE","CONTACT_UNKNOWN":"CONTACT_UNKNOWN","MONTH_APR":"MONTH_APR","MONTH_AUG":"MONTH_AUG","MONTH_DEC":"MONTH_DEC","MONTH_FEB":"MONTH_FEB","MONTH_JAN":"MONTH_JAN","MONTH_JUL":"MONTH_JUL","MONTH_JUN":"MONTH_JUN","MONTH_MAR":"MONTH_MAR","MONTH_MAY":"MONTH_MAY","MONTH_NOV":"MONTH_NOV","MONTH_OCT":"MONTH_OCT","MONTH_SEP":"MONTH_SEP","POUTCOME_FAILURE":"POUTCOME_FAILURE","POUTCOME_OTHER":"POUTCOME_OTHER","POUTCOME_SUCCESS":"POUTCOME_SUCCESS","POUTCOME_UNKNOWN":"POUTCOME_UNKNOWN"}},"model_config":{"modelId":"model2024120609490673794-model-export-output","basePath":"/","sourcePath":"/home/kuscia/var/storage/data/model2024120609490673794-model-export-output","sourceType":"ST_FILE"},"feature_source_config":{"dbOpts":{"file_path":"{"connection_str":"DSN=dm;SERVER=172.16.0.219;UID=SYSDBA;PWD=SYSDBA001;TCP_PORT=52360","datasource_kind_sub":1,"datasource_kind":3,"table_name":"BOB"}","id_name":"ID2"}},"channel_desc":{"protocol":"http"}}}}
KUSCIA_PORT_SERVICE_NUMBER: 53508
KUSCIA_PORT_COMMUNICATION_NUMBER: 53509
KUSCIA_PORT_INTERNAL_NUMBER: 53510
KUSCIA_PORT_BRPC_BUILTIN_NUMBER: 53511
SERVING_ID: serving-2024120611252796246
Mounts:
/etc/kuscia/serving-config.conf from config-template (rw,path="serving-config.conf")
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-template:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: serving-2024120611252796246-configtemplate
Optional: false
QoS Class: BestEffort
Node-Selectors: kuscia.secretflow/namespace=com2023011620063473637
Tolerations: kuscia.secretflow/agent:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
[root@idata-kuscia-autonomy-com2023011620063473637 kuscia]#

@wangzul
Copy link

wangzul commented Dec 6, 2024

ErrImagePull这个状态

我这边网络有点问题,暂时看不到图片信息你这样操作一下试试。

  1. 已确定你的镜像名称是serving-anolis8
  2. 执行命令 kubectl edit appimage sf-serving-image -oyaml
  3. 修改一下name=docker.io/serving-anolis8 尝试一下

@neaos
Copy link
Author

neaos commented Dec 6, 2024

在kuscia容器里查看的镜像名称好像是这个docker.io/library/serving-anolis8

[root@idata-kuscia-autonomy-com2023011620072311739 kuscia]# crictl images|grep serving
docker.io/library/serving-anolis8 sv-12-05 7f1037eb17993 2.25GB

@wangzul
Copy link

wangzul commented Dec 6, 2024

docker.io/library/serving-anolis8

对appinage 中的name 替换为docker.io/library/serving-anolis8

@neaos
Copy link
Author

neaos commented Dec 6, 2024

通过kubectl edit appimage sf-serving-image -oyaml命令改完 怎么让其生效呢

@wangzul
Copy link

wangzul commented Dec 6, 2024

通过kubectl edit appimage sf-serving-image -oyaml命令改完 怎么让其生效呢

执行:wq保存后就会生效,你可以查询命令kubectl get xxx 验证一下是否成功

@neaos
Copy link
Author

neaos commented Dec 6, 2024

[root@idata-kuscia-autonomy-com2023011620063473637 kuscia]# kubectl get pod -A |grep serving-2024120614172177844
com2023011620063473637 serving-2024120614172177844-76cdf7dbc8-94xq2 0/1 CrashLoopBackOff 1 (50s ago) 55s
[root@idata-kuscia-autonomy-com2023011620063473637 kuscia]#

改完新发去serving推理任务状态是CrashLoopBackOff

@neaos
Copy link
Author

neaos commented Dec 6, 2024

[root@idata-kuscia-autonomy-com2023011620063473637 kuscia]# kubectl describe pod serving-2024120614172177844-76cdf7dbc8-94xq2 -n com2023011620063473637
Name: serving-2024120614172177844-76cdf7dbc8-94xq2
Namespace: com2023011620063473637
Priority: 0
Service Account: default
Node: idata-kuscia-autonomy-com2023011620063473637/172.19.0.2
Start Time: Fri, 06 Dec 2024 14:17:21 +0800
Labels: kuscia.secretflow/app-type=serving
kuscia.secretflow/communication-role-client=true
kuscia.secretflow/communication-role-server=true
kuscia.secretflow/controller=KusciaDeployment
kuscia.secretflow/deployment-name=serving-2024120614172177844
kuscia.secretflow/kd-name=serving-2024120614172177844
kuscia.secretflow/kd-uid=0c5d0049-0353-4107-a4bc-fb1747f4a130
kuscia.secretflow/owner_namespace=cross-domain
pod-template-hash=76cdf7dbc8
Annotations: kuscia.secretflow/config-template-volumes: config-template
Status: Running
IP: 10.88.5.47
IPs:
IP: 10.88.5.47
Controlled By: ReplicaSet/serving-2024120614172177844-76cdf7dbc8
Containers:
secretflow:
Container ID: containerd://8a327eca975f2e45408f20678251b2d55a0df3aee1eaaf39b67fc78d49b1cf8f
Image: docker.io/library/serving-anolis8:sv-12-05
Image ID: sha256:7f1037eb179930274ff671697b915cf391eb6d0768596cb5537c8a060ed363ef
Ports: 53508/TCP, 53509/TCP, 53510/TCP, 53511/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
Command:
sh
-c
./secretflow_serving --flagfile=conf/gflags.conf --config_mode=kuscia --serving_config_file=/etc/kuscia/serving-config.conf
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Message: 2024-12-06 14:17:26.676 [info] [trace.cc:SetUpTracerProvider:137] no span processor configured, noop tracer will be used
2024-12-06 14:17:26.676 [info] [main.cc:main:96] version: 0.7.0b0
2024-12-06 14:17:26.676 [info] [main.cc:main:107] op list: PHE_2P_REDUCE, PHE_2P_MERGE_Y, PHE_2P_DECRYPT_PEER_Y, MERGE_Y, DOT_PRODUCT, PHE_2P_DOT_PRODUCT, ARROW_PROCESSING, TREE_SELECT, TREE_MERGE, TREE_ENSEMBLE_PREDICT
2024-12-06 14:17:26.677 [info] [config_parser.cc:KusciaConfigParser:56] raw kuscia serving config content: {
"serving_id": "serving-2024120614172177844",
"input_config": "{"party_configs":{"com2023011620063473637":{"server_config":{"feature_mapping":{"AGE":"AGE","EDUCATION":"EDUCATION","DEFAULT":"DEFAULT","BALANCE":"BALANCE","HOUSING":"HOUSING","LOAN":"LOAN","DAY":"DAY","DURATION":"DURATION","CAMPAIGN":"CAMPAIGN","PDAYS":"PDAYS","PREVIOUS":"PREVIOUS","JOB_BLUE-COLLAR":"JOB_BLUE-COLLAR","JOB_ENTREPRENEUR":"JOB_ENTREPRENEUR","JOB_HOUSEMAID":"JOB_HOUSEMAID","JOB_MANAGEMENT":"JOB_MANAGEMENT","JOB_RETIRED":"JOB_RETIRED","JOB_SELF-EMPLOYED":"JOB_SELF-EMPLOYED","JOB_SERVICES":"JOB_SERVICES","JOB_STUDENT":"JOB_STUDENT","JOB_TECHNICIAN":"JOB_TECHNICIAN","JOB_UNEMPLOYED":"JOB_UNEMPLOYED","MARITAL_DIVORCED":"MARITAL_DIVORCED","MARITAL_MARRIED":"MARITAL_MARRIED","MARITAL_SINGLE":"MARITAL_SINGLE"}},"model_config":{"modelId":"model2024120609490673794-model-export-output","basePath":"/","sourcePath":"/home/kuscia/var/storage/data/model2024120609490673794-model-export-output","sourceType":"ST_FILE"},"feature_source_config":{"dbOpts":{"file_path":"{\"connection_str\":\"DSN=dm;SERVER=172.16.0.217;UID=SYSDBA;PWD=SYSDBA001;TCP_PORT=52360\",\"datasource_kind_sub\":1,\"datasource_kind\":3,\"table_name\":\"ALICE\"}","id_name":"ID1"}},"channel_desc":{"protocol":"http"}},"com2023011620072311739":{"server_config":{"feature_mapping":{"CONTACT_CELLULAR":"CONTACT_CELLULAR","CONTACT_TELEPHONE":"CONTACT_TELEPHONE","CONTACT_UNKNOWN":"CONTACT_UNKNOWN","MONTH_APR":"MONTH_APR","MONTH_AUG":"MONTH_AUG","MONTH_DEC":"MONTH_DEC","MONTH_FEB":"MONTH_FEB","MONTH_JAN":"MONTH_JAN","MONTH_JUL":"MONTH_JUL","MONTH_JUN":"MONTH_JUN","MONTH_MAR":"MONTH_MAR","MONTH_MAY":"MONTH_MAY","MONTH_NOV":"MONTH_NOV","MONTH_OCT":"MONTH_OCT","MONTH_SEP":"MONTH_SEP","POUTCOME_FAILURE":"POUTCOME_FAILURE","POUTCOME_OTHER":"POUTCOME_OTHER","POUTCOME_SUCCESS":"POUTCOME_SUCCESS","POUTCOME_UNKNOWN":"POUTCOME_UNKNOWN"}},"model_config":{"modelId":"model2024120609490673794-model-export-output","basePath":"/","sourcePath":"/home/kuscia/var/storage/data/model2024120609490673794-model-export-output","sourceType":"ST_FILE"},"feature_source_config":{"dbOpts":{"file_path":"{\"connection_str\":\"DSN=dm;SERVER=172.16.0.219;UID=SYSDBA;PWD=SYSDBA001;TCP_PORT=52360\",\"datasource_kind_sub\":1,\"datasource_kind\":3,\"table_name\":\"BOB\"}","id_name":"ID2"}},"channel_desc":{"protocol":"http"}}}}",
"cluster_def": "{"parties":[{"name":"com2023011620063473637","role":"","services":[{"portName":"service","endpoints":["serving-2024120614172177844-service.com2023011620063473637.svc:53508"]},{"portName":"communication","endpoints":["serving-2024120614172177844-communication.com2023011620063473637.svc"]},{"portName":"internal","endpoints":["serving-2024120614172177844-internal.com2023011620063473637.svc:53510"]},{"portName":"brpc-builtin","endpoints":["serving-2024120614172177844-brpc-builtin.com2023011620063473637.svc:53511"]}]},{"name":"com2023011620072311739","role":"","services":[{"portName":"communication","endpoints":["serving-2024120614172177844-communication.com2023011620072311739.svc"]}]}],"selfPartyIdx":0,"selfEndpointIdx":0}",
"allocated_ports": "{"ports":[{"name":"brpc-builtin","port":53511,"scope":"Domain","protocol":"HTTP"},{"name":"service","port":53508,"scope":"Domain","protocol":"HTTP"},{"name":"communication","port":53509,"scope":"Cluster","protocol":"HTTP"},{"name":"internal","port":53510,"scope":"Domain","protocol":"HTTP"}]}",
"oss_meta": ""
}

2024-12-06 14:17:26.677 [info] [retry_policy.cc:RetryPolicy:48] Create RetryPolicy:backoff_time:10ms
2024-12-06 14:17:26.677 [info] [retry_policy.cc:RetryPolicy:48] Create RetryPolicy:backoff_time:10ms
2024-12-06 14:17:26.677 [info] [retry_policy.cc:SetConfig:171] Regist retry policy: name=com2023011620072311739
2024-12-06 14:17:26.681 [info] [filesystem_source.cc:OnPullModel:37] copy model file from /home/kuscia/var/storage/data/model2024120609490673794-model-export-output to /serving-2024120614172177844/model2024120609490673794-model-export-output/model_bundle.tar.gz
2024-12-06 14:17:26.681 [info] [model_loader.cc:Load:37] begin load file: /serving-2024120614172177844/model2024120609490673794-model-export-output/model_bundle.tar.gz
2024-12-06 14:17:26.686 [info] [model_loader.cc:Load:82] end load model bundle, name: modelExport-model2024120609490673794_3c41761b-ba91-4b8d-9f55-0269c5a6318c, desc: , graph version: 0.1.0
2024-12-06 14:17:26.690 [info] [thread_pool.h:Start:94] Create and start thread pool with 16 threads
2024-12-06 14:17:26.691 [info] [execution_core.cc:ExecutionCore:73] create feature adapter, type:5
2024-12-06 14:17:26.694 [error] [main.cc:main:149] server startup failed, msg:[Enforce fail at ./secretflow_serving/feature_adapter/feature_adapter_factory.h:52] creator. no creator registered for operator type: 5
Stacktrace:
#0 secretflow::serving::ExecutionCore::ExecutionCore()+0x55ae14e8908f
#1 secretflow::serving::Server::Start()+0x55ae14e64847
#2 main+0x55ae14e5c4ae
#3 (unknown)+0x7f5afc192d90

  Exit Code:    255
  Started:      Fri, 06 Dec 2024 14:17:26 +0800
  Finished:     Fri, 06 Dec 2024 14:17:26 +0800
Ready:          False
Restart Count:  1
Liveness:       http-get http://:53511/health delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness:      http-get http://:53511/health delay=0s timeout=1s period=10s #success=1 #failure=3
Startup:        http-get http://:53511/health delay=0s timeout=1s period=10s #success=1 #failure=30
Environment:
  KUSCIA_DOMAIN_ID:                  com2023011620063473637
  CLUSTER_DEFINE:                    {"parties":[{"name":"com2023011620063473637","role":"","services":[{"portName":"service","endpoints":["serving-2024120614172177844-service.com2023011620063473637.svc:53508"]},{"portName":"communication","endpoints":["serving-2024120614172177844-communication.com2023011620063473637.svc"]},{"portName":"internal","endpoints":["serving-2024120614172177844-internal.com2023011620063473637.svc:53510"]},{"portName":"brpc-builtin","endpoints":["serving-2024120614172177844-brpc-builtin.com2023011620063473637.svc:53511"]}]},{"name":"com2023011620072311739","role":"","services":[{"portName":"communication","endpoints":["serving-2024120614172177844-communication.com2023011620072311739.svc"]}]}],"selfPartyIdx":0,"selfEndpointIdx":0}
  ALLOCATED_PORTS:                   {"ports":[{"name":"brpc-builtin","port":53511,"scope":"Domain","protocol":"HTTP"},{"name":"service","port":53508,"scope":"Domain","protocol":"HTTP"},{"name":"communication","port":53509,"scope":"Cluster","protocol":"HTTP"},{"name":"internal","port":53510,"scope":"Domain","protocol":"HTTP"}]}
  INPUT_CONFIG:                      {"party_configs":{"com2023011620063473637":{"server_config":{"feature_mapping":{"AGE":"AGE","EDUCATION":"EDUCATION","DEFAULT":"DEFAULT","BALANCE":"BALANCE","HOUSING":"HOUSING","LOAN":"LOAN","DAY":"DAY","DURATION":"DURATION","CAMPAIGN":"CAMPAIGN","PDAYS":"PDAYS","PREVIOUS":"PREVIOUS","JOB_BLUE-COLLAR":"JOB_BLUE-COLLAR","JOB_ENTREPRENEUR":"JOB_ENTREPRENEUR","JOB_HOUSEMAID":"JOB_HOUSEMAID","JOB_MANAGEMENT":"JOB_MANAGEMENT","JOB_RETIRED":"JOB_RETIRED","JOB_SELF-EMPLOYED":"JOB_SELF-EMPLOYED","JOB_SERVICES":"JOB_SERVICES","JOB_STUDENT":"JOB_STUDENT","JOB_TECHNICIAN":"JOB_TECHNICIAN","JOB_UNEMPLOYED":"JOB_UNEMPLOYED","MARITAL_DIVORCED":"MARITAL_DIVORCED","MARITAL_MARRIED":"MARITAL_MARRIED","MARITAL_SINGLE":"MARITAL_SINGLE"}},"model_config":{"modelId":"model2024120609490673794-model-export-output","basePath":"/","sourcePath":"/home/kuscia/var/storage/data/model2024120609490673794-model-export-output","sourceType":"ST_FILE"},"feature_source_config":{"dbOpts":{"file_path":"{\"connection_str\":\"DSN=dm;SERVER=172.16.0.217;UID=SYSDBA;PWD=SYSDBA001;TCP_PORT=52360\",\"datasource_kind_sub\":1,\"datasource_kind\":3,\"table_name\":\"ALICE\"}","id_name":"ID1"}},"channel_desc":{"protocol":"http"}},"com2023011620072311739":{"server_config":{"feature_mapping":{"CONTACT_CELLULAR":"CONTACT_CELLULAR","CONTACT_TELEPHONE":"CONTACT_TELEPHONE","CONTACT_UNKNOWN":"CONTACT_UNKNOWN","MONTH_APR":"MONTH_APR","MONTH_AUG":"MONTH_AUG","MONTH_DEC":"MONTH_DEC","MONTH_FEB":"MONTH_FEB","MONTH_JAN":"MONTH_JAN","MONTH_JUL":"MONTH_JUL","MONTH_JUN":"MONTH_JUN","MONTH_MAR":"MONTH_MAR","MONTH_MAY":"MONTH_MAY","MONTH_NOV":"MONTH_NOV","MONTH_OCT":"MONTH_OCT","MONTH_SEP":"MONTH_SEP","POUTCOME_FAILURE":"POUTCOME_FAILURE","POUTCOME_OTHER":"POUTCOME_OTHER","POUTCOME_SUCCESS":"POUTCOME_SUCCESS","POUTCOME_UNKNOWN":"POUTCOME_UNKNOWN"}},"model_config":{"modelId":"model2024120609490673794-model-export-output","basePath":"/","sourcePath":"/home/kuscia/var/storage/data/model2024120609490673794-model-export-output","sourceType":"ST_FILE"},"feature_source_config":{"dbOpts":{"file_path":"{\"connection_str\":\"DSN=dm;SERVER=172.16.0.219;UID=SYSDBA;PWD=SYSDBA001;TCP_PORT=52360\",\"datasource_kind_sub\":1,\"datasource_kind\":3,\"table_name\":\"BOB\"}","id_name":"ID2"}},"channel_desc":{"protocol":"http"}}}}
  KUSCIA_PORT_BRPC_BUILTIN_NUMBER:   53511
  KUSCIA_PORT_SERVICE_NUMBER:        53508
  KUSCIA_PORT_COMMUNICATION_NUMBER:  53509
  KUSCIA_PORT_INTERNAL_NUMBER:       53510
  SERVING_ID:                        serving-2024120614172177844
Mounts:
  /etc/kuscia/serving-config.conf from config-template (rw,path="serving-config.conf")

Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-template:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: serving-2024120614172177844-configtemplate
Optional: false
QoS Class: BestEffort
Node-Selectors: kuscia.secretflow/namespace=com2023011620063473637
Tolerations: kuscia.secretflow/agent:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Normal Scheduled 2m11s kuscia-scheduler Successfully assigned com2023011620063473637/serving-2024120614172177844-76cdf7dbc8-94xq2 to idata-kuscia-autonomy-com2023011620063473637
Normal Pulled 2m7s (x2 over 2m11s) Agent Container image "docker.io/library/serving-anolis8:sv-12-05" already present on machine
Normal Created 2m7s (x2 over 2m11s) Agent Created container secretflow
Normal Started 2m7s (x2 over 2m11s) Agent Started container secretflow
Warning MissingClusterDNS 2m2s (x8 over 2m12s) Agent pod: "serving-2024120614172177844-76cdf7dbc8-94xq2_com2023011620063473637(3528c5d5-7477-4e31-939b-50549db1fdeb)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
Warning BackOff 2m2s (x3 over 2m6s) Agent Back-off restarting failed container
[root@idata-kuscia-autonomy-com2023011620063473637 kuscia]#

@wangzul
Copy link

wangzul commented Dec 6, 2024

在这个路径上获取一下全量日志 /home/kuscia/var/stdout/pods/podName_xxxx/xxxx/x.log

辛苦通过上传文件的方式提供一下。

@neaos
Copy link
Author

neaos commented Dec 6, 2024

0.log

@neaos
Copy link
Author

neaos commented Dec 6, 2024

1.log

@neaos
Copy link
Author

neaos commented Dec 6, 2024

就这两个日志
root@idata05:/home/idata/kuscia/autonomy/com2023011620063473637/pods/com2023011620063473637_serving-2024120614172177844-76cdf7dbc8-94xq2_3528c5d5-7477-4e31-939b-50549db1fdeb/secretflow# ll
总用量 24
drwxr-xr-x 2 root root 4096 12月 6 14:17 ./
drwxr-xr-x 3 root root 4096 12月 6 14:17 ../
-rwxrwxrwx 1 root root 7085 12月 6 14:17 0.log*
-rwxrwxrwx 1 root root 7084 12月 6 14:17 1.log*

@neaos
Copy link
Author

neaos commented Dec 6, 2024

我在feature_config.proto里
message FeatureSourceConfig {
oneof options {
MockOptions mock_opts = 1;
HttpOptions http_opts = 2;
CsvOptions csv_opts = 3;
StreamingOptions streaming_opts = 4;
DbOptions db_opts = 5;
}
}
改了这块代码,加了一条DbOptions db_opts = 5;

然后我在serving报文里传递的是
{
"serving_id": "serving-2024120611252796246",
"initiator": "com2023011620063473637",
"parties": [
{
"app_image": "sf-serving-image",
"domain_id": "com2023011620063473637"
},
{
"app_image": "sf-serving-image",
"domain_id": "com2023011620072311739"
}
],
"serving_input_config": "{"party_configs":{"com2023011620063473637":{"server_config":{"feature_mapping":{"AGE":"AGE","EDUCATION":"EDUCATION","DEFAULT":"DEFAULT","BALANCE":"BALANCE","HOUSING":"HOUSING","LOAN":"LOAN","DAY":"DAY","DURATION":"DURATION","CAMPAIGN":"CAMPAIGN","PDAYS":"PDAYS","PREVIOUS":"PREVIOUS","JOB_BLUE-COLLAR":"JOB_BLUE-COLLAR","JOB_ENTREPRENEUR":"JOB_ENTREPRENEUR","JOB_HOUSEMAID":"JOB_HOUSEMAID","JOB_MANAGEMENT":"JOB_MANAGEMENT","JOB_RETIRED":"JOB_RETIRED","JOB_SELF-EMPLOYED":"JOB_SELF-EMPLOYED","JOB_SERVICES":"JOB_SERVICES","JOB_STUDENT":"JOB_STUDENT","JOB_TECHNICIAN":"JOB_TECHNICIAN","JOB_UNEMPLOYED":"JOB_UNEMPLOYED","MARITAL_DIVORCED":"MARITAL_DIVORCED","MARITAL_MARRIED":"MARITAL_MARRIED","MARITAL_SINGLE":"MARITAL_SINGLE"}},"model_config":{"modelId":"model2024120609490673794-model-export-output","basePath":"/","sourcePath":"/home/kuscia/var/storage/data/model2024120609490673794-model-export-output","sourceType":"ST_FILE"},"feature_source_config":{"dbOpts":{"file_path":"{\"connection_str\":\"DSN=dm;SERVER=172.16.0.217;UID=SYSDBA;PWD=SYSDBA001;TCP_PORT=52360\",\"datasource_kind_sub\":1,\"datasource_kind\":3,\"table_name\":\"ALICE\"}","id_name":"ID1"}},"channel_desc":{"protocol":"http"}},"com2023011620072311739":{"server_config":{"feature_mapping":{"CONTACT_CELLULAR":"CONTACT_CELLULAR","CONTACT_TELEPHONE":"CONTACT_TELEPHONE","CONTACT_UNKNOWN":"CONTACT_UNKNOWN","MONTH_APR":"MONTH_APR","MONTH_AUG":"MONTH_AUG","MONTH_DEC":"MONTH_DEC","MONTH_FEB":"MONTH_FEB","MONTH_JAN":"MONTH_JAN","MONTH_JUL":"MONTH_JUL","MONTH_JUN":"MONTH_JUN","MONTH_MAR":"MONTH_MAR","MONTH_MAY":"MONTH_MAY","MONTH_NOV":"MONTH_NOV","MONTH_OCT":"MONTH_OCT","MONTH_SEP":"MONTH_SEP","POUTCOME_FAILURE":"POUTCOME_FAILURE","POUTCOME_OTHER":"POUTCOME_OTHER","POUTCOME_SUCCESS":"POUTCOME_SUCCESS","POUTCOME_UNKNOWN":"POUTCOME_UNKNOWN"}},"model_config":{"modelId":"model2024120609490673794-model-export-output","basePath":"/","sourcePath":"/home/kuscia/var/storage/data/model2024120609490673794-model-export-output","sourceType":"ST_FILE"},"feature_source_config":{"dbOpts":{"file_path":"{\"connection_str\":\"DSN=dm;SERVER=172.16.0.219;UID=SYSDBA;PWD=SYSDBA001;TCP_PORT=52360\",\"datasource_kind_sub\":1,\"datasource_kind\":3,\"table_name\":\"BOB\"}","id_name":"ID2"}},"channel_desc":{"protocol":"http"}}}}"
}

其中feature_source_config部分传递的是dbOpts可以吗,我自测是可以的呀

@neaos
Copy link
Author

neaos commented Dec 6, 2024

我建db_adapter_test.cc,在里面这样构造的
FeatureSourceConfig config;
auto* db_opts = config.mutable_db_opts();
db_opts->set_file_path(db_dm);
db_opts->set_id_name("ID");

测试是可以的,但是没有测试这样的完整文档

@neaos
Copy link
Author

neaos commented Dec 6, 2024

看0.log日志里
2024-12-06T14:17:24.869038575+08:00 stdout F 2024-12-06 14:17:24.868 [info] [thread_pool.h:Start:94] Create and start thread pool with 16 threads
2024-12-06T14:17:24.869916143+08:00 stdout F 2024-12-06 14:17:24.869 [info] [execution_core.cc:ExecutionCore:73] create feature adapter, type:5
2024-12-06T14:17:25.04669496+08:00 stdout F 2024-12-06 14:17:25.046 [error] [main.cc:main:149] server startup failed, msg:[Enforce fail at ./secretflow_serving/feature_adapt
er/feature_adapter_factory.h:52] creator. no creator registered for operator type: 5
2024-12-06T14:17:25.046707744+08:00 stdout F Stacktrace:
2024-12-06T14:17:25.046710903+08:00 stdout F #0 secretflow::serving::ExecutionCore::ExecutionCore()+0x55e09793508f
2024-12-06T14:17:25.046713611+08:00 stdout F #1 secretflow::serving::Server::Start()+0x55e097910847
2024-12-06T14:17:25.046716464+08:00 stdout F #2 main+0x55e0979084ae

为什么会报[main.cc:main:149] server startup failed, msg:[Enforce fail at ./secretflow_serving/feature_adapt
er/feature_adapter_factory.h:52] creator. no creator registered for operator type: 5

@wangzul
Copy link

wangzul commented Dec 6, 2024

kubectl get Deployment -A 看一下

@neaos
Copy link
Author

neaos commented Dec 6, 2024

[root@idata-kuscia-autonomy-com2023011620063473637 kuscia]# kubectl get Deployment -A |grep serving-2024120614172177844
com2023011620063473637 serving-2024120614172177844 0/1 1 0 56m
[root@idata-kuscia-autonomy-com2023011620063473637 kuscia]#

@neaos
Copy link
Author

neaos commented Dec 6, 2024

Uploading image.png…

@wangzul
Copy link

wangzul commented Dec 6, 2024

看0.log日志里 2024-12-06T14:17:24.869038575+08:00 stdout F 2024-12-06 14:17:24.868 [info] [thread_pool.h:Start:94] Create and start thread pool with 16 threads 2024-12-06T14:17:24.869916143+08:00 stdout F 2024-12-06 14:17:24.869 [info] [execution_core.cc:ExecutionCore:73] create feature adapter, type:5 2024-12-06T14:17:25.04669496+08:00 stdout F 2024-12-06 14:17:25.046 [error] [main.cc:main:149] server startup failed, msg:[Enforce fail at ./secretflow_serving/feature_adapt er/feature_adapter_factory.h:52] creator. no creator registered for operator type: 5 2024-12-06T14:17:25.046707744+08:00 stdout F Stacktrace: 2024-12-06T14:17:25.046710903+08:00 stdout F #0 secretflow::serving::ExecutionCore::ExecutionCore()+0x55e09793508f 2024-12-06T14:17:25.046713611+08:00 stdout F #1 secretflow::serving::Server::Start()+0x55e097910847 2024-12-06T14:17:25.046716464+08:00 stdout F #2 main+0x55e0979084ae

为什么会报[main.cc:main:149] server startup failed, msg:[Enforce fail at ./secretflow_serving/feature_adapt er/feature_adapter_factory.h:52] creator. no creator registered for operator type: 5

image
你输入的feature_source_config":{"dbOpts": 有定义吗?

@neaos
Copy link
Author

neaos commented Dec 6, 2024

有定义的。
我试图对FeatureSourceConfig增加一种类型db_opts
我改了serving仓里的feature_config.proto里增加了一条DbOptions db_opts = 5;

以及
message DbOptions {
// Input file path, specifies where to load data
string file_path = 1;

// Id column name, associated with FeatureParam::query_datas
// Query datas is a subset of id column
string id_name = 2;

// Optional.
// This determines the size(byte) of each read batch.
int32 block_size = 12;
}

image

image

@neaos
Copy link
Author

neaos commented Dec 6, 2024

image

我自测没问题的。
FeatureSourceConfig config;
auto* db_opts = config.mutable_db_opts();
db_opts->set_file_path(db_dm);
db_opts->set_id_name("ID");

@neaos
Copy link
Author

neaos commented Dec 6, 2024

我定义的是 DbOptions db_opts = 5;

是不是我在报文里传递的应该也是db_opts,而不应该缩写为dbOpts

@neaos
Copy link
Author

neaos commented Dec 6, 2024

image

我推理的请求报文里写的是"feature_source_config":{"dbOpts":{"f

@wangzul
Copy link

wangzul commented Dec 6, 2024

@wangzul
Copy link

wangzul commented Dec 6, 2024

服务调用是通过kuscia API发起的还是allinone SecretPad 发起?

@neaos
Copy link
Author

neaos commented Dec 6, 2024

image

这里吗 第52行 feature_adapter_factory.h

@neaos
Copy link
Author

neaos commented Dec 6, 2024

是通过kuscia API发起的

@neaos
Copy link
Author

neaos commented Dec 6, 2024

image

spec.options_case()这里面应该是有db_opts_

@wangzul
Copy link

wangzul commented Dec 6, 2024

你有注册自己db特征适配器吗?就像源码中的http_adapter、File_adapter 一样。

@oeqqwq
Copy link
Member

oeqqwq commented Dec 6, 2024

有定义的。 我试图对FeatureSourceConfig增加一种类型db_opts 我改了serving仓里的feature_config.proto里增加了一条DbOptions db_opts = 5;

以及 message DbOptions { // Input file path, specifies where to load data string file_path = 1;

// Id column name, associated with FeatureParam::query_datas // Query datas is a subset of id column string id_name = 2;

// Optional. // This determines the size(byte) of each read batch. int32 block_size = 12; }

image

image

这边可以检查一下你的adapter实现中是否有进行了注册,参考:https://github.com/secretflow/serving/blob/main/secretflow_serving/feature_adapter/file_adapter.cc#L37

同时,请检查是否在模块的build中增加了你的adpater的构建,见:https://github.com/secretflow/serving/blob/main/secretflow_serving/feature_adapter/BUILD.bazel#L19

@neaos
Copy link
Author

neaos commented Dec 6, 2024

image

哦 是不是这里要添加一个":db_adapter",

@oeqqwq
Copy link
Member

oeqqwq commented Dec 6, 2024

image

哦 是不是这里要添加一个":db_adapter",

是的,否则打包时不会包含你的模块

@neaos
Copy link
Author

neaos commented Dec 6, 2024

好的 非常感谢!!!

@neaos
Copy link
Author

neaos commented Dec 6, 2024

问题解决了,感谢诸位

@wangzul
Copy link

wangzul commented Dec 12, 2024

我注册的时候用的是 bash ./register_app_image/register_app_image.sh -u idata -d com2023011620063473637 -m p2p -n sf-serving-image -f ./register_app_image/sf-serving-0.yaml -i serving-anolis8:sv-12-05

-n后面的是sf-serving-image

方便告知一下,你这个命令是从那个文档链接中获取的吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants