Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]我添加了dataload,然后minio源端更新数据以后,不论是juicefs还是Alluxio都不会主动更新dataset #4179

Open
bonniechen1119 opened this issue Jun 26, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@bonniechen1119
Copy link

What is your environment(Kubernetes version, Fluid version, etc.)
Kubernetes集群:v1.27.12
fluid:1.0.0-31f5433
Describe the bug
按照https://github.com/fluid-cloudnative/fluid/blob/master/docs/zh/samples/juicefs/juicefs_runtime.md 部署juicefs
按照https://github.com/fluid-cloudnative/fluid/blob/master/docs/zh/samples/accelerate_s3_minio.md部署Alluxio runtime
然后数据都能显示,就是在minio远端更新数据,在kubernetes的pod没有更新
What you expect to happen:
在minio远端更新数据,在kubernetes的pod更新
How to reproduce it

Additional Information

@bonniechen1119 bonniechen1119 added the bug Something isn't working label Jun 26, 2024
@xliuqq
Copy link
Collaborator

xliuqq commented Jul 2, 2024

@bonniechen1119 你更新完 minio 之后,有创建 DataLoad 并等待完成么?

@tedli
Copy link

tedli commented Aug 28, 2024

The same thing happens with ceph s3, object creation, modification and deletion never be synced back.
e.g.

  1. create dataset, runtime, and create pod to consum the generated pvc.
  2. add new object to the bucket, modification content of existing object, or delete existing object, using other s3 client.
  3. check the volume mount inside container - not updated.
  4. create data load, wait the data load job complete.
  5. check the volume mount inside container again - not updated.

forward sync works as expect, eg:

  1. create dataset, runtime, pod with volume mount using generated pvc.
  2. remove object, or modifiy content of existing object, or create new object inside the container within the volume mount.
  3. check the s3 bucket using other s3 client - updated.

@tedli
Copy link

tedli commented Aug 29, 2024

by specifing loadMetadata: true the behaviour not changed. outside changes not synced back.

the log of data load job:

ALLUXIO_JARS
ALLUXIO_JAVA_OPTS
ALLUXIO_MASTER_JAVA_OPTS
ALLUXIO_PROXY_JAVA_OPTS
ALLUXIO_RAM_FOLDER
ALLUXIO_USER_JAVA_OPTS
ALLUXIO_WORKER_JAVA_OPTS
ALLUXIO_JOB_MASTER_JAVA_OPTS
ALLUXIO_JOB_WORKER_JAVA_OPTS =~ ALLUXIO_MASTER_JAVA_OPTS ]]
+ echo 'export ALLUXIO_MASTER_JAVA_OPTS="-Dalluxio.master.hostname=${ALLUXIO_MASTER_HOSTNAME} -Xmx16G -XX:+UnlockExperimentalVMOptions "'
+ for keyvaluepair in '$(env)'
++ echo _=/usr/bin/env
++ cut -d= -f1
+ key=_
++ echo _=/usr/bin/env
++ cut -d= -f2-
+ value=/usr/bin/env
+ [[ ALLUXIO_CLASSPATH
ALLUXIO_HOSTNAME
ALLUXIO_JARS
ALLUXIO_JAVA_OPTS
ALLUXIO_MASTER_JAVA_OPTS
ALLUXIO_PROXY_JAVA_OPTS
ALLUXIO_RAM_FOLDER
ALLUXIO_USER_JAVA_OPTS
ALLUXIO_WORKER_JAVA_OPTS
ALLUXIO_JOB_MASTER_JAVA_OPTS
ALLUXIO_JOB_WORKER_JAVA_OPTS =~ _ ]]
+ echo 'export _="/usr/bin/env"'
+ main
+ needLoadMetadata=true
+ [[ true == \t\r\u\e ]]
+ [[ -d /data ]]
+ paths=/
+ paths=(${paths//:/ })
+ replicas=1
+ replicas=(${replicas//:/ })
+ (( i=0 ))
+ (( i<1 ))
+ local path=/
+ local replica=1
+ echo -e 'distributedLoad on / starts'
distributedLoad on / starts
+ distributedLoad / 1
+ local path=/
+ local replica=1
+ checkPathExistence /
+ local path=/
++ timeout 30s alluxio fs ls /
++ tail -1
+ local 'checkPathResult=              2       PERSISTED 08-29-2024 01:37:23:125  DIR /s3-demo'
+ local 'strUnexistence=does not exist'
+ [[               2       PERSISTED 08-29-2024 01:37:23:125  DIR /s3-demo =~ does not exist ]]
+ alluxio fs setReplication --max 1 -R /
Changed the replication level of /
replicationMax was set to 1

+ [[ true == \t\r\u\e ]]
+ needPreLoadMetadata
++ alluxio version
+ local alluxioVersion=2.9.0
++ echo '2.9.0 2.8.0'
++ tr ' ' '\n'
++ sort -rV
++ head -n 1
+ test 2.9.0 == 2.9.0
+ alluxio fs ls -Dalluxio.user.file.metadata.sync.interval=0 -R /
              2       PERSISTED 08-29-2024 01:42:33:127  DIR /s3-demo
          27336       PERSISTED 08-29-2024 01:42:31:790 100% /s3-demo/bin.tar.gz
             11       PERSISTED 08-29-2024 01:41:51:022   0% /s3-demo/lalala.txt

real	0m1.235s
user	0m2.235s
sys	0m0.160s
+ alluxio fs distributedLoad --replication 1 /
Please wait for command submission to finish..
Submitted successfully, jobControlId = 1724844063446
Waiting for the command to finish ...
Get command status information below:
Successfully loaded path /s3-demo/lalala.txt
Total completed file count is 1, failed file count is 0
Finished running the command, jobControlId = 1724844063446

real	0m2.658s
user	0m2.490s
sys	0m0.213s
+ echo -e 'distributedLoad on / ends'
+ (( i++ ))
distributedLoad on / ends
+ (( i<1 ))

after data load job completed, nothing updated inside pod's volume mount.

root@node1:~/dive-fluid# kubectl exec -ti pod -n debug -- ls -al /volume
total 29
drwx------ 1 root root     3 Aug 28 11:39 .
dr-xr-xr-x 1 root root    54 Aug 28 11:22 ..
-rwx------ 1 root root 27336 Aug 28 11:32 bin.tar.gz
-rw-r--r-- 1 root root     7 Aug 28 11:39 lalala.txt
-rwx------ 1 root root   720 Aug 28 11:32 peer.jsonpb.json  <-- here
root@node1:~/dive-fluid# kubectl exec -ti pod -n debug -- cat /volume/lalala.txt
lalalal  <-- here

like this, the peer.jsonpb.json, already deleted, meanwhile it still remain in the volume mount, what's more, the content of lalala.txt was modified to lalalalili and it still keep lalalal in the volume mount provided by fluid.
image

image

@TrafalgarZZZ
Copy link
Member

@tedli Could you have a check at what the fuse pod's spec is like? kubectl get pod <dataset_name>-fuse-xxxxx -oyaml, replace your dataset's name with <dataset_name>

@tedli
Copy link

tedli commented Aug 29, 2024

HI @TrafalgarZZZ ,
thanks for reply, here is the fuse pod, please have a look.

apiVersion: v1
kind: Pod
metadata:
  annotations:
    sidecar.istio.io/inject: "false"
  creationTimestamp: "2024-08-28T11:22:10Z"
  generateName: s3-demo-fuse-
  labels:
    app: alluxio
    chart: alluxio-0.9.13
    controller-revision-hash: 76fd76d7bd
    heritage: Helm
    pod-template-generation: "1"
    release: s3-demo
    role: alluxio-fuse
  name: s3-demo-fuse-76lqb
  namespace: debug
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: s3-demo-fuse
    uid: 519c5839-8ede-4c0a-9a91-24bcd90e83c8
  resourceVersion: "2936235"
  uid: 4746b937-91f0-4332-8201-808ca2ba44e0
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - node2.k8s1.dev
  containers:
  - args:
    - fuse
    - --fuse-opts=kernel_cache,rw,allow_other
    - /runtime-mnt/alluxio/debug/s3-demo/alluxio-fuse
    - /
    command:
    - /entrypoint.sh
    env:
    - name: ALLUXIO_CLIENT_HOSTNAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.hostIP
    - name: ALLUXIO_CLIENT_JAVA_OPTS
      value: ' -Dalluxio.user.hostname=${ALLUXIO_CLIENT_HOSTNAME} '
    - name: FLUID_RUNTIME_TYPE
      value: alluxio
    - name: FLUID_RUNTIME_NS
      value: debug
    - name: FLUID_RUNTIME_NAME
      value: s3-demo
    envFrom:
    - configMapRef:
        name: s3-demo-config
    image: harbor.xxxxxxxx.net/alluxio/alluxio-dev:2.9.0
    imagePullPolicy: IfNotPresent
    lifecycle:
      preStop:
        exec:
          command:
          - /opt/alluxio/integration/fuse/bin/alluxio-fuse
          - unmount
          - /runtime-mnt/alluxio/debug/s3-demo/alluxio-fuse
    name: alluxio-fuse
    resources: {}
    securityContext:
      capabilities:
        add:
        - SYS_ADMIN
      privileged: true
      runAsGroup: 0
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /dev/fuse
      name: alluxio-fuse-device
    - mountPath: /runtime-mnt/alluxio/debug/s3-demo
      mountPropagation: Bidirectional
      name: alluxio-fuse-mount
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-n9kwk
      readOnly: true
    - mountPath: /proc/cpuinfo
      name: lxcfs-proc-cpuinfo
      readOnly: true
    - mountPath: /proc/diskstats
      name: lxcfs-proc-diskstats
      readOnly: true
    - mountPath: /proc/loadavg
      name: lxcfs-proc-loadavg
      readOnly: true
    - mountPath: /proc/meminfo
      name: lxcfs-proc-meminfo
      readOnly: true
    - mountPath: /proc/stat
      name: lxcfs-proc-stat
      readOnly: true
    - mountPath: /proc/swaps
      name: lxcfs-proc-swaps
      readOnly: true
    - mountPath: /proc/uptime
      name: lxcfs-proc-uptime
      readOnly: true
    - mountPath: /sys/devices/system/cpu
      name: lxcfs-sys-devices-system-cpu
      readOnly: true
    - mountPath: /sys/devices/system/cpu/online
      name: lxcfs-sys-devices-system-cpu-online
      readOnly: true
    - mountPath: /dev/shm/debug/s3-demo
      name: mem
  dnsPolicy: ClusterFirstWithHostNet
  enableServiceLinks: false
  hostNetwork: true
  nodeName: node2.k8s1.dev
  nodeSelector:
    fluid.io/f-debug-s3-demo: "true"
  preemptionPolicy: PreemptLowerPriority
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/pid-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/unschedulable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/network-unavailable
    operator: Exists
  volumes:
  - hostPath:
      path: /dev/fuse
      type: CharDevice
    name: alluxio-fuse-device
  - hostPath:
      path: /runtime-mnt/alluxio/debug/s3-demo
      type: DirectoryOrCreate
    name: alluxio-fuse-mount
  - name: kube-api-access-n9kwk
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
  - hostPath:
      path: /var/lib/lxcfs/proc/cpuinfo
      type: ""
    name: lxcfs-proc-cpuinfo
  - hostPath:
      path: /var/lib/lxcfs/proc/diskstats
      type: ""
    name: lxcfs-proc-diskstats
  - hostPath:
      path: /var/lib/lxcfs/proc/loadavg
      type: ""
    name: lxcfs-proc-loadavg
  - hostPath:
      path: /var/lib/lxcfs/proc/meminfo
      type: ""
    name: lxcfs-proc-meminfo
  - hostPath:
      path: /var/lib/lxcfs/proc/stat
      type: ""
    name: lxcfs-proc-stat
  - hostPath:
      path: /var/lib/lxcfs/proc/swaps
      type: ""
    name: lxcfs-proc-swaps
  - hostPath:
      path: /var/lib/lxcfs/proc/uptime
      type: ""
    name: lxcfs-proc-uptime
  - hostPath:
      path: /var/lib/lxcfs/sys/devices/system/cpu
      type: ""
    name: lxcfs-sys-devices-system-cpu
  - hostPath:
      path: /var/lib/lxcfs/sys/devices/system/cpu/online
      type: ""
    name: lxcfs-sys-devices-system-cpu-online
  - hostPath:
      path: /dev/shm/debug/s3-demo
      type: DirectoryOrCreate
    name: mem
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-08-28T11:22:09Z"
    status: "True"
    type: PodReadyToStartContainers
  - lastProbeTime: null
    lastTransitionTime: "2024-08-28T11:22:09Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-08-28T11:22:09Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-08-28T11:22:09Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2024-08-28T11:22:10Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://8e2c0b1f1b0591187e5c3079a8c4add68ae3f34ca19f33cc1fd37192d53e668e
    image: harbor.xxxxxxxx.net/alluxio/alluxio-dev:2.9.0
    imageID: harbor.xxxxxxxx.net/alluxio/alluxio-dev@sha256:e055fae2866e6abb310c72c95b9a70d447b31950fb828dd46b0b29cb7021566b
    lastState: {}
    name: alluxio-fuse
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2024-08-28T11:22:09Z"
    volumeMounts:
    - mountPath: /dev/fuse
      name: alluxio-fuse-device
    - mountPath: /runtime-mnt/alluxio/debug/s3-demo
      name: alluxio-fuse-mount
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-n9kwk
      readOnly: true
      recursiveReadOnly: Disabled
    - mountPath: /proc/cpuinfo
      name: lxcfs-proc-cpuinfo
      readOnly: true
      recursiveReadOnly: Disabled
    - mountPath: /proc/diskstats
      name: lxcfs-proc-diskstats
      readOnly: true
      recursiveReadOnly: Disabled
    - mountPath: /proc/loadavg
      name: lxcfs-proc-loadavg
      readOnly: true
      recursiveReadOnly: Disabled
    - mountPath: /proc/meminfo
      name: lxcfs-proc-meminfo
      readOnly: true
      recursiveReadOnly: Disabled
    - mountPath: /proc/stat
      name: lxcfs-proc-stat
      readOnly: true
      recursiveReadOnly: Disabled
    - mountPath: /proc/swaps
      name: lxcfs-proc-swaps
      readOnly: true
      recursiveReadOnly: Disabled
    - mountPath: /proc/uptime
      name: lxcfs-proc-uptime
      readOnly: true
      recursiveReadOnly: Disabled
    - mountPath: /sys/devices/system/cpu
      name: lxcfs-sys-devices-system-cpu
      readOnly: true
      recursiveReadOnly: Disabled
    - mountPath: /sys/devices/system/cpu/online
      name: lxcfs-sys-devices-system-cpu-online
      readOnly: true
      recursiveReadOnly: Disabled
    - mountPath: /dev/shm/debug/s3-demo
      name: mem
  hostIP: 192.168.10.192
  hostIPs:
  - ip: 192.168.10.192
  phase: Running
  podIP: 192.168.10.192
  podIPs:
  - ip: 192.168.10.192
  qosClass: BestEffort
  startTime: "2024-08-28T11:22:09Z"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants