Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read corruption on large files (vmdk) #8

Open
ocgltd opened this issue Aug 4, 2012 · 6 comments
Open

Read corruption on large files (vmdk) #8

ocgltd opened this issue Aug 4, 2012 · 6 comments

Comments

@ocgltd
Copy link

ocgltd commented Aug 4, 2012

I've discovered that when reading large files from a vmfs volume, vmfs-tools does not consistantly present the same data to the reading application (i.e. corruption). For example, after copying a large vmdk file to an external disk, I ran sha1sum on the source and destination 3 times. As you can see below, the source (vmfs) presents a different sum on almost every run:

Compare 1 failed on file [/mnt/vmfs/dvr/dvr-flat.vmdk]. MD5/SHA source=c8809289e7e48549c9594400a66e1b987947c326, destination=15a78ae7140b3ab82009a35bc64f32bc32a60274
Compare 2 failed on file [/mnt/vmfs/dvr/dvr-flat.vmdk]. MD5/SHA source=c8809289e7e48549c9594400a66e1b987947c326, destination=15a78ae7140b3ab82009a35bc64f32bc32a60274
Compare 3 failed on file [/mnt/vmfs/dvr/dvr-flat.vmdk]. MD5/SHA source=7d1d7ce34910758ae75545da8b0decbafdcb2b02, destination=15a78ae7140b3ab82009a35bc64f32bc32a60274

I ran the sha1sum under valgrind using debugvmfs but found only one error (not sure it's related):

valgrind --leak-check=full --show-reachable=yes /usr/local/sbin/debugvmfs /dev/sda3 cat /PBX2/PBX2-flat.vmdk | sha1sum -b ==4945== Memcheck, a memory error detector ==4945== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==4945== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info ==4945== Command: /usr/local/sbin/debugvmfs /dev/sda3 cat /PBX2/PBX2-flat.vmdk ==4945== ==4945== Warning: noted but unhandled ioctl 0x5382 with no size/direction hints
==4945== This could cause spurious value errors to appear.
==4945== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==4945== Conditional jump or move depends on uninitialised value(s)
==4945== at 0x40A303: vmfs_vol_open (vmfs_volume.c:223)
==4945== by 0x407A9F: vmfs_fs_open (vmfs_fs.c:203)
==4945== by 0x40295D: main (debugvmfs.c:675)
==4945==

This same problem presents on 3 different VMware ESXIi 4.1 hosts (we are hoping to use vmfs-tools as our bare metal backup). We are running vmfs-tools 0.2.5 under Centos 6.2 x86_64..

@glandium
Copy link
Owner

glandium commented Aug 6, 2012

Is /dev/sda3 sata, scsi, iscsi, ... ?
How big is the vmfs volume, and how big is dvr-flat.vmdk ?

@ocgltd
Copy link
Author

ocgltd commented Aug 7, 2012

Here's info on the file:
-rw------- 1 root root 68719476736 Aug 7 00:54 dvr-flat.vmdk
but this has failed on 32GB vmdk files too.
The volume is 1.5TB
On this system there are 2 mirrored SATA drives sitting behind an LSI raid controller.
It's worth noting that I have only seen this error on vmdk files (so size may be related)

@ocgltd
Copy link
Author

ocgltd commented Jan 28, 2013

Any update on this issue? Still using vmfs-tools for backup and still experiencing the error above....

@ocgltd
Copy link
Author

ocgltd commented Mar 6, 2013

Is anyone else doing a checksum/md5 to verify integrity? I can't believe I'm the only one experiencing this...

@moryb41
Copy link

moryb41 commented Jun 4, 2015

You are definitely not alone.
My 1.9 TB file resides on a 3.0 TB NetApp SAN LUN presented to a CentOS server via iSCSI. I first encountered problems trying to rsync the VM files to another NAS device:

# rsync -av --progress --bwlimit=20000 /mnt/vmfs1/SRV-DCSPLUNK/ /mnt/qnap/SRV-DCSPLUNK
sending incremental file list
SRV-DCSPLUNK_3-flat.vmdk
1979120929792 100%   18.88MB/s   27:46:03 (xfer#1, to-check=8/22)
rsync: read errors mapping "/mnt/vmfs1/SRV-DCSPLUNK/SRV-DCSPLUNK_3-flat.vmdk": Input/output error (5)
WARNING: SRV-DCSPLUNK_3-flat.vmdk failed verification -- update discarded (will try again).
SRV-DCSPLUNK_3-flat.vmdk
1979120929792 100%   18.89MB/s   27:45:17 (xfer#2, to-check=8/22)
rsync: read errors mapping "/mnt/vmfs1/SRV-DCSPLUNK/SRV-DCSPLUNK_3-flat.vmdk": Input/output error (5)
ERROR: SRV-DCSPLUNK_3-flat.vmdk failed verification -- update discarded.

sent 3958725043956 bytes  received 52 bytes  19799812.66 bytes/sec
total size is 2062038396264  speedup is 0.52
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]
[root@centos ~]# 

Other useful information:

netapp> lun show /vol/myvol/qt/lun
        /vol/myvol/qt/lun      3t (3298534883328) (r/w, online, mapped)

# parted -l
Model: NETAPP LUN (scsi)
Disk /dev/sdb: 3299GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  3299GB  3299GB

# df -h /mnt/vmfs1
Filesystem      Size  Used Avail Use% Mounted on
/dev/fuse       3.0T  2.0T  1.1T  64% /mnt/vmfs1

# ls -lh SRV-DCSPLUNK_3-flat.vmdk
-rw------- 1 root root 1.8T Apr 10 10:03 SRV-DCSPLUNK_3-flat.vmdk

# file SRV-DCSPLUNK_3-flat.vmdk
SRV-DCSPLUNK_3-flat.vmdk: ERROR: cannot read `SRV-DCSPLUNK_3-flat.vmdk' (Input/output error)

# md5sum SRV-DCSPLUNK_3-flat.vmdk
md5sum: SRV-DCSPLUNK_3-flat.vmdk: Input/output error

This tool is a great asset when you need it, but large file support is becoming the norm everywhere it seems.

@jugleni
Copy link

jugleni commented Oct 25, 2016

Friend, I have the same problem.
solved?
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants