about hdf5 #11

shawnthu · 2019-05-13T06:57:12Z

I have several hundred GB wav files on my disk (about 1, 000 hours wav data). I found directly reading the wav file is slow for training, so I choose lmdb and hdf5 as options. However I found that
hdf5 do not support concurrent, i.e. num_workers in Dataloader can not be more than 1, how do you solve this problem? thx

shawnthu · 2019-05-13T07:00:25Z

https://pytorch.org/audio/datasets.html#yesno
torchaudio lists two examples, but the datasets the use are very small, so can be loaded into memory directly. This method does not fit for large data which can not fit into memory!

shawnthu · 2019-05-13T07:04:52Z

Besides, I found a absurd phenomenon. e.g. all my wav files are under /wav folder. first I have a read wav function like this,
from scipy.io import wavfile def read_wav(wav_path): rate, data wavfile(wav_path) return data

shawnthu · 2019-05-13T07:05:21Z

Besides, I found a absurd phenomenon. e.g. all my wav files are under /wav folder. first I have a read wav function like this,

from scipy.io import wavfile
from torch.utils.data import Dataset, DataLoader
def read_wav(wav_path):
    rate, data wavfile(wav_path)
    return data

class Dst(Dataset):
    def __init__(self, wav_path_list):
        self.wav_path_list = wav_path_list
    def __len__(self):
        return len(self.wav_path_list)
    def __getitem(self, idx):
        return read_wav(self.wav_path_list[idx])

dst = Dst(wav_path_list)
loader = DataLoader(dst, batch_size, shuffle, num_workers)

in fact, when I set the num_workers from 0 to 4 (My work station equipped with 8 cpus), the speed do not
change! It looks like the read_wav function occupies all the cpu cores, which result the failure of num_workers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about hdf5 #11

about hdf5 #11

shawnthu commented May 13, 2019

shawnthu commented May 13, 2019

shawnthu commented May 13, 2019

shawnthu commented May 13, 2019 •

edited

Loading

about hdf5 #11

about hdf5 #11

Comments

shawnthu commented May 13, 2019

shawnthu commented May 13, 2019

shawnthu commented May 13, 2019

shawnthu commented May 13, 2019 • edited Loading

shawnthu commented May 13, 2019 •

edited

Loading