You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have noticed that compression is not enabled on pandas data frames when storing them with flammkuchen.
I have included a test example below where I store a pandas dataframe and a numpy array. The numpy array ends up compressed as per ddls while the pandas array data is not:
import numpy as np
import pandas as pd
import flammkuchen as fl
df = pd.DataFrame({'a':['B'] * 100000, 'b':np.repeat(1, 100000), 'c':np.repeat(1, 100000)})
fl.save('test.h5', {'df':df, 'npa':np.repeat(1, 100000)})
I think pandas does not respect the handles compression settings and having the complevel and complib set to None disables compression as per the pandas documentation. I am not sure the best way to extract the compression settings from the handle and apply it to this class.
Thanks in advance
The text was updated successfully, but these errors were encountered:
I figured out how to add compression to pandas by passing the filters parameter. I have created a branch at here that appears to have solved the issue:
Hi,
I have noticed that compression is not enabled on pandas data frames when storing them with flammkuchen.
I have included a test example below where I store a pandas dataframe and a numpy array. The numpy array ends up compressed as per
ddls
while the pandas array data is not:I believe this is an issue with
I think pandas does not respect the handles compression settings and having the
complevel
andcomplib
set toNone
disables compression as per the pandas documentation. I am not sure the best way to extract the compression settings from the handle and apply it to this class.Thanks in advance
The text was updated successfully, but these errors were encountered: