You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dask: Dask is a parallel computing library that integrates with pandas, NumPy, and scikit-learn. It can handle larger-than-memory datasets and can distribute the computation across multiple cores or even multiple machines.
importdask.dataframeasddfromsklearn.decompositionimportPCAfromsklearn.preprocessingimportStandardScalerimportdask.arrayasda# load data with daskddata=dd.read_csv(data_path, index_col=0)
# convert to dask arraydata_array=ddata.to_dask_array(lengths=True)
# standardize datascaler=StandardScaler()
data_scaled=scaler.fit_transform(data_array)
# PCA transformationpca_obj=PCA(n_components=None, random_state=42)
data_pca=pca_obj.fit_transform(data_scaled)
The text was updated successfully, but these errors were encountered:
test it for e.g., pca.py
Dask: Dask is a parallel computing library that integrates with pandas, NumPy, and scikit-learn. It can handle larger-than-memory datasets and can distribute the computation across multiple cores or even multiple machines.
The text was updated successfully, but these errors were encountered: