Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plotly.JSONDataset not saved as utf-8 #741

Open
Madnex opened this issue Jun 26, 2024 · 1 comment
Open

plotly.JSONDataset not saved as utf-8 #741

Madnex opened this issue Jun 26, 2024 · 1 comment
Labels
good first issue Good for newcomers Hacktoberfest help wanted Contribution task, outside help would be appreciated!

Comments

@Madnex
Copy link

Madnex commented Jun 26, 2024

Description

It can happen that the saved plotly.JSONDataset is not encoded as utf-8. Supplying the file system args as follows fixes the issue:

myfile:
  filepath: myfilepath
  fs_args:
    open_args_save:
      encoding: utf-8

However, that should be the default behaviour. The question is why there is a problem when the encoding is not explicitly set here.

Context

I had an issue with the encoding of saved plotly plots (as json) via the kedro data catalog. After saving the plots I could not read the plots anymore via the catalog. It failed with the error 'utf-8' codec can't decode byte 0xe8 in position 6570: invalid continuation byte. Investigating that further, I managed to read those files with a different encoding (e.g. latin-1). I did not understand though why the files are not valid utf-8 in the first place. Adding that fs_args mentioned above solved the issues.

Steps to Reproduce

  1. Save some plotly plot as plotly.JSONDataset with special characters.
  2. Try to load that plot via the catalog again.
  3. Eventually there should occur the error mentioned above.
  4. Change the fs_args as indicated and repeat steps 1 and 2. Now it should work without issues.

Expected Result

There should not be any encoding issues happening, because it is expected that files are saved as utf-8.

Actual Result

The file was not saved in utf-8.

utf-8' codec can't decode byte 0xe8 in position 6570: invalid continuation byte

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): 0.19.5
  • Kedro plugin and kedro plugin version used (pip show kedro-airflow): 3.0.0
  • Python version used (python -V): 3.11.7
  • Operating system and version: Windows 11
@ElenaKhaustova ElenaKhaustova added the Community Issue/PR opened by the open-source community label Jun 26, 2024
@merelcht
Copy link
Member

merelcht commented Jul 9, 2024

Hi @Madnex, thanks for flagging this! I can see the plotly.JSONDataset does use utf-8 for loading the dataset, but not the saving. This does indeed seem strange. We'd be more than happy to accept a PR for this!

(cc @rashidakanchwala just double checking if adding utf-8 as the default save encoding would be okay for viz?)

@merelcht merelcht added good first issue Good for newcomers help wanted Contribution task, outside help would be appreciated! labels Jul 9, 2024
@merelcht merelcht removed the Community Issue/PR opened by the open-source community label Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers Hacktoberfest help wanted Contribution task, outside help would be appreciated!
Projects
No open projects
Status: Todo
Development

No branches or pull requests

3 participants