Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding error #16

Open
ErikTromp opened this issue Jul 1, 2019 · 5 comments
Open

Encoding error #16

ErikTromp opened this issue Jul 1, 2019 · 5 comments

Comments

@ErikTromp
Copy link

Not sure where to put this as it is a Rasa-X error and not this demo per se, but I get this when I use a domain.yml file with UTF8 encoding on Windows and some special characters (like é):

Traceback (most recent call last):
  File "c:\users\erik\anaconda3\lib\site-packages\rasa\cli\x.py", line 322, in run_locally
    local.main(args, project_path, args.data, token=rasa_x_token)
  File "c:\users\erik\anaconda3\lib\site-packages\rasax\community\local.py", line 190, in main
    project_path, data_path, session, args.port
  File "c:\users\erik\anaconda3\lib\site-packages\rasax\community\local.py", line 139, in _initialize_with_local_data
    domain_path, domain_service, COMMUNITY_PROJECT_NAME, COMMUNITY_USERNAME
  File "c:\users\erik\anaconda3\lib\site-packages\rasax\community\initialise.py", line 136, in inject_domain
    domain_yaml=read_file(domain_path),
  File "c:\users\erik\anaconda3\lib\site-packages\rasa\utils\io.py", line 125, in read_file
    return f.read()
  File "c:\users\erik\anaconda3\lib\codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 524: invalid continuation byte

What seems to happen is:

  • I store a domain.yml file in UTF8
  • Rasa-X opens the domain.yml file, modifies it, stores it with system default encoding (ISO 8859-2 for me on Windows)
  • Later it wants to open it again with UTF8 encoding, which obviously no longer works.
@kristiankolthoff
Copy link

I am facing the same issue when the domain.yml contains some german umlauts

@ErikTromp
Copy link
Author

We ended up just not using Rasa X

@Mohendran
Copy link

Im also Facing the same issue.

@daniel-eder
Copy link

daniel-eder commented Sep 10, 2019

I just ran into the same issue - did anyone find a possible solution or workaround so far?

EDIT: The underlying issue is that python by default writes to files with the system code page, unless an override is provided, and rasa does not specificy UTF8. Additionally, when loading the domain.yml file rasa first reformats and saves it, before actually loading and parsing it, during the first step we lose the encoding, and when loading we are no longer UTF8 causing the error.

Workaround: (Python 3.7+ only) set the environment variable PYTHONUTF8 to 1 before running rasa, this forces python to use utf8 as default encoding. On Windows: set PYTHONUTF8=1

@ziligy
Copy link

ziligy commented Dec 23, 2019

Solved? I ran into a similar issue and realized that there was dot-file-debris left by my mac when ssh-ing into my rasa data-directory. I deleted these hidden files to resolve the issue.

Main point: Be sure there are no hidden files in the rasa data directory!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants