Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-ASCII filenames #39

Open
bors-ltd opened this issue Sep 7, 2020 · 7 comments
Open

Non-ASCII filenames #39

bors-ltd opened this issue Sep 7, 2020 · 7 comments

Comments

@bors-ltd
Copy link
Contributor

bors-ltd commented Sep 7, 2020

Hi,

I ran into an issue when upgrading a project to Django 3.1 and bumping dependencies. But before sending a PR, I'd like to discuss the solution.

Serving files works fine, except in production when they contain non-ASCII characters. I printed the response generated to debug it::

Content-Type: text/plain
X-Sendfile: /example/Accentu\xe9.txt
Content-Disposition: inline; filename="Accentue.odt"; filename*=UTF-8''Accentu%C3%A9.txt
Content-length: 143543

The path is encoded in ISO-8859-1, which makes sense with HTTP, but the filesystem encoding is UTF-8.

So I tried to quote it using urllib.parse.quote::

X-Sendfile: /example/Accentu%C3%A9.txt

with no success. So let's just encode it ourselves adding os.fsencode(filename) in the xsendfile backend::

X-Sendfile: /example/Accentu\xc3\xa9.txt

and I could finally download the file.

I'm not sure why it used to work in Django 2.2 and/or older versions of sendfile2, I can't rollback to test and I don't really want to spend time on it.

But you may not consider that fsencode an elegant solution. I can see in other issues you want to support the case where the file is served by Apache from another server. It probably means a setting, with a fallback to the local filesystem encoding. Tell me and I'll tailor the PR.

If you reject that change altogether, I can just "fork" the xsendfile backend in my project and change the setting to point to it.

The nginx backend seems safe, as it quotes the path.

@moggers87
Copy link
Owner

I'd suggest you first write a test for unicode filenames and open a draft PR that just contains that. We will then see if Django 3.1 did really break something or not and go from there.

@bors-ltd
Copy link
Contributor Author

bors-ltd commented Sep 8, 2020

There are tests already covering unicode filenames. My issue may be more in Apache/X-Sendfile, but that won't be covered in a unit test.

My findings so far:

  • mod_xsendfile is unmaintained for like 10 years, good thing it still compiles against 2.4.
  • HTTP response headers might not actually accept non-ASCII characters, so my solution is just a hack, and nginx is right in urlquoting the path.
  • I found a fork adding that quoting https://github.com/nmaier/mod_xsendfile but unmaintained as well, and I use a shared hosting anyway.

So all in all, there is no "good" answer, no specification or documentation I can rely on to assert Django is producing the expected response header. mod_xsendfile never made it to the Apache official modules, that's surprising for such a useful module with no alternative after all this time.

@bors-ltd
Copy link
Contributor Author

After a refactor I could actually serve the files directly from a memoryview, a new feature of Django 3, and I didn't need sendfile for this project anymore.

Feel free to close this issue as you see fit, but I hope that little summary of the mess X-Sendfile is was useful for your, other users, and to answer future issues.

@stefan6419846
Copy link

What is the current state of this issue?

With a simple method-based view, I can still reproduce this issue (currently using version 0.5.1):

from django_sendfile import sendfile

def sendfile_test(request):
    return sendfile(request, '/absolute/path/to/äüö.pdf')

This will yield the following entry inside the Apache error log:

[Tue Jun 28 15:24:39.895816 2022] [:error] [pid 2988] (2)No such file or directory: [client 192.168.XXX.XX:57604] xsendfile: cannot open file: /absolute/path/to/\xe4\xfc\xf6.pdf

The runserver setup works fine, but this does not help in production environments.

Replacing/monkey-patching the backend will yield the correct result. The patch includes encoding the path before passing it to the header inside the xsendfile backend:

filename = filename.encode('utf-8')

Looking at response._headers, the current implementation yields /absolute/path/to/äüö.pdf for the X-Sendfile header, whereas /absolute/path/to/äüö.pdf is given by the encoded version (and appears to work correctly).

@moggers87
Copy link
Owner

mod_xsendfile is unmaintained for like 10 years, good thing it still compiles against 2.4.

@stefan6419846 that's the current status. I will add a warning to the docs before closing this issue, but it's well beyond the scope of this project to start maintaining an Apache module.

@stefan6419846
Copy link

Does this indicate that this should rather be reported against mod_xsendfile as this is an issue there?

While mod_xsendfile might not be actively maintained, it seems to still work without any real issues (at least given my requirements). For this reason, from my perspective it should not hurt to actually include the aforementioned patch - unless there are some other cases which will break and which I am not aware of. Unfortunately, for some user bases non-ASCII filenames are quite common, which includes Germany as in my case, rendering the default xsendfile backend useless.

@tidenhub
Copy link

Just stumbled over this thread while I was looking if there is any update for the xsendfile backend.

filename = filename.encode('utf-8')

We are using that workaround in production since 2020 and had no issues since then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants