-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
N°4281 - Add RFC-2045-Compliant Way of Handling Filenames in Attachments #16
base: master
Are you sure you want to change the base?
Conversation
Hi Combodo, short update:
Tests were ran successfully again. Same diffs for against results before the patch and same results for against my "first-after-the-patch"-tests. Cheers, |
… for "name" and "filename" attributes.
Rebased and Commits squashed and pushed again. @Molkobain : I hope, this is now ready for a review (when you got the time to do so). Martin |
Co-authored-by: Thomas Casteleyn <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the code itself, it seems ok to me. Just ran some test with more "complex" filenames having single quotes (happens a lot in french), spaces and all. It all worked.
Only change requested is vocabulary. PR will be review during the next technical review. Thanks for adding the test sample!
Co-authored-by: Molkobain <[email protected]>
Discussed during technical review, code itself is accepted.
If you need some help in adding the unit test (extract the regexp in a dedicated const/method, add the test file, ...), let us know :) |
Thank you for the follow up.
If you can pinpoint me to an example, where this is done already, I think I can figure it out on my own. But I don't want to prepare something, which isn't helpful the way I did it (e. g. usage of PHPUnit or alike). |
Sure, you can:
Mind that you will have to extract the RegExp in a class constant or method so that you can use it in both the business logic and the unit test. |
Moved to functional review to avoid loosing a month |
Accepted during functional review |
Well :( I didn't knew that. This explains things. Thank you for all the extra work you are putting in.
Yeah, don't get me wrong. I wasn't trying to critizise your processes, whatever they may look like (especially with the info above). And I think you are completely right, requesting "simple" improvements upfront to avoid loosing another month on each PR is a good practice. - But maybe it's worth to shortly discuss those internally upfront, aswell. I think, it could reduce the amount of artificial stress on all sides, coming around with a (for the moment) complete list of requested changes, as the amount of necessary iterations of coding-review-communicate might be reduced on all sides. Thank you for explaining. I'll work on the requested improvements in the next few days. |
Exactly we should try to discuss things internally before asking for changes multiple times which can be a pain in the ash for the PR initiator. Especially as you can get the feeling that it's a never ending situation 😅 We'll try to improve this in the future :) |
Hello, The usual process is only 1 Combodo dev is set as assignee to the PR. He reads it, check for Combodo requirements, and then present it to the rest of the team during reviews (technical, functional, as Guillaume said). We then give one consolidated answer. Your PR already passed both reviews, but we asked for more changes afterwards. @Molkobain tell me when available so we can agree on this PR status O:) |
I'm ok with changing the static property to a constant you are right. We just need to ensure that |
We discussed IRL with Guillaume to check every aspect of my review and the past remarks we made. We are proposing to :
As you already done lot of work, I can make the modifications if you wish ? (and also if you checked the "allow maintainers" PR checkbox) |
I'd be thankful, if you did the modifications, yes. Thank you. Unfortunately, I seem to be blind. Where can I find the "Allow Maintainers"-Checkbox on the PR? I even searched the page with Ctrl+F and can't find it. 😞 /edit:
No problem at all. Thank you for the explanation! |
@piRGoif : I've added you to our list of maintainers for this repo. I hope you got an invitation and you can then apply your changes. |
By the way, the "|allow edits from maintainers](https://docs.github.com/en/github-ae@latest/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork)" option is described in our contributing file Someone told rencently this option is only available during PR creation, and that is what I understand reading the doc :
|
@piRGoif : Looks totally fine for me. Thank you very much! - Thank you aswell for the info on when I need to set the checkbox to allow you to edit PRs. I just re-synched our branch on our fork to this master. Do I need to do something else before you can continue? |
Hello, You're welcome Martin, we are very glad you did this change, and again all our apologies for the confusion on our side (mine in particular O:) ) I see you made a merge commit from master to your fork branch... Not sure this will go well when merging the pr :/ We'll see. |
Accepted during technical review. |
Accepted during functional review. |
Actually maybe it has something to do with 2A (*) and 2B (+) being interpreted as regexp tokens, which would explain why 2D (-) is highlighted differently. So maybe this is just a matter of how to add the special regexp tokens as actual chars, not tokens. Moving 2A / 2B at the end of the regexp is not a solution in itself, it's just moving the issue somewhere else 😁 What do you all think? |
I would also use regular characters. I think maybe they all need to be double escaped? What do you get when you print the var FILENAME_REGEX? |
Hi all. To move on with this PR, I'd like to respond to the latest comments and summarize.
Regarding point 3) from Hipska: What I like especially is, that it's a pragmatic approach. I think, I even already tried it in the past, but noticed a big (at least for me) downside: The Regex contains "valid chars", which are easily mistaken for other chars or being not even recognized as chars, since they are somewhat special. But to be even more pracmatic: Does anything prevent us from mixing the representation? The char in question, which creates the problems, is the comma. A comma is easy to not to be mistaken (like space vs. tab or something alike). -> Let's just use the regular "comma"-Char and only use the Hex-Repr for chars, which can be easily mistaken. What do you think about that? |
Did you already try a simple print (to log or with var_dump) of the FILENAME_REGEX var? I think that will explain a lot.. I just tried in 3v4l and it shows they definitely need to be double escaped!! |
Reason for the bugfix
Content-Type: application/octet-stream; charset=UTF-8; name=example_attachment_mail.csv
orContent-Disposition: attachment; filename=example_attachment_mail.csv
(meaning: a filename or name attribute, where the filename is not quoted), are not considered as valid filesnames and thus, the filetype with an added integer value is used as the filename of the attachment.Explanation of the approach to fix it
filename=
orname=
attributes, (2) the same with single quotes (the RFC doesn't specify the method of quoting, so we take care of that aswell), (3) the case, where the filename isn't quoted.Implementationdetails and Caveats
$aMatches[1]
to get the result of the match, the content of$aMatches[1]
can now be different compared to before, since we are now using capture groups and thus another level of possible results is introduced. But what we can be sure about is:$aMatches[1]
will always contain a) the filename with quotes, or b) the filename without quotes. For that reason, we simply usestr_replace()
later on to remove all types of quotes from the filenames, since nobody wants filenames containing any kind of quotes, anyway. I think, no harm will be done by this slight change in behavior.$aMatches
:and now will be
(Note the difference in [1] and [1] from before and after.)
(Note the quotes in [1].)
(Note: No quotes in [1])
Testresults
test/emailsSample/email_with_and_without_quotes_around_attachment_name.eml
containing all cases of afilename
and aname
attribute a) with double quotes, b) with single quotes, c) without any quotes. It is contained in this PR aswell.utils/test.php
for all testmails before the patch and after and created a diff of the output: Only my testcases and attachments of three other testmails, which don't comply with the standard occur. The explanation of the other testcases occuring are:1544c1544
: In this case, no quotes are used for the filename, but the filename contains a space, which is not permitted as of the RFC.1618c1618
and1622c1622
: In this cases, the filename until now was wrongfully built, since no quotes, but containing a space resulting in a filename containing the attributessize
andcreation_date
by accident.All other attachment filenames stay the same. Please see the following screenshot for proof.
A word about the caveat
Yes, this PR will slightly change the behavior of the function (no filenames containing quotes are possible anymore), but I don't think, this will cause any harm (quite the opposite, I think). Furthermore, recreating the exact behavior with quoted filenames as before, will result in a way more complicated function, where we'd need to first check, which rule is applicable and then - based on the rule chosen - apply the right indexing into
$aMatches
. I think, this approach would overcomplicate things in an unnecessary way. - But of course we are free to discuss this./cc @larhip
Cheers,
Martin