Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Email address quotes are not preserved when phrase is not defined #52

Open
atoomic opened this issue Sep 6, 2017 · 12 comments
Open

Email address quotes are not preserved when phrase is not defined #52

atoomic opened this issue Sep 6, 2017 · 12 comments

Comments

@atoomic
Copy link

atoomic commented Sep 6, 2017

Notice after updating Email::MIME from 1.940 to version 1.945
The email address '<>' quotes are not preserved if no phrase/name is set

one liner to show the issue with 1.945

$>perl526 -MEmail::MIME::Header -MEmail::MIME::Encode -E 'say Email::MIME::Encode::maybe_mime_encode_header("To", "<iwrestled\@abear.once>", "utf-8")'
    iwrestled@abear.once

$>perl526 -MEmail::MIME::Encode -E 'say Email::MIME::Encode::maybe_mime_encode_header("To", "<iwrestled\@abear.once>", "utf-8")'                      
    <iwrestled@abear.once>

After investigation it bisects to the commit ed7c290 which introduces the usage of Email::Address::XS instead of Email::Address.
But Email::Address is not the source of the problem and the XS one is behaving the same way than the non XS one and preserve the brackets.

> perl -e 'use Email::Address; map { print $_->format . qq[\n] } Email::Address->parse( q[<[email protected]>] );'
[email protected]
> perl -e 'use Email::Address::XS; map { print $_->format . qq[\n] } Email::Address::XS->parse( q[<[email protected]>] );'
[email protected]

The problem comes from the shortcut in maybe_mime_encode_header which only occurs if Email::MIME::Header is loaded...

atoomic added a commit to atoomic/Email-MIME that referenced this issue Sep 6, 2017
GH rjbs#52

Show a regression introduced by commit ed7c290
which is using Email::Adress::XS to format the email
which result in stripping the 'angle brackets' around
an email when not using a 'phrase' for the email.

Previous behavior was preserving the '<quoted@email>'.
atoomic added a commit to atoomic/Email-MIME that referenced this issue Sep 6, 2017
@atoomic
Copy link
Author

atoomic commented Sep 6, 2017

commit 9107810 is adding a unit test to show the issue whereas 7986a6c is an attempt to add a naive fix

@pali
Copy link
Contributor

pali commented Sep 6, 2017

Apparently this is more complicated problem. Function maybe_mime_encode_header is for encoding Unicode header to 8bit MIME. And for To header with value <[email protected]>, correct MIME encoding is also [email protected].

Due to how MIME is working, for structures headers like addresses, it is first needed to fully decompose address-like header into internal structures, then every Unicode string needs to be encoded into 8bit MIME and after that header value is composed back to one structured string.

And it is better to guarantee RFC-compliant output which represent same input, rather then trying to hack around some special cases to not touch header, like in your case.

Your current bug report is just about different representation of encoded MIME header and both values have same meaning. There are plenty equivalent representation of your example (Unicode) input in MIME.

@pali
Copy link
Contributor

pali commented Sep 6, 2017

So question is, why you want to hack maybe_mime_encode_header? If you already prepared correctly formatted 8bit MIME header, why not rather use header_raw_set function?

@pali
Copy link
Contributor

pali commented Sep 6, 2017

Anyway, Email::MIME::Encode should load Email::MIME::Header as it uses variable $Email::MIME::Header::header_to_class_map.

@atoomic
Copy link
Author

atoomic commented Sep 6, 2017

Very good question, in the existing code I maintain, Email::MIME::Encode::maybe_mime_encode_header is used to encode email headers.
That portion of code only deal with header, I realize that it's probably incorrect as Email::MIME::Encode namespace is labelled as "private", using header_str on one Email::MIME object is probably more appropriate.

And Yes I agree this is not a bug, but a behavior change. Our main concern with this change is that this could affect the spam score from Mail::SpamAssassin, as an email without "" has a malus.

For your second message (hoped that github could handle reply to each message individually) I agree Email::MIME::Encode should always load Email::MIME::Header so it would provide a consistent behavior.

@atoomic
Copy link
Author

atoomic commented Sep 6, 2017

Email::Simple::Header is probably the one providing all the abstraction to manipulate headers, but do not provide the encoding layer

@pali
Copy link
Contributor

pali commented Sep 6, 2017

Very good question, in the existing code I maintain, Email::MIME::Encode::maybe_mime_encode_header is used to encode email headers.

Ah :-( Email::MIME::Encode is a private helper for MIME header encoding which is written in documentation https://metacpan.org/pod/Email::MIME::Encode but seems that it does not stop people to use it and then start seeing different behavior of private functions... Anyway, you are not alone some people are not stopped even when function starts with underline, which is perl way for private function rjbs/Email-MIME-ContentType#6

So I would suggest you to stop using private functions of random modules as they can be changed at anytime. If you have really good reason for it, try to ask if such function cannot be marked as public, define stable API and important: write documentation for it.

as an email without "" has a malus.

What does it mean without ""? If whole problem of this reported issue is just because we generate something which is not good for Mail::SpamAssassin, then we could change generated output to be more compatible with Mail::SpamAssassin.

pali added a commit to pali/Email-MIME that referenced this issue Sep 6, 2017
…t explicit "use Email::MIME::Header"

Email::MIME::Encode uses %Email::MIME::Header::header_to_class_map variable
but have not loaded Email::MIME::Header module. This patch fixes it.

Problem discovered by atoomic in report rjbs#52.
@pali
Copy link
Contributor

pali commented Sep 6, 2017

Fix for Email::MIME::Encode to work also without explicit Email::MIME::Header loading is there: #54

@atoomic
Copy link
Author

atoomic commented Sep 6, 2017

Email::MIME::Encode::maybe_mime_encode_header
Thanks I've noted that this is for now a private function. Might consider asking to promote it to public with some doc changes & stable API.

as an email without "" has a malus.
Sorry I mean an email without angle brackets <...>

Another idea would be patching Email::Address::XS (and also probably the non XS one) to add a boolean in the message_address struct [then use it] to know if angle brackets were used in the original version then preserve it if it were used ?

We could either make the default behavior to preserve the angle brackets or make it optional, so Email::MIME could set it or not depending on the option.

Problem is this would require patching the upstream dovecot code

diff --git a/Email-Address-XS/dovecot-parser.h b/Email-Address-XS/dovecot-parser.h
index b0263543d..d03a7ba1d 100644
--- a/Email-Address-XS/dovecot-parser.h
+++ b/Email-Address-XS/dovecot-parser.h
@@ -19,6 +19,7 @@ struct message_address {
        char *comment;
        /* there were errors when parsing this address */
        bool invalid_syntax;
+     bool use_anglebrackets;
 };

@pali
Copy link
Contributor

pali commented Sep 7, 2017

Email::MIME::Encode::maybe_mime_encode_header

Thanks I've noted that this is for now a private function. Might consider asking to promote it to public with some doc changes & stable API.

I'm not very happy for making this module public for another modules. What is your purpose? Or better question, why is not Email::MIME enough for you? Is there a bug or missing feature in Email::MIME?

Another idea would be patching Email::Address::XS (and also probably the non XS one)

Email::Address is dead and contains security issues. Rather do not use it on untrusted input. I have example of emails which cause DOS on servers which uses Email::Address->parse(). Reason why I created Email::Address::XS.

to add a boolean in the message_address struct [then use it] to know if angle brackets were used in the original version then preserve it if it were used ?

I do not like this idea. Two reasons:

  1. If you create Email::Address::XS object in any way (via ->new or via ->parse or via some parse_* function), then format method would depend only on tokens (user, host, address, phrase, comment) and when tokens are same, then it always provides same output. This is something important -- to provide unified, deterministic and immutable behavior.

  2. Original string passed to ->parse method still would be different from output of ->format function. E.g. any comments in phrase or email address are dropped, phrase is simplified, etc... So it does not make sense to add some special case for angle brackets to have some different behavior.

See example:

$ perl -Iblib/lib -Iblib/arch -MEmail::Address::XS -E 'say Email::Address::XS->parse(q((comment) "Name" (comment) <a.(comment)[email protected]> (last comment)))->format()'
Name <[email protected]> (last comment)

There are a couple of ways how to write into header address with user='a.b', host='host.com', phrase='Name' and with comment after address '(last comment)'. Format function choose one which is not so complicated, valid according to RFC2822 and is not hard to parse.

In new version of Email::Address::XS there would be method original which like in Email::Address would return original string used for parsing.

Maybe now for you it looks like it could be used in Email::MIME. But, this would not work if you modify some member of Email::Address::XS -- e.g. any MIME encoding/decoding.

Look at example: <[email protected]> - this does not need MIME re-encoding.
But e.g. IDN <iwrestled@abéár.once> needs it. And personally I do not like idea that for IDN domains output string would be different as without IDN. (IDN is not supported yet, but pull request is prepared). In my opinion such behavior would cause more problems and specially for those who in most cases uses plain ASCII strings.

So my suggestion is:

If you need to preformat some email header, then use method header_raw_set. Here it is guaranteed that it would not be MIME-reencoded or modified. But then you must correctly MIME encode it. Method header_str_set is really mean to set Unicode string value and cause lot of problems for structured headers. This is also reason why support for header objects comes into Email::MIME.

@atoomic
Copy link
Author

atoomic commented Sep 11, 2017

First thanks for your long & precise answer, and sorry for my late answer on this topic.

There are multiple reasons we are not using the full version of Email::MIME:

  • legacy code [Email::MIME was not considered at the time of the implementation]
  • memory usage: Email::MIME adds an extra 1.5Mo of memory on top of the Email::MIME::Encode (view stats at the end)
  • our Internal code to send emails was just trying to use a generic library to build encoded headers. As Email::Simple::Header do not provide that layer, Email::MIME::Encode was the logical choice.

here are some basic metrics [this is mainly working on linux system - not stable on macosx]

> perl -MEmail::MIME -e 'print qx{grep RSS /proc/$$/status}'
VmRSS:	    6980 kB

> perl -MEmail::MIME::Encode -e 'print qx{grep RSS /proc/$$/status}'
VmRSS:	    5584 kB

> perl -MHomeMade::Email::Object -e 'print qx{grep RSS /proc/$$/status}'
VmRSS:	    3240 kB

I trend to share your point of view where we should not use a private method in our codebase, and either use the whole enchilada or not... [ I've just opened an internal case for this ].

I also agree that both syntax are corrects. The root problem we were trying to solve here was to preserve the <quotes> to avoid a malus in the SpamAssassin score. This should probably be fixed upstream.

We were trying to add a rule saying: if the original email was using <quotes> then preserve them in the final version of the email.

From my point of view, we can close this ticket, I was mainly pointing to a behavior change. The commit in #54 would at least make it behave consistent.

@pali
Copy link
Contributor

pali commented Nov 27, 2017

Anyway, have you reported bug to SpamAssassin that prediction of spam based on missing <quotes> is wrong? I think this is something which needs to be fixed...

rjbs pushed a commit that referenced this issue May 9, 2020
…t explicit "use Email::MIME::Header"

Email::MIME::Encode uses %Email::MIME::Header::header_to_class_map variable
but have not loaded Email::MIME::Header module. This patch fixes it.

Problem discovered by atoomic in report #52.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants