New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
smtplib doesn't handle unicode passwords #73936
Comments
|
Trying to use unicode passwords on smtplib fails miserably on python3. (code, resp) = self.docmd(encode_base64(password.encode('ascii'), eol='')) which obviously dies when trying to handle unicode chars. |
|
I'm sorry I rushed my comment. Same thing happens on line 604 return encode_base64(s.encode('ascii'), eol='') changing both from 'ascii' to 'utf-8' works for me. |
|
See msg253287. Someone should check the RFC. It is not obvious that just encoding using utf8 is correct; fundamentally passwords are binary data. But the auth methods don't currently accept binary data. UTF8 is a reasonable default these days, I think, but if we support more than ascii I think we need to support binary, with utf8 as the default encoding. |
|
Note: it is definitely the case, regardless of what the RFC says, that binary passwords need to be supported. utf-8 should probably be used as the default encoding for string passwords, rather than ascii. See also bpo-33741. |
|
It would be extremely helpful to have some test cases that actually work for users but fail with smtplib. So far we have no actual examples, likely due to these being passwords.
I'm not sure what you mean by "binary". Do you mean 8-bit characters, a.k.a. bytes?
It is also possible that the appropriate encoding here is "latin1" a.k.a. ISO-8859-1 encoding. This specifically includes many specialized versions of latin characters, e.g. those with German umlauts as mentioned in the duplicate issue bpo-33741. And it could even be the very common Windows-1252 encoding: "It is probably the most-used 8-bit character encoding in the world." (Wikipedia) |
|
In my case I was doing tests with "contraseña" which is (spanish for password) and it failed On June 14, 2018 8:36:30 AM GMT+02:00, Tal Einat <report@bugs.python.org> wrote:
|
|
While you are correct that latin1 may be common in this situation, I think it may still be better to have utf-8 be the default, since that is the (still emerging? :) standard. However, you are correct to call for examples: if in the *majority* of the real-world cases it turns out latin1 is what is used, then we could default to that (or not have a default, but instead document our observations). I don't know how we accumulate enough information to make that decision, though. Maybe we could look at what other mail programs do? Thunderbird, etc? David, which mail program(s) did you use that were able to successfully send that password? And yes, by binary passwords I mean that the module needs to support being passed a bytes-like object as the password, since clearly there are servers "in the wild" that support non-ascii passwords and the only way to be sure one can send the server the correct password is by treating it as a series of bytes. The library caller will have to be responsible for picking the correct encoding based on local knowledge. |
|
Both thunderbird, sogo (web) and gmail (web). On June 14, 2018 3:54:31 PM GMT+02:00, "R. David Murray" <report@bugs.python.org> wrote:
|
|
For the web cases I presume you also set the password using the web interface, so that doesn't really tell us anything useful. Did you use thunderbird to access the mailbox that you set up via gmail and/or sogo? That would make what thunderbird does the interesting question. |
|
Yes, i used thunderbird for both On June 14, 2018 5:14:31 PM GMT+02:00, "R. David Murray" <report@bugs.python.org> wrote:
|
Perhaps we should make smtplib accept only bytes, passing on the responsibility of using an appropriate encoding to its users? This seems like the most straightforward and transparent choice. It would not be backwards-compatible, though. Alternatively, we could change smtplib to accept passwords as bytes or strings, but raise an informative exception when given strings with non-ASCII characters. As now, users could be surprised if they have been passing passwords as string and hadn't tested their use of smtplib with non-ASCII passwords. We'd just improve the exception and documentation to clarify the situation. |
|
I would like to see the second option (allow both, warning on non-ascii) On 17 June 2018 at 21:03, Tal Einat <report@bugs.python.org> wrote:
|
|
We must continue to support at least ascii strings, for backward compatibility reasons. We can certainly improve the error messages, but the goal of this issue is to add support for bytes passwords. I lean toward continuing to only support ascii strings, and making it the responsibility of the program to do the encoding to bytes when dealing with non-ascii. However, I'd like to also be able to recommend in the docs what encoding is most likely to work, if someone can find out what encoding Thunderbird uses...however, it occurs to me that it may be using whatever encoding the OS is using (LC_LANG, oem codepage, etc), and that David's experiments worked because the same encoding was used for the same reason when the password was set. I'm not sure how browsers/webmail works in that regard, honestly. That's less important than just adding support for bytes passwords, though. |
|
I found the Thunderbirg bugzilla issues where they appear to have dealt precisely with this issue (for a variety of protocols, including SMTP): |
|
This specifically seems relevant:
|
|
There's also some discussion there (from 3 years ago) of possibly needing to fall back to ISO-8859-1 to work with MS Exchange, despite the standards saying UTF-8 should be used. It's unclear to me whether that's actually the case. |
|
From reading the aforementioned discussion on Thunderbird's issue tracker, ISTM that encoding with UTF-8 is the way to go. |
|
I didn't think to look at the standards for the auth mechanisms, I only looked at the smtp standards. So, if the standard says utf-8, then we should use that. But we should still support bytes passwords so that an application could work around issues like the possible ms-exchange one, if they need to. Those could be two separate PRs, though. In fact, they probably should be. As a standards-compliance issue, we would be within our rules to backport the utf-8 standards-compliance fix. |
|
Are there any PRs already for this issue? I couldn't find any on GitHub. Also, is the plan to branch the fix down to at least 3.6? |
|
I have worked on this, almost ready for a PR. |
|
Never mind, I won't have time for this any time soon, better if someone else can do it. |
|
Hello. I would like to work on this, should the issue be assigned on me or this comment is enough? |
|
A comment here is all that is needed. |
|
@vadim Pushtaev I also want to work on it. If you wanna work together. Maybe we can talk on zulipchat. :D |
|
That's OK, you can do it. |
|
I added a pitch to support utf-8. |
|
I hit the same issue. |
|
Sorry, I forgot about this PR, I will update the patch depends on review soon :D |
|
I just updated the PR |
|
Utf8 passwords are still broken on python 3.8. Patch works great on 3.8. |
|

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: