smtplib doesn't handle unicode passwords #73936

david · 2017-03-07T20:40:24Z

BPO	29750
Nosy	@warsaw, @taleinat, @giampaolo, @bitdancer, @Windsooon, @JustAnother1, @P403n1x87, @VadimPushtaev
PRs	#8938 #15064

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2017-03-07.20:40:24.455>
labels = ['3.8', 'type-feature', 'library', 'expert-email']
title = "smtplib doesn't handle unicode passwords"
updated_at = <Date 2019-12-26.23:46:13.786>
user = 'https://bugs.python.org/david'

bugs.python.org fields:

activity = <Date 2019-12-26.23:46:13.786>
actor = 'seblu'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)', 'email']
creation = <Date 2017-03-07.20:40:24.455>
creator = 'david__'
dependencies = []
files = []
hgrepos = []
issue_num = 29750
keywords = ['patch']
message_count = 30.0
messages = ['289184', '289185', '289186', '319222', '319497', '319499', '319514', '319515', '319521', '319522', '319827', '319828', '319830', '319862', '319863', '319864', '319866', '319891', '321172', '321174', '322431', '322447', '322450', '322457', '322468', '324480', '348831', '348832', '348837', '358892']
nosy_count = 10.0
nosy_names = ['barry', 'taleinat', 'giampaolo.rodola', 'r.david.murray', 'seblu', 'david__', 'Windson Yang', 'JustAnother1', 'Gabriele Tornetta', 'Vadim Pushtaev']
pr_nums = ['8938', '15064']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue29750'
versions = ['Python 3.8']

david · 2017-03-07T20:40:24Z

Trying to use unicode passwords on smtplib fails miserably on python3.
My particular issue arises on line 643 of said library:

(code, resp) = self.docmd(encode_base64(password.encode('ascii'), eol=''))

which obviously dies when trying to handle unicode chars.

david · 2017-03-07T20:42:27Z

I'm sorry I rushed my comment. Same thing happens on line 604

return encode_base64(s.encode('ascii'), eol='')

changing both from 'ascii' to 'utf-8' works for me.

bitdancer · 2017-03-07T21:12:13Z

See msg253287. Someone should check the RFC. It is not obvious that just encoding using utf8 is correct; fundamentally passwords are binary data. But the auth methods don't currently accept binary data. UTF8 is a reasonable default these days, I think, but if we support more than ascii I think we need to support binary, with utf8 as the default encoding.

bitdancer · 2018-06-10T14:17:55Z

Note: it is definitely the case, regardless of what the RFC says, that binary passwords need to be supported. utf-8 should probably be used as the default encoding for string passwords, rather than ascii. See also bpo-33741.

taleinat · 2018-06-14T06:36:30Z

It would be extremely helpful to have some test cases that actually work for users but fail with smtplib. So far we have no actual examples, likely due to these being passwords.

Note: it is definitely the case, regardless of what the RFC says, that binary passwords need to be supported.

I'm not sure what you mean by "binary". Do you mean 8-bit characters, a.k.a. bytes?

utf-8 should probably be used as the default encoding for string passwords, rather than ascii.

It is also possible that the appropriate encoding here is "latin1" a.k.a. ISO-8859-1 encoding. This specifically includes many specialized versions of latin characters, e.g. those with German umlauts as mentioned in the duplicate issue bpo-33741. And it could even be the very common Windows-1252 encoding: "It is probably the most-used 8-bit character encoding in the world." (Wikipedia)

david · 2018-06-14T07:08:37Z

In my case I was doing tests with "contraseña" which is (spanish for password) and it failed

On June 14, 2018 8:36:30 AM GMT+02:00, Tal Einat <report@bugs.python.org> wrote:

Tal Einat <taleinat@gmail.com> added the comment:

It would be extremely helpful to have some test cases that actually
work for users but fail with smtplib. So far we have no actual
examples, likely due to these being passwords.

> Note: it is definitely the case, regardless of what the RFC says,
that binary passwords need to be supported.

I'm not sure what you mean by "binary". Do you mean 8-bit characters,
a.k.a. bytes?

> utf-8 should probably be used as the default encoding for string
passwords, rather than ascii.

It is also possible that the appropriate encoding here is "latin1"
a.k.a. ISO-8859-1 encoding. This specifically includes many
specialized versions of latin characters, e.g. those with German
umlauts as mentioned in the duplicate issue bpo-33741. And it could even
be the very common Windows-1252 encoding: "It is probably the most-used
8-bit character encoding in the world." (Wikipedia)

----------

Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue29750\>

bitdancer · 2018-06-14T13:54:32Z

While you are correct that latin1 may be common in this situation, I think it may still be better to have utf-8 be the default, since that is the (still emerging? :) standard. However, you are correct to call for examples: if in the *majority* of the real-world cases it turns out latin1 is what is used, then we could default to that (or not have a default, but instead document our observations).

I don't know how we accumulate enough information to make that decision, though. Maybe we could look at what other mail programs do? Thunderbird, etc? David, which mail program(s) did you use that were able to successfully send that password?

And yes, by binary passwords I mean that the module needs to support being passed a bytes-like object as the password, since clearly there are servers "in the wild" that support non-ascii passwords and the only way to be sure one can send the server the correct password is by treating it as a series of bytes. The library caller will have to be responsible for picking the correct encoding based on local knowledge.

david · 2018-06-14T13:56:23Z

Both thunderbird, sogo (web) and gmail (web).

On June 14, 2018 3:54:31 PM GMT+02:00, "R. David Murray" <report@bugs.python.org> wrote:

R. David Murray <rdmurray@bitdance.com> added the comment:

While you are correct that latin1 may be common in this situation, I
think it may still be better to have utf-8 be the default, since that
is the (still emerging? :) standard. However, you are correct to call
for examples: if in the *majority* of the real-world cases it turns out
latin1 is what is used, then we could default to that (or not have a
default, but instead document our observations).

I don't know how we accumulate enough information to make that
decision, though. Maybe we could look at what other mail programs do?
Thunderbird, etc? David, which mail program(s) did you use that were
able to successfully send that password?

And yes, by binary passwords I mean that the module needs to support
being passed a bytes-like object as the password, since clearly there
are servers "in the wild" that support non-ascii passwords and the only
way to be sure one can send the server the correct password is by
treating it as a series of bytes. The library caller will have to be
responsible for picking the correct encoding based on local knowledge.

----------

Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue29750\>

bitdancer · 2018-06-14T15:14:31Z

For the web cases I presume you also set the password using the web interface, so that doesn't really tell us anything useful. Did you use thunderbird to access the mailbox that you set up via gmail and/or sogo? That would make what thunderbird does the interesting question.

david · 2018-06-14T15:15:29Z

Yes, i used thunderbird for both

On June 14, 2018 5:14:31 PM GMT+02:00, "R. David Murray" <report@bugs.python.org> wrote:

R. David Murray <rdmurray@bitdance.com> added the comment:

For the web cases I presume you also set the password using the web
interface, so that doesn't really tell us anything useful. Did you use
thunderbird to access the mailbox that you set up via gmail and/or
sogo? That would make what thunderbird does the interesting question.

----------

Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue29750\>

taleinat · 2018-06-17T19:03:30Z

And yes, by binary passwords I mean that the module needs to support being passed a bytes-like object as the password, since clearly there are servers "in the wild" that support non-ascii passwords and the only way to be sure one can send the server the correct password is by treating it as a series of bytes. The library caller will have to be responsible for picking the correct encoding based on local knowledge.

Perhaps we should make smtplib accept only bytes, passing on the responsibility of using an appropriate encoding to its users? This seems like the most straightforward and transparent choice. It would not be backwards-compatible, though.

Alternatively, we could change smtplib to accept passwords as bytes or strings, but raise an informative exception when given strings with non-ASCII characters. As now, users could be surprised if they have been passing passwords as string and hadn't tested their use of smtplib with non-ASCII passwords. We'd just improve the exception and documentation to clarify the situation.

david · 2018-06-17T19:11:32Z

I would like to see the second option (allow both, warning on non-ascii)

On 17 June 2018 at 21:03, Tal Einat <report@bugs.python.org> wrote:

Tal Einat <taleinat@gmail.com> added the comment:

> And yes, by binary passwords I mean that the module needs to support
being passed a bytes-like object as the password, since clearly there are
servers "in the wild" that support non-ascii passwords and the only way to
be sure one can send the server the correct password is by treating it as a
series of bytes. The library caller will have to be responsible for
picking the correct encoding based on local knowledge.

Perhaps we should make smtplib accept only bytes, passing on the
responsibility of using an appropriate encoding to its users? This seems
like the most straightforward and transparent choice. It would not be
backwards-compatible, though.

Alternatively, we could change smtplib to accept passwords as bytes or
strings, but raise an informative exception when given strings with
non-ASCII characters. As now, users could be surprised if they have been
passing passwords as string and hadn't tested their use of smtplib with
non-ASCII passwords. We'd just improve the exception and documentation to
clarify the situation.

----------

Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue29750\>

bitdancer · 2018-06-17T21:46:20Z

We must continue to support at least ascii strings, for backward compatibility reasons. We can certainly improve the error messages, but the goal of this issue is to add support for bytes passwords. I lean toward continuing to only support ascii strings, and making it the responsibility of the program to do the encoding to bytes when dealing with non-ascii. However, I'd like to also be able to recommend in the docs what encoding is most likely to work, if someone can find out what encoding Thunderbird uses...however, it occurs to me that it may be using whatever encoding the OS is using (LC_LANG, oem codepage, etc), and that David's experiments worked because the same encoding was used for the same reason when the password was set. I'm not sure how browsers/webmail works in that regard, honestly.

That's less important than just adding support for bytes passwords, though.

taleinat · 2018-06-18T06:13:23Z

I found the Thunderbirg bugzilla issues where they appear to have dealt precisely with this issue (for a variety of protocols, including SMTP):

https://bugzilla.mozilla.org/show_bug.cgi?id=312593

taleinat · 2018-06-18T06:15:48Z

This specifically seems relevant:

In order for Thunderbird to be standards-compliant-enough to interoperate with standards-compliant servers, it should use UTF-8 for the SASL PLAIN mechanism regardless of the underlying protocol (IMAP, POP and SMTP). That includes the POP3 "AUTH PLAIN" command and the SMTP "AUTH PLAIN" command.

taleinat · 2018-06-18T06:45:56Z

There's also some discussion there (from 3 years ago) of possibly needing to fall back to ISO-8859-1 to work with MS Exchange, despite the standards saying UTF-8 should be used. It's unclear to me whether that's actually the case.

taleinat · 2018-06-18T06:55:27Z

From reading the aforementioned discussion on Thunderbird's issue tracker, ISTM that encoding with UTF-8 is the way to go.

bitdancer · 2018-06-18T15:35:03Z

I didn't think to look at the standards for the auth mechanisms, I only looked at the smtp standards. So, if the standard says utf-8, then we should use that. But we should still support bytes passwords so that an application could work around issues like the possible ms-exchange one, if they need to. Those could be two separate PRs, though. In fact, they probably should be. As a standards-compliance issue, we would be within our rules to backport the utf-8 standards-compliance fix.

P403n1x87 · 2018-07-06T13:28:17Z

Are there any PRs already for this issue? I couldn't find any on GitHub. Also, is the plan to branch the fix down to at least 3.6?

taleinat · 2018-07-06T14:07:53Z

I have worked on this, almost ready for a PR.

taleinat · 2018-07-26T14:24:19Z

Never mind, I won't have time for this any time soon, better if someone else can do it.

VadimPushtaev · 2018-07-26T20:11:44Z

Hello. I would like to work on this, should the issue be assigned on me or this comment is enough?

taleinat · 2018-07-26T21:01:17Z

A comment here is all that is needed.

Windsooon · 2018-07-27T03:26:37Z

@vadim Pushtaev I also want to work on it. If you wanna work together. Maybe we can talk on zulipchat. :D

VadimPushtaev · 2018-07-27T06:47:49Z

That's OK, you can do it.

Windsooon · 2018-09-02T16:22:35Z

I added a pitch to support utf-8.

seblu · 2019-08-01T01:31:13Z

I hit the same issue.
Do you have news about the patch review and its inclusion?

Windsooon · 2019-08-01T02:15:48Z

Sorry, I forgot about this PR, I will update the patch depends on review soon :D

Windsooon · 2019-08-01T04:14:30Z

I just updated the PR

seblu · 2019-12-26T23:46:14Z

Utf8 passwords are still broken on python 3.8.

Patch works great on 3.8.

petsuter · 2022-04-30T08:09:21Z

Thunderbird uses UTF-8, referencing rfc4616#section-2
.NET uses UTF-8
Go uses UTF-8 implicitly, same for various PHP, Perl and Rust libraries

david mannequin added the stdlib label Mar 7, 2017

bitdancer added expert-email 3.7 type-feature labels Mar 7, 2017

bitdancer added 3.8 and removed 3.7 labels Jun 10, 2018

ezio-melotti transferred this issue from another repository Apr 10, 2022

May	JUN	Jul
	05
2021	2022	2023

smtplib doesn't handle unicode passwords #73936

smtplib doesn't handle unicode passwords #73936

david mannequin commented Mar 7, 2017

david mannequin commented Mar 7, 2017

david mannequin commented Mar 7, 2017

bitdancer commented Mar 7, 2017

bitdancer commented Jun 10, 2018

taleinat commented Jun 14, 2018

david mannequin commented Jun 14, 2018

bitdancer commented Jun 14, 2018

david mannequin commented Jun 14, 2018

bitdancer commented Jun 14, 2018

david mannequin commented Jun 14, 2018

taleinat commented Jun 17, 2018

david mannequin commented Jun 17, 2018

bitdancer commented Jun 17, 2018

taleinat commented Jun 18, 2018

taleinat commented Jun 18, 2018

taleinat commented Jun 18, 2018

taleinat commented Jun 18, 2018

bitdancer commented Jun 18, 2018

P403n1x87 mannequin commented Jul 6, 2018

taleinat commented Jul 6, 2018

taleinat commented Jul 26, 2018

VadimPushtaev mannequin commented Jul 26, 2018

taleinat commented Jul 26, 2018

Windsooon mannequin commented Jul 27, 2018

VadimPushtaev mannequin commented Jul 27, 2018

Windsooon mannequin commented Sep 2, 2018

seblu mannequin commented Aug 1, 2019

Windsooon mannequin commented Aug 1, 2019

Windsooon mannequin commented Aug 1, 2019

seblu mannequin commented Dec 26, 2019

petsuter commented Apr 30, 2022

smtplib doesn't handle unicode passwords #73936

smtplib doesn't handle unicode passwords #73936

Comments

david mannequin commented Mar 7, 2017

david mannequin commented Mar 7, 2017

david mannequin commented Mar 7, 2017

bitdancer commented Mar 7, 2017

bitdancer commented Jun 10, 2018

taleinat commented Jun 14, 2018

david mannequin commented Jun 14, 2018

bitdancer commented Jun 14, 2018

david mannequin commented Jun 14, 2018

bitdancer commented Jun 14, 2018

david mannequin commented Jun 14, 2018

taleinat commented Jun 17, 2018

david mannequin commented Jun 17, 2018

bitdancer commented Jun 17, 2018

taleinat commented Jun 18, 2018

taleinat commented Jun 18, 2018

taleinat commented Jun 18, 2018

taleinat commented Jun 18, 2018

bitdancer commented Jun 18, 2018

P403n1x87 mannequin commented Jul 6, 2018

taleinat commented Jul 6, 2018

taleinat commented Jul 26, 2018

VadimPushtaev mannequin commented Jul 26, 2018

taleinat commented Jul 26, 2018

Windsooon mannequin commented Jul 27, 2018

VadimPushtaev mannequin commented Jul 27, 2018

Windsooon mannequin commented Sep 2, 2018

seblu mannequin commented Aug 1, 2019

Windsooon mannequin commented Aug 1, 2019

Windsooon mannequin commented Aug 1, 2019

seblu mannequin commented Dec 26, 2019

petsuter commented Apr 30, 2022