X Tutup
The Wayback Machine - https://web.archive.org/web/20220605231508/https://github.com/python/cpython/issues/73936
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smtplib doesn't handle unicode passwords #73936

Open
david mannequin opened this issue Mar 7, 2017 · 31 comments
Open

smtplib doesn't handle unicode passwords #73936

david mannequin opened this issue Mar 7, 2017 · 31 comments
Labels
3.8 expert-email stdlib type-feature

Comments

@david
Copy link
Mannequin

@david david mannequin commented Mar 7, 2017

BPO 29750
Nosy @warsaw, @taleinat, @giampaolo, @bitdancer, @Windsooon, @JustAnother1, @P403n1x87, @VadimPushtaev
PRs
  • #8938
  • #15064
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2017-03-07.20:40:24.455>
    labels = ['3.8', 'type-feature', 'library', 'expert-email']
    title = "smtplib doesn't handle unicode passwords"
    updated_at = <Date 2019-12-26.23:46:13.786>
    user = 'https://bugs.python.org/david'

    bugs.python.org fields:

    activity = <Date 2019-12-26.23:46:13.786>
    actor = 'seblu'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)', 'email']
    creation = <Date 2017-03-07.20:40:24.455>
    creator = 'david__'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 29750
    keywords = ['patch']
    message_count = 30.0
    messages = ['289184', '289185', '289186', '319222', '319497', '319499', '319514', '319515', '319521', '319522', '319827', '319828', '319830', '319862', '319863', '319864', '319866', '319891', '321172', '321174', '322431', '322447', '322450', '322457', '322468', '324480', '348831', '348832', '348837', '358892']
    nosy_count = 10.0
    nosy_names = ['barry', 'taleinat', 'giampaolo.rodola', 'r.david.murray', 'seblu', 'david__', 'Windson Yang', 'JustAnother1', 'Gabriele Tornetta', 'Vadim Pushtaev']
    pr_nums = ['8938', '15064']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue29750'
    versions = ['Python 3.8']

    @david
    Copy link
    Mannequin Author

    @david david mannequin commented Mar 7, 2017

    Trying to use unicode passwords on smtplib fails miserably on python3.
    My particular issue arises on line 643 of said library:

    (code, resp) = self.docmd(encode_base64(password.encode('ascii'), eol=''))

    which obviously dies when trying to handle unicode chars.

    @david david mannequin added the stdlib label Mar 7, 2017
    @david
    Copy link
    Mannequin Author

    @david david mannequin commented Mar 7, 2017

    I'm sorry I rushed my comment. Same thing happens on line 604

    return encode_base64(s.encode('ascii'), eol='')

    changing both from 'ascii' to 'utf-8' works for me.

    @bitdancer
    Copy link
    Member

    @bitdancer bitdancer commented Mar 7, 2017

    See msg253287. Someone should check the RFC. It is not obvious that just encoding using utf8 is correct; fundamentally passwords are binary data. But the auth methods don't currently accept binary data. UTF8 is a reasonable default these days, I think, but if we support more than ascii I think we need to support binary, with utf8 as the default encoding.

    @bitdancer bitdancer added expert-email 3.7 type-feature labels Mar 7, 2017
    @bitdancer
    Copy link
    Member

    @bitdancer bitdancer commented Jun 10, 2018

    Note: it is definitely the case, regardless of what the RFC says, that binary passwords need to be supported. utf-8 should probably be used as the default encoding for string passwords, rather than ascii. See also bpo-33741.

    @bitdancer bitdancer added 3.8 and removed 3.7 labels Jun 10, 2018
    @taleinat
    Copy link
    Contributor

    @taleinat taleinat commented Jun 14, 2018

    It would be extremely helpful to have some test cases that actually work for users but fail with smtplib. So far we have no actual examples, likely due to these being passwords.

    Note: it is definitely the case, regardless of what the RFC says, that binary passwords need to be supported.

    I'm not sure what you mean by "binary". Do you mean 8-bit characters, a.k.a. bytes?

    utf-8 should probably be used as the default encoding for string passwords, rather than ascii.

    It is also possible that the appropriate encoding here is "latin1" a.k.a. ISO-8859-1 encoding. This specifically includes many specialized versions of latin characters, e.g. those with German umlauts as mentioned in the duplicate issue bpo-33741. And it could even be the very common Windows-1252 encoding: "It is probably the most-used 8-bit character encoding in the world." (Wikipedia)

    @david
    Copy link
    Mannequin Author

    @david david mannequin commented Jun 14, 2018

    In my case I was doing tests with "contraseña" which is (spanish for password) and it failed

    On June 14, 2018 8:36:30 AM GMT+02:00, Tal Einat <report@bugs.python.org> wrote:

    Tal Einat <taleinat@gmail.com> added the comment:

    It would be extremely helpful to have some test cases that actually
    work for users but fail with smtplib. So far we have no actual
    examples, likely due to these being passwords.

    > Note: it is definitely the case, regardless of what the RFC says,
    that binary passwords need to be supported.

    I'm not sure what you mean by "binary". Do you mean 8-bit characters,
    a.k.a. bytes?

    > utf-8 should probably be used as the default encoding for string
    passwords, rather than ascii.

    It is also possible that the appropriate encoding here is "latin1"
    a.k.a. ISO-8859-1 encoding. This specifically includes many
    specialized versions of latin characters, e.g. those with German
    umlauts as mentioned in the duplicate issue bpo-33741. And it could even
    be the very common Windows-1252 encoding: "It is probably the most-used
    8-bit character encoding in the world." (Wikipedia)

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue29750\>


    @bitdancer
    Copy link
    Member

    @bitdancer bitdancer commented Jun 14, 2018

    While you are correct that latin1 may be common in this situation, I think it may still be better to have utf-8 be the default, since that is the (still emerging? :) standard. However, you are correct to call for examples: if in the *majority* of the real-world cases it turns out latin1 is what is used, then we could default to that (or not have a default, but instead document our observations).

    I don't know how we accumulate enough information to make that decision, though. Maybe we could look at what other mail programs do? Thunderbird, etc? David, which mail program(s) did you use that were able to successfully send that password?

    And yes, by binary passwords I mean that the module needs to support being passed a bytes-like object as the password, since clearly there are servers "in the wild" that support non-ascii passwords and the only way to be sure one can send the server the correct password is by treating it as a series of bytes. The library caller will have to be responsible for picking the correct encoding based on local knowledge.

    @david
    Copy link
    Mannequin Author

    @david david mannequin commented Jun 14, 2018

    Both thunderbird, sogo (web) and gmail (web).

    On June 14, 2018 3:54:31 PM GMT+02:00, "R. David Murray" <report@bugs.python.org> wrote:

    R. David Murray <rdmurray@bitdance.com> added the comment:

    While you are correct that latin1 may be common in this situation, I
    think it may still be better to have utf-8 be the default, since that
    is the (still emerging? :) standard. However, you are correct to call
    for examples: if in the *majority* of the real-world cases it turns out
    latin1 is what is used, then we could default to that (or not have a
    default, but instead document our observations).

    I don't know how we accumulate enough information to make that
    decision, though. Maybe we could look at what other mail programs do?
    Thunderbird, etc? David, which mail program(s) did you use that were
    able to successfully send that password?

    And yes, by binary passwords I mean that the module needs to support
    being passed a bytes-like object as the password, since clearly there
    are servers "in the wild" that support non-ascii passwords and the only
    way to be sure one can send the server the correct password is by
    treating it as a series of bytes. The library caller will have to be
    responsible for picking the correct encoding based on local knowledge.

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue29750\>


    @bitdancer
    Copy link
    Member

    @bitdancer bitdancer commented Jun 14, 2018

    For the web cases I presume you also set the password using the web interface, so that doesn't really tell us anything useful. Did you use thunderbird to access the mailbox that you set up via gmail and/or sogo? That would make what thunderbird does the interesting question.

    @david
    Copy link
    Mannequin Author

    @david david mannequin commented Jun 14, 2018

    Yes, i used thunderbird for both

    On June 14, 2018 5:14:31 PM GMT+02:00, "R. David Murray" <report@bugs.python.org> wrote:

    R. David Murray <rdmurray@bitdance.com> added the comment:

    For the web cases I presume you also set the password using the web
    interface, so that doesn't really tell us anything useful. Did you use
    thunderbird to access the mailbox that you set up via gmail and/or
    sogo? That would make what thunderbird does the interesting question.

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue29750\>


    @taleinat
    Copy link
    Contributor

    @taleinat taleinat commented Jun 17, 2018

    And yes, by binary passwords I mean that the module needs to support being passed a bytes-like object as the password, since clearly there are servers "in the wild" that support non-ascii passwords and the only way to be sure one can send the server the correct password is by treating it as a series of bytes. The library caller will have to be responsible for picking the correct encoding based on local knowledge.

    Perhaps we should make smtplib accept only bytes, passing on the responsibility of using an appropriate encoding to its users? This seems like the most straightforward and transparent choice. It would not be backwards-compatible, though.

    Alternatively, we could change smtplib to accept passwords as bytes or strings, but raise an informative exception when given strings with non-ASCII characters. As now, users could be surprised if they have been passing passwords as string and hadn't tested their use of smtplib with non-ASCII passwords. We'd just improve the exception and documentation to clarify the situation.

    @david
    Copy link
    Mannequin Author

    @david david mannequin commented Jun 17, 2018

    I would like to see the second option (allow both, warning on non-ascii)

    On 17 June 2018 at 21:03, Tal Einat <report@bugs.python.org> wrote:

    Tal Einat <taleinat@gmail.com> added the comment:

    > And yes, by binary passwords I mean that the module needs to support
    being passed a bytes-like object as the password, since clearly there are
    servers "in the wild" that support non-ascii passwords and the only way to
    be sure one can send the server the correct password is by treating it as a
    series of bytes. The library caller will have to be responsible for
    picking the correct encoding based on local knowledge.

    Perhaps we should make smtplib accept only bytes, passing on the
    responsibility of using an appropriate encoding to its users? This seems
    like the most straightforward and transparent choice. It would not be
    backwards-compatible, though.

    Alternatively, we could change smtplib to accept passwords as bytes or
    strings, but raise an informative exception when given strings with
    non-ASCII characters. As now, users could be surprised if they have been
    passing passwords as string and hadn't tested their use of smtplib with
    non-ASCII passwords. We'd just improve the exception and documentation to
    clarify the situation.

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue29750\>


    @bitdancer
    Copy link
    Member

    @bitdancer bitdancer commented Jun 17, 2018

    We must continue to support at least ascii strings, for backward compatibility reasons. We can certainly improve the error messages, but the goal of this issue is to add support for bytes passwords. I lean toward continuing to only support ascii strings, and making it the responsibility of the program to do the encoding to bytes when dealing with non-ascii. However, I'd like to also be able to recommend in the docs what encoding is most likely to work, if someone can find out what encoding Thunderbird uses...however, it occurs to me that it may be using whatever encoding the OS is using (LC_LANG, oem codepage, etc), and that David's experiments worked because the same encoding was used for the same reason when the password was set. I'm not sure how browsers/webmail works in that regard, honestly.

    That's less important than just adding support for bytes passwords, though.

    @taleinat
    Copy link
    Contributor

    @taleinat taleinat commented Jun 18, 2018

    I found the Thunderbirg bugzilla issues where they appear to have dealt precisely with this issue (for a variety of protocols, including SMTP):

    https://bugzilla.mozilla.org/show_bug.cgi?id=312593

    @taleinat
    Copy link
    Contributor

    @taleinat taleinat commented Jun 18, 2018

    This specifically seems relevant:

    In order for Thunderbird to be standards-compliant-enough to interoperate with standards-compliant servers, it should use UTF-8 for the SASL PLAIN mechanism regardless of the underlying protocol (IMAP, POP and SMTP). That includes the POP3 "AUTH PLAIN" command and the SMTP "AUTH PLAIN" command.

    @taleinat
    Copy link
    Contributor

    @taleinat taleinat commented Jun 18, 2018

    There's also some discussion there (from 3 years ago) of possibly needing to fall back to ISO-8859-1 to work with MS Exchange, despite the standards saying UTF-8 should be used. It's unclear to me whether that's actually the case.

    @taleinat
    Copy link
    Contributor

    @taleinat taleinat commented Jun 18, 2018

    From reading the aforementioned discussion on Thunderbird's issue tracker, ISTM that encoding with UTF-8 is the way to go.

    @bitdancer
    Copy link
    Member

    @bitdancer bitdancer commented Jun 18, 2018

    I didn't think to look at the standards for the auth mechanisms, I only looked at the smtp standards. So, if the standard says utf-8, then we should use that. But we should still support bytes passwords so that an application could work around issues like the possible ms-exchange one, if they need to. Those could be two separate PRs, though. In fact, they probably should be. As a standards-compliance issue, we would be within our rules to backport the utf-8 standards-compliance fix.

    @P403n1x87
    Copy link
    Mannequin

    @P403n1x87 P403n1x87 mannequin commented Jul 6, 2018

    Are there any PRs already for this issue? I couldn't find any on GitHub. Also, is the plan to branch the fix down to at least 3.6?

    @taleinat
    Copy link
    Contributor

    @taleinat taleinat commented Jul 6, 2018

    I have worked on this, almost ready for a PR.

    @taleinat
    Copy link
    Contributor

    @taleinat taleinat commented Jul 26, 2018

    Never mind, I won't have time for this any time soon, better if someone else can do it.

    @VadimPushtaev
    Copy link
    Mannequin

    @VadimPushtaev VadimPushtaev mannequin commented Jul 26, 2018

    Hello. I would like to work on this, should the issue be assigned on me or this comment is enough?

    @taleinat
    Copy link
    Contributor

    @taleinat taleinat commented Jul 26, 2018

    A comment here is all that is needed.

    @Windsooon
    Copy link
    Mannequin

    @Windsooon Windsooon mannequin commented Jul 27, 2018

    @vadim Pushtaev I also want to work on it. If you wanna work together. Maybe we can talk on zulipchat. :D

    @VadimPushtaev
    Copy link
    Mannequin

    @VadimPushtaev VadimPushtaev mannequin commented Jul 27, 2018

    That's OK, you can do it.

    @Windsooon
    Copy link
    Mannequin

    @Windsooon Windsooon mannequin commented Sep 2, 2018

    I added a pitch to support utf-8.

    @seblu
    Copy link
    Mannequin

    @seblu seblu mannequin commented Aug 1, 2019

    I hit the same issue.
    Do you have news about the patch review and its inclusion?

    @Windsooon
    Copy link
    Mannequin

    @Windsooon Windsooon mannequin commented Aug 1, 2019

    Sorry, I forgot about this PR, I will update the patch depends on review soon :D

    @Windsooon
    Copy link
    Mannequin

    @Windsooon Windsooon mannequin commented Aug 1, 2019

    I just updated the PR

    @seblu
    Copy link
    Mannequin

    @seblu seblu mannequin commented Dec 26, 2019

    Utf8 passwords are still broken on python 3.8.

    Patch works great on 3.8.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @petsuter
    Copy link

    @petsuter petsuter commented Apr 30, 2022

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 expert-email stdlib type-feature
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants
    X Tutup