X Tutup
The Wayback Machine - https://web.archive.org/web/20210106131048/https://github.com/python-xlib/python-xlib/issues/84
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError in Python 3 #84

Closed
cykerway opened this issue Sep 3, 2017 · 5 comments
Closed

UnicodeDecodeError in Python 3 #84

cykerway opened this issue Sep 3, 2017 · 5 comments
Labels

Comments

@cykerway
Copy link

@cykerway cykerway commented Sep 3, 2017

return data.decode(), b''

This line caused a decoding error when run with Python 3.6:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 0: invalid continuation byte

I'm wondering whether data.decode() will ever work for values in range [128, 256).

data.decode('latin1') solves my issue. But I think there may be similar cases in other parts of the project.

Pull Request: #85

The default encoding in Python 3 is utf-8, which doesn't work in this case. ascii won't work, either.

@benoit-pierre
Copy link
Member

@benoit-pierre benoit-pierre commented Sep 3, 2017

You need to provide more context, what is the call that triggers this exception? Because using data.decode() is the right thing, but only if the data is an actual string. If the actual call triggering this issue is the same as in your example in #86, then the real fix is this:

 Xlib/protocol/request.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git i/Xlib/protocol/request.py w/Xlib/protocol/request.py
index 578f4a6..7087273 100644
--- i/Xlib/protocol/request.py
+++ w/Xlib/protocol/request.py
@@ -1089,7 +1089,7 @@ class GetImage(rq.ReplyRequest):
         rq.ReplyLength(),
         rq.Card32('visual'),
         rq.Pad(20),
-        rq.String8('data'),
+        rq.Binary('data'),
         )
 
 class PolyText8(rq.Request):
@benoit-pierre
Copy link
Member

@benoit-pierre benoit-pierre commented Sep 3, 2017

See #88.

@cykerway
Copy link
Author

@cykerway cykerway commented Sep 4, 2017

For more context, you may refer to #86 .

Since you didn't document class String8, I'm not sure what that is being used for. If you mean ascii [0, 128) strings, then the impl is probably correct. But please note that there will be 8-bit characters in range [128, 256) which should be converted using latin-1 encoding instead of ascii and utf-8 in a String8 setting.

I guess fixing the GetImage call should fix the problem at hand, but you may want to double check the encoding/decoding work with non-ascii 8-bit chars.

@benoit-pierre
Copy link
Member

@benoit-pierre benoit-pierre commented Sep 4, 2017

I don't see how #86 is related.

Historically, String8 was used for both text strings and binary data, as the code was Python 2 only. Now that Python 3 is also supported, we need to differentiate those 2 cases; in particular, we want proper Unicode strings for text data.

When not explicitly specified (see #52 (comment)), X11 text strings should be using the Host Portable Character Encoding (basically ASCII). So I agree that the fact that on Python 3 we end up using UTF-8 is not right, but neither would using Latin-1. Instead I think we should be using ASCII, as in Python 2, so we can detect all invalid use of String8 for binary data.

@benoit-pierre
Copy link
Member

@benoit-pierre benoit-pierre commented Sep 4, 2017

Tentative patch:

 Xlib/protocol/rq.py | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git i/Xlib/protocol/rq.py w/Xlib/protocol/rq.py
index c945c62..f0962e9 100644
--- i/Xlib/protocol/rq.py
+++ w/Xlib/protocol/rq.py
@@ -34,6 +34,10 @@ from .. import X
 from ..support import lock
 
 
+def decode_string(bs):
+    return bs.decode('ascii')
+
+
 class BadDataError(Exception): pass
 
 # These are struct codes, we know their byte sizes
@@ -424,17 +428,14 @@ class String8(ValueField):
 
     def parse_binary_value(self, data, display, length, format):
         if length is None:
-            return data.decode(), b''
+            return decode_string(data), b''
 
         if self.pad:
             slen = length + ((4 - length % 4) % 4)
         else:
             slen = length
 
-        if sys.version_info < (3, 0):
-            data_str = data[:length]
-        else:
-            data_str = data[:length].decode()
+        data_str = decode_string(data[:length])
 
         return data_str, data[slen:]
 
@@ -903,7 +904,7 @@ class StrClass(object):
 
     def parse_binary(self, data, display):
         slen = byte2int(data) + 1
-        return data[1:slen].decode(), data[slen:]
+        return decode_string(data[1:slen]), data[slen:]
 
 Str = StrClass()
 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.
X Tutup