'ascii' codec can't decode byte 0xe2 in position 50: ordinal not in range(128)
Disclaimer: I'm not a python developer :)
Yesterday I've found a tool that I needed, it was written in python.
Nothing special, get a list of links from a csv, login to a page, and access a link from the opened page by match.
The script was old(2 years), and it needed some refinement, but I managed to make it work to start griding my over 7000 links, after 30 minutes later I saw it crashed with a strange error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 50: ordinal not in range(128)
I've started to investigate the issue, found some information on stackoverflow, but I was confused, I don't really know python that well. What I've found is to make some changes in the config files from python, which I don't really know how to do and it was not recommended, or to use decode('utf-8') function on the string variable and usage that caused the issue.
As you can see in the full error log, listed down below, I traced the error to the first call from the call stack:
File "C:\Python27\lib\re.py", line 155, in sub return _compile(pattern, flags).sub(repl, string, count)
I opened the re.py file and changed the function definition to:
return _compile(pattern, flags).sub(repl, string.decode('utf-8'), count)
It worked like a charm! no strange settings or other oddities were needed.
As I said, not a pro python programmer here, I'm sure there is a better solution, my solution probably would have implication on overall performance, cause trouble when upgrading python or I don't know, because it is a python lib file. You can use it if you need a quick and dirty fix ;)
BTW, I didn't know that it is this easy to make changes to the default libs.
Full error log:
Traceback (most recent call last):
File "C:\Users\n_lac\Documents\python\udemy coupon.py", line 40, in
course_page = br.open(course_links)
File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 254, in open
return self._mech_open(url_or_request, data, timeout=timeout)
File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 284, in _mech_open
response = UserAgentBase.open(self, request, data)
File "C:\Python27\lib\site-packages\mechanize\_opener.py", line 206, in open
response = meth(req, response)
File "C:\Python27\lib\site-packages\mechanize\_urllib2_fork.py", line 467, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\site-packages\mechanize\_opener.py", line 224, in error
result = apply(self._call_chain, args)
File "C:\Python27\lib\site-packages\mechanize\_urllib2_fork.py", line 340, in _call_chain
result = func(*args)
File "C:\Python27\lib\site-packages\mechanize\_urllib2_fork.py", line 586, in http_error_302
return self.parent.open(new)
File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 254, in open
return self._mech_open(url_or_request, data, timeout=timeout)
File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 284, in _mech_open
response = UserAgentBase.open(self, request, data)
File "C:\Python27\lib\site-packages\mechanize\_opener.py", line 206, in open
response = meth(req, response)
File "C:\Python27\lib\site-packages\mechanize\_http.py", line 134, in http_response
self.head_parser_class())
File "C:\Python27\lib\site-packages\mechanize\_http.py", line 100, in parse_head
parser.feed(data)
File "C:\Python27\lib\HTMLParser.py", line 117, in feed
self.goahead(0)
File "C:\Python27\lib\HTMLParser.py", line 161, in goahead
k = self.parse_starttag(i)
File "C:\Python27\lib\HTMLParser.py", line 308, in parse_starttag
attrvalue = self.unescape(attrvalue)
File "C:\Python27\lib\HTMLParser.py", line 475, in unescape
return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)
File "C:\Python27\lib\re.py", line 155, in sub
return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 50: ordinal not in range(128)
Comments
Post a Comment