'ascii' codec can't decode byte 0xe2 in position 50: ordinal not in range(128)


Disclaimer: I'm not a python developer :)
Yesterday I've found a tool that I needed, it was written in python.
Nothing special, get a list of links from a csv, login to a page, and access a link from the opened page by match.

The script was old(2 years), and it needed some refinement, but I managed to make it work to start griding my over 7000 links, after 30 minutes later I saw it crashed with a strange error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 50: ordinal not in range(128)

I've started to investigate the issue, found some information on stackoverflow, but I was confused, I don't really know python that well. What I've found is to make some changes in the config files from python, which I don't really know how to do and it was not recommended, or to use decode('utf-8') function on the string variable and usage that caused the issue.

As you can see in the full error log, listed down below, I traced the error to the first call from the call stack:
File "C:\Python27\lib\re.py", line 155, in sub return _compile(pattern, flags).sub(repl, string, count)
I opened the re.py file and changed the function definition to:
return _compile(pattern, flags).sub(repl, string.decode('utf-8'), count)
It worked like a charm! no strange settings or other oddities were needed.

As I said, not a pro python programmer here, I'm sure there is a better solution, my solution probably would have implication on overall performance, cause trouble when upgrading python or I don't know, because it is a python lib file. You can use it if you need a quick and dirty fix ;)
BTW, I didn't know that it is this easy to make changes to the default libs.

Full error log:
Traceback (most recent call last):
  File "C:\Users\n_lac\Documents\python\udemy coupon.py", line 40, in
    course_page = br.open(course_links)
  File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 254, in open
    return self._mech_open(url_or_request, data, timeout=timeout)
  File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 284, in _mech_open
    response = UserAgentBase.open(self, request, data)
  File "C:\Python27\lib\site-packages\mechanize\_opener.py", line 206, in open
    response = meth(req, response)
  File "C:\Python27\lib\site-packages\mechanize\_urllib2_fork.py", line 467, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\site-packages\mechanize\_opener.py", line 224, in error
    result = apply(self._call_chain, args)
  File "C:\Python27\lib\site-packages\mechanize\_urllib2_fork.py", line 340, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\site-packages\mechanize\_urllib2_fork.py", line 586, in http_error_302
    return self.parent.open(new)
  File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 254, in open
    return self._mech_open(url_or_request, data, timeout=timeout)
  File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 284, in _mech_open
    response = UserAgentBase.open(self, request, data)
  File "C:\Python27\lib\site-packages\mechanize\_opener.py", line 206, in open
    response = meth(req, response)
  File "C:\Python27\lib\site-packages\mechanize\_http.py", line 134, in http_response
    self.head_parser_class())
  File "C:\Python27\lib\site-packages\mechanize\_http.py", line 100, in parse_head
    parser.feed(data)
  File "C:\Python27\lib\HTMLParser.py", line 117, in feed
    self.goahead(0)
  File "C:\Python27\lib\HTMLParser.py", line 161, in goahead
    k = self.parse_starttag(i)
  File "C:\Python27\lib\HTMLParser.py", line 308, in parse_starttag
    attrvalue = self.unescape(attrvalue)
  File "C:\Python27\lib\HTMLParser.py", line 475, in unescape
    return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)
  File "C:\Python27\lib\re.py", line 155, in sub
    return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 50: ordinal not in range(128)

Comments

Popular posts from this blog

14 Sfaturi pentru programatori incepatori

Sanitizer provider is not configured in the web.config file. Ajax Control Toolkit and HtmlEditorExtender problems.

DataTable to TreeView in C#, Displaying Hierarchies