问题表现:
响应头中有gbk编码的中文,导致requests无法解码读取header。
http包如图:
Python 3.4.3 (default, Aug 25 2017, 16:49:50) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import requests >>> res = requests.get('http://down.chinaz.com/download.asp?id=35&dp=1&fid=22&f=yes',headers={'Referer':'http://down.chinaz.com/soft/12162.htm'},allow_redirects=False) Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.4/site-packages/requests/api.py", line 72, in get return request('get', url, params=params, **kwargs) File "/usr/local/lib/python3.4/site-packages/requests/api.py", line 58, in request return session.request(method=method, url=url, **kwargs) File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 510, in request resp = self.send(prep, **send_kwargs) File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 655, in send r._next = next(self.resolve_redirects(r, request, yield_requests=True, **kwargs)) File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 125, in resolve_redirects url = self.get_redirect_target(resp) File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 116, in get_redirect_target return to_native_string(location, 'utf8') File "/usr/local/lib/python3.4/site-packages/requests/_internal_utils.py", line 25, in to_native_string out = string.decode(encoding) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc6 in position 28: invalid continuation byte >>>
直接导致无法请求,该问题google也找不到相关问题,因为大部人遇到的都是请求成功的响应编码问题,而这个问题是请求时即报错。
经过测试python2.7是没有该问题的
从ipython 中可以看出是这一段错误:
usr/local/lib/python3.4/site-packages/requests/sessions.py in get_redirect_target(self, resp) 114 if is_py3: 115 location = location.encode('latin1') --> 116 return to_native_string(location, 'utf8') 117 #return location 118
那么对比下python 2.7 与python3.4 的requests底层代码可以看出差别:
python3.4 requests中获取响应location代码;
默认全部使用ut8解码
python 2.7代码:
再看下 get_redirect_target函数:
基本可以确认为python3.4 中获取location时默认使用了utf-8解码,然而如果location是中文gbk编码,那么就会出现文中一开始出现的报错。
临时的解决方法可以将utf-8改为 GBK,另外以下两处也需要修改,用于请求location的地址: