抓取网页时,乱码问题

时间:2014-05-09 13:23:23   收藏:0   阅读:469
bubuko.com,布布扣
 1 def get_content():
 2     user_agent="Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36"
 3     headers = { User-Agent : user_agent }
 4     url = "http://bj.58.com/"
 5     req = urllib2.Request(url, headers = headers)
 6     response = urllib2.urlopen(req)
 7     the_page = response.read()
 8     type = sys.getfilesystemencoding()
 9     the_page = the_page.decode("UTF-8").encode(type)
10     print the_page
bubuko.com,布布扣

 

抓取网页时,乱码问题,布布扣,bubuko.com

评论(0
© 2014 mamicode.com 版权所有 京ICP备13008772号-2  联系我们:gaon5@hotmail.com
迷上了代码!