根据您的个性需求进行定制 先人一步 抢占小程序红利时代
小编给大家分享一下解决python写爬虫出现乱码的方法,希望大家阅读完这篇文章后大所收获,下面让我们一起去探讨吧!

施甸网站建设公司创新互联,施甸网站设计制作,有大型网站制作公司丰富经验。已为施甸上千提供企业网站建设服务。企业网站搭建\成都外贸网站建设公司要多少钱,请找那个售后服务好的施甸做网站的公司定做!
关于爬虫乱码有很多各式各样的问题,这里不仅是中文乱码,编码转换、还包括一些如日文、韩文 、俄文、藏文之类的乱码处理,因为解决方式是一致的,故在此统一说明。
网络爬虫出现乱码的原因
源网页编码和爬取下来后的编码格式不一致。
如源网页为gbk编码的字节流,而我们抓取下后程序直接使用utf-8进行编码并输出到存储文件中,这必然会引起乱码 即当源网页编码和抓取下来后程序直接使用处理编码一致时,则不会出现乱码; 此时再进行统一的字符编码也就不会出现乱码了。
注意区分
源网编码A、程序直接使用的编码B、统一转换字符的编码C。
乱码的解决方法
确定源网页的编码A,编码A往往在网页中的三个位置
1.http header的Content-Type
获取服务器 header 的站点可以通过它来告知浏览器一些页面内容的相关信息。 Content-Type 这一条目的写法就是 "text/html; charset=utf-8"。
2.meta charset
3.网页头中Document定义
基本 文件 流程 错误 SQL 调试
- 请求信息 : 2026-05-18 10:59:51 HTTP/1.1 GET : /article/jdsgeo.html
- 运行时间 : 2.3891s ( Load:0.0064s Init:1.7163s Exec:0.6573s Template:0.0091s )
- 吞吐率 : 0.42req/s
- 内存开销 : 2,232.19 kb
- 查询信息 : 12 queries 5 writes
- 文件加载 : 36
- 缓存信息 : 0 gets 0 writes
- 配置加载 : 130
- 会话信息 : SESSION_ID=vu6h72bhi52ik323d9mc305e31
- /www/wwwroot/tsicrk.com/index.php ( 1.09 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/ThinkPHP.php ( 4.61 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Think.class.php ( 12.26 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Storage.class.php ( 1.37 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Storage/Driver/File.class.php ( 3.52 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Mode/common.php ( 2.82 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Common/functions.php ( 53.56 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Hook.class.php ( 4.01 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/App.class.php ( 13.49 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Dispatcher.class.php ( 14.79 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Route.class.php ( 13.36 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Controller.class.php ( 11.23 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/View.class.php ( 7.59 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Behavior/BuildLiteBehavior.class.php ( 3.68 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Behavior/ParseTemplateBehavior.class.php ( 3.88 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Behavior/ContentReplaceBehavior.class.php ( 1.91 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Conf/convention.php ( 11.15 KB )
- /www/wwwroot/tsicrk.com/App/Common/Conf/config.php ( 2.14 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Lang/zh-cn.php ( 2.55 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Conf/debug.php ( 1.49 KB )
- /www/wwwroot/tsicrk.com/App/Home/Conf/config.php ( 0.31 KB )
- /www/wwwroot/tsicrk.com/App/Home/Common/function.php ( 3.33 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Behavior/ReadHtmlCacheBehavior.class.php ( 5.62 KB )
- /www/wwwroot/tsicrk.com/App/Home/Controller/ArticleController.class.php ( 6.02 KB )
- /www/wwwroot/tsicrk.com/App/Home/Controller/CommController.class.php ( 1.60 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Model.class.php ( 60.11 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Db.class.php ( 32.43 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Db/Driver/Pdo.class.php ( 16.74 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Cache.class.php ( 3.83 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Cache/Driver/File.class.php ( 5.87 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Template.class.php ( 28.16 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Template/TagLib/Cx.class.php ( 22.40 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Template/TagLib.class.php ( 9.16 KB )
- /www/wwwroot/tsicrk.com/App/Runtime/Cache/Home/7540f392f42b28b481b30614275e4e55.php ( 17.71 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Behavior/WriteHtmlCacheBehavior.class.php ( 0.97 KB )
- /www/wwwroot/tsicrk.com/ThinkPHP/Library/Behavior/ShowPageTraceBehavior.class.php ( 5.24 KB )
- [ app_init ] --START--
- Run Behavior\BuildLiteBehavior [ RunTime:0.000005s ]
- [ app_init ] --END-- [ RunTime:0.000029s ]
- [ app_begin ] --START--
- Run Behavior\ReadHtmlCacheBehavior [ RunTime:0.000274s ]
- [ app_begin ] --END-- [ RunTime:0.000292s ]
- [ view_parse ] --START--
- [ template_filter ] --START--
- Run Behavior\ContentReplaceBehavior [ RunTime:0.000049s ]
- [ template_filter ] --END-- [ RunTime:0.000070s ]
- Run Behavior\ParseTemplateBehavior [ RunTime:0.006177s ]
- [ view_parse ] --END-- [ RunTime:0.006202s ]
- [ view_filter ] --START--
- Run Behavior\WriteHtmlCacheBehavior [ RunTime:0.000159s ]
- [ view_filter ] --END-- [ RunTime:0.000173s ]
- [ app_end ] --START--
- 1064:You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ') LIMIT 1' at line 1 [ SQL语句 ] : SELECT `id`,`pid`,`navname` FROM `cx_nav` WHERE ( id= ) LIMIT 1
- 1064:You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ') LIMIT 1' at line 1 [ SQL语句 ] : SELECT `id`,`navname` FROM `cx_nav` WHERE ( id= ) LIMIT 1
- 1064:You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ')' at line 1 [ SQL语句 ] : SELECT `id`,`navname` FROM `cx_nav` WHERE ( pid= )
- [8] Undefined index: pid /www/wwwroot/tsicrk.com/App/Home/Controller/ArticleController.class.php 第 47 行.
- [8] Undefined index: db_host /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Db.class.php 第 120 行.
- [8] Undefined index: db_port /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Db.class.php 第 121 行.
- [8] Undefined index: db_name /www/wwwroot/tsicrk.com/ThinkPHP/Library/Think/Db.class.php 第 122 行.
2.3891s

