网络爬虫之数据库连接

时间:2021-04-10 13:21:40   收藏:0   阅读:0

  爬取的数据一般需要提交给数据库,这里就介绍了三个主流数据库的连接(mysql,redis,mongodb),如果你的数据库服务器都放在liunx系统上首先要修改一下配置文件将bind 127.0.0.1修改为bind 0.0.0.0这样才能访问数据库。并且需要查看linux防火墙设置。如果开启要将其关闭。

 

查看防火墙状态
systemctl status firewalld.service

如果显示Loaded: running就要将其关闭

关闭防火墙
Systemctl stop firewalld.service

Loaded: loaded表示已经关闭

 

连接mysql:

首先检查是否安装上pymsql

import pymysql

conn = pymysql.connect(host=172.16.70.130,port=3306,user=‘user,password=‘passwd’)#host是你的主机地址 port默认为3306 user表示你的用户名 password表示密码 另外可以指定库只需要传递database参数即可

cur = conn.cursor()
cur.execute(select version())
data = cur.fetchall()
print(data)#打印版本号

运行结果如下:

((5.7.27,),)

 

连接redis:

首先检查是否安装redis

import redis
conn = redis.StrictRedis(host=172.16.70.130,port=6379,decode_responses=True,db=1)
#host:主机名 port:端口号默认6379 如果有设密码需要传递password参数 db指定库 默认为0
print(conn.info())

结果如下:

{redis_version: 5.0.5‘#版本号, redis_git_sha1: 0, redis_git_dirty: 0, redis_build_id: 6a23e5766d3175f5, redis_mode: standalone, os: Linux 3.10.0-1160.21.1.el7.x86_64 x86_64, arch_bits: 64, multiplexing_api: epoll, atomicvar_api: atomic-builtin, gcc_version: 4.8.5, process_id: 3702, run_id: 0b9a27d474df47866f2615cbb4c12a157c202d57, tcp_port: 6379, uptime_in_seconds: 6837, uptime_in_days: 0, hz: 10, configured_hz: 10, lru_clock: 7365539, executable: /opt/redis-5.0.5/./src/redis-server, config_file: /opt/redis-5.0.5/redis.conf, connected_clients: 2, client_recent_max_input_buffer: 2, client_recent_max_output_buffer: 0, blocked_clients: 0, used_memory: 5019560, used_memory_human: 4.79M, used_memory_rss: 11505664, used_memory_rss_human: 10.97M, used_memory_peak: 8052504, used_memory_peak_human: 7.68M, used_memory_peak_perc: 62.34%, used_memory_overhead: 858080, used_memory_startup: 791392, used_memory_dataset: 4161480, used_memory_dataset_perc: 98.42%, allocator_allocated: 5438560, allocator_active: 6467584, allocator_resident: 17625088, total_system_memory: 1019559936, total_system_memory_human: 972.33M, used_memory_lua: 37888, used_memory_lua_human: 37.00K, used_memory_scripts: 0, used_memory_scripts_human: 0B, number_of_cached_scripts: 0, maxmemory: 0, maxmemory_human: 0B, maxmemory_policy: noeviction, allocator_frag_ratio: 1.19, allocator_frag_bytes: 1029024, allocator_rss_ratio: 2.73, allocator_rss_bytes: 11157504, rss_overhead_ratio: 0.65, rss_overhead_bytes: -6119424, mem_fragmentation_ratio: 2.32, mem_fragmentation_bytes: 6549264, mem_not_counted_for_evict: 0, mem_replication_backlog: 0, mem_clients_slaves: 0, mem_clients_normal: 66616, mem_aof_buffer: 0, mem_allocator: jemalloc-5.1.0, active_defrag_running: 0, lazyfree_pending_objects: 0, loading: 0, rdb_changes_since_last_save: 0, rdb_bgsave_in_progress: 0, rdb_last_save_time: 1617974663, rdb_last_bgsave_status: ok, rdb_last_bgsave_time_sec: 0, rdb_current_bgsave_time_sec: -1, rdb_last_cow_size: 634880, aof_enabled: 0, aof_rewrite_in_progress: 0, aof_rewrite_scheduled: 0, aof_last_rewrite_time_sec: -1, aof_current_rewrite_time_sec: -1, aof_last_bgrewrite_status: ok, aof_last_write_status: ok, aof_last_cow_size: 0, total_connections_received: 14, total_commands_processed: 60113, instantaneous_ops_per_sec: 0, total_net_input_bytes: 7993087, total_net_output_bytes: 9155003, instantaneous_input_kbps: 0.0, instantaneous_output_kbps: 0.0, rejected_connections: 0, sync_full: 0, sync_partial_ok: 0, sync_partial_err: 0, expired_keys: 0, expired_stale_perc: 0.0, expired_time_cap_reached_count: 0, evicted_keys: 0, keyspace_hits: 15, keyspace_misses: 5, pubsub_channels: 0, pubsub_patterns: 0, latest_fork_usec: 321, migrate_cached_sockets: 0, slave_expires_tracked_keys: 0, active_defrag_hits: 0, active_defrag_misses: 0, active_defrag_key_hits: 0, active_defrag_key_misses: 0, role: master, connected_slaves: 0, master_replid: 545becf1f5f2952dcf76619dbc67fe7b95a03776, master_replid2: 0, master_repl_offset: 0, second_repl_offset: -1, repl_backlog_active: 0, repl_backlog_size: 1048576, repl_backlog_first_byte_offset: 0, repl_backlog_histlen: 0, used_cpu_sys: 20.550579, used_cpu_user: 8.305302, used_cpu_sys_children: 0.058768, used_cpu_user_children: 0.202806, cluster_enabled: 0, db1: {keys: 1, expires: 0, avg_ttl: 0}}

 

mongodb连接:

首先检查是否安装pymongo

import pymongo
client = pymongo.MongoClient(host=172.16.70.130,port=27017)
#host:主机地址 port:端口号默认27017

#如果开启权限认证就需要进行登陆认证
client[admin].authenticate(username,password)

db = client[‘databasename’] #指定库

然后就可以往里插入或者查找数据验证是否连接成功

 

连接成功后就可根据需求建表建库保存数据,还可以通过操作文件句柄保存在本地。

评论(0
© 2014 mamicode.com 版权所有 京ICP备13008772号-2  联系我们:gaon5@hotmail.com
迷上了代码!