当前位置：首页 > news >正文

福州外文网站建设分销渠道

news 2025/8/18 3:21:51

福州外文网站建设,分销渠道,江苏网站建设代理商,wordpress菜单无效Table of Contents 请求页面 urllib.request.urlopen() 构造 HTTP 请求 urlopen()函数的API data参数:urllib.parse.urlencode(字典) 将字典转换为字符串data 接收bytes 类型 timeout 参数:设置超时时间，单位为秒，意思就是如果请求超出了设置的这个…

Table of Contents

请求页面

urllib.request.urlopen() 构造 HTTP 请求

urlopen()函数的API

data参数:urllib.parse.urlencode(字典) 将字典转换为字符串data 接收bytes 类型

timeout 参数:设置超时时间，单位为秒，意思就是如果请求超出了设置的这个时间，还没有得到响应，就会抛出异常。如果不指定该就会使用全局默认时间

urllib.request.Request() 添加请求头和POST请求什么的

使用代理

Cookies 处理

下载文件

关于https请求

请求页面

urllib.request.urlopen() 构造 HTTP 请求

import urllib.requesthtml = urllib.request.urlopen('http://www.baidu.com')# html 是HTTPResponse 对象
print(html)# 响应状态码
print(html.status)# 响应头
print(html.getheaders())# 响应内容
print(html.read().decode('utf-8'))

`urlopen()`函数的API

urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)

data参数:urllib.parse.urlencode(字典) 将字典转换为字符串data 接收bytes 类型

import urllib.requestdata = bytes(urllib.parse.urlencode({'aa':'bb','hello':'世界'}),encoding='utf-8')
html = urllib.request.urlopen('http://httpbin.org/post', data=data)print(html.read().decode())# {
#   "args": {}, 
#   "data": "", 
#   "files": {}, 
#   "form": {
#     "aa": "bb", 
#     "hello": "\u4e16\u754c"
#   }, 
#   "headers": {
#     "Accept-Encoding": "identity", 
#     "Connection": "close", 
#     "Content-Length": "30", 
#     "Content-Type": "application/x-www-form-urlencoded", 
#     "Host": "httpbin.org", 
#     "User-Agent": "Python-urllib/3.5"
#   }, 
#   "json": null, 
#   "origin": "111.197.18.20", 
#   "url": "http://httpbin.org/post"
# }

timeout 参数:设置超时时间，单位为秒，意思就是如果请求超出了设置的这个时间，还没有得到响应，就会抛出异常。如果不指定该就会使用全局默认时间

import urllib.requestdata = bytes(urllib.parse.urlencode({'aa':'bb','hello':'世界'}),encoding='utf-8')
html = urllib.request.urlopen('http://httpbin.org/post', data=data,timeout=0.3)
print(html.read().decode())

urllib.request.Request() 添加请求头和POST请求什么的

urllib.request.Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None)

第一个参数url用于请求URL，这是必传参数，其他都是可选参数。
第二个参数data如果要传，必须传bytes（字节流）类型的。如果它是字典，可以先用urllib.parse模块里的urlencode()编码。
第三个参数headers是一个字典，它就是请求头，我们可以在构造请求时通过headers参数直接构造，也可以通过调用请求实例的add_header()方法添加。
添加请求头最常用的用法就是通过修改User-Agent来伪装浏览器，默认的User-Agent是Python-urllib，我们可以通过修改它来伪装浏览器。比如要伪装火狐浏览器，你可以把它设置为：
Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11
第四个参数origin_req_host指的是请求方的host名称或者IP地址。
第五个参数unverifiable表示这个请求是否是无法验证的，默认是False，意思就是说用户没有足够权限来选择接收这个请求的结果。例如，我们请求一个HTML文档中的图片，但是我们没有自动抓取图像的权限，这时unverifiable的值就是True`。
第六个参数method是一个字符串，用来指示请求使用的方法，比如GET、POST和PUT等。

import urllib.requesturl = 'http://httpbin.org/post'headers = {'User-Agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)','Hosts': '123123.org','test':'666'
}
# data 参数
data = bytes(urllib.parse.urlencode({'aa':'bb','hello':'世界'}),encoding='utf-8')
rs = urllib.request.Request(url,data,headers,method='POST')
response = urllib.request.urlopen(rs)print(response.read().decode('utf-8'))

使用代理

import urllib.request
import re,randomproxy = [{
"ip": "112.245.189.220",
"port": 4243,
"expire_time": "2018-11-22 17:57:02",
"city": "山东省滨州市",
"isp": "联通"
},
{
"ip": "113.121.156.147",
"port": 7889,
"expire_time": "2018-11-22 18:08:10",
"city": "山东省泰安市",
"isp": "电信"
},
{
"ip": "112.240.176.185",
"port": 4213,
"expire_time": "2018-11-22 17:57:01",
"city": "山东省荷泽市",
"isp": "联通"
},
{
"ip": "122.4.40.196",
"port": 3937,
"expire_time": "2018-11-22 18:07:02",
"city": "山东省济南市",
"isp": "电信"
},
{
"ip": "122.7.135.224",
"port": 7889,
"expire_time": "2018-11-22 18:11:33",
"city": "山东省泰安市",
"isp": "电信"
}]def test(ip,port):proxy_handler = urllib.request.ProxyHandler({'http': 'http://%s:%s'%(ip,port),'https': 'https://%s:%s'%(ip,port),})opener = urllib.request.build_opener(proxy_handler)response = opener.open('http://www.baidu.com/s?wd=ip')ip = re.search(r'本机IP:&nbsp;(.*?)</td>',response.read().decode('utf-8'),re.S)print(ip.group(1))for i in range(5):rdm = random.choice(proxy)test(rdm['ip'],str(rdm['port']))

运行结果: 有的 ip 不稳定

122.7.135.224</span>山东省泰安市 电信	    122.4.40.196</span>山东省济南市 电信	    Traceback (most recent call last):File "E:\Python\安装目录\lib\urllib\request.py", line 1254, in do_openh.request(req.get_method(), req.selector, req.data, headers)File "E:\Python\安装目录\lib\http\client.py", line 1107, in requestself._send_request(method, url, body, headers)File "E:\Python\安装目录\lib\http\client.py", line 1152, in _send_requestself.endheaders(body)File "E:\Python\安装目录\lib\http\client.py", line 1103, in endheadersself._send_output(message_body)File "E:\Python\安装目录\lib\http\client.py", line 934, in _send_outputself.send(msg)File "E:\Python\安装目录\lib\http\client.py", line 877, in sendself.connect()File "E:\Python\安装目录\lib\http\client.py", line 849, in connect(self.host,self.port), self.timeout, self.source_address)File "E:\Python\安装目录\lib\socket.py", line 712, in create_connectionraise errFile "E:\Python\安装目录\lib\socket.py", line 703, in create_connectionsock.connect(sa)
TimeoutError: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应，连接尝试失败。

Cookies 处理

import urllib.request
import http.cookiejarcookie = http.cookiejar.CookieJar()
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open('https://www.baidu.com')
for item in cookie:print(item.name+"="+item.value)

下载文件

urllib.request.urlretrieve(url,filename,reporthook,data)

filename:制定了保存到本地的路径，（如果未指定该参数，urllib会生成一个临时文件来保存数据）
reporthook:是一个回调函数，当连接上服务器以及响应的数据模块传输完毕的时候就会触发该回调函数，我们可以用这个回调函数来显示当前的下载进度
data:指post到服务器的数据。该方法返回一个包含两个元素的元祖（filename，headers）filename表示保存到本地的路径，headers表示服务器响应首部

from urllib import requestdef Schedule(a, b, c):'''7     a:已经下载的数据块8     b:数据块的大小9     c:远程文件的大小'''per = 100.0 * a * b / cif per > 100:per = 100print ('%.2f%% ' % per,a,b,c)url = 'https://www.python.org/ftp/python/3.7.1/python-3.7.1-amd64.exe'local = 'python-3.7.1-amd64.exe'
request.urlretrieve(url, local, Schedule)

关于https请求

from urllib import request
# 忽略https 请求的证书校验
import ssl
ssl._create_default_https_context = ssl._create_unverified_contextbase_url = 'https://www.12306.cn/mormhweb/'
response = request.urlopen(base_url)
print(response.read().decode('utf-8'))

查看全文

http://www.lbrq.cn/news/2740861.html