忻州 建网站/专业网站推广软件
文章目录
- 获取逻辑
- 上代码
今天搜刮网上的KaoYan题,发现一个网站有【kaoyan历年真题】
除了注册一个账户, 还得必须一个一个点“真题” “答案”,下载完成之后,还得一个一个手动改文件名,甚是磨手指头。
于是稍稍分析了一下网站,发现可以用爬虫下载…嘿嘿
获取逻辑
- 获取每个科目对应的
code
, 直接附上api
: https://college.koolearn.com/v1/past-exam-paper/exam-courses(POST
请求)
-
用
code
请求科目,解析近十年的真题与答案PDF
的URL
, 比如数二的API
: https://college.koolearn.com/v1/past-exam-paper/exam-course/122 (POST
请求) ,请求要加上data
字典参数
请求成功后,JSON如下
- 下载保存
PDF
上代码
可以直接运行滴,运行结果如下:
# encoding: utf-8
# @time : 2022-01-17 11:53:00
# @file : LiNianZhenTi.py
# @software : PyCharm
# @author : Ading
# blog : https://blog.csdn.net/m0_46156900import json
import osimport requestsapi_code = 'https://college.koolearn.com/v1/past-exam-paper/exam-courses' # 获取不同科目的code
api_pdf = 'https://college.koolearn.com/v1/past-exam-paper/exam-course/{}' # 获取某个科目,历年试卷
pdf_dir = './历年真题'
data = { # 默认data'endYear': '2021','startYear': '2010'
}def getJson(url, method="POST", code=None): # 请求网站try:if code:url = url.format(code)r = requests.request(method=method, url=url, timeout=5, data=data)r.raise_for_status()else:r = requests.request(method=method, url=url, timeout=5)r.raise_for_status()return r.textexcept:print("请求失败:", url)def parseJson(text):data = json.loads(text)['data']if 'publicCourse' in data.keys() or 'majorCourse' in data.keys():print('解析各科对应的code: ')for course in data['publicCourse']:print(course['name'], ':', course['code'])for course in data['majorCourse']:print(course['name'], ':', course['code'])print('开始下载...')for course in data['publicCourse']:parseJson(getJson(api_pdf, code=course['code'])) # 递归解析下载公共课for course in data['majorCourse']:parseJson(getJson(api_pdf, code=course['code'])) # 递归解析下载专业课else:print(f'准备下载{data["subjectName"]}:')for paper in data['pastExamPapers']:saveJson(paper, course=data["subjectName"])def saveJson(exam, course):if not os.path.isdir(pdf_dir): # 创建路径os.mkdir(pdf_dir)print(f'正在下载: {exam["examPaperName"]}')fn = pdf_dir+"/"+courseif not os.path.isdir(fn): # 创建科目文件夹os.mkdir(fn)with open(f"{fn}/{exam['examPaperName']}.pdf",'wb') as f: # 保存试题if exam["examPaperUrl"]:f.write(requests.get(exam["examPaperUrl"]).content)with open(f"{fn}/{exam['examPaperName']}_ans.pdf",'wb') as f: # 保存试题答案if exam["examPaperAnswerUrl"]:f.write(requests.get(exam["examPaperAnswerUrl"]).content)if __name__ == '__main__':codeText = getJson(url=api_code)parseJson(codeText)
考研加油朋友们