面试某公司前端(附加题 python 三)
基础面试题(简易版)
- 打开https://mifengcha.com/coin/ethereum?tab=announcement
- 爬取其中公告(数据标题包括:交易所、公告类型、标题、相关币种、发文时间、链接来源)
- 导出csv文件
- 利用matplotlib绘制公告类型分布图(可以用柱形图统计各个币种的数量分布)
- 提示:可以从前端看看有没有相关json
- 要求:自动翻页,到最后一页停止(不能无限循环)
高级面试题
- 对爬取的新闻内容情感做出分析,按照类别负面、中性、正面进行判断
- 对数据库增加一列(label)
- 将分析完的结果存入数据库
这里能力时间有限只做了第一题,不够完善。
import requests
import json
import time
import random
import pymysql
USER_AGENTS = [
"Mozilla/5.0 (Windows; U; Windows NT 5.2) Gecko/2008070208 Firefox/3.0.1"
"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
"Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
"Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; en) Presto/2.8.131 Version/11.11",
"Opera/9.80 (Windows NT 6.1; U; en) Presto/2.8.131 Version/11.11",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
"Opera/9.80 (Windows NT 5.1; U; zh-cn) Presto/2.9.168 Version/11.50",
"Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0",
"Mozilla/5.0 (Windows NT 5.2) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.122 Safari/534.30",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)",
"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 SE 2.X MetaSr 1.0",
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2)",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)",
"Mozilla/4.0 (compatible; MSIE 5.0; Windows NT)",
"Mozilla/5.0 (Windows; U; Windows NT 5.2) Gecko/2008070208 Firefox/3.0.1",
"Mozilla/5.0 (Windows; U; Windows NT 5.1) Gecko/20070309 Firefox/2.0.0.3",
"Mozilla/5.0 (Windows; U; Windows NT 5.1) Gecko/20070803 Firefox/1.5.0.12 "
]
conn = pymysql.connect(host='localhost',
user='root',
password='xxx',
database='mifengcha',
port=3306,
charset='utf8mb4')
# 创建游标
cursor = conn.cursor()
# 存储数据
def insert_pydata(data):
cursor.execute(
'insert into pydata(from_name,columns,title,coin,datetime,sourceUrl) VALUES("%s","%s","%s","%s","%s","%s")'
% (data[0], data[1], data[2], data[3], data[4], data[5]))
# 从游标中获取结果
cursor.fetchall()
# 提交结果
conn.commit()
# 获取数据
def getData(timestamp):
headers = {
'user-agent': random.choice(USER_AGENTS),
}
params = (
('t', '41bca481b3dc4aa789231966e2040e8a'),
('lan', 'zh'),
('timestamp', timestamp),
('size', '15'),
)
response = requests.get('https://mifengcha.com/api/new/v2/notice/index',
headers=headers,
params=params)
json_data = json.loads(response.text)
if (json_data['code'] != 0):
return
return json_data
# 解析数据
def dealData(data):
#跳出循环条件最早一篇公告为2014-08-24 07:20:43,时间搓为1408836043000
if (data['data']['count'] < 15):
return
dataList = []
for i in range(15):
from_name = data['data']['list'][i]['from']
columns = data['data']['list'][i]['column']['zh']
title = data['data']['list'][i]['title']
title = title.replace("'", "\\\'") #将单引号转成\单引号
title = title.replace('"', '\\\"') #将双引号转成\双引号
try:
coin = data['data']['list'][i]['coin']['symbol']
except:
coin = '无'
datetime = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(data['data']['list'][i]['datetime']/1000))
sourceUrl = data['data']['list'][i]['sourceUrl']
dataList = [from_name, columns, title, coin, datetime, sourceUrl]
insert_pydata(dataList)
# 递归调用
timestamp = str(data['data']['list'][14]['datetime'])
print(timestamp)
dealData(getData(timestamp))
if __name__ == "__main__":
dealData(getData(''))
# 关闭游标
cursor.close()
# 关闭数据库
conn.close()
Tags : 本文未设置标签
Previous post
面试某公司前端(复试二)
Next post
在vscode使用自带git管理项目(含提交规范插件)