面试某公司前端(附加题 python 三)

基础面试题(简易版)

  1. 打开https://mifengcha.com/coin/ethereum?tab=announcement
  2. 爬取其中公告(数据标题包括:交易所、公告类型、标题、相关币种、发文时间、链接来源)
  3. 导出csv文件
  4. 利用matplotlib绘制公告类型分布图(可以用柱形图统计各个币种的数量分布)
  5. 提示:可以从前端看看有没有相关json
  6. 要求:自动翻页,到最后一页停止(不能无限循环)

高级面试题

  1. 对爬取的新闻内容情感做出分析,按照类别负面、中性、正面进行判断
  2. 对数据库增加一列(label)
  3. 将分析完的结果存入数据库

这里能力时间有限只做了第一题,不够完善。


import requests
import json
import time
import random
import pymysql
USER_AGENTS = [
    "Mozilla/5.0 (Windows; U; Windows NT 5.2) Gecko/2008070208 Firefox/3.0.1"
    "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
    "Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
    "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; en) Presto/2.8.131 Version/11.11",
    "Opera/9.80 (Windows NT 6.1; U; en) Presto/2.8.131 Version/11.11",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
    "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
    "Opera/9.80 (Windows NT 5.1; U; zh-cn) Presto/2.9.168 Version/11.50",
    "Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0",
    "Mozilla/5.0 (Windows NT 5.2) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.122 Safari/534.30",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
    "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)",
    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)",
    "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 SE 2.X MetaSr 1.0",
    "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)",
    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2)",
    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)",
    "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT)",
    "Mozilla/5.0 (Windows; U; Windows NT 5.2) Gecko/2008070208 Firefox/3.0.1",
    "Mozilla/5.0 (Windows; U; Windows NT 5.1) Gecko/20070309 Firefox/2.0.0.3",
    "Mozilla/5.0 (Windows; U; Windows NT 5.1) Gecko/20070803 Firefox/1.5.0.12 "
]

conn = pymysql.connect(host='localhost',
                       user='root',
                       password='xxx',
                       database='mifengcha',
                       port=3306,
                       charset='utf8mb4')
# 创建游标
cursor = conn.cursor()


# 存储数据
def insert_pydata(data):
    cursor.execute(
        'insert into pydata(from_name,columns,title,coin,datetime,sourceUrl) VALUES("%s","%s","%s","%s","%s","%s")'
        % (data[0], data[1], data[2], data[3], data[4], data[5]))
    # 从游标中获取结果
    cursor.fetchall()

    # 提交结果
    conn.commit()


# 获取数据
def getData(timestamp):
    headers = {
        'user-agent': random.choice(USER_AGENTS),
    }
    params = (
        ('t', '41bca481b3dc4aa789231966e2040e8a'),
        ('lan', 'zh'),
        ('timestamp', timestamp),
        ('size', '15'),
    )
    response = requests.get('https://mifengcha.com/api/new/v2/notice/index',
                            headers=headers,
                            params=params)
    json_data = json.loads(response.text)
    if (json_data['code'] != 0):
        return
    return json_data


# 解析数据
def dealData(data):
    #跳出循环条件最早一篇公告为2014-08-24 07:20:43,时间搓为1408836043000
    if (data['data']['count'] < 15):
        return
    dataList = []
    for i in range(15):
        from_name = data['data']['list'][i]['from']
        columns = data['data']['list'][i]['column']['zh']
        title = data['data']['list'][i]['title']
        title = title.replace("'", "\\\'")  #将单引号转成\单引号
        title = title.replace('"', '\\\"')  #将双引号转成\双引号
        try:
            coin = data['data']['list'][i]['coin']['symbol']
        except:
            coin = '无'
        datetime = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(data['data']['list'][i]['datetime']/1000))
        sourceUrl = data['data']['list'][i]['sourceUrl']
        dataList = [from_name, columns, title, coin, datetime, sourceUrl]
        insert_pydata(dataList)
    # 递归调用
    timestamp = str(data['data']['list'][14]['datetime'])
    print(timestamp)
    dealData(getData(timestamp))
if __name__ == "__main__":
    dealData(getData(''))
    # 关闭游标
    cursor.close()
    # 关闭数据库
    conn.close()

添加新评论

  Timeline

我们来自五湖四海,转眼就要各奔东西。
--- updated on 2020年12月1日

  关于博主

计科学生一枚,现在变社畜了,依旧热爱分享,有趣想法也会尝试用代码实现;
建这个博客初衷在于记一些自己笔记和想法,方便自己查阅;
本博客内核采用 Typecho开源代码,平时也可能分享一些开源资源,若侵犯您版权,请联系我删除。

  近期评论

  • 暂无评论

生活其实很简单,过了今天就是明天。

低头哭过别忘了抬头继续走。

不要被任何人打乱自的脚步,因为没有谁会像你一样清楚和在乎自己梦想。

没有人可以打倒我,除非我自己先趴下!

你要记住你不是为别人而活,你是为自己而活。