[用科学的方法做不科学的事情系列]---分析五百万大奖-双色球之花落谁家?(1)
目的:
瞧瞧双色球里的各种数据.
用阿里云的pai来分析分析双色球相关的东西.
获取数据
中奖公告:
http://www.cwl.gov.cn/kjxx/ssq/
环境搭建
安装 python3
安装 pip
安装第三方模块
pip install BeautifulSoup pip install requests
算了,不废话,直接上代码吧.
import requests from bs4 import BeautifulSoup import json import time def url_find(url): r = requests.get(url) r.encoding='utf-8' soup = BeautifulSoup(r.text, 'html.parser') x = soup.find_all('a') xx = [] for i in x: #print(i) if str(i).find("期开奖公告")>0: m = "http://www.cwl.gov.cn"+i.get('href') #print(m) xx.append(m) return xx def cat_text(url): m={} r = requests.get(url) r.encoding='utf-8' soup = BeautifulSoup(r.text, 'html.parser') y=[] for x in soup.find_all('td'): y.append(x.get_text()) x_id=str(soup.h2.get_text())[10:17] for i in soup.find_all("script"): if str(i).find("var khHq") > 0: qiu_h=json.loads(str(i)[24:55]) for i in soup.find_all("span"): if i.get("class") == ["qiuL"]: qiu_l=i.get_text() for i in soup.find_all("div"): if i.get("class") == ["zjqkzy"]: address=i.find("dd").get_text() if y[3]=='- 元': mm = x_id + "," + y[0] + "," + y[1].rstrip(" 元").replace(",","") + ","+ y[2].rstrip(" 元").replace(",","") + ","+ y[9] +","+ y[10].split("(")[0] + ","+ y[12] +",TEst"+ y[13].split("(含")[0]+ ","+ y[15].split("(")[0] +","+ y[16] + ","+ y[18] +","+ y[19] + ","+ y[21] +","+ y[22] + ","+ y[24] +","+ y[25] +","+str(qiu_h).replace("[","").replace("]","").replace(" ","").replace("'","")+","+qiu_l+","+address.replace(",","--").replace("。","").replace("共","").replace("注","") elif y[11]=='其中:一等奖复式投注': mm = x_id + "," + y[0] + "," + y[1].rstrip(" 元").replace(",","") + ","+ y[2].rstrip(" 元").replace(",","") + ","+ y[9] +","+ y[10].split("(")[0] + ","+ y[12] +","+ y[13].split("(含")[0]+ ","+ y[15].split("(")[0] +","+ y[16] + ","+ y[18] +","+ y[19] + ","+ y[21] +","+ y[22] + ","+ y[24] +","+ y[25] +","+str(qiu_h).replace("[","").replace("]","").replace(" ","").replace("'","")+","+qiu_l+","+address.replace(",","--").replace("。","").replace("共","").replace("注","") else: mm = x_id + "," + y[0] + "," + y[1].rstrip(" 元").replace(",","") + ","+ y[2].rstrip(" 元").replace(",","") + ","+ y[7] +","+ y[8].split("(")[0] + ",,,"+ y[10] +","+ y[11].split("(含")[0]+ ","+ y[13].split("(")[0] +","+ y[14] + ","+ y[16] +","+ y[17] + ","+ y[19] +","+ y[20] + ","+ y[22] +","+ y[23] +","+str(qiu_h).replace("[","").replace("]","").replace(" ","").replace("'","")+","+qiu_l+","+address.replace(",","--").replace("。","").replace("共","").replace("注","") return mm url = 'http://www.cwl.gov.cn/kjxx/ssq/kjgg/list.shtml' url_list = [] url_list = url_list + url_find(url) for i in range(29): url = 'http://www.cwl.gov.cn/kjxx/ssq/kjgg/list_' + str(i+2) + '.shtml' url_list = url_list+url_find(url) #print(url_list) def save_file(somea): with open('./data', 'a') as f: f.write(somea ) for i in url_list: data=str(cat_text(i))+"\n" save_file(data)

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。
持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。
转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。
- 上一篇
消灭毛刺!HBase2.0全链路offheap效果拔群
阿里云HBase2.0版本正式上线 阿里云HBase2.0版本是基于社区2018年发布的HBase2.0.0版本开发的全新版本。在社区HBase2.0.0版本基础上,做了大量的改进和优化,吸收了众多阿里内部成功经验,比社区HBase版本具有更好的稳定性和性能,同时具备了HBase2.0提供的全新能力。HBase2.0提供的新功能介绍可以参照这篇文章。如果想要申请使用全新的HBase2.0版本,可以在此链接申请试用。在HBase2.0提供的众多功能中,最引人注目的就是全链路的offheap能力了。根据HBase社区官方文档的说法,全链路的offheap功能能够显著减少JVM heap里的数据生成和拷贝,减少垃圾的产生,减少GC的停顿时间。 在线业务在使用hbase读写数据时,我们可能会发现,HBase的平均延迟会很低,可能会低于1ms,
- 下一篇
【学习笔记】hive 之行拆列explode
1、explode explode(ARRAY) 列表中的每个元素生成一行explode(MAP) map中每个key-value对,生成一行,key为一列,value为一列限制:1、No other expressions are allowed in SELECT SELECT pageid, explode(adid_list) AS myCol... is not supported 2、UDTF's can't be nested SELECT explode(explode(adid_list)) AS myCol... is not supported 3、GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY is not supported SELECT explode(adid_list) AS myCol ... GROUP BY myCol is not supported 2、lateral view 可使用lateral view解除以上限制,语法: lateralView: LATERAL VIEW explode(...
相关文章
文章评论
共有0条评论来说两句吧...
文章二维码
点击排行
推荐阅读
最新文章
- CentOS8安装Docker,最新的服务器搭配容器使用
- MySQL8.0.19开启GTID主从同步CentOS8
- SpringBoot2整合MyBatis,连接MySql数据库做增删改查操作
- CentOS7编译安装Gcc9.2.0,解决mysql等软件编译问题
- Docker使用Oracle官方镜像安装(12C,18C,19C)
- 设置Eclipse缩进为4个空格,增强代码规范
- SpringBoot2全家桶,快速入门学习开发网站教程
- CentOS7,CentOS8安装Elasticsearch6.8.6
- Windows10,CentOS7,CentOS8安装MongoDB4.0.16
- CentOS7设置SWAP分区,小内存服务器的救世主