首页 文章 精选 留言 我的

精选列表

搜索[系统工具],共10000篇文章
优秀的个人博客,低调大师

python人工智能机器学习工具书籍: scikit-learn Cookbook 2nd Edition

简介 本书包括对机器学习中常见问题和不常见问题的演练和解决方案,以及如何利用scikit-learn有效地执行各种机器学习任务。 第二版首先介绍评估数据统计属性的方法,并为机器学习建模生成合成数据。当您逐步完成这些章节时,您会遇到一些食谱,它们将教您实施数据预处理,线性回归,逻辑回归,KNN,NaïveBayes,分类,决策树,合奏等技术。 此外,您将学习使用多级分类,交叉验证,模型评估来优化您的模型,并深入学习使用scikit-learn实现深度学习。 除了涵盖模型部分,API和分类器,回归器和估算器等新功能的增强功能外,本书还包含评估和微调模型性能的方法。在本书的最后,您将探索用于Python的scikit-learn提供的众多功能,以解决您遇到的任何机器学习问题。 参考资料 下载:https://www.jianshu.com/p/c2

优秀的个人博客,低调大师

[雪峰磁针石博客]2018最佳人工智能数据采集(爬虫)工具书下载

Python网络数据采集 Python网络数据采集 - 2016.pdf 本书采用简洁强大的Python语言,介绍了网络数据采集,并为采集新式网络中的各种数据类型提供了全面的指导。第 1部分重点介绍网络数据采集的基本原理:如何用Python从网络服务器请求信息,如何对服务器的响应进行基本处理,以及如何以自动化手段与网站进行交互。第 二部分介绍如何用网络爬虫测试网站,自动化处理,以及如何通过更多的方式接入网络。 Web Scraping with Python 2nd - 2018.pdf https://github.com/REMitchell/python-scraping 2000左右星 精通Python爬虫框架Scrapy Scrapy是使用Python开发的一个快速、高层次的屏幕抓取和Web抓取框架,用于抓Web站点并从页面中提取结构化的数据。《精通Python爬虫框架Scrapy》以Scrapy 1.0版本为基础,讲解了Scrapy的基础知识,以及如何使用Python和三方API提取、整理数据,以满足自己的需求。 本书共11章,其内容涵盖了Scrapy基础知识,理解HTML和XPath,安装Scrapy并爬取一个网站,使用爬虫填充数据库并输出到移动应用中,爬虫的强大功能,将爬虫部署到Scrapinghub云服务器,Scrapy的配置与管理,Scrapy编程,管道秘诀,理解Scrapy性能,使用Scrapyd与实时分析进行分布式爬取。本书附录还提供了各种软件的安装与故障排除等内容。本书适合软件开发人员、数据科学家,以及对自然语言处理和机器学习感兴趣的人阅读。 源码 github星级 300左右 Learning Scrapy -2016.pdf 另有中文电子版本 因为版权已经在CSDN等网站下架,可以在qq群144081101等找到。 python3爬虫基础 在线教程 https://github.com/MorvanZhou/easy-scraping-tutorial 200 左右星 First web scraper 教程:https://first-web-scraper.readthedocs.io/en/latest/ https://github.com/ireapps/first-web-scraper/blob/master/docs/index.rst 200 左右星 Practical Web Scraping for Data Science -Best Practices and Examples with Python - 2018.pdf https://github.com/Apress/practical-web-scraping-for-data-science 星级 低于100 This book provides a complete and modern guide to web scraping, using Python as the programming language, without glossing over important details or best practices. Written with a data science audience in mind, the book explores both scraping and the larger context of web technologies in which it operates, to ensure full understanding. The authors recommend web scraping as a powerful tool for any data scientist’s arsenal, as many data science projects start by obtaining an appropriate data set. Starting with a brief overview on scraping and real-life use cases, the authors explore the core concepts of HTTP, HTML, and CSS to provide a solid foundation. Along with a quick Python primer, they cover Selenium for JavaScript-heavy sites, and web crawling in detail. The book finishes with a recap of best practices and a collection of examples that bring together everything you've learned and illustrate various data science use cases. 用Python写网络爬虫 第2版 《用Python写网络爬虫(第 2版》讲解了如何使用Python来编写网络爬虫程序,内容包括网络爬虫简介,从页面中抓取数据的3种方法,提取缓存中的数据,使用多个线程和进程进行并发抓取,抓取动态页面中的内容,与表单进行交互,处理页面中的验证码问题,以及使用Scarpy和Portia进行数据抓取,并在最后介绍了使用本书讲解的数据抓取技术对几个真实的网站进行抓取的实例,旨在帮助读者活学活用书中介绍的技术。 《用Python写网络爬虫(第 2版》适合有一定Python编程经验而且对爬虫技术感兴趣的读者阅读。 Python Web Scraping 2nd Edition - 2017.pdf 第一版中文 用Python写网络爬虫.pdf https://github.com/kjam/wswp < 100星 Python Web Scraping Cookbook - 2018.pdf 下载 Python Web Scraping Cookbook is a solution-focused book that will teach you techniques to develop high-performance Scrapers, and deal with cookies, hidden form fields, Ajax-based sites and proxies. You'll explore a number of real-world scenarios where every part of the development or product life cycle will be fully covered. You will not only develop the skills to design reliable, high-performing data flows, but also deploy your codebase to Amazon Web Services (AWS). If you are involved in software engineering, product development, or data mining or in building data-driven products, you will find this book useful as each recipe has a clear purpose and objective. Right from extracting data from websites to writing a sophisticated web crawler, the book's independent recipes will be extremely helpful while on the job. This book covers Python libraries, requests, and BeautifulSoup. You will learn about crawling, web spidering, working with AJAX websites, and paginated items. You will also understand to tackle problems such as 403 errors, working with proxy, scraping images, and LXML. By the end of this book, you will be able to scrape websites more efficiently and deploy and operate your scraper in the cloud. https://github.com/PacktPublishing/Python-Web-Scraping-Cookbook < 100星 参考资料 https://github.com/lorien/awesome-web-scraping/blob/master/python.md 最好用的Python爬虫推荐 https://www.jianshu.com/p/7da43c16dd87 https://www.zhihu.com/question/41277528

优秀的个人博客,低调大师

[雪峰磁针石博客]数据分析工具pandas快速入门教程4-数据汇聚

我们需要的所有信息可能记录在单独的文件和数据帧中。例如,可能有一个公司信息单独表和股票价格表,数据被分成独立的表格以减少冗余信息。 连接 添加行 4-1.py import pandas as pd df1 = pd.read_csv('data/concat_1.csv') df2 = pd.read_csv('data/concat_2.csv') df3 = pd.read_csv('data/concat_3.csv') print(df1) print(df2) print(df3) row_concat = pd.concat([df1, df2, df3]) print(row_concat) print(row_concat.iloc[3, ]) new_row_series = pd.Series(['n1', 'n2', 'n3', 'n4']) print(pd.concat([df1, new_row_series])) new_row_df = pd.DataFrame([['n1', 'n2', 'n3', 'n4']], columns=['A', 'B', 'C', 'D']) print(new_row_df) print(pd.concat([df1, new_row_df])) print(df1.append(df2)) print(df1.append(new_row_df)) data_dict = {'A': 'n1', 'B': 'n2', 'C': 'n3', 'D': 'n4'} print(df1.append(data_dict, ignore_index=True)) row_concat_i = pd.concat([df1, df2, df3], ignore_index=True) print(row_concat_i) 执行结果 $ python3 4-1.py A B C D 0 a0 b0 c0 d0 1 a1 b1 c1 d1 2 a2 b2 c2 d2 3 a3 b3 c3 d3 A B C D 0 a4 b4 c4 d4 1 a5 b5 c5 d5 2 a6 b6 c6 d6 3 a7 b7 c7 d7 A B C D 0 a8 b8 c8 d8 1 a9 b9 c9 d9 2 a10 b10 c10 d10 3 a11 b11 c11 d11 A B C D 0 a0 b0 c0 d0 1 a1 b1 c1 d1 2 a2 b2 c2 d2 3 a3 b3 c3 d3 0 a4 b4 c4 d4 1 a5 b5 c5 d5 2 a6 b6 c6 d6 3 a7 b7 c7 d7 0 a8 b8 c8 d8 1 a9 b9 c9 d9 2 a10 b10 c10 d10 3 a11 b11 c11 d11 A a3 B b3 C c3 D d3 Name: 3, dtype: object A B C D 0 0 a0 b0 c0 d0 NaN 1 a1 b1 c1 d1 NaN 2 a2 b2 c2 d2 NaN 3 a3 b3 c3 d3 NaN 0 NaN NaN NaN NaN n1 1 NaN NaN NaN NaN n2 2 NaN NaN NaN NaN n3 3 NaN NaN NaN NaN n4 A B C D 0 n1 n2 n3 n4 A B C D 0 a0 b0 c0 d0 1 a1 b1 c1 d1 2 a2 b2 c2 d2 3 a3 b3 c3 d3 0 n1 n2 n3 n4 A B C D 0 a0 b0 c0 d0 1 a1 b1 c1 d1 2 a2 b2 c2 d2 3 a3 b3 c3 d3 0 a4 b4 c4 d4 1 a5 b5 c5 d5 2 a6 b6 c6 d6 3 a7 b7 c7 d7 A B C D 0 a0 b0 c0 d0 1 a1 b1 c1 d1 2 a2 b2 c2 d2 3 a3 b3 c3 d3 0 n1 n2 n3 n4 A B C D 0 a0 b0 c0 d0 1 a1 b1 c1 d1 2 a2 b2 c2 d2 3 a3 b3 c3 d3 4 n1 n2 n3 n4 A B C D 0 a0 b0 c0 d0 1 a1 b1 c1 d1 2 a2 b2 c2 d2 3 a3 b3 c3 d3 4 a4 b4 c4 d4 5 a5 b5 c5 d5 6 a6 b6 c6 d6 7 a7 b7 c7 d7 8 a8 b8 c8 d8 9 a9 b9 c9 d9 10 a10 b10 c10 d10 11 a11 b11 c11 d11 添加列 4-2.py In [1]: from numpy import NaN, NAN, nan In [2]: print(NaN == True, NaN == False, NaN == 0, NaN == '', sep='|') False|False|False|False In [3]: print(NaN == NaN, NaN == nan, NaN == NAN, nan == NAN, sep='|') False|False|False|False In [4]: import pandas as pd In [5]: print(pd.isnull(NaN), pd.isnull(nan), pd.isnull(NAN), sep='|') True|True|True In [6]: print(pd.notnull(NaN), pd.notnull(99), pd.notnull("https://china-testing.github.io"), sep='|') False|True|True 执行结果 $ python3 4-2.py A B C D A B C D A B C D 0 a0 b0 c0 d0 a4 b4 c4 d4 a8 b8 c8 d8 1 a1 b1 c1 d1 a5 b5 c5 d5 a9 b9 c9 d9 2 a2 b2 c2 d2 a6 b6 c6 d6 a10 b10 c10 d10 3 a3 b3 c3 d3 a7 b7 c7 d7 a11 b11 c11 d11 A A A 0 a0 a4 a8 1 a1 a5 a9 2 a2 a6 a10 3 a3 a7 a11 A B C D A B C D A B C D new_col_list 0 a0 b0 c0 d0 a4 b4 c4 d4 a8 b8 c8 d8 n1 1 a1 b1 c1 d1 a5 b5 c5 d5 a9 b9 c9 d9 n2 2 a2 b2 c2 d2 a6 b6 c6 d6 a10 b10 c10 d10 n3 3 a3 b3 c3 d3 a7 b7 c7 d7 a11 b11 c11 d11 n4 A B C D A B C D A B C D new_col_list \ 0 a0 b0 c0 d0 a4 b4 c4 d4 a8 b8 c8 d8 n1 1 a1 b1 c1 d1 a5 b5 c5 d5 a9 b9 c9 d9 n2 2 a2 b2 c2 d2 a6 b6 c6 d6 a10 b10 c10 d10 n3 3 a3 b3 c3 d3 a7 b7 c7 d7 a11 b11 c11 d11 n4 new_col_series 0 n1 1 n2 2 n3 3 n4 0 1 2 3 4 5 6 7 8 9 10 11 0 a0 b0 c0 d0 a4 b4 c4 d4 a8 b8 c8 d8 1 a1 b1 c1 d1 a5 b5 c5 d5 a9 b9 c9 d9 2 a2 b2 c2 d2 a6 b6 c6 d6 a10 b10 c10 d10 3 a3 b3 c3 d3 a7 b7 c7 d7 a11 b11 c11 d11 合并不同区间 4-3.py import pandas as pd df1 = pd.read_csv('data/concat_1.csv') df2 = pd.read_csv('data/concat_2.csv') df3 = pd.read_csv('data/concat_3.csv') df1.columns = ['A', 'B', 'C', 'D'] df2.columns = ['E', 'F', 'G', 'H'] df3.columns = ['A', 'C', 'F', 'H'] print(df1) print(df2) print(df3) row_concat = pd.concat([df1, df2, df3]) print(row_concat) print(pd.concat([df1, df2, df3], join='inner')) print(pd.concat([df1,df3], ignore_index=False, join='inner')) df1.index = [0, 1, 2, 3] df2.index = [4, 5, 6, 7] df3.index = [0, 2, 5, 7] print(df1) print(df2) print(df3) col_concat = pd.concat([df1, df2, df3], axis=1) print(col_concat) print(pd.concat([df1, df3], axis=1, join='inner')) 执行结果 $ python3 4-3.py A B C D 0 a0 b0 c0 d0 1 a1 b1 c1 d1 2 a2 b2 c2 d2 3 a3 b3 c3 d3 E F G H 0 a4 b4 c4 d4 1 a5 b5 c5 d5 2 a6 b6 c6 d6 3 a7 b7 c7 d7 A C F H 0 a8 b8 c8 d8 1 a9 b9 c9 d9 2 a10 b10 c10 d10 3 a11 b11 c11 d11 A B C D E F G H 0 a0 b0 c0 d0 NaN NaN NaN NaN 1 a1 b1 c1 d1 NaN NaN NaN NaN 2 a2 b2 c2 d2 NaN NaN NaN NaN 3 a3 b3 c3 d3 NaN NaN NaN NaN 0 NaN NaN NaN NaN a4 b4 c4 d4 1 NaN NaN NaN NaN a5 b5 c5 d5 2 NaN NaN NaN NaN a6 b6 c6 d6 3 NaN NaN NaN NaN a7 b7 c7 d7 0 a8 NaN b8 NaN NaN c8 NaN d8 1 a9 NaN b9 NaN NaN c9 NaN d9 2 a10 NaN b10 NaN NaN c10 NaN d10 3 a11 NaN b11 NaN NaN c11 NaN d11 Empty DataFrame Columns: [] Index: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3] A C 0 a0 c0 1 a1 c1 2 a2 c2 3 a3 c3 0 a8 b8 1 a9 b9 2 a10 b10 3 a11 b11 A B C D 0 a0 b0 c0 d0 1 a1 b1 c1 d1 2 a2 b2 c2 d2 3 a3 b3 c3 d3 E F G H 4 a4 b4 c4 d4 5 a5 b5 c5 d5 6 a6 b6 c6 d6 7 a7 b7 c7 d7 A C F H 0 a8 b8 c8 d8 2 a9 b9 c9 d9 5 a10 b10 c10 d10 7 a11 b11 c11 d11 A B C D E F G H A C F H 0 a0 b0 c0 d0 NaN NaN NaN NaN a8 b8 c8 d8 1 a1 b1 c1 d1 NaN NaN NaN NaN NaN NaN NaN NaN 2 a2 b2 c2 d2 NaN NaN NaN NaN a9 b9 c9 d9 3 a3 b3 c3 d3 NaN NaN NaN NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN a4 b4 c4 d4 NaN NaN NaN NaN 5 NaN NaN NaN NaN a5 b5 c5 d5 a10 b10 c10 d10 6 NaN NaN NaN NaN a6 b6 c6 d6 NaN NaN NaN NaN 7 NaN NaN NaN NaN a7 b7 c7 d7 a11 b11 c11 d11 A B C D A C F H 0 a0 b0 c0 d0 a8 b8 c8 d8 2 a2 b2 c2 d2 a9 b9 c9 d9 合并多个数据集 4-4.py import pandas as pd person = pd.read_csv('data/survey_person.csv') site = pd.read_csv('data/survey_site.csv') survey = pd.read_csv('data/survey_survey.csv') visited = pd.read_csv('data/survey_visited.csv') print(person) print(site) print(survey) print(visited) visited_subset = visited.iloc[[0, 2, 6], ] o2o_merge = site.merge(visited_subset, left_on='name', right_on='site') print(o2o_merge) m2o_merge = site.merge(visited, left_on='name', right_on='site') print(m2o_merge) ps = person.merge(survey, left_on='ident', right_on='person') vs = visited.merge(survey, left_on='ident', right_on='taken') print(ps) print(vs) 执行结果 $ python3 4-4.py ident personal family 0 dyer William Dyer 1 pb Frank Pabodie 2 lake Anderson Lake 3 roe Valentina Roerich 4 danforth Frank Danforth name lat long 0 DR-1 -49.85 -128.57 1 DR-3 -47.15 -126.72 2 MSK-4 -48.87 -123.40 taken person quant reading 0 619 dyer rad 9.82 1 619 dyer sal 0.13 2 622 dyer rad 7.80 3 622 dyer sal 0.09 4 734 pb rad 8.41 5 734 lake sal 0.05 6 734 pb temp -21.50 7 735 pb rad 7.22 8 735 NaN sal 0.06 9 735 NaN temp -26.00 10 751 pb rad 4.35 11 751 pb temp -18.50 12 751 lake sal 0.10 13 752 lake rad 2.19 14 752 lake sal 0.09 15 752 lake temp -16.00 16 752 roe sal 41.60 17 837 lake rad 1.46 18 837 lake sal 0.21 19 837 roe sal 22.50 20 844 roe rad 11.25 ident site dated 0 619 DR-1 1927-02-08 1 622 DR-1 1927-02-10 2 734 DR-3 1939-01-07 3 735 DR-3 1930-01-12 4 751 DR-3 1930-02-26 5 752 DR-3 NaN 6 837 MSK-4 1932-01-14 7 844 DR-1 1932-03-22 name lat long ident site dated 0 DR-1 -49.85 -128.57 619 DR-1 1927-02-08 1 DR-3 -47.15 -126.72 734 DR-3 1939-01-07 2 MSK-4 -48.87 -123.40 837 MSK-4 1932-01-14 name lat long ident site dated 0 DR-1 -49.85 -128.57 619 DR-1 1927-02-08 1 DR-1 -49.85 -128.57 622 DR-1 1927-02-10 2 DR-1 -49.85 -128.57 844 DR-1 1932-03-22 3 DR-3 -47.15 -126.72 734 DR-3 1939-01-07 4 DR-3 -47.15 -126.72 735 DR-3 1930-01-12 5 DR-3 -47.15 -126.72 751 DR-3 1930-02-26 6 DR-3 -47.15 -126.72 752 DR-3 NaN 7 MSK-4 -48.87 -123.40 837 MSK-4 1932-01-14 ident personal family taken person quant reading 0 dyer William Dyer 619 dyer rad 9.82 1 dyer William Dyer 619 dyer sal 0.13 2 dyer William Dyer 622 dyer rad 7.80 3 dyer William Dyer 622 dyer sal 0.09 4 pb Frank Pabodie 734 pb rad 8.41 5 pb Frank Pabodie 734 pb temp -21.50 6 pb Frank Pabodie 735 pb rad 7.22 7 pb Frank Pabodie 751 pb rad 4.35 8 pb Frank Pabodie 751 pb temp -18.50 9 lake Anderson Lake 734 lake sal 0.05 10 lake Anderson Lake 751 lake sal 0.10 11 lake Anderson Lake 752 lake rad 2.19 12 lake Anderson Lake 752 lake sal 0.09 13 lake Anderson Lake 752 lake temp -16.00 14 lake Anderson Lake 837 lake rad 1.46 15 lake Anderson Lake 837 lake sal 0.21 16 roe Valentina Roerich 752 roe sal 41.60 17 roe Valentina Roerich 837 roe sal 22.50 18 roe Valentina Roerich 844 roe rad 11.25 ident site dated taken person quant reading 0 619 DR-1 1927-02-08 619 dyer rad 9.82 1 619 DR-1 1927-02-08 619 dyer sal 0.13 2 622 DR-1 1927-02-10 622 dyer rad 7.80 3 622 DR-1 1927-02-10 622 dyer sal 0.09 4 734 DR-3 1939-01-07 734 pb rad 8.41 5 734 DR-3 1939-01-07 734 lake sal 0.05 6 734 DR-3 1939-01-07 734 pb temp -21.50 7 735 DR-3 1930-01-12 735 pb rad 7.22 8 735 DR-3 1930-01-12 735 NaN sal 0.06 9 735 DR-3 1930-01-12 735 NaN temp -26.00 10 751 DR-3 1930-02-26 751 pb rad 4.35 11 751 DR-3 1930-02-26 751 pb temp -18.50 12 751 DR-3 1930-02-26 751 lake sal 0.10 13 752 DR-3 NaN 752 lake rad 2.19 14 752 DR-3 NaN 752 lake sal 0.09 15 752 DR-3 NaN 752 lake temp -16.00 16 752 DR-3 NaN 752 roe sal 41.60 17 837 MSK-4 1932-01-14 837 lake rad 1.46 18 837 MSK-4 1932-01-14 837 lake sal 0.21 19 837 MSK-4 1932-01-14 837 roe sal 22.50 20 844 DR-1 1932-03-22 844 roe rad 11.25 参考资料 技术支持qq群144081101 591302926 567351477 钉钉免费群21745728 本文最新版本地址 本文涉及的python测试开发库 谢谢点赞! 本文相关海量书籍下载 源码下载 本文英文版书籍下载

资源下载

更多资源
Mario

Mario

马里奥是站在游戏界顶峰的超人气多面角色。马里奥靠吃蘑菇成长,特征是大鼻子、头戴帽子、身穿背带裤,还留着胡子。与他的双胞胎兄弟路易基一起,长年担任任天堂的招牌角色。

Nacos

Nacos

Nacos /nɑ:kəʊs/ 是 Dynamic Naming and Configuration Service 的首字母简称,一个易于构建 AI Agent 应用的动态服务发现、配置管理和AI智能体管理平台。Nacos 致力于帮助您发现、配置和管理微服务及AI智能体应用。Nacos 提供了一组简单易用的特性集,帮助您快速实现动态服务发现、服务配置、服务元数据、流量管理。Nacos 帮助您更敏捷和容易地构建、交付和管理微服务平台。

Spring

Spring

Spring框架(Spring Framework)是由Rod Johnson于2002年提出的开源Java企业级应用框架,旨在通过使用JavaBean替代传统EJB实现方式降低企业级编程开发的复杂性。该框架基于简单性、可测试性和松耦合性设计理念,提供核心容器、应用上下文、数据访问集成等模块,支持整合Hibernate、Struts等第三方框架,其适用范围不仅限于服务器端开发,绝大多数Java应用均可从中受益。

Rocky Linux

Rocky Linux

Rocky Linux(中文名:洛基)是由Gregory Kurtzer于2020年12月发起的企业级Linux发行版,作为CentOS稳定版停止维护后与RHEL(Red Hat Enterprise Linux)完全兼容的开源替代方案,由社区拥有并管理,支持x86_64、aarch64等架构。其通过重新编译RHEL源代码提供长期稳定性,采用模块化包装和SELinux安全架构,默认包含GNOME桌面环境及XFS文件系统,支持十年生命周期更新。

用户登录
用户注册