搜索[学习]结果-低调大师优秀个人博客

TensorFlow 2.6.1 发布，机器学习平台

TensorFlow 是一个用于机器学习的端到端开源平台。它有一个全面灵活的工具、库和社区资源所组成的生态，让开发人员轻松建立和部署由 ML 驱动的应用程序。

2021-11-02

Tensorflow深度学习算法整理(二)

接Tensorflow深度学习算法整理循环神经网络序列式问题为什么需要循环神经网络首先我们来看一下普通的神经网络的样子这里红色部分是输入，比如说图像；绿色部分是网络部分，比如说卷积部分和全连接部分

2021-10-14

作者：Xie Zefan<br> 来源：https://xiezefan.me/2017/05/01/redis_in_action_ziplist/ 在讨论Redis内存压缩的时候，我们需要了解一下几个Redis的相关知识。压缩列表 ziplist Redis的ziplist是用一段连续的内存来存储列表数据的一个数据结构，它的结构示例如下图压缩列表组成示例--截图来自《Redis设计与实现》 zlbytes: 记录整个压缩列表使用的内存大小 zltail: 记录压缩列表表尾距离起始位置有多少字节 zllen: 记录压缩列表节点数量，值得注意的一点是，因为它只占了2个字节，所以最大值只能到65535，这意味着压缩列表长度大于65535的时候，就只能通过遍历整个列表来计算长度了 zleng: 压缩列表末端标志位，固定值为OxFF entry1-N: 压缩列表节点, 具体结构如下图压缩列表节点组成示例--截图来自《Redis设计与实现》其中 previous_entry_length: 上一个节点的长度 encoding: content的编码以及长度 content: 节点数据当我们查找一个节点的时候，主要进行一下操作: 根据zltail获取最后一个节点的位置判断当前节点是否是目标节点如果是，则返回数据如果不是，则根据previous_entry_length计算上一个节点的起始位置，然后重新进行步骤2判断通过上述的描述，我们可以知道，ziplist每次数据更新的复杂度大约是O(N)，因为它需要对N个节点进行内存重分配，查找一个数据的时候，复杂度是O(N)，最坏情况下需要遍历整个列表。什么情况下会使用到ziplist呢？ Redis会使用到ziplist的数据结构是Hash与List。 Hash结构使用ziplist作为底层存储的两个条件是: 所有的键与值的字符串长度都小于64字节的时候键与值对数据小于512个只要上述条件任何一个不满足，Redis就会自动将这个Hash对象从ziplist转换成hashtable。但这两个阈值可以通过修改配置文件中的hash-max-ziplist-value与hash-max-ziplist-entries来变更。 List结构使用ziplist的条件与Hash结构一样，当条件不满足的时候，会从ziplist转换成linkedlist，同样我们可以修改list-max-ziplist-value与hash-max-ziplist-entries来使用不同的阈值。为什么Hash与List会使用ziplist来存储数据呢？因为 ziplist会比hashtable与ziplist节省跟多的内存内存中以连续块方式保存的数据比起hashtable与linkedlist使用的链表可以更快的载入缓存中当ziplist的长度比较小的时候，从ziplist读写数据的效率比hashtable或者linkedlist的差异并不大。本质上，使用ziplist就是以时间换空间的一种优化，但是他的时间损坏小到几乎可以忽略不计，但却能带来可观的内存减少，所以满足条件时，Redis会使用ziplist作为Hash与List的存储结构。实战我们先抛出问题，在广告程序化交易的过程中，我们经常需要为一个广告投放计划定制人群包，其存储的形式如下: 人群包ID => [设备ID_1, 设备ID_2 ... 设备ID_N] 其中，人群包ID是Long型整数，设备ID是经过MD5处理，长度为32。在业务场景中，我们需要判断一个设备ID是否在一个人群包中，来决定是否投放广告。在传统的使用Redis的场景, 我们可以使用标准的KV结构来存储定向包数据，则存储方式如下: {人群包ID}_{设备ID_1} => true {人群包ID}_{设备ID_2} => true 如果我们想使用ziplist来继续内存压缩的话，我们必须保证Hash对象的长度小于512，并且键值的长度小于64字节。我们可以将KV结构的数据，存储到预先分配好的bucket中。我们先预估下，整个Redis集群预计容纳的数据条数为10亿，那么Bucket的数量的计算公式如下: bucket_count = 10亿 / 512 = 195W 那么我们大概需要200W个Bucket(预估Bucket数量需要多预估一点，以防触发临界值问题）我们先以下公式计算BucketID: bucket_id = CRC32(人群包ID + "_" + 设备ID) % 200W 那么数据在Redis的存储结构就变成 bucket_id => { {人群包ID}_{设备ID_1} => true {人群包ID}_{设备ID_2} => true } 这样我们保证每个bucket中的数据项都小于512，并且长度均小于64字节。我们以2000W数据进行测试，前后两者的内存使用情况如下: 数据集大小存储模式 Bucket数量所用内存碎片率 Redis占用的内存 2000W 压缩列表 200W 928M 1.38 1.25G 2000W 压缩列表 5W 785M 1.48 1.14G 2000W 直接存储 - 1.44G 1.03 1.48G 在这里需要额外引入一个概念 – 内存碎片率。内存碎片率 = 操作系统给Redis分配的内存 / Redis存储对象占用的内存因为压缩列表在更新节点的时候，经常需要进行内存重分配，所以导致比较高的内存碎片率。我们在做技术方案比较的时候，内存碎片率也是非常需要关注的指标之一。但有很多手段可以减少内存碎片率，比如内存对其，甚至更极端的直接重做整个Redis内存（利用快照或者从节点来重做内存）都能有效的减低内存碎片率。我们在本次实验中，因为存储的数值比较大（单个KEY约34个字节），所以实际节省内存不是很多，但依然能节约35%-50%的内存使用。在实际的生产环境中，我们根据应用场景合理的设计压缩存储结构，部分业务甚至能达到节约70%的内存使用的效果。压缩列表能节省多少内存？我们现在知道压缩列表是通过将节点紧凑的排列在内存中，从而节省掉内存的。但他究竟节省了哪些内存从而能达到惊人的压缩率呢？首先为了明白这个细节，我们需要知道普通Key-Value结构在Redis中是如何存储的。 typedef struct redisObject { unsigned type:4; // 对象的类型 unsigned encoding:4; // 对象的编码 unsigned lru:LRU_BITS; // LRU类型 int refcount; // 引用计数 void *ptr; // 指向底层数据结构的指针 } robj; Redis所有的对象都是通过上述结构来存储, 假设我存储Hello=>World这样一个健值对到Redis中，除了存储本身键值的数据外，还需要额外的24个字节来存储redisObject对象。而Redis存储字符串使用的SDS数据结构 struct sdshdr8 { uint8_t len; // 所保存字符串的长度 uint8_t alloc; // 分配的内存数量 unsigned char flags;// 标志位，用于判断sdshdr类型 char buf[]; // 字节数组，用户保存字符串 }; 假如字符串的长度无法用unsigned int8来表示的话，Redis会使用能表达更大长度的sdshdr16结构来存储字符串。并且，为了减少修改字符串带来的内存重分类问题，Redis会进行内存预分配，所以可能你仅仅为了保存五个字符，但Redis会为你预分配10 bytes的内存。这意味着当我们存储Hello这个字符串的时候，你需要额外的3个以上的字节。 Oh~~~，我只想保存Hello=>World这十个字符的数据，竟然需要的30~40个字节的数据来存储额外的信息，比存储数据本身的大小还多一些。这还没包括Redis维护字典表所需要的额外的内存空间。那么假设我们用ziplist来存储这个数据，我们仅仅需要额外的2个字节用于存储previous_entry_length与encoding。具体的计算方式可以参考Redis源码或者《Redis设计与实现》第一部分第7章压缩列表。总结从以上对比，我们可以看出，在存储越小的数据的时候，使用ziplist来进行数据压缩能得到更好的压缩率。但副作用也很明显，ziplist的更新效率远远低于普通K-V模式，并且会造成额外的内存碎片率。在Redis中存储大量数据的实践过程中，我们经常会做一些小技巧来尽可能压榨Redis的存储能力。接下来准备写一篇Redis内存压缩的小技巧。近期热文推荐： 1.1,000+ 道 Java面试题及答案整理(2021最新版) 2.终于靠开源项目弄到 IntelliJ IDEA 激活码了，真香！ 3.阿里 Mock 工具正式开源，干掉市面上所有 Mock 工具！ 4.Spring Cloud 2020.0.0 正式发布，全新颠覆性版本！ 5.《Java开发手册（嵩山版）》最新发布，速速下载！觉得不错，别忘了随手点赞+转发哦！

2021-07-09

Elasticsearch-7.x学习笔记

本文转载自：阅读原文文章目录 1. 单节点安装 2. ES安装head插件 3. Elasticsearch Rest基本操作 REST介绍 CURL创建索引库查询索引-GET DSL查询 MGET查询 HEAD的使用 ES更新索引 ES删除索引 ES批量操作-bulk ES版本控制 4. Elasticsearch 核心概念 Cluster Shards Replicas Recovery Gateway Discovery.zen Transport Create Index Mapping 5. Elasticsearch Java 客户端 Java High Level REST Client Java高级客户端 Document APIs Index API Get API Exists API Update API Delete API Bulk API Multi-Get API SearchType Query Then Fetch Dfs, Query Then Fetch 查询-Query 聚合-Aggregations 分页多索引和多类型查询极速查询 ElasticSearch索引模块索引模块组成部分集成IK中文分词插件自定义IK词库热更新IK词库 ES集群安装部署 ES集群规划 ES集群安装 X-Pack安装 Kibana安装 ES优化集群脑裂优化设置增大系统打开文件数合理设置JVM内存锁定物理内存合理设置分片合理设置副本数合并索引关闭索引清除删除文档合理数据导入设置索引_all 设置索引_source 版本一致软件版本 jdk-8u192-linux-x64.tar.gz elasticsearch-7.1.0-linux-x86_64.tar.gz 1. 单节点安装下载： wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.1.0-linux-x86_64.tar.gz 解压 # tar -zxvf elasticsearch-7.1.0-linux-x86_64.tar.gz -C /opt/tools/elk/ # tar -zxvf jdk-8u192-linux-x64.tar.gz -C /opt/tools/ 配置环境变量（root用户下）# vi /etc/profile # set jdk path export JAVA_HOME=/opt/tools/jdk1.8.0 #set es path export ES_HOME=/opt/tools/elk/elasticsearch-7.1.0 export PATH=$PATH:$JAVA_HOME/bin:$ES_HOME/bin 使配置文件生效 # source /etc/profile 查看 # echo $JAVA_HOME /opt/tools/jdk1.8.0 # echo $ES_HOME /opt/tools/elk/elasticsearch-7.1.0 修改配置文件$ES_HOME/config/elasticsearch.yml network.host: 192.168.93.252 #设置当前主机ip 启动 # bin/elasticsearch # bin/elasticsearch -d （后台启动）报错问题一 [o.e.b.ElasticsearchUncaughtExceptionHandler] [bigdatademo] uncaught exception in thread [main]org.elasticsearch.bootstrap.StartupException: java.lang.RuntimeException: can not run elast icsearch as root 原因：elasticsearch不能以root账户启动解决方案：新建一个普通用户 # useradd yskang # passwd yskang Changing password for user yskang. New password: BAD PASSWORD: it is too simplistic/systematic BAD PASSWORD: is too simple Retype new password: passwd: all authentication tokens updated successfully. 授权：# chown -R yskang:yskang /opt/* 然后切换到yskang用户，重新启动elasticsearch 使普通用户具有root用户权限，sudo命令权限（通过which查看命令所在） root用户通过vi sudo去修改 # visudo 添加以下内容： ## Allow root to run any commands anywhere root ALL=(ALL) ALL yskang ALL=(ALL) NOPASSWD: ALL ALL 说明：用户名 IP或者网段=（身份）也可以不写，默认是root 可执行的命令使用方法$ sudo service iptables status 问题二 java.lang.UnsupportedOperationException: seccomp unavailable: requires kernel 3.5+ with CON FIG_SECCOMP and CONFIG_SECCOMP_FILTER compiled in at org.elasticsearch.bootstrap.SystemCallFilter 原因：报了一大串错误，不必惊慌，其实只是一个警告，主要是因为Linux的版本过低造成的解决方案：（1）重新安装新版本的Linux系统；（2）警告不影响使用，可以忽略问题三 ERROR: bootstrap checks failed [1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65535] 原因：无法创建本地文件问题，用户最大可创建文件数太小解决方案：切换到root用户下，编辑 /etc/security/limits.conf，追加以下内容； # vi /etc/security/limits.conf * soft nofile 65536 * hard nofile 262144 * soft nproc 32000 * hard nproc 32000 问题四 [2]: max number of threads [1024] for user [yskang] is too low, increase to at least [4096] 1 原因：无法创建本地线程问题，用户最大可创建线程数太小解决方案：切换到root用户下，编辑 /etc/security/limits.d/90-nproc.conf # vi /etc/security/limits.d/90-nproc.conf 找到 * soft nproc 1024 修改为： * soft nproc 4096 1 2 3 4 5 问题五 [3]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144] 1 原因：最大虚拟内存太小解决方案：切换到root用户下，编辑 /etc/sysctl.conf，追加以下内容： vm.max_map_count=655360 1 保存后，执行命令（使配置生效），然后重新启动 # sysctl -p 1 问题六 [4]: system call filters failed to install; check the logs and fix your configuration or di sable system call filters at your own risk 1 2 原因: 因为Centos6不支持SecComp,而ES默认bootstrap.system_call_filter为true进行检测,所以导致检测失败,失败后直接导致ES不能解决方案:修改elasticsearch.yml 添加以下内容 $ vi elasticsearch.yml bootstrap.memory_lock: false #设置ES节点允许内存交换 bootstrap.system_call_filter: false #禁用系统调用过滤器 1 2 3 问题七 [5]: the default discovery settings are unsuitable for production use; at least one of [dis covery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured 1 2 原因：默认发现设置不适合生产使用;必须至少配置[dis covery.seed_hosts，discovery.seed_providers，cluster.initial_master_nodes]中的一个解决方案：修改elasticsearch.yml文件 $ vi elasticsearch.yml 将 #cluster.initial_master_nodes: ["node-1", "node-2"] 去掉注释#并修改为 cluster.initial_master_nodes: ["bigdatademo"] 说明：bigdatademo：当前节点主机名，记得保存。 1 2 3 4 5 启动完成后，验证服务是否开启成功 $ curl http://192.168.93.252:9200 { "name" : "bigdatademo", "cluster_name" : "elasticsearch", "cluster_uuid" : "_na_", "version" : { "number" : "7.1.0", "build_flavor" : "default", "build_type" : "tar", "build_hash" : "606a173", "build_date" : "2019-05-16T00:43:15.323135Z", "build_snapshot" : false, "lucene_version" : "8.0.0", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }, "tagline" : "You Know, for Search" } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 2. ES安装head插件安装nodejs npm # yum -y install nodejs npm 直接yum install -y nodejs会提示找不到nodejs这个模块安装nodesource后再执行yum install -y nodejs $ curl --silent --location https://rpm.nodesource.com/setup_10.x | sudo bash - 然后 $ sudo yum -y install nodejs 会将npm一起安装的查看版本信息 $ node -v v10.16.0 $ npm -v 6.9.0 安装git $ sudo yum -y install git 下载head $ git clone git://github.com/mobz/elasticsearch-head.git $ cd elasticsearch-head $ npm install 报错：node npm install Error:CERT_UNTRUSTED ssl验证问题：使用下面命令取消ssl验证即可解决 npm config set strict-ssl false 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 配置head插件修改Gruntfile.js配置，增加hostname: '*'配置 $ vi Gruntfile.js connect: { server: { options: { port: 9100, base: '.', keepalive: true, hostname: '*' } } } 1 2 3 4 5 6 7 8 9 10 11 12 修改head/_site/app.js文件修改head连接es的地址（修改localhost为本机的ip地址） $ vi app.js this.base_uri = this.config.base_uri || this.prefs.get("app-base_uri") || "http://192.168.93.252:9200"; 1 2 ES 配置修改elasticsearch.yml ,增加跨域的配置（需要重启es才能生效） $ vi config/elasticsearch.yml http.cors.enabled: true http.cors.allow-origin: "*" 1 2 3 启动head插件 $ cd elasticsearch-head/node_modules/grunt/ $ bin/grunt server & 查看进程 $ netstat -ntlp 1 2 3 4 head 查看es集群状态http://192.168.93.252:9100 3. Elasticsearch Rest基本操作 REST介绍 REST定义 REST即表述性状态传递（英文：Representational State Transfer，简称REST）是Roy Fielding博士在2000年他的博士论文中提出来的一种软件架构风格。它是一种针对网络应用的设计和开发方式，可以降低开发的复杂性，提高系统的可伸缩性。 REST指的是一组架构约束条件和原则。满足这些约束条件和原则的应用程序或设计就是RESTful Web应用程序最重要的REST原则是：客户端和服务器之间的交互在请求之间是无状态的。从客户端到服务器的每个请求都必须包含理解请求所必需的信息。如果服务器在请求之间的任何时间点重启，客户端不会得到通知。此外，无状态请求可以由任何可用服务器回答，这十分适合云计算之类的环境。客户端可以缓存数据以改进性能。在服务器端，应用程序状态和功能可以分为各种资源。每个资源都是要URI(Universal Resource Identifier)得到一个唯一的地址。所有资源都共享统一的界面，以便在客户端和服务器之间传输状态。使用的是标准的HTTP方法，比如：GET、PUT、POST和DELETE。 REST资源资源 GET PUT POST DELETE 一组资源的URL 如： http://example.com/products/ 列出URL列表使用给定的一组资源替换当前组资源在本资源组中创建或者追加一个新的资源删除整组资源单个资源的URL 如： http://example.com/products/1234 获取指定资源的详细信息替换或者创建指定资源在资源组下创建或者追加一个新的元素删除指定的元素 REST基本操作方法作用 GET 获取对象的当前状态 PUT 改变对象的状态 POST 创建对象 DELETE 删除对象 HEAD 获取头信息 ES内置的常用REST接口 URL 说明 /index/_search 搜索指定索引下的数据 /_aliases 获取或者操作索引下的别名 /index/ 查看指定索引下的详细信息 /index/type/ 创建或者操作类型 /index/mapping 创建或者操作mapping /index/settings 创建或者操作settings /index/_open 打开指定索引 /index/_close 关闭指定索引 /index/_refresh 刷新索引（使新增加内容对搜索可见，不保证数据被写入磁盘） /index/_flush 刷新索引（会触发Lucene提交数据） CURL命令简单认为是可以在命令行下访问url的一个工具 curl是利用URL语法在命令行方式下工作的开源文件传输工具，使用curl可以简单实现常见的get/post请求 CURL的使用 -X 指定http请求的方法，GET POST PUT DELETE -d 指定要传输的参数 1 2 CURL创建索引库示例：如：索引库名称：test $ curl -XPUT 'http://192.168.93.252:9200/test/' PUT/POST都可以显示以下内容表示创建索引库成功 {"acknowledged":true,"shards_acknowledged":true,"index":"test"} 创建数据 $ curl -XPOST http://192.168.93.252:9200/test/user/1 -d'{"name":"jack","age":26}' {"error":"Content-Type header [application/x-www-form-urlencoded] is not supported","status":406} 高版本的ES需要指定头文件信息，否则会报错，低版本可以不用指定 $ curl -H "Content-Type: application/json" -XPOST http://192.168.93.252:9200/test/user/1 -d'{ "name":"jack","age":26}' {"_index":"test","_type":"user","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0 ,"_primary_term":1} 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 PUT和POST的用法区别 PUT是幂等方法，二POST不是。所以PUT用于更新操作、POST用于新增操作比较合适 PUT、DELETE操作是幂等的。所谓幂等是指不管进行多少次操作，结果都一样。 POST操作不是幂等的，因此会出现POST重复加载的问题，比如，当多次发出同样的POST请求之后，结果会创建若干的资源创建操作可以使用POST，也可以使用PUT，区别在于POST是作用在一个集合资源之上的（/articles），而PUT是作用在一个具体资源之上的（/articles/123），比如很多资源使用数据库自增主键作为标识信息，而创建的资源标识信息到底是什么只能由服务端提供，这个时候就必须使用POST 创建索引库的注意事项索引库名称必须要全部小写，不能以下划线开头，也不能包含逗号如果没有明确指定索引数据的ID，那么ES会自动生成一个随机的ID，需要使用POST参数 $ curl -H "Content-Type: application/json" -XPOST http://192.168.93.252:9200/test/user/ -d'{ "name":"john","age":18}' 1 2 创建全新内容的两种方式（1）使用自增ID（post） $ curl -H "Content-Type: application/json" -XPOST http://192.168.93.252:9200/test/user/ -d'{ "name":"john","age":18}' 1 2 （2）在url后面添加参数 $ curl -H "Content-Type: application/json" -XPUT http://192.168.93.252:9200/test/user/2?op_type=create -d'{ "name":"lucy","age":15}' $ curl -H "Content-Type: application/json" -XPUT http://192.168.93.252:9200/test/user/3/_create -d'{ "name":"alan","age":58}' 1 2 3 4 5 查询索引-GET （1）根据id查询 $ curl -XGET http://192.168.93.252:9200/test/user/1 {"_index":"test","_type":"user","_id":"1","_version":1,"_seq_no":0,"_primary_term":1,"found":true,"_source":{"name":"jack","age":26}} 1 2 3 （2）在任意的查询字符串中添加pretty参数，ES可以得到易于识别的json结果 (1) 检索文档中的一部分，如果只需要显示指定字段 $ curl -XGET 'http://192.168.93.252:9200/test/user/1?_source=name&pretty' { "_index" : "test", "_type" : "user", "_id" : "1", "_version" : 1, "_seq_no" : 0, "_primary_term" : 1, "found" : true, "_source" : { "name" : "jack" } } (2) 查询指定索引库指定类型所有数据 $ curl -XGET http://192.168.93.252:9200/test/user/_search?pretty { "took" : 80, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 4, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "test", "_type" : "user", "_id" : "1", "_score" : 1.0, "_source" : { "name" : "jack", "age" : 26 } }, { "_index" : "test", "_type" : "user", "_id" : "zeQEIWsBWJbm70w3S4EC", "_score" : 1.0, "_source" : { "name" : "john", "age" : 18 } }, { "_index" : "test", "_type" : "user", "_id" : "2", "_score" : 1.0, "_source" : { "name" : "lucy", "age" : 15 } }, { "_index" : "test", "_type" : "user", "_id" : "3", "_score" : 1.0, "_source" : { "name" : "alan", "age" : 58 } } ] } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 （3）根据条件进行查询 $ curl -XGET 'http://192.168.93.252:9200/test/user/_search?q=name:john&pretty=true' 或者 $ curl -XGET 'http://192.168.93.252:9200/test/user/_search?q=name:john&pretty' { "took" : 5, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.2039728, "hits" : [ { "_index" : "test", "_type" : "user", "_id" : "zeQEIWsBWJbm70w3S4EC", "_score" : 1.2039728, "_source" : { "name" : "john", "age" : 18 } } ] } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 DSL查询 DSL（Domain Specific Language）领域特定语言新添加一个文档 $ curl -H "Content-Type: application/json" -XPUT http://192.168.93.252:9200/test/user/4/_create -d'{"name":"zhangsan","age":18}' {"_index":"test","_type":"user","_id":"4","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":4 ,"_primary_term":1} $ curl -H "Content-Type: application/json" -XGET http://192.168.93.252:9200/test/user/_search -d'{"query":{"match":{"name":"zhangsan"}}}' {"took":2,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"}," max_score":1.3862944,"hits":[{"_index":"test","_type":"user","_id":"4","_score":1.3862944,"_source":{"name":"zhangsan","age":18}}]}} 1 2 3 4 5 6 7 MGET查询使用mget API 获取多个文档先新建一个索引库test2 $ curl -XPUT 'http://192.168.93.252:9200/test2/' $ curl -H "Content-Type: application/json" -XPOST http://192.168.93.252:9200/test2/user/1 -d'{"name":"marry","age":16}' $ curl -H "Content-Type: application/json" -XGET http://192.168.93.252:9200/_mget?pretty -d'{"docs":[{"_index":"test","_type":"user","_id":2,"_source":"name"},{"_index":"test2","_type":"user","_id":1}]}' { "docs" : [ { "_index" : "test", "_type" : "user", "_id" : "2", "_version" : 1, "_seq_no" : 2, "_primary_term" : 1, "found" : true, "_source" : { "name" : "lucy" } }, { "_index" : "test2", "_type" : "user", "_id" : "1", "_version" : 1, "_seq_no" : 0, "_primary_term" : 1, "found" : true, "_source" : { "name" : "marry", "age" : 16 } } ] } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 如果需要的文档在同一个_index或者同一个_type中，你就可以在URL中指定一个默认的/_index或者/_index/_type $ curl -H "Content-Type: application/json" -XGET http://192.168.93.252:9200/test/user/_mget?pretty -d'{ "docs":[{"_id":1},{"_id":2}]}' { "docs" : [ { "_index" : "test", "_type" : "user", "_id" : "2", "_version" : 1, "_seq_no" : 2, "_primary_term" : 1, "found" : true, "_source" : { "name" : "lucy", "age" : 15 } }, { "_index" : "test", "_type" : "user", "_id" : "1", "_version" : 1, "_seq_no" : 0, "_primary_term" : 1, "found" : true, "_source" : { "name" : "jack", "age" : 26 } } ] } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 如果所有的文档拥有相同的_index以及_type中，直接在请求中添加ids的数组即可 $ curl -H "Content-Type: application/json" -XGET http://192.168.93.252:9200/test/user/_mget?pretty -d'{ "ids":["1","2"]}' { "docs" : [ { "_index" : "test", "_type" : "user", "_id" : "1", "_version" : 1, "_seq_no" : 0, "_primary_term" : 1, "found" : true, "_source" : { "name" : "jack", "age" : 26 } }, { "_index" : "test", "_type" : "user", "_id" : "2", "_version" : 1, "_seq_no" : 2, "_primary_term" : 1, "found" : true, "_source" : { "name" : "lucy", "age" : 15 } } ] } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 HEAD的使用如果只想检查一下文档是否存在，你可以使用HEAD来替代GET方法，这样就只会返回HTTP头文件 $ curl -i -XHEAD http://192.168.93.252:9200/test/user/1 HTTP/1.1 200 OK Warning: 299 Elasticsearch-7.1.0-606a173 "[types removal] Specifying types in document get requests is deprecated, use the /{index}/_ doc/{id} endpoint instead."content-type: application/json; charset=UTF-8 content-length: 133 $ curl -i -XHEAD http://192.168.93.252:9200/test/user/5 HTTP/1.1 404 Not Found Warning: 299 Elasticsearch-7.1.0-606a173 "[types removal] Specifying types in document get requests is deprecated, use the /{index}/_ doc/{id} endpoint instead."content-type: application/json; charset=UTF-8 content-length: 56 1 2 3 4 5 6 7 8 9 10 11 ES更新索引 ES可以使用PUT或者POST对文档进行更新（全部更新）操作，如果指定ID的文档已经存在，则执行更新操作注意：ES在执行更新操作的时候，首先会将旧的文档标记为删除状态，然后添加新的文档，旧的文档不会立即消失，但是你也无法访问，ES会在你继续添加更多数据的时候在后台清理已经标记为删除状态的文档局部更新，可以添加新字段或者更新已有字段（必须使用POST） $ curl -H "Content-Type: application/json" -XPOST http://192.168.93.252:9200/test/user/1/_update -d'{"doc":{"name":"baby","age":18}}' {"_index":"test","_type":"user","_id":"1","_version":2,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":5 ,"_primary_term":1} $ curl -XGET http://192.168.93.252:9200/test/user/1?pretty { "_index" : "test", "_type" : "user", "_id" : "1", "_version" : 2, "_seq_no" : 5, "_primary_term" : 1, "found" : true, "_source" : { "name" : "baby", "age" : 18 } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ES删除索引 ES可以使用DELETE对文档进行删除操作 $ curl -XDELETE http://192.168.93.252:9200/test/user/1 {"_index":"test","_type":"user","_id":"1","_version":3,"result":"deleted","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":6 ,"_primary_term":1} 说明：如果文档存在，result属性值为deleted，_version属性的值+1 $ curl -XDELETE http://192.168.93.252:9200/test/user/1 {"_index":"test","_type":"user","_id":"1","_version":4,"result":"not_found","_shards":{"total":2,"successful":1,"failed":0},"_seq_no" :11,"_primary_term":1} 说明：如果文档不存在，result属性值为not_found,但是_version属性的值依然会+1，这个就是内部管理的一部分，它保证了我们在多个节点间的不同操作的顺序都被正确的标记了 $ curl -XGET http://192.168.93.252:9200/test/user/1 {"_index":"test","_type":"user","_id":"1","found":false} 1 2 3 4 5 6 7 8 9 10 11 12 注意：ES在执行删除操作时也不会立即生效，它只是被标记成已删除。ES将会在你之后添加更多索引的时候才会在后台进行删除内容的清理 ES批量操作-bulk bulk API可以帮助我们同时执行多个请求格式： action: index/create/update/delete metadata: _index，_type，_id request body: _source(删除操作不需要) {action:{metadata}} {request body} {action:{metadata}} {request body} 1 2 3 4 5 6 7 create和index的区别如果数据存在，使用create操作失败，会提示文档已经存在，使用index则可以成功执行使用文件的方式新建一个requests文件 $ vi requests {"index":{"_index":"test","_type":"user","_id":"6"}} {"name":"mayun","age":51} {"update":{"_index":"test","_type":"user","_id":"6"}} {"doc":{"age":52}} 1 2 3 4 5 执行批量操作 $ curl -H "Content-Type: application/json" -XPOST http://192.168.93.252:9200/_bulk --data-binary @requests {"took":31,"errors":false,"items":[{"index":{"_index":"test","_type":"user","_id":"6","_version":1,"result":"created","_shards":{"tot al":2,"successful":1,"failed":0},"_seq_no":12,"_primary_term":1,"status":201}},{"update":{"_index":"test","_type":"user","_id":"6","_version":2,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":13,"_primary_term":1,"status":200}}]} $ curl -XGET http://192.168.93.252:9200/test/user/6?pretty { "_index" : "test", "_type" : "user", "_id" : "6", "_version" : 2, "_seq_no" : 13, "_primary_term" : 1, "found" : true, "_source" : { "name" : "mayun", "age" : 52 } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 bulk请求可以在URL中声明/_index 或者 /_index/_type bulk一次最大处理多少数据量 (1) bulk会把将要处理的数据载入内存中，所以数据量是有限的 (2) 最佳的数据量不是一个确定的数值，它取决于你的硬件，你的文档大小以及复杂性，你的索引以及搜索的负载 (3) 一般建议是1000-5000个文档，如果你的文档很大，可以适当减少队列，大小建议是5-15MB，默认不能超过100MB，可以在ES的配置文件中修改这个值 http.max_content_length: 100mb http.max_content_length ：The max content of an HTTP request. Defaults to 100mb. (4) 官网说明：https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-http.html ES版本控制（1）普通关系型数据库使用的是（悲观并发控制（PCC））当我们在修改一个数据前先锁定这一行，然后确保只有读取到数据的这个线程可以修改这一行数据（2）ES使用的是（乐观并发控制（OCC）） ES不会阻止某一数据的访问，然而，如果基础数据在我们读取和写入的间隔中发生了变化，更新就会失败，这时候就由程序来决定如何处理这个冲突。它可以重新读取新数据来进行更新，又或者将这一情况直接反馈给客户（3）ES如何实现版本控制（使用ES内部版本号）首先得到需要修改的文档，获取版本号（_version） $ curl -XGET http://192.168.93.252:9200/test/user/2 {"_index":"test","_type":"user","_id":"2","_version":1,"_seq_no":2,"_primary_term":1,"found":true,"_source":{ "name":"lucy","age":15}} 1 2 3 在执行更新操作的时候把版本号传过去 $ curl -H "Content-Type: application/json" -XPOST http://192.168.93.252:9200/test/user/2/_update?version=1 -d'{"doc":{"age":30}}' {"_index":"test","_type":"user","_id":"2","_version":2,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1 4,"_primary_term":1} $ curl -H "Content-Type: application/json" -XPUT http://192.168.93.252:9200/test/user/2?version=2 -d'{"name":"joy","age":20}' 1 2 3 4 5 如果传递的版本号和待更新的文档的版本号不一致，则会更新失败 4. Elasticsearch 核心概念 Cluster （1）代表一个集群，集群中有多个节点，其中有一个为主节点，这个主节点是可以通过选举产生的，主从节点是对于集群内部来说的。ES的一个概念就是去中心化。（2）主节点的职责是负责管理集群状态，包括管理分片的状态和副本的状态，以及节点的发现和删除。（3）注意：主节点不负责对数据的增删改查请求进行处理，只负责维护集群的相关状态信息。集群状态查看：http://192.168.93.252:9200/_cluster/health?pretty { cluster_name: "elasticsearch", status: "yellow", timed_out: false, number_of_nodes: 1, number_of_data_nodes: 1, active_primary_shards: 2, active_shards: 2, relocating_shards: 0, initializing_shards: 0, unassigned_shards: 2, delayed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 50 } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Shards （1）代表索引分片，ES可以把一个完整的索引分成多个分片，好处是可以把一个大的索引水平拆分成多个，分布到不同的节点上，构成分布式搜索，提高性能和吞吐量（2）分片的数量只能在创建索引库时指定，索引库创建后不能更改。 curl -H "Content-Type:application/json" -XPUT 'http://192.168.93.252:9200/test3/' -d'{"settings":{"number_of_shards":3}}' 1 默认一个索引库有5个分片（7.0之前），本版本7.1.0默认只有一个分片和一个副本每个分片中最多存储2,147,483,519条数据官网地址：https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-concepts.html To summarize, each index can be split into multiple shards. An index can also be replicated zero (meaning no replicas) or more times. Once replicated, each index will have primary shards (the original shards that were replicated from) and replica shards (the copies of the primary shards). 总而言之，每个索引可以拆分为多个分片。索引也可以复制为零（表示没有副本）或更多次。复制后，每个索引都将具有主分片（从中复制的原始分片）和副本分片（主分片的副本）。 The number of shards and replicas can be defined per index at the time the index is created. After the index is created, you may also change the number of replicas dynamically anytime. You can change the number of shards for an existing index using the _shrink and _split APIs, however this is not a trivial task and pre-planning for the correct number of shards is the optimal approach. 可以在创建索引时为每个索引定义分片和副本的数量。创建索引后，您还可以随时动态更改副本数。您可以使用_shrink和_split API更改现有索引的分片数，但这不是一项简单的任务，并且预先计划正确数量的分片是最佳方法。 By default, each index in Elasticsearch is allocated one primary shard and one replica which means that if you have at least two nodes in your cluster, your index will have one primary shard and another replica shard (one complete replica) for a total of two shards per index. 默认情况下，Elasticsearch中的每个索引都分配了一个主分片和一个副本，这意味着如果群集中至少有两个节点，则索引将具有一个主分片和另一个副本分片（一个完整副本），总共两个每个索引的分片。 Each Elasticsearch shard is a Lucene index. There is a maximum number of documents you can have in a single Lucene index. As of LUCENE-5843, the limit is 2,147,483,519 (= Integer.MAX_VALUE - 128) documents. You can monitor shard sizes using the _cat/shards API. 每个Elasticsearch分片都是Lucene索引。单个Lucene索引中可以包含最多文档数。截至LUCENE-5843，限制为2,147,483,519（= Integer.MAX_VALUE - 128）个文件。您可以使用_cat / shards API监视分片大小。 1 2 3 4 5 6 7 8 9 10 11 Replicas 代表索引副本，ES可以给索引分片设置副本副本的作用：一是提高系统的容错性，当某个节点某个分片损坏或丢失时可以从副本中恢复二是提高ES的查询效率，ES会自动对搜索请求进行负载均衡副本的数量可以随时修改，可以在创建索引库的时候指定 curl -H "Content-Type:application/json" -XPUT 'http://192.168.93.252:9200/test3/' -d'{"settings":{"number_of_replicas":3}}' 1 默认是一个分片有1个副本 -> index.number_of_replicas: 1 注意：主分片和副本不会存在一个节点中 Recovery 代表数据的恢复或叫数据重新分布，ES在有节点加入或退出时会根据机器的负载对索引分片进行重新分配，挂掉的节点重新启动时也会进行数据恢复 Gateway 代表ES索引的持久化存储方式，ES默认是先把索引存放到内存中，当内存满了时再持久化到硬盘。当这个ES集群关闭再重新启动时就会从Gateway中读取索引数据。 ES支持多种类型的Gateway：本地文件系统（默认）分布式文件系统 Hadoop的HDFS和Amazon的S3云存储服务 Discovery.zen 代表ES的自动发现节点机制，ES是一个基于p2p的系统，它先通过广播寻找存在的节点，再通过多播协议来进行节点之间的通信，同时也支持点对点的交互如果是不同网段的节点如何组成ES集群？禁用自动发现机制 discovery.zen.ping.multicast.enabled: false 设置新节点被启动时能够发现的主节点列表 discovery.zen.ping.unicast.hosts: [“192.168.93.252”,“192.168.93.251”,“192.168.93.250”] Transport 代表ES内部节点或集群与客户端的交互方式，默认内部是使用tcp协议进行交互，同时它支持http协议（json格式）、thrift、servlet、memcached、zeroMQ等其他的传输协议（通过插件方式集成） Create Index 官网地址：https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html Create Index API用于在Elasticsearch中手动创建索引。Elasticsearch中的所有文档都存储在一个索引或另一个索引中。最基本的命令如下： PUT twitter 1 这将创建一个名为twitter的索引，其默认设置为all 索引命名限制（1）仅限小写（2）不能包括\，/，*，？，“，<，>，|，``（空格字符），逗号，＃（3）7.0版本之前的索引可能包含冒号（：)，但已被弃用，7.0+不支持（4）不能以- ，_，+开头（5）不能是.或者.. （6）不能超过255个字节（注意它是字节，因此多字节字符将更快地计入255个限制） Index Settings 创建的每个索引都可以具有与其关联的特定设置，如： PUT twitter { "settings" : { "index" : { "number_of_shards" : 3, "number_of_replicas" : 2 } } } 1 2 3 4 5 6 7 8 9 分片数：number_of_shards默认值为1 副本数：number_of_replicas默认值为1 也可以简化 PUT twitter { "settings" : { "number_of_shards" : 3, "number_of_replicas" : 2 } } 1 2 3 4 5 6 7 注意：您不必在settings部分中明确指定索引部分查看索引库的settings信息 curl -XGET http://192.168.93.252:9200/test/_settings?pretty 1 操作不存在的索引（创建） curl -H "Content-Type:application/json" -XPUT 'http://192.168.93.252:9200/test4/' -d'{"settings":{"number_of_shards":3,"number_of_replicas":2}}' 1 操作已存在索引（修改） curl -H "Content-Type:application/json" -XPUT 'http://192.168.93.252:9200/test4/_settings' -d'{"index":{"number_of_replicas":2}}' 1 Mapping 官网地址：https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html Mapping是定义文档及其包含的字段的存储和索引方式的过程。例如，使用Mapping来定义：（1）应将哪些字符串字段视为全文字段。（2）哪些字段包含数字，日期或地理位置。（3）日期值的格式。（4）用于控制动态添加字段的映射的自定义规则。 Mapping Type 每个索引都有一种映射类型，用于确定文档的索引方式。映射类型具有：（1）Meta-fields（元字段）元字段用于自定义文档的元数据关联的处理方式。元字段的示例包括文档的_index，_type，_id和_source字段。（2）Fields or properties（字段或属性）映射类型包含与文档相关的字段（fields ）或属性（properties ）列表。每个字段都有一个数据类型，可以是：（1）一个简单的类型，如text，keyword，date，long，double，boolean或ip。（2）支持JSON的分层特性的类型，如对象或嵌套（3）或者像geo_point，geo_shape或completion这样的特殊类型。创建索引时可以指定映射，如下所示： PUT my_index { "mappings": { "properties": { "title": { "type": "text" }, "name": { "type": "text" }, "age": { "type": "integer" }, "created": { "type": "date", "format": "strict_date_optional_time||epoch_millis" } } } } 说明：创建一个名为my_index的索引指定映射中的字段或属性指定标题字段包含文本值指定名称字段包含文本值指定age字段包含整数值指定创建的字段包含两种可能格式的日期值 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 查询索引库的mapping信息 curl -XGET http://192.168.93.252:9200/test/user/_mapping?pretty 1 操作不存在的索引（创建） curl -H "Content-Type:application/json" -XPUT 'http://192.168.93.252:9200/test5/' -d'{"mappings": {"user": {"properties": { "name": { "type": "text"},"age": { "type": "integer" }}}}}' 1 操作已存在索引（修改） curl -H "Content-Type:application/json" -XPUT 'http://192.168.93.252:9200/test5/user/_mapping' -d'{"properties": { "name": { "type": "text"},"age": { "type": "integer" }}}' 1 5. Elasticsearch Java 客户端 Java High Level REST Client Java高级客户端 Java API:7.1 官网地址https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/index.html 添加maven依赖  <dependency> <groupId>org.junit.jupiter</groupId> <artifactId>junit-jupiter-api</artifactId> <version>5.4.2</version> <scope>test</scope> </dependency>  <dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> <version>7.1.0</version> </dependency> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 端口 ES启动监听两个端口：9300 和 9200 ES 9200端口与 9300端口的区别？ 9300是TCP协议端口：通过tcp协议通讯，ES集群之间是通过9300进行通讯，Java客户端（TransportClient）的方式也是以tcp协议在9300端口上与集群进行通信 9200是Http协议端口：主要用于外部通讯，外部使用RESTful接口进行访问官网介绍：https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.1/java-rest-high.html Java高级REST客户端，内部仍然是基于低级客户端，它提供了更多的API，接受请求对象作为参数并返回响应对象，由客户端自己处理编码和解码。每个API都可以同步或异步调用，同步方法返回一个响应对象，而异步方法的名称以async后缀结尾，需要一个监听器参数，一旦收到响应或错误，就会被通知（在由低级客户端管理的线程池上）。 Java高级REST客户端依赖于Elasticsearch Core项目。它接受与TransportClient相同的请求参数，并返回相同的响应对象。官网介绍：https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.1/java-rest-high-compatibility.html 兼容性 Java高级REST客户端需要Java 1.8并依赖于Elasticsearch核心项目。客户端版本与客户端开发的Elasticsearch版本相同。它接受与TransportClient相同的请求参数，并返回相同的响应对象。高级客户端保证能够与运行在相同主要版本和更大或相同次要版本上的任何Elasticsearch节点进行通信。它不需要与它与之通信的Elasticsearch节点处于相同的次要版本，因为它是向前兼容的，这意味着它支持与Elasticsearch的更高版本进行通信，而不是与其开发的版本进行通信。初始化官网介绍：https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.1/java-rest-high-getting-started-initialization.html 连接到ES集群 RestHighLevelClient实例需要低级客户端构建器来构建，如下所示： RestHighLevelClient client = new RestHighLevelClient( RestClient.builder( new HttpHost("localhost", 9200, "http"), new HttpHost("localhost", 9201, "http"))); 1 2 3 4 高级客户端将在内部创建用于根据提供的构建器执行请求的低级客户端，并管理其生命周期该低级客户端维护一个连接池并启动一些线程，因此当您需要关闭高级客户端时，它将关闭内部低级客户端以释放这些资源。这可以通过close()方法来完成：client.close(); 测试 /** * 测试：使用RestHighLevelClient连接ElasticSearch集群 * @throws IOException */ @Test public void test() throws IOException { RestHighLevelClient client = new RestHighLevelClient( RestClient.builder( new HttpHost("192.168.93.252", 9200, "http"))); GetRequest getRequest = new GetRequest("test", "5"); boolean exists = client.exists(getRequest, RequestOptions.DEFAULT); if (exists) { System.out.println("文档存在"); } else { System.out.println("文档不存在"); } client.close(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Document APIs Index API 官网介绍：https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.1/java-rest-high-document-index.html 索引index API：（json、map、XContentBuilder 、object） public class ESDocumentAPIsTest { public RestHighLevelClient client; /** * 测试：使用RestHighLevelClient连接ElasticSearch集群 * @throws IOException */ @BeforeEach public void test() throws IOException { client = new RestHighLevelClient( RestClient.builder( new HttpHost("192.168.93.252", 9200, "http"))); } /** * index API ： json格式 * @throws IOException */ @Test public void testIndexforJson() throws IOException { IndexRequest request = new IndexRequest("posts"); //索引 request.id("1"); //请求的文档ID String jsonString = "{" + "\"user\":\"kimchy\"," + "\"postDate\":\"2019-06-10\"," + "\"message\":\"trying out Elasticsearch\"" + "}"; request.source(jsonString, XContentType.JSON); //文档源以字符串形式提供 //执行 IndexResponse indexResponse = client.index(request, RequestOptions.DEFAULT); System.out.println(indexResponse.getId()); client.close(); } /** * index API ：map格式 * @throws IOException */ @Test public void testIndexforMap() throws IOException { Map<String, Object> jsonMap = new HashMap<>(); jsonMap.put("user", "kimchy"); jsonMap.put("postDate", new Date()); jsonMap.put("message", "trying out Elasticsearch"); //文档源作为Map提供，自动转换为JSON格式 IndexRequest request = new IndexRequest("posts").id("2").source(jsonMap); //执行 IndexResponse indexResponse = client.index(request, RequestOptions.DEFAULT); System.out.println(indexResponse.getId()); client.close(); } /** * index API ：XContentBuilder 格式 * @throws IOException */ @Test public void testIndexforXContentBuilder() throws IOException { XContentBuilder builder = XContentFactory.jsonBuilder(); builder.startObject(); { builder.field("user", "kimchy"); builder.timeField("postDate", new Date()); builder.field("message", "trying out Elasticsearch"); } builder.endObject(); //文档源作为XContentBuilder对象提供，Elasticsearch内置助手生成JSON内容 IndexRequest request = new IndexRequest("posts").id("3").source(builder); //执行 IndexResponse indexResponse = client.index(request, RequestOptions.DEFAULT); System.out.println(indexResponse.getId()); client.close(); } /** * index API ：Object 格式 * @throws IOException */ @Test public void testIndexforObject() throws IOException { //文档源作为Object键对提供，转换为JSON格式 IndexRequest request = new IndexRequest("posts") .id("4") .source("user", "kimchy", "postDate", new Date(), "message", "trying out Elasticsearch"); //执行 IndexResponse indexResponse = client.index(request, RequestOptions.DEFAULT); System.out.println(indexResponse.getId()); client.close(); } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 Get API 官网介绍：https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.1/java-rest-high-document-get.html 查询get API /** * get API * @throws IOException */ @Test public void testGetforSynchronous() throws IOException { GetRequest getRequest = new GetRequest("posts", "1"); //同步执行 GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT); System.out.println(getResponse.getSource()); //结果：{postDate=2019-06-10, message=trying out Elasticsearch, user=kimchy} client.close(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Exists API 官网介绍：https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.1/java-rest-high-document-exists.html 判断exists API /** * exist API * @throws IOException */ @Test public void testExists() throws IOException { GetRequest getRequest = new GetRequest("posts", "5"); getRequest.fetchSourceContext(new FetchSourceContext(false)); getRequest.storedFields("_none_"); //同步执行 boolean exists = client.exists(getRequest, RequestOptions.DEFAULT); if (exists) { System.out.println("存在"); } else { System.out.println("不存在"); } client.close(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Update API 官网介绍：https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.1/java-rest-high-document-update.html 更新update API：（json、map、XContentBuilder 、object） /** * Update API json * @throws IOException */ @Test public void testUpdateforJson() throws IOException { UpdateRequest request = new UpdateRequest("posts", "1"); String jsonString = "{" + "\"updated\":\"2019-06-09\"," + "\"reason\":\"daily update\"" + "}"; request.doc(jsonString, XContentType.JSON); //同步执行 UpdateResponse updateResponse = client.update(request, RequestOptions.DEFAULT); System.out.println(updateResponse.getResult()); client.close(); } /** * Update API map * @throws IOException */ @Test public void testUpdateforMap() throws IOException { Map<String, Object> jsonMap = new HashMap<>(); jsonMap.put("updated", new Date()); jsonMap.put("reason", "daily update"); UpdateRequest request = new UpdateRequest("posts", "1").doc(jsonMap); //同步执行 UpdateResponse updateResponse = client.update( request, RequestOptions.DEFAULT); System.out.println(updateResponse.getResult()); client.close(); } /** * Update API XContentBuilder * @throws IOException */ @Test public void testUpdateforXContentBuilder() throws IOException { XContentBuilder builder = XContentFactory.jsonBuilder(); builder.startObject(); { builder.timeField("updated", new Date()); builder.field("reason", "daily update"); } builder.endObject(); UpdateRequest request = new UpdateRequest("posts", "1").doc(builder); //同步执行 UpdateResponse updateResponse = client.update( request, RequestOptions.DEFAULT); System.out.println(updateResponse.getResult()); client.close(); } /** * Update API Object * @throws IOException */ @Test public void testUpdateforObject() throws IOException { UpdateRequest request = new UpdateRequest("posts", "1") .doc("updated", new Date(), "reason", "daily update"); //同步执行 UpdateResponse updateResponse = client.update(request, RequestOptions.DEFAULT); System.out.println(updateResponse.getResult()); client.close(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 Delete API 官网介绍：https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.1/java-rest-high-document-delete.html 删除delete API /** * Delete API * @throws IOException */ @Test public void testDelete() throws IOException { DeleteRequest request = new DeleteRequest("posts","1"); //同步执行 DeleteResponse deleteResponse = client.delete( request, RequestOptions.DEFAULT); client.close(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 Bulk API 官网介绍：https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.1/java-rest-high-document-bulk.html 批量操作bulk API /** * Bulk API 批量操作 * @throws IOException */ @Test public void testBulk() throws IOException { BulkRequest request = new BulkRequest(); request.add(new IndexRequest("posts").id("5") .source(XContentType.JSON,"field", "foo")); request.add(new IndexRequest("posts").id("6") .source(XContentType.JSON,"field", "bar")); request.add(new IndexRequest("posts").id("7") .source(XContentType.JSON,"field", "baz")); //可以将不同的操作类型添加到同一BulkRequest中 /* request.add(new DeleteRequest("posts", "3")); request.add(new UpdateRequest("posts", "2") .doc(XContentType.JSON,"other", "test")); request.add(new IndexRequest("posts").id("4") .source(XContentType.JSON,"field", "baz")); */ //同步执行 BulkResponse bulkResponse = client.bulk(request, RequestOptions.DEFAULT); for (BulkItemResponse bulkItemResponse : bulkResponse) { if (bulkItemResponse.isFailed()) { BulkItemResponse.Failure failure = bulkItemResponse.getFailure(); System.out.println(failure.getMessage()); } } client.close(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Multi-Get API 官网介绍：https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.1/java-rest-high-document-multi-get.html 多条数据查询Multi-Get API multiGet API并行地在单个http请求中执行多个get请求 /** * Multi-Get API 多条数据查询 * @throws IOException */ @Test public void testMultiGet() throws IOException { MultiGetRequest request = new MultiGetRequest(); request.add(new MultiGetRequest.Item("test","2")); request.add(new MultiGetRequest.Item("test", "3")); request.add(new MultiGetRequest.Item("test", "4")); //同步执行 MultiGetResponse responses = client.mget(request, RequestOptions.DEFAULT); for (MultiGetItemResponse itemResponse : responses) { GetResponse response = itemResponse.getResponse(); if (response.isExists()) { String json = response.getSourceAsString(); //System.out.println(json); Map<String, Object> sourceAsMap = response.getSourceAsMap(); System.out.println(sourceAsMap); } } client.close(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 SearchType 官网介绍：https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html ES搜索存在的问题（1）返回数据数量问题如果数据分散在默认的5个分片上，ES会向5个分片同时发出请求，每个分片同时发出请求，每个分片都返回10条数据，最终会返回总数据为：5*10=50条数据，远远大于用户请求。解决思路：第一步：先从每个分片汇总查询的数据id，进行排名，取前10条数据第二步：根据这10条数据id，到不同分片获取数据（2）返回数据排名问题每个分片计算符合条件的前10条数据都是基于自己分片的数据进行打分计算的。计算分值（score）使用的词频和文档频率等信息都是基于自己分片的数据进行的，而ES进行整体排名是基于每个分片计算后的分值进行排序的（打分依据就不一致，最终对这些数据统一排名的时间就不准确了）。解决思路：将各个分片打分标准统一 SearchType类型1：query and fetch fetch：获取实现原理：向索引的所有分片（shard）都发出查询请求，各分片返回的时候把元素文档（document）和计算后的排名信息一起返回。优点：这种搜索方式是最快的，因为相比后面的几种搜索方式，这种查询方法只需要去shard查询一次。缺点：各个shard返回的结果的数量之和可能是用户要求的size的n倍。 Query Then Fetch SearchType类型2：query then fetch 实现原理：第一步：先向所有的shard发出请求，各分片只返回文档id（注意：不包括文档document）和排名相关的信息（也就是文档对应的分值）第二步：然后按照各分片返回的文档的分数进行重新排序和排名，取前size个文档。优点：返回的数据量是准确的。缺点：数据排名不准确且性能一般。 SearchType类型3：Dfs, query and fetch 这种方式比第一种类型多了一个DFS步骤，它可以更精确控制搜索打分和排名。实现原理：第一步：先对所有分片发送请求，把所有分片中的词频和文档频率等打分依据全部汇总到一块。第二步：然后再执行后面的操作优点：数据排名准确缺点：搜索性能一般，且返回的数据量不准确，可能返回（N*分片数量）的数据。 Dfs, Query Then Fetch SearchType类型4：Dfs, query then fetch 这种方式比第二种类型多了一个DFS步骤。实现原理：第一步：先对所有分片发送请求，把所有分片中的词频和文档频率等打分依据全部汇总到一块。第二步：然后再执行后面的操作优点：返回的数据量是准确的，数据排名也是准确的缺点：性能最差（这个最差只是表示在这四种查询方式中性能最慢，也不至于不能忍受，如果对查询性能要求不是非常高，而对查询准确度要求比较高的时候可以考虑这个）。总结：（1）从性能考虑来说：query_and_fetch是最快的，dfs_query_then_fetch是最慢的。（2）从搜索的准确度来说：DFS要比非DFS的准确度更高。（3）ES6.x以后进行了优化，只保留了第2和第4两种类型（即query then fetch和dfs,query then fetch） SearchType示例操作官网介绍：https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.1/java-rest-high-search.html @Test public void testSearch() throws IOException { SearchRequest searchRequest = new SearchRequest(); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder.query(QueryBuilders.termQuery("user", "kimchy")); searchRequest.indices("posts"); searchRequest.source(sourceBuilder); searchRequest.searchType(SearchType.DFS_QUERY_THEN_FETCH); SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); SearchHits hits = searchResponse.getHits(); System.out.println(hits.getTotalHits()); client.close(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 查询-Query 分页：from/size 排序：sort 过滤：filter 按查询匹配度排序：explain 测试数据 curl -XPUT 'http://192.168.93.252:9200/test6/' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test6/user/1 -d'{"name":"tom","age":18,"info":"tom"}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test6/user/2 -d'{"name":"jack","age":29,"info":"jack"}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test6/user/3 -d'{"name":"jessica","age":18,"info":"jessica"}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test6/user/4 -d'{"name":"dave","age":19,"info":"dave"}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test6/user/5 -d'{"name":"lilei","age":18,"info":"lilei"}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test6/user/6 -d'{"name":"lili","age":18,"info":"lili"}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test6/user/7 -d'{"name":"tom","age":29,"info":"tom"}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test6/user/8 -d'{"name":"tom1","age":16,"info":"tom1"}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test6/user/9 -d'{"name":"tom2","age":38,"info":"tom2"}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test6/user/10 -d'{"name":"tom3","age":28,"info":"tom3"}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test6/user/11 -d'{"name":"tom4","age":35,"info":"tom4"}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test6/user/12 -d'{"name":"tom5","age":24,"info":"tom5"}' 1 2 3 4 5 6 7 8 9 10 11 12 13 14 测试代码 @Test public void testQuery() throws IOException { SearchRequest searchRequest = new SearchRequest(); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder //.query(QueryBuilders.matchQuery("info", "marry and john")) //.query(QueryBuilders.matchAllQuery()) //.query(QueryBuilders.multiMatchQuery("john", "name","info"))//多字段匹配 //.query(QueryBuilders.queryStringQuery("name:tom*"))//模糊（正则）匹配 .query(QueryBuilders.termQuery("name","tom"))//精准匹配 .from(0)//分页 .size(2) .sort("age", SortOrder.ASC)//排序 .postFilter(QueryBuilders.rangeQuery("age").from(20).to(40))//过滤 .explain(true)//按查询匹配度排序 ; searchRequest.indices("test6"); searchRequest.source(sourceBuilder); searchRequest.searchType(SearchType.DFS_QUERY_THEN_FETCH); SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); SearchHits hits = searchResponse.getHits(); SearchHit[] hitsArray = hits.getHits(); for (SearchHit searchHit : hitsArray) { System.out.println(searchHit.getSourceAsString()); } client.close(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 聚合-Aggregations 根据字段进行分组统计根据字段分组，统计其他字段的值 aggregations统计实例一统计相同年龄的学员个数姓名年龄 tom 18 jack 29 jessica 18 dave 19 lilei 18 lili 29 测试代码 @Test public void test1() throws IOException { SearchRequest searchRequest = new SearchRequest(); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); //按年龄分组聚合统计 TermsAggregationBuilder aggregationBuilder = AggregationBuilders.terms("by_age").field("age"); sourceBuilder .query(QueryBuilders.matchAllQuery()) .aggregation(aggregationBuilder); searchRequest.indices("test6");//添加索引 searchRequest.source(sourceBuilder); searchRequest.searchType(SearchType.DFS_QUERY_THEN_FETCH); SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); //获取分组聚合后信息 Terms terms = searchResponse.getAggregations().get("by_age"); for (Terms.Bucket tmp : terms.getBuckets()) { Object key = tmp.getKey(); long docCount = tmp.getDocCount(); System.out.println(key+" @ "+docCount); } client.close(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 aggregations统计实例二统计每个学员的总成绩姓名科目分数 tom 语文 59 tom 数学 89 jack 语文 78 jack 数学 85 jessica 语文 97 jessica 数学 68 测试数据 curl -XPUT 'http://192.168.93.252:9200/test7/' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test7/class/1 -d'{"name":"tom","type":"chinese","score":59}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test7/class/2 -d'{"name":"tom","type":"math","score":89}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test7/class/3 -d'{"name":"tom","type":"english","score":90}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test7/class/4 -d'{"name":"jack","type":"chinese","score":78}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test7/class/5 -d'{"name":"jack","type":"math","score":85}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test7/class/6 -d'{"name":"jack","type":"english","score":80}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test7/class/7 -d'{"name":"jessica","type":"chinese","score":97}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test7/class/8 -d'{"name":"jessica","type":"math","score":68}' curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test7/class/9 -d'{"name":"jessica","type":"english","score":85}' 1 2 3 4 5 6 7 8 9 10 11 测试代码 @Test public void test2() throws IOException { SearchRequest searchRequest = new SearchRequest(); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); //按年龄分组聚合统计 TermsAggregationBuilder nameAggregation = AggregationBuilders.terms("by_name").field("name.keyword"); SumAggregationBuilder scoreAggregation = AggregationBuilders.sum("by_score").field("score"); //聚合 nameAggregation.subAggregation(scoreAggregation); sourceBuilder .query(QueryBuilders.matchAllQuery()) .aggregation(nameAggregation); searchRequest.indices("test7");//添加索引 searchRequest.source(sourceBuilder); searchRequest.searchType(SearchType.DFS_QUERY_THEN_FETCH); SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); //获取分组聚合后信息 Terms terms = searchResponse.getAggregations().get("by_name"); for (Terms.Bucket tmp : terms.getBuckets()) { Object key = tmp.getKey(); Sum sum = tmp.getAggregations().get("by_score"); System.out.println(key+" : "+sum.getValue()); } client.close(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 报错 ElasticsearchStatusException[Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.]]; 1 2 3 4 原因默认情况下，在文本字段上禁用Fielddata 因为Fielddata可能会消耗大量的堆空间，尤其是在加载高基数文本字段时。一旦fielddata已加载到堆中，它将在该段的生命周期内保留。此外，加载fielddata是一个昂贵的过程，可能会导致用户遇到延迟命中。这就是默认情况下禁用fielddata的原因。官网参考地址：https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html 解决办法在[name]上设置fielddata = true，以便通过反转索引来加载内存中的fielddata。请注意，这可能会占用大量内存。或者，也可以使用关键字字段如： curl -H "Content-Type:application/json" -XPOST 'http://192.168.93.252:9200/test7/class/_mapping' -d'{"properties":{"name":{"type":"text","fields":{"keyword":{"type":"keyword"}}}}}' 或者 curl -H "Content-Type:application/json" -XPOST http://192.168.93.252:9200/test7/class/_mapping -d'{"properties": {"name": { "type": "text","fielddata": true}}}' 1 2 3 4 5 分页 SQL语句分页：limit m,n（m:从哪条结果开始 n:size,每次返回多少个结果） ES中使用的是from和size两个参数 from：从哪条结果开始，默认值为0 size：每次返回多少个结果，默认值为10 如：假设每页显示5条数据，那么1至3页的请求就是： GET /_search?size=5 GET /_search?size=5&from=5 GET /_search?size=5&from=10 1 2 3 注意：不要一次请求过多或者页码过大的结果，这样会对服务器造成很大的压力。因为它们会在返回前排序。一个请求会经过多个分片，每个分片都会生成自己的排序结果。然后再进行集中整理，以确保最终结果的正确性。多索引和多类型查询案例 /** * 支持多索引、多类型查询 * @throws IOException */ @Test public void test3() throws IOException { SearchRequest searchRequest = new SearchRequest(); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder.query(QueryBuilders.matchAllQuery()) .from(0) .size(100); //searchRequest.indices("test");//单个索引 //searchRequest.indices("test","test2","posts"); searchRequest.indices("test*");//通配符匹配 searchRequest.source(sourceBuilder); searchRequest.searchType(SearchType.DFS_QUERY_THEN_FETCH); SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); SearchHits hits = searchResponse.getHits(); SearchHit[] hitsArray = hits.getHits(); for (SearchHit searchHit : hitsArray) { System.out.println(searchHit.getSourceAsString()); } client.close(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 极速查询 ES将数据存储在不同的分片中，根据文档id通过内部算法得出要将文档存储在哪个分片上，所以在查询时只要指定在对应的分片上进行查询就可以实现基于ES的极速查询。知道数据在哪个分片上，是解决问题的关键实现方式：可以通过路由参数来设置数据存储在同一个分片中，setRouting("") org.elasticsearch.cluster.routing--------------OperationRouting-------------indexShards 批量插入测试数据 /** * 极速查询，批量插入测试数据 * @throws IOException */ @Test public void test4() throws IOException { Map<String, Object> jsonMap = new HashMap<>(); jsonMap.put("phone", "13322225555"); jsonMap.put("name", "zhangsan"); jsonMap.put("sex", "M"); jsonMap.put("age", "20"); Map<String, Object> jsonMap2 = new HashMap<>(); jsonMap2.put("phone", "13410252356"); jsonMap2.put("name", "lisi"); jsonMap2.put("sex", "M"); jsonMap2.put("age", "28"); Map<String, Object> jsonMap3 = new HashMap<>(); jsonMap3.put("phone", "18844662587"); jsonMap3.put("name", "wangwu"); jsonMap3.put("sex", "F"); jsonMap3.put("age", "18"); Map<String, Object> jsonMap4 = new HashMap<>(); jsonMap4.put("phone", "16655882345"); jsonMap4.put("name", "zhaoliu"); jsonMap4.put("sex", "F"); jsonMap4.put("age", "24"); Map<String, Object> jsonMap5 = new HashMap<>(); jsonMap5.put("phone", "18563248923"); jsonMap5.put("name", "tianqi"); jsonMap5.put("sex", "F"); jsonMap5.put("age", "22"); Map<String, Object> jsonMap6 = new HashMap<>(); jsonMap6.put("phone", "18325684532"); jsonMap6.put("name", "qiba"); jsonMap6.put("sex", "M"); jsonMap6.put("age", "26"); //批量插入 BulkRequest bulkRequest = new BulkRequest(); bulkRequest.add(new IndexRequest("test8").routing(jsonMap.get("phone").toString().substring(0, 3)).source(jsonMap)) .add(new IndexRequest("test8").routing(jsonMap2.get("phone").toString().substring(0, 3)).source(jsonMap2)) .add(new IndexRequest("test8").routing(jsonMap3.get("phone").toString().substring(0, 3)).source(jsonMap3)) .add(new IndexRequest("test8").routing(jsonMap4.get("phone").toString().substring(0, 3)).source(jsonMap4)) .add(new IndexRequest("test8").routing(jsonMap5.get("phone").toString().substring(0, 3)).source(jsonMap5)) .add(new IndexRequest("test8").routing(jsonMap6.get("phone").toString().substring(0, 3)).source(jsonMap6)); //同步执行 BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT); for (BulkItemResponse bulkItemResponse : bulkResponse) { if (bulkItemResponse.isFailed()) { BulkItemResponse.Failure failure = bulkItemResponse.getFailure(); System.out.println(failure.getMessage()); } } client.close(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 通过路由极速查询 /** * 通过路由极速查询，实现代码 * @throws IOException */ @Test public void test5() throws IOException { SearchRequest searchRequest = new SearchRequest(); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder.query(QueryBuilders.matchAllQuery()) .from(0) .size(10); searchRequest.indices("test8"); searchRequest.source(sourceBuilder); searchRequest.searchType(SearchType.DFS_QUERY_THEN_FETCH); searchRequest.routing("18325684532".substring(0, 3)); SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); SearchHits hits = searchResponse.getHits(); SearchHit[] hitsArray = hits.getHits(); for (SearchHit searchHit : hitsArray) { System.out.println(searchHit.getSourceAsString()); } client.close(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 ElasticSearch索引模块索引模块组成部分官网说明：https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis.html （1）索引分析模块-Analysis 分析器 - 包含：字符过滤器，标记器和令牌过滤器。字符过滤器-Character Filters 一个analyzer（分析器）可以具有零个或多个字符过滤器，这些过滤器按顺序应用。分解器-Tokenizers 一个analyzer（分析器）必须只有一个标记器。词元过滤器-Token Filters 一个analyzer（分析器）可以具有零个或多个令牌过滤器，这些过滤器按顺序应用。（2）索引建立模块-Indexer 在建立索引过程中，分析处理的文档将被加入到索引列表。集成IK中文分词插件下载ES的IK插件https://github.com/medcl/elasticsearch-analysis-ik 使用maven进行源码编译（可以在Windows上进行编译） mvn clean package 上传并解压releases下的文件到ES插件目录 cp /opt/tools/elk/es-ik-7.1.0/target/releases/elasticsearch-analysis-ik-7.1.0.zip ES_HOME/plugins/ik unzip elasticsearch-analysis-ik-7.1.0.zip 重启ES服务自定义IK词库测试新词： curl -H "Content-Type: application/json" -XGET http://192.168.93.252:9200/_analyze?pretty -d'{"analyzer":"ik_max_word","text":"蓝瘦香菇"}' { "tokens" : [ { "token" : "蓝", "start_offset" : 0, "end_offset" : 1, "type" : "CN_CHAR", "position" : 0 }, { "token" : "瘦", "start_offset" : 1, "end_offset" : 2, "type" : "CN_CHAR", "position" : 1 }, { "token" : "香菇", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD" 文章目录 1. 单节点安装 2. ES安装head插件 3. Elasticsearch Rest基本操作 REST介绍 CURL创建索引库查询索引-GET DSL查询 MGET查询 HEAD的使用 ES更新索引 ES删除索引 ES批量操作-bulk ES版本控制 4. Elasticsearch 核心概念 Cluster Shards Replicas Recovery Gateway Discovery.zen Transport Create Index Mapping 5. Elasticsearch Java 客户端 Java High Level REST Client Java高级客户端 Document APIs Index API Get API Exists API Update API Delete API Bulk API Multi-Get API SearchType Query Then Fetch Dfs, Query Then Fetch 查询-Query 聚合-Aggregations 分页多索引和多类型查询极速查询 ElasticSearch索引模块索引模块组成部分集成IK中文分词插件自定义IK词库热更新IK词库 ES集群安装部署 ES集群规划 ES集群安装 X-Pack安装 Kibana安装 ES优化集群脑裂优化设置增大系统打开文件数合理设置JVM内存锁定物理内存合理设置分片合理设置副本数合并索引关闭索引清除删除文档合理数据导入设置索引_all 设置索引_source 版本一致软件版本 jdk-8u192-linux-x64.tar.gz elasticsearch-7.1.0-linux-x86_64.tar.gz

2021-06-21

Go语言学习笔记1

并发编程实战》作者郝林托管到GitHub的Go命令教程，里面涉及了Go命令和工具的详细用法：https://github.com/hyper-carrot/go_command_tutorialGo语言学习的第一天

2021-03-23

浅谈深度学习的落地问题

前言深度学习不不仅仅是理论创新，更重要的是应用于工程实际。

2021-02-21

Golang学习笔记--slice使用总结

切片对数组进行包装，为数据序列提供了更通用、更强大和更方便的接口。Go中的大多数数组编程都是用切片而不是简单的数组来完成的 slice（切片）代表变长的序列，序列中每个元素都有相同的类型。一个slice类型一般写作[]T，其中T代表slice中元素的类型。 1.slice创建直接通过初始化参数创建 slice0 := []int{1,3,7,5,2,3,4} fmt.Println(slice0,len(slice0),cap(slice0)) // [1 3 7 5 2 3 4] 7 7 使用内置函数make创建。使用默认值进行初始化 slice1 := make([]int,5,10) fmt.Println(slice1,len(slice1),cap(slice1)) // [0 0 0 0 0] 5 10 从指定下标创建。没指定的将使用默认值进行初始化，比如int类型是0值，string类型是空串。 slice2 := [...]int{1:1,12:12} fmt.Println(slice2) // [0 1 0 0 0 0 0 0 0 0 0 0 12] fmt.Println(slice2[0],len(slice2),cap(slice2)) // 0 13 13 2.slice的操作下标索引操作 slice3 := []int{1,3,7,5,2,3,4} fmt.Println(slice3[1]) // 3 fmt.Println(slice3[10]) // 运行时报panic: runtime error: index out of range [10] with length 7 for循环操作 slice3 := []int{1,3,7,5,2,3,4} for _,val := range slice3{ fmt.Println(val) } 切片操作。跟python里面切片一样 slice3 := []int{1,3,7,5,2,3,4} fmt.Println(slice3[2:4]) // [7 5] 两个slice不能直接使用 ==进行比较。slice唯一合法的比较操作是和nil比较 slice4 := []int{1,3,7,5,2,3,4} slice5 := slice4[2:4] fmt.Println(slice4 == slice5) // 提示：Invalid operation: slice4 == slice5 (operator == is not defined on []int) fmt.Println(slice4 != nil) // true 内置函数的len和cap函数分别返回slice的长度和容量内置函数append进行添加操作 slice6 := []int{1,3,7,5,2,3,4} slice7 := append(slice6,0) slice8 := append(slice7,9,8) fmt.Println(slice7) // [1 3 7 5 2 3 4 0] fmt.Println(slice8) // [1 3 7 5 2 3 4 0 9 8] fmt.Println(append(slice6,slice7...)) // 两个slice进行添加 // [1 3 7 5 2 3 4 1 3 7 5 2 3 4 0] 内置函数copy进行拷贝操作 slice10 := []int{1,3,7,5,2,3,4} var slice9 = make([]int,len(slice10)) fmt.Println(slice9,len(slice9),cap(slice9)) // [0 0 0 0 0 0 0] 7 7 copy(slice9,slice10) fmt.Println(slice9,len(slice9),cap(slice9)) // [1 3 7 5 2 3 4] 7 7 3.slice的使用注意点切片保存对底层数组的引用，如果将一个切片分配给另一个切片，则两个切片都引用同一个数组。修改切片的值，会同时影响到源切片的数据 slice11 := []int{1,3,7,5,2,3,4} slice12 := slice11[2:5] slice12[1] = 10 fmt.Println(slice11) // [1 3 7 10 2 3 4] fmt.Println(slice12) // [7 10 2] 如果函数接受slice参数，则调用者可以看到它对slice元素所做的更改，类似于传递指向底层数组的指针。 slice13 := []int{1,3,7,5,2,3,4} reverse(slice13) fmt.Println(slice13) // [4 3 2 5 7 3 1] func reverse(input_slice []int){ slice_len := len(input_slice) for i := 0 ; i < slice_len / 2; i++ { input_slice[i], input_slice[slice_len - i - 1] = input_slice[slice_len - i -1], input_slice[i] } } slice没有提供直接删除的切片某个元素的内置函数，可以自己组合切片来实现 slice13 := []int{1,3,7,5,2,3,4} slice14 := append(slice13[:2],slice13[4:len(slice13)]...) fmt.Println(slice14) // [4 3 7 3 1] 删除下标为3的元素在边界处拷贝 Slices 和 Maps(uber_go_guide) 接收 Slices的时候注意拷贝。当 map 或 slice 作为函数参数传入时，如果您存储了对它们的引用，则用户可以对其进行修改。 func (d *Driver) SetTrips(trips []Trip) { d.trips = make([]Trip, len(trips)) copy(d.trips, trips) } trips := ... d1.SetTrips(trips) // 这里我们修改 trips[0]，但不会影响到 d1.trips trips[0] = ... 返回 slices的时候注意拷贝。同样，请注意用户对暴露内部状态的 map 或 slice 的修改。 type Stats struct { mu sync.Mutex counters map[string]int } func (s *Stats) Snapshot() map[string]int { s.mu.Lock() defer s.mu.Unlock() result := make(map[string]int, len(s.counters)) for k, v := range s.counters { result[k] = v } return result } // snapshot 现在是一个拷贝 snapshot := stats.Snapshot() 4.slice的runtime的部分实现 // slice.go package runtime // slice的底层的实现结构 type slice struct { array unsafe.Pointer // 一个数组的指针 len int cap int } ... // slice的创建 func makeslice(et *_type, len, cap int) unsafe.Pointer { mem, overflow := math.MulUintptr(et.size, uintptr(cap)) // uintptr的type的大小 * cap大小的uint指针类型 if overflow || mem > maxAlloc || len < 0 || len > cap { // 计算有没有溢出 mem, overflow := math.MulUintptr(et.size, uintptr(len)) if overflow || mem > maxAlloc || len < 0 { panicmakeslicelen() } panicmakeslicecap() } return mallocgc(mem, et, true)// 使用mallockgc进行内存分配, 并返回unsafe.Pointer指针，指向数组 } // append超过容量的时候，进行容量扩展 func growslice(et *_type, old slice, cap int) slice { if raceenabled { callerpc := getcallerpc() racereadrangepc(old.array, uintptr(old.len*int(et.size)), callerpc, funcPC(growslice)) } if msanenabled { msanread(old.array, uintptr(old.len*int(et.size))) } if cap < old.cap { panic(errorString("growslice: cap out of range")) } if et.size == 0 { // append不应该创建一个带有nil指针但有非零len的切片 return slice{unsafe.Pointer(&zerobase), old.len, cap} } // 计算容量扩容后新的容量 newcap := old.cap doublecap := newcap + newcap if cap > doublecap { // 新建的容量大于旧容量的两倍，直接使用当前的容量 newcap = cap } else { if old.len < 1024 { // 如果就长度小于1024，直接使用旧的2倍容量 newcap = doublecap } else { // 如果旧的容量小于当前容量，则按25%的幅度循环进行增长，直到大于当前容量 for 0 < newcap && newcap < cap { newcap += newcap / 4 } if newcap <= 0 { newcap = cap } } } var overflow bool var lenmem, newlenmem, capmem uintptr switch { case et.size == 1: lenmem = uintptr(old.len) newlenmem = uintptr(cap) capmem = roundupsize(uintptr(newcap)) overflow = uintptr(newcap) > maxAlloc newcap = int(capmem) case et.size == sys.PtrSize: lenmem = uintptr(old.len) * sys.PtrSize newlenmem = uintptr(cap) * sys.PtrSize capmem = roundupsize(uintptr(newcap) * sys.PtrSize) overflow = uintptr(newcap) > maxAlloc/sys.PtrSize newcap = int(capmem / sys.PtrSize) case isPowerOfTwo(et.size): var shift uintptr if sys.PtrSize == 8 { // Mask shift for better code generation. shift = uintptr(sys.Ctz64(uint64(et.size))) & 63 } else { shift = uintptr(sys.Ctz32(uint32(et.size))) & 31 } lenmem = uintptr(old.len) << shift newlenmem = uintptr(cap) << shift capmem = roundupsize(uintptr(newcap) << shift) overflow = uintptr(newcap) > (maxAlloc >> shift) newcap = int(capmem >> shift) default: lenmem = uintptr(old.len) * et.size newlenmem = uintptr(cap) * et.size capmem, overflow = math.MulUintptr(et.size, uintptr(newcap)) capmem = roundupsize(capmem) newcap = int(capmem / et.size) } if overflow || capmem > maxAlloc { panic(errorString("growslice: cap out of range")) } // 创建新的slice的数组 var p unsafe.Pointer if et.ptrdata == 0 { p = mallocgc(capmem, nil, false) memclrNoHeapPointers(add(p, newlenmem), capmem-newlenmem) } else { p = mallocgc(capmem, et, true) if lenmem > 0 && writeBarrier.enabled { bulkBarrierPreWriteSrcOnly(uintptr(p), uintptr(old.array), lenmem-et.size+et.ptrdata) } } // 将旧的数组复制到新的数组 memmove(p, old.array, lenmem) // 返回新的slice结构 return slice{p, old.len, newcap} } // slice的copy操作 func slicecopy(toPtr unsafe.Pointer, toLen int, fmPtr unsafe.Pointer, fmLen int, width uintptr) int { if fmLen == 0 || toLen == 0 { return 0 } n := fmLen if toLen < n { n = toLen } if width == 0 { return n } if raceenabled { callerpc := getcallerpc() pc := funcPC(slicecopy) racereadrangepc(fmPtr, uintptr(n*int(width)), callerpc, pc) racewriterangepc(toPtr, uintptr(n*int(width)), callerpc, pc) } if msanenabled { msanread(fmPtr, uintptr(n*int(width))) msanwrite(toPtr, uintptr(n*int(width))) } size := uintptr(n) * width if size == 1 { // common case worth about 2x to do here // 长度为1表示只是一个字节，直接进行赋值 *(*byte)(toPtr) = *(*byte)(fmPtr) } else { // 否则使用内存内存拷贝，将源地址数据拷贝到新的数组 memmove(toPtr, fmPtr, size) } return n }

2021-01-30

基于深度学习的推荐系统

然而，近年来，深度学习在从图像识别到自然语言处理等多个领域取得了巨大的成功。推荐系统也得益于深度学习的成功。

2020-11-02

Spring学习笔记（六）：MyBatis集成

1 概述 MyBaits是一个著名的持久层框架，本文首先介绍了MyBatis的简单使用，接着与Spring进行整合，最后简单地使用了Generator去自动生成代码。 2 MyBatis简介 MyBatis本来是Apache的一个开源项目——iBatis，2010年由Apaceh Software Foundation迁移到了Google Code，并改名为MyBatis。 MyBatis是一个基于Java的持久层框架，提供的持久层框架包括SQL Maps和Data Access Objects，使用简单的XML或者注解用于配置映射，将接口和POJO映射成数据库中的记录，是一个小巧、方便、高效、简单、直接、半自动化的持久层框架。 3 工作原理上图：读取配置文件：mybatis-config.xml是全局MyBatis配置文件，配置了MyBatis运行环境信息加载映射文件：也就是SQL映射文件，配置了操作数据库的SQL语句构造会话工厂：通过配置文件构造会话工厂SqlSessionFactory 创建会话对象：由上一步的会话工厂创建会话对象SqlSession 获取MapperStatement：通过用户调用的api的Statement ID获取MapperStatement对象输入参数映射：通过Executor对MapperStatement进行解析，将各种Java基本类型转化为SQL操作语句中的类型输出结果映射：JDBC执行SQL后，借助MapperStatement的映射关系将返回结果转化为Java基本类型并返回 4 MyBatis示例首先先来看一下纯MyBaits的示例，没有整合Spring，一个简单的Maven工程，项目结构如下： 4.1 依赖 <dependency> <groupId>org.mybatis</groupId> <artifactId>mybatis</artifactId> <version>3.5.5</version> </dependency> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <version>1.18.12</version> </dependency>  <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.21</version> </dependency> Gradle： compile group: 'org.mybatis', name: 'mybatis', version: '3.5.5' compile group: 'mysql', name: 'mysql-connector-java', version: '8.0.21' 4.2 实体类 @Setter @Getter @Builder public class User { private Integer id; private String name; @Override public String toString() { return "id:"+id+"\tname:"+name; } } 4.3 映射文件新建一个叫UserMapper.xml的映射文件： <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd"> <mapper namespace="UserMapper"> <select id="selectById" parameterType="Integer" resultType="pers.entity.User"> select * from user where id=#{id} </select> <select id="selectAll" resultType="pers.entity.User"> select * from user </select> <insert id="insert" parameterType="pers.entity.User"> INSERT INTO `user` (`id`,`name`) VALUES (#{id},#{name}) </insert> <update id="update" parameterType="pers.entity.User"> UPDATE `user` set `name`=#{name} where id=#{id} </update> <delete id="delete" parameterType="Integer"> DELETE FROM `user` WHERE `id` = #{id} </delete> </mapper> 映射文件是一个XML文件，根元素为<mapper>，需要注意其中的namespace属性，调用的时候通过该namespace调用。其中的子元素表示SQL语句： <select>：查询，id指定了这条语句的id号，调用时通过namespace.id的方式调用，比如该条select需要通过UserMapper.selectById调用，parameterType指定参数类型，这里是一个Integer的参数，resultType指定返回类型，实体类 <insert>/<update>/<delete>：对应的插入/修改/删除语句关于占位符：#{}表示是占位符，相当于传统JDBC中的?，#{id}表示该占位符等待接收的参数名称为id 4.4 配置文件 MyBatis的配置文件，叫mybatis-config.xml： <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE configuration PUBLIC "-//mybatis.org//DTD Config 3.0//EN" "http://mybatis.org/dtd/mybatis-3-config.dtd"> <configuration> <environments default="development"> <environment id="development"> <transactionManager type="JDBC" /> <dataSource type="POOLED"> <property name="driver" value="com.mysql.cj.jdbc.Driver"/> <property name="url" value="jdbc:mysql://localhost:3306/test"/> <property name="username" value="test"/> <property name="password" value="test"/> </dataSource> </environment> </environments> <mappers> <mapper resource="mapper/UserMapper.xml" /> </mappers> </configuration> 指定了数据库的一些连接属性还有mapper的位置。 4.5 测试 public class Main { public static void main(String[] args) { try { InputStream inputStream = Resources.getResourceAsStream("config/mybatis-config.xml"); SqlSessionFactory factory = new SqlSessionFactoryBuilder().build(inputStream); SqlSession session = factory.openSession(); User user = session.selectOne("UserMapper.selectById",1); System.out.println(user); User user1 = User.builder().name("test").build(); session.insert("UserMapper.insert",user1); user1.setName("222"); session.update("UserMapper.update",user1); List<User> list = session.selectList("UserMapper.selectAll"); list.forEach(System.out::println); session.delete("UserMapper.delete",1); session.commit(); session.close(); } catch (Exception e) { e.printStackTrace(); } } } 主要流程如下：读取配置文件：根据org.apache.ibatis.io.Resources读取配置文件mybatis-config.xml，请注意配置文件的位置正确，这里的配置文件都放在resources下，mybatis-config.xml放在其中的config下构建Session：根据配置文件构建SqlSessionFactory后，通过openSession创建Session 业务操作：通过session的selectOne/insert/update等进行业务操作，这类操作带两个参数，第一个参数是String，表示配置文件中的SQL语句，采用namespace.id的形式，比如这里的UserMapper.xml中声明namespace为UserMapper，其中带有一条id为selectById的select语句，因此调用时使用UserMapper.selectById的形式，第二个参数是一个Object，表示要传递的参数，也就是绑定到配置文件中对应占位符的值提交与关闭：业务操作完成后提交事务并关闭session 示例测试结果： 5 Spring整合示例上面的例子只是为了演示MyBatis的基本使用，没有整合Spring，这里的例子是把Spring整合进来，流程也大概差不多，项目结构如下： 5.1 依赖分为5类JAR： MyBatis需要的JAR Spring需要的JAR MyBatis与Spring整合的中间JAR 数据库驱动JAR 数据源JAR 完整依赖如下： <dependency> <groupId>org.springframework</groupId> <artifactId>spring-context</artifactId> <version>5.2.9.RELEASE</version> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-core</artifactId> <version>5.2.9.RELEASE</version> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-beans</artifactId> <version>5.2.9.RELEASE</version> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-web</artifactId> <version>5.2.9.RELEASE</version> </dependency> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <version>1.18.12</version> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-jdbc</artifactId> <version>5.2.9.RELEASE</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.21</version> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-tx</artifactId> <version>5.2.9.RELEASE</version> </dependency> <dependency> <groupId>org.mybatis</groupId> <artifactId>mybatis</artifactId> <version>3.5.5</version> </dependency> <dependency> <groupId>org.mybatis</groupId> <artifactId>mybatis-spring</artifactId> <version>2.0.5</version> </dependency> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-dbcp2</artifactId> <version>2.8.0</version> </dependency> Gradle： compile group: 'org.springframework', name: 'spring-beans', version: '5.2.9.RELEASE' compile group: 'org.springframework', name: 'spring-context', version: '5.2.9.RELEASE' compile group: 'org.springframework', name: 'spring-core', version: '5.2.9.RELEASE' compile group: 'org.springframework', name: 'spring-tx', version: '5.2.9.RELEASE' compile group: 'org.springframework', name: 'spring-jdbc', version: '5.2.9.RELEASE' compile group: 'mysql', name: 'mysql-connector-java', version: '8.0.21' compile group: 'org.apache.commons', name: 'commons-dbcp2', version: '2.8.0' compile group: 'org.mybatis', name: 'mybatis', version: '3.5.5' compile group: 'org.mybatis', name: 'mybatis-spring', version: '2.0.5' 5.2 配置文件配置文件分为三类： MyBatis映射文件：编写mapper的地方，也就是业务需要的SQL语句 MyBatis全局配置文件：由于整合了Spring，数据源的配置放在了Spring的配置文件中，而只需要保留mapper的查找位置 Spring配置文件：配置数据源+事务管理+MyBaits的sqlSssionFactory+组件扫描 5.2.1 MyBatis映射文件与上面的例子差不多，只是修改了namespace为包名.类名的形式： <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd"> <mapper namespace="pers.dao.UserDao"> <select id="selectById" parameterType="Integer" resultType="pers.entity.User"> select * from user where id=#{id} </select> <select id="selectAll" resultType="pers.entity.User"> select * from user </select> <insert id="insert" parameterType="pers.entity.User"> INSERT INTO `user` (`id`,`name`) VALUES (#{id},#{name}) </insert> <update id="update" parameterType="pers.entity.User"> UPDATE `user` set `name`=#{name} where id=#{id} </update> <delete id="delete" parameterType="Integer"> DELETE FROM `user` WHERE `id` = #{id} </delete> </mapper> namespace需要与对应包名的带有@Mapper的类配置一致。 5.2.2 MyBatis配置文件 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE configuration PUBLIC "-//mybatis.org//DTD Config 3.0//EN" "http://mybatis.org/dtd/mybatis-3-config.dtd"> <configuration> <mappers> <mapper resource="mapper/UserMapper.xml" /> </mappers> </configuration> 5.2.3 Spring配置文件 <?xml version="1.0" encoding="utf-8" ?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context" xmlns:tx="http://www.springframework.org/schema/tx" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx.xsd http://www.springframework.org/schema/context https://www.springframework.org/schema/context/spring-context.xsd" > <context:component-scan base-package="pers.dao"/> <context:component-scan base-package="pers.service"/>  <bean id="dataSource" class="org.apache.commons.dbcp2.BasicDataSource"> <property name="driverClassName" value="com.mysql.cj.jdbc.Driver"/> <property name="url" value="jdbc:mysql://localhost:3306/test"/> <property name="username" value="test"/> <property name="password" value="test"/>  <property name="maxTotal" value="30"/> <property name="maxIdle" value="10"/> <property name="initialSize" value="5"/> </bean>  <bean id="txManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager"> <property name="dataSource" ref="dataSource"/> </bean>  <tx:annotation-driven transaction-manager="txManager" />  <bean id="sqlSessionFactory" class="org.mybatis.spring.SqlSessionFactoryBean"> <property name="dataSource" ref="dataSource"/> <property name="configLocation" value="classpath:config/mybatis-config.xml"/> </bean>  <bean class="org.mybatis.spring.mapper.MapperScannerConfigurer"> <property name="basePackage" value="pers.dao"/> <property name="sqlSessionFactoryBeanName" value="sqlSessionFactory"/> </bean> </beans> 5.3 持久层需要加上@Mapper注解，表示自动装配为MyBatis的映射接口，注意：映射文件中的namespace需要与包名.类名对应，比如这里的包为pers.dao，类名为UserDao，那么映射文件中的namespace为pers.dao.UserDao id需要与方法名对应，比如映射文件中的有一条select语句的id为selectById，那么方法就需要命名为selectById，且参数类型需要对应一致 @Repository @Mapper public interface UserDao { User selectById(Integer id); List<User> selectAll(); int insert(User user); int update(User user); int delete(Integer id); } 5.4 业务层 @Service @Transactional @RequiredArgsConstructor(onConstructor = @__(@Autowired)) public class MyBatisService { private final UserDao dao; public void test(){ User user = dao.selectById(13); System.out.println(user); dao.insert(User.builder().name("333").build()); dao.update(User.builder().name("88888").id(13).build()); dao.selectAll().forEach(System.out::println); dao.delete(12); dao.selectAll().forEach(System.out::println); } } 注入UserDao后进行简单的测试，结果如下： 6 自动生成代码相信很多程序员也讨厌写又长又麻烦的XML配置文件，因此，MyBatis也提供了一个生成器插件，可以直接从表中生成实体类、dao接口以及映射文件，可以省去很多操作。步骤如下：导入依赖编写Generator配置文件生成代码 6.1 依赖其实就是加入一个插件： <build> <plugins> <plugin> <groupId>org.mybatis.generator</groupId> <artifactId>mybatis-generator-maven-plugin</artifactId> <version>1.4.0</version> <configuration>  <verbose>true</verbose>  <overwrite>true</overwrite> <configurationFile>src/main/resources/generatorConfig.xml</configurationFile> </configuration> <dependencies> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.21</version> </dependency> </dependencies> </plugin> </plugins> </build> 数据库驱动请对应修改。至于Gradle版请看Kotlin版源码。 6.2 配置文件这里是参考别人的配置文件，修改数据库连接、表名、包名即可： <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE generatorConfiguration PUBLIC "-//mybatis.org//DTD MyBatis Generator Configuration 1.0//EN" "http://mybatis.org/dtd/mybatis-generator-config_1_0.dtd"> <generatorConfiguration>    <context id="default" targetRuntime="MyBatis3">  <commentGenerator> <property name="suppressDate" value="true"/>  <property name="suppressAllComments" value="true"/> </commentGenerator>  <jdbcConnection driverClass="com.mysql.cj.jdbc.Driver" connectionURL="jdbc:mysql://localhost:3306/test" userId="test" password="test" />  <javaTypeResolver>    <property name="forceBigDecimals" value="false"/> </javaTypeResolver>   <javaModelGenerator targetPackage="pers.entity" targetProject="src/main/java">  <property name="enableSubPackages" value="false"/>  <property name="constructorBased" value="true"/>  <property name="trimStrings" value="true"/>  <property name="immutable" value="false"/> </javaModelGenerator>  <sqlMapGenerator targetPackage="mapper" targetProject="src/main/resources">  <property name="enableSubPackages" value="false"/> </sqlMapGenerator>  <javaClientGenerator type="XMLMAPPER" targetPackage="pers.dao" targetProject="src/main/java"> </javaClientGenerator>  <table tableName="user" domainObjectName="User" enableCountByExample="false" enableUpdateByExample="false" enableDeleteByExample="false" enableSelectByExample="false" selectByExampleQueryId="false" /> </context> </generatorConfiguration> 6.3 生成代码双击生成即可：生成了实体类、dao接口以及mapper文件。 7 参考源码 Java版： Github 码云 CODE.CHINA Kotlin版： Github 码云 CODE.CHINA 8 参考链接简书-IDEA使用mybatis-generator Github-mybatis-generator-plugin

2020-09-29

JVM学习笔记之栈区

JVM学习笔记之栈区本文主要内容：栈是什么？栈帧又是什么？在JVM中，main方法调用say方法后，是怎么运行的？本文将详细讲解栈。希望大家学了之后，对栈有更深的了解。

2020-09-20

深度学习背后的数学思想

但是，尽管目标是从数据中尽可能多地学习，但是深度学习模型可能会遭受过度拟合的困扰。当模型从训练数据（包括随机噪声）中学习太多时，就会发生这种情况。

2020-09-14

基于深度学习的人员跟踪

以前训练计算机使它像人一样学习、做出像人一样的行为是很遥远的梦想。但现在随着神经网络和计算能力的进步，梦想逐渐成为现实。 CNN 视觉智能是CNN（卷积神经网络）提供给计算机的。

2020-08-26

在线学习系统 WTS 0.2.0 发布

WLP在线学习系统，通过课件发布共享MP4视频和PDF课件供学员在线学习，目前可支持多级课程分类，支持课程下多章节课时配置（当前仅支持H264编码的MP4视频在线播放和PDF在线播放）软件架构 jdk7

2020-07-16

Java 8 Stream API学习总结

tempFilePath)).peek(System.out::println).collect(Collectors.toList()); 测试代码 Study Java 8 Stream API 学习链接

2020-05-16

Python机器学习小知识：lambda

关于Lambda的定义：Lambdas are one line functions. They are also known as anonymous functions in some other languages. You might want to use lambdas when you don’t want to use a function twice in a program. They are just like normal functions and even behave like them. 大意为：lambda函数也叫匿名函数，即没有具体名称的函数，它允许快速定义单行函数，可以用在任何需要函数的地方。这区别于def定义的函数。 lambda调用方式为：lambda [arg1 [, agr2,.....argn]] : expression) lambda与def函数的区别： 1）lambda只是一个表达式，而def则是一个语句。lambda一般用来定义简单的函数，而def可以定义复杂的函数。 2）def创建的方法是有名称的，而lambda没有。 3）lambda会返回一个函数对象，但这个对象不会赋给一个标识符，而def则会把函数对象赋值给一个函数名。 4）lambda表达式的冒号（:）后面，只能有一个表达式，def则可以有多个。 5）像if或for或print等语句不能用于lambda中，def可以。例如：add = lambda x, y: x + y print(add(3, 5)) Output: 8 List排序a = [(1, 2), (4, 1), (9, 10), (13, -3)]a.sort(key=lambda x: x[1])print(a) Output: [(13, -3), (4, 1), (1, 2), (9, 10)]

2020-04-15

深度学习中的数值计算

数值计算机器学习算法需要大量的数字计算，并且这些计算包含有一些迭代拟合的过程，在这个计算过程中，由于计算机的局限，无法完全精确的表示，因此总是存在误差的，小的误差经过迭代次数的增多，或者多个误差的叠加

2020-03-29

Flink on Zeppelin (4) - 机器学习篇

今天我来讲下如何在 Zeppelin 里做机器学习。机器学习的重要性我就不多说了，我们直奔主题。

2020-03-11

深入学习和理解 Redux

Redux 源码行数不多，所以对于想提高源码阅读能力的开发者来说，很值得前期来学习。

2020-03-05

web前端最新的学习路线

web前端最新的学习路线，web前端开发工程师已经成为互联网行业中极具竞争力的人才，因此，越来越多的人想要学习web前端技术。那么，web前端开发到底需要学习哪些内容?web前端的学习路线是什么?

2020-03-05

设计模式学习---单例模式

单例模式---对于整个系统只需要一个实体就能完成工作的情况下，我们系统只需要一个实体并且保证只有一个实例，避免造成资源浪费 1.懒汉懒汉模式是在需要用到该实例的时候才进行实例化优点:节约资源，在需要用到该实例的时候才初始化缺点:线程非安全，并发访问情况下，有可能多次实例化，并且每次实例化都覆盖上一次的实例 public class Singleton { private static Singleton SINGLETON; private Singleton(){} public static Singleton getInstance(){ if(Singleton.SINGLETON == null); Singleton.SINGLETON = new Singleton(); return SINGLETON; } } 2.饿汉饿汉单例模式在类加载的时候就实例化优点：安全，不存在并发创建多实例问题缺点：容易造成资源浪费，一直占用着资源且无法回收 public class Singleton { private static final Singleton SINGLETON = new Singleton(); private Singleton(){} public static Singleton getInstance(){ return SINGLETON; } } 3.懒汉模式(方法加锁) 这种模式在获取实例的时候添加synchronize同步锁能避免多并发情况下造成创建多实例问题优点:具有懒汉模式的节约资源优点，且方法加锁情况下避免了多并发创建多次实例的情况确定:方法锁消耗性能比较大，必须是第一访问完整个方法才到第二次访问进入 public class Singleton { private static Singleton SINGLETON; private Singleton(){} public synchronized static Singleton getInstance(){ if(Singleton.SINGLETON == null); Singleton.SINGLETON = new Singleton(); return SINGLETON; } } 4.双重锁校验(推荐) 双重锁校验是优化了方发锁的方式而来，优化啊了多并发情况下性能低下的结果优点：保证了线程安全情况下，节约资源且访问性能高 public class Singleton { private static Singleton SINGLETON; private Singleton(){} public static Singleton getInstance(){ if(Singleton.SINGLETON == null){ synchronized (Singleton.class) { if(Singleton.SINGLETON == null) { Singleton.SINGLETON = new Singleton(); } } } return SINGLETON; } } 进入方法体之后首先判断了实例是否存在，如果存在，则直接返回实例，否则加锁执行多一次判断，如果为null再实例化。因为第一次判断和加锁之间，对象可能已经实例化，所以加锁之后再判断一次，避免多次创建。但是这种方式还有点缺陷，synchronized关键字可以保证多线程情况下同步问题，如果是多核计算机(现在绝大部分都是多核计算机)情况下，还会有一个指令重排的问题所以我们需要用volatile 来修饰SINGLETON，最后改造成下面代码 public class Singleton { private volatile static Singleton SINGLETON; private Singleton(){} public static Singleton getInstance(){ if(Singleton.SINGLETON == null){ synchronized (Singleton.class) { if(Singleton.SINGLETON == null) { Singleton.SINGLETON = new Singleton(); } } } return SINGLETON; } } 5.静态内部类静态内部类是在调用的时候才会进行加载，是懒汉模式另外一种实现方式 public class Singleton { private Singleton(){} public static Singleton getInstance(){ return Instance.singleton; } private static class Instance{ private static final Singleton singleton = new Singleton(); } } 6.枚举枚举为最优的单例模式实现方案，因为可以防反射暴力创建对象，也可以避免序列化问题，下面先放了一个简单的例子， public enum SingletonEnum { SINGLETON; private String name; public String getName() { return name; } public void setName(String name) { this.name = name; } public static SingletonEnum getInstance(){ return SINGLETON; } } 看一下使用方式 public static void main(String[] args) { SingletonEnum.SINGLETON.setName("name1"); System.out.println(SingletonEnum.SINGLETON.getName()); } 输出结果，由此可见SingletonEnum.SINGLETON 时调用的都是同一个实例下面我们看看枚举类型防放射暴力创建实例我们用之前静态内部类的那个代码来比较 public static void main(String[] args) throws NoSuchMethodException, IllegalAccessException, InvocationTargetException, InstantiationException { // 反射获取构造器 Constructor<Singleton> singletonConstructor = Singleton.class.getDeclaredConstructor(); // 通过构造器创建对象 Singleton singleton1 = singletonConstructor.newInstance(); // 通过我们单例获取实例的接口获取实例 Singleton singleton2 = Singleton.getInstance(); // 下面结果为false，证明是2个不一样的实例，甚至都不用调用构造器的 setAccessible() 就能成功新建一个实例 System.out.println(singleton1 == singleton2); } 接下来再看看枚举 public static void main(String[] args) throws NoSuchMethodException, IllegalAccessException, InvocationTargetException, InstantiationException { // 反射获取构造器 Constructor<SingletonEnum> singletonEnumConstructor = SingletonEnum.class.getDeclaredConstructor(); // 通过构造器创建对象 SingletonEnum singletonEnum1 = singletonEnumConstructor.newInstance(); // 获取单例 SingletonEnum singletonEnum2 = SingletonEnum.SINGLETON; // 下面结果为false，证明是2个不一样的实例，甚至都不用调用构造器的 setAccessible() 就能成功新建一个实例 System.out.println(singletonEnum1 == singletonEnum2); } 这时候报是报了个java.lang.NoSuchMethodException，原因是因为枚举类型没有无参构造下面我们进入debug模式可以看到只有一个带一个String参数和一个int参数的构造方法所以改造成 public static void main(String[] args) throws NoSuchMethodException, IllegalAccessException, InvocationTargetException, InstantiationException { // 反射获取构造器 Constructor<SingletonEnum> singletonEnumConstructor = SingletonEnum.class.getDeclaredConstructor(String.class,int.class); // 通过构造器创建对象 SingletonEnum singletonEnum1 = singletonEnumConstructor.newInstance("",1); // 获取单例 SingletonEnum singletonEnum2 = SingletonEnum.SINGLETON; // 下面结果为false，证明是2个不一样的实例，甚至都不用调用构造器的 setAccessible() 就能成功新建一个实例 System.out.println(singletonEnum1 == singletonEnum2); } 但是改过来之后报了个 java.lang.IllegalAccessException 非法访问异常原因是如果实例化的对象是个枚举类型，就会抛出这个异常，这说明枚举类型天生就是单例的 public T newInstance(Object ... initargs) throws InstantiationException, IllegalAccessException, IllegalArgumentException, InvocationTargetException { if (!override) { if (!Reflection.quickCheckMemberAccess(clazz, modifiers)) { Class<?> caller = Reflection.getCallerClass(); checkAccess(caller, clazz, null, modifiers); } } if ((clazz.getModifiers() & Modifier.ENUM) != 0) throw new IllegalArgumentException("Cannot reflectively create enum objects"); ConstructorAccessor ca = constructorAccessor; // read volatile if (ca == null) { ca = acquireConstructorAccessor(); } @SuppressWarnings("unchecked") T inst = (T) ca.newInstance(initargs); return inst; } if ((clazz.getModifiers() & Modifier.ENUM) != 0) throw new IllegalArgumentException("Cannot reflectively create enum objects"); 序列化与反序列化，如果我们实体需要储存到程序以外的存储媒介，当再次获取时候，这个实例并非我们最开始的实例序列化的时候实体类必须实现Serializable public class Singleton implements Serializable{} public static void main(String[] args) throws FileNotFoundException, IOException, ClassNotFoundException { // 通过获取实例接口获取实例 Singleton singleton1 = Singleton.getInstance(); // 创建输出流并且输出到文件 ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream("D:\\singleton\\singleton.txt")); oos.writeObject(singleton1); // 创建输入流并且反序列化实例 ObjectInputStream ios = new ObjectInputStream(new FileInputStream("D:\\singleton\\singleton.txt")); Singleton singleton2 = (Singleton) ios.readObject(); oos.close(); ios.close(); System.out.println(singleton1 == singleton2); } 序列化前后的对象结果对比，不是同一个实例再看看枚举类型 public static void main(String[] args) throws FileNotFoundException, IOException, ClassNotFoundException { SingletonEnum singletonEnum1 = SingletonEnum.SINGLETON; // 创建输出流并且输出到文件 ObjectOutputStream oosE = new ObjectOutputStream(new FileOutputStream("D:\\singleton\\singletonE.txt")); oosE.writeObject(singletonEnum1); // 创建输入流并且反序列化实例 ObjectInputStream iosE = new ObjectInputStream(new FileInputStream("D:\\singleton\\singletonE.txt")); SingletonEnum singletonEnum2 = (SingletonEnum) iosE.readObject(); oosE.close(); iosE.close(); System.out.println(singletonEnum1 == singletonEnum2); } 序列化前后的对象是一致的，没有被破坏所以单例的最优方案是枚举，其他方法都会因为反射或者序列化破坏了整个系统只有一个实例的原则，当然根据业务要求选择一种比较合适目前开发团队的方案也很重要

2020-02-18

精选列表