Elasticsearch 默认配置 IK 及 Java AnalyzeRequestBuilder 使用
前言
Spring Boot Version (x) Spring Data Elasticsearch Version (y) Elasticsearch Version (z)x <= 1.3.5 y <= 1.3.4 z <= 1.7.2*x >= 1.4.x 2.0.0 <=y < 5.0.0** 2.0.0 <= z < 5.0.0*** - 只需要你修改下对应的 pom 文件版本号** - 下一个 ES 的版本会有重大的更新
一、什么是 Elasticsearch-analysis-ik
分析器 Analyzer: ik_smart 或 ik_max_word 分词器 Tokenizer: ik_smart 或 ik_max_word
二、默认配置 IK
IK版 ES版本 主 5.x -> master 5.3.2 5.3.2 5.2.2 5.2.2 5.1.2 5.1.2 1.10.1 2.4.1 1.9.5 2.3.5 1.8.1 2.2.1 1.7.0 2.1.1 1.5.0 2.0.0 1.2.6 1.0.0 1.2.5 0.90.x 1.1.3 0.20.x 1.0.0 0.16.2 -> 0.19.0
cd elasticsearch-2.3.2/plugins mkdir ik cp ...
index.analysis.analyzer.default.tokenizer : "ik_max_word" index.analysis.analyzer.default.type: "ik"
localhost:9200/_analyze?analyzer=ik&pretty=true&text=泥瓦匠的博客是bysocket.com可以得到下面的结果集:
{
"tokens": [
{
"token": "泥瓦匠",
"start_offset": 0,
"end_offset": 3,
"type": "CN_WORD",
"position": 0
},
{
"token": "泥",
"start_offset": 0,
"end_offset": 1,
"type": "CN_WORD",
"position": 1
},
{
"token": "瓦匠",
"start_offset": 1,
"end_offset": 3,
"type": "CN_WORD",
"position": 2
},
{
"token": "匠",
"start_offset": 2,
"end_offset": 3,
"type": "CN_WORD",
"position": 3
},
{
"token": "博客",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 4
},
{
"token": "bysocket.com",
"start_offset": 8,
"end_offset": 20,
"type": "LETTER",
"position": 5
},
{
"token": "bysocket",
"start_offset": 8,
"end_offset": 16,
"type": "ENGLISH",
"position": 6
},
{
"token": "com",
"start_offset": 17,
"end_offset": 20,
"type": "ENGLISH",
"position": 7
}
]
}
三、使用 AnalyzeRequestBuilder 获取分词结果
<!-- Spring Boot Elasticsearch 依赖 --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-elasticsearch</artifactId> </dependency>
# ES spring.data.elasticsearch.repositories.enabled = true spring.data.elasticsearch.cluster-nodes = 127.0.0.1:9300
@Autowired
private ElasticsearchTemplate elasticsearchTemplate;
/**
* 调用 ES 获取 IK 分词后结果
*
* @param searchContent
* @return
*/
private List<String> getIkAnalyzeSearchTerms(String searchContent) {
// 调用 IK 分词分词
AnalyzeRequestBuilder ikRequest = new AnalyzeRequestBuilder(elasticsearchTemplate.getClient(),
AnalyzeAction.INSTANCE,"indexName",searchContent);
ikRequest.setTokenizer("ik");
List<AnalyzeResponse.AnalyzeToken> ikTokenList = ikRequest.execute().actionGet().getTokens();
// 循环赋值
List<String> searchTermList = new ArrayList<>();
ikTokenList.forEach(ikToken -> { searchTermList.add(ikToken.getTerm()); });
return searchTermList;
}
