ES 相似度算法设置(续)
Tuning BM25 One of the nice features of BM25 is that, unlike TF/IDF, it has two parameters that allow it to be tuned: k1 This parameter controls how quickly an increase in term frequency results in term-frequency saturation. The default value is 1.2. Lower values result in quicker saturation, and higher values in slower saturation. b This parameter controls how much effect field-length normalization should have.A value of 0.0disables normalization completely, and a value of 1.0normalizes fully. The default is 0.75. The practicalities of tuning BM25 are another matter. The default values fork1andbshould be suitable for most document collections, but the optimal values really depend on the collection. Finding good values for your collection is a matter of adjusting, checking, and adjusting again. The similarity algorithm can be set on a per-field basis.It’s just a matter of specifying the chosen algorithmin the field’s mapping: PUT /my_index { "mappings": { "doc": { "properties": { "title": { "type": "string", "similarity": "BM25" }, "body": { "type": "string", "similarity": "default" } } } } Thetitlefield uses BM25 similarity. Thebodyfield uses the default similarity (seeLucene’s Practical Scoring Function). Currently, it is not possible to change thesimilaritymapping for an existing field. You would need to reindex your data in order to do that. Configuring BM25 Configuring a similarity is muchlike configuring an analyzer. Custom similarities can be specified when creating an index. For instance: PUT /my_index { "settings": { "similarity": { "my_bm25": { "type": "BM25", "b": 0 } } }, "mappings": { "doc": { "properties": { "title": { "type": "string", "similarity": "my_bm25" }, "body": { "type": "string", "similarity": "BM25" } } } } } 参考:https://www.elastic.co/guide/en/elasticsearch/guide/current/changing-similarities.html 本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/6472828.html,如需转载请自行联系原作者