elasticsearch: Field data loading is forbidden on [$field]-低调大师

elasticsearch: Field data loading is forbidden on [$field]

2018-07-19 612

我们的日活等数据除了友盟等三方服务提供外，还通过nginx日志来统计，但是最近数据统计总是不准确，通过kibana聚合时也会报Field data loading is forbidden on [uuid]错误，detail as the following:

Visualize: Field data loading is forbidden on [uuid]

Error: Request to Elasticsearch failed: {"error":{"root_cause":[{"type":"illegal_state_exception","reason":"Field data loading is forbidden on [uuid]"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"logstash-nginx-log-2017.09.27","node":"uTHxVRMCQhWwlA5uZzMAVw","reason":{"type":"illegal_state_exception","reason":"Field data loading is forbidden on [uuid]"}}]}}
KbnError@http://kibana.qyvideo.net/bundles/commons.bundle.js:61164:30
RequestFailure@http://kibana.qyvideo.net/bundles/commons.bundle.js:61197:19
http://kibana.qyvideo.net/bundles/kibana.bundle.js:88304:57
http://kibana.qyvideo.net/bundles/commons.bundle.js:63691:28
http://kibana.qyvideo.net/bundles/commons.bundle.js:63660:31
map@[native code]
map@http://kibana.qyvideo.net/bundles/commons.bundle.js:63659:34
callResponseHandlers@http://kibana.qyvideo.net/bundles/kibana.bundle.js:88276:26
http://kibana.qyvideo.net/bundles/kibana.bundle.js:87783:37
processQueue@http://kibana.qyvideo.net/bundles/commons.bundle.js:41809:31
http://kibana.qyvideo.net/bundles/commons.bundle.js:41825:40
$digest@http://kibana.qyvideo.net/bundles/commons.bundle.js:42864:37
$apply@http://kibana.qyvideo.net/bundles/commons.bundle.js:43161:32
done@http://kibana.qyvideo.net/bundles/commons.bundle.js:37610:54
completeRequest@http://kibana.qyvideo.net/bundles/commons.bundle.js:37808:16
requestLoaded@http://kibana.qyvideo.net/bundles/commons.bundle.js:37749:25

分析

这个问题在github [Field data loading is forbidden on [FIELDNAME] #15267](https://github.com/elastic/elasticsearch/issues/15267)被讨论过，引用clintongormley commented on 11 Dec 2015

This is not a bug. It is a safeguard. The logstash template now disables fielddata loading where it makes sense, eg see https://github.com/logstash-plugins/logstash-output-elasticsearch/blob/master/lib/logstash/outputs/elasticsearch/elasticsearch-template.json#L15

You get this message when you try to sort or run aggregations or scripts on analyzed fields. Fulfilling this request would cause massive amounts of memory usage on your cluster, and it almost certainly isn't what you want anyway, eg Field data loading is forbidden on path"... You don't want to aggregate on the analyzed field path, you want to aggregate on the not analyzed field path.raw, which uses doc values not heap memory.

所以说出现这个问题的原因是因为聚合的某个字段被analyse了, ES为了提高搜索效率，会根据自己的分词逻辑(比如按空白符自动分割)将字符串进行分割索引

未被分词的字段以doc values的方式存储，这样在es进行查询的时候可以通过自己的doc search逻辑进行高效的索引，而被分词的字段则需要进行全局查找，这可能会占用大量内存，为了防止因为对分词字段的查找而导致的性能下降或崩溃，es引入了一种保护机制，即拒绝在聚合查询的时候对分词字段进行索引，同时，对于分词字段，es会自动生成field.raw 字段来采用doc values的方式存储，这样用户可以用filed.raw 代替field字段用于聚合分析

通常情况我们都是使用ELK来进行日志分析，而出现这种问题的一般场景也是因为logstash往es灌入日志时，mapping对某个字段设置为了analyzed

$curl http://localhost:9200/_mapping

找到出现问题的索引，会发现对应field字段mapping如下

"uuid":{"type":"string","norms":{"enabled":false},"fielddata":{"format":"disabled"},"fields":{"raw":{"type":"string","index":"not_analyzed","ignore_above":256}}}

原因

出现这种问题是因为elasticsearch对logstash有一套默认的模板

$curl http://localhost:9200/_template

通常string type的数据默认是被分词的，

解决

可以重新编辑一个es-template.json

{
    "template" : "logstash-*",
    "settings" : {
        "index.refresh_interval" : "5s"
    },
    "mappings": {
        "_default_": {
            "_all": {
                "enabled": true, 
                "omit_norms": true
            }, 
            "dynamic_templates": [
                {
                    "message_field": {
                        "mapping": {
                            "doc_values": true, 
                            "fielddata": {
                                "format": "disabled"
                            }, 
                            "index": "not_analyzed", 
                            "omit_norms": true, 
                            "type": "string"
                        }, 
                        "match": "message", 
                        "match_mapping_type": "string"
                    }
                }, 
                {
                    "string_fields": {
                        "mapping": {
                            "doc_values": true, 
                            "index": "not_analyzed", 
                            "omit_norms": true, 
                            "type": "string"
                        }, 
                        "match": "*", 
                        "match_mapping_type": "string"
                    }
                }, 
                {
                    "float_fields": {
                        "mapping": {
                            "doc_values": true, 
                            "type": "float"
                        }, 
                        "match": "*", 
                        "match_mapping_type": "float"
                    }
                }, 
                {
                    "double_fields": {
                        "mapping": {
                            "doc_values": true, 
                            "type": "double"
                        }, 
                        "match": "*", 
                        "match_mapping_type": "double"
                    }
                }, 
                {
                    "byte_fields": {
                        "mapping": {
                            "doc_values": true, 
                            "type": "byte"
                        }, 
                        "match": "*", 
                        "match_mapping_type": "byte"
                    }
                }, 
                {
                    "short_fields": {
                        "mapping": {
                            "doc_values": true, 
                            "type": "short"
                        }, 
                        "match": "*", 
                        "match_mapping_type": "short"
                    }
                }, 
                {
                    "integer_fields": {
                        "mapping": {
                            "doc_values": true, 
                            "type": "integer"
                        }, 
                        "match": "*", 
                        "match_mapping_type": "integer"
                    }
                }, 
                {
                    "long_fields": {
                        "mapping": {
                            "doc_values": true, 
                            "type": "long"
                        }, 
                        "match": "*", 
                        "match_mapping_type": "long"
                    }
                }, 
                {
                    "date_fields": {
                        "mapping": {
                            "doc_values": true, 
                            "type": "date"
                        }, 
                        "match": "*", 
                        "match_mapping_type": "date"
                    }
                }, 
                {
                    "geo_point_fields": {
                        "mapping": {
                            "doc_values": true, 
                            "type": "geo_point"
                        }, 
                        "match": "*", 
                        "match_mapping_type": "geo_point"
                    }
                }
            ], 
            "properties": {
                "@timestamp": {
                    "format": "strict_date_optional_time||epoch_millis", 
                    "type": "date",
                    "doc_values": true
                }, 
                "@version": {
                    "index": "not_analyzed", 
                    "type": "string",
                    "doc_values" : true
                }, 
                "geoip": {
                    "type": "object",
                    "dynamic": "true",
                    "properties": {
                        "ip": {
                            "type": "ip",
                            "doc_values" : true
                        }, 
                        "latitude": {
                            "type": "float",
                            "doc_values" : true
                        }, 
                        "location": {
                            "type": "geo_point",
                            "doc_values" : true
                        }, 
                        "longitude": {
                            "type": "float",
                            "doc_values" : true
                        }
                    }
                }
            }
        }
    }
}

然后更新ES的logstash index模板文件

curl -XPUT http://localhost:9200/_template/template-name?pretty -d @es-template.json

此时再去查询时，会发现已经更新为最新的template了

curl http://lcoalhost:9200/_template

解决+

虽然我们更新了默认的index template，但是要注意的是logstash的配置文件中 template_overwrite 不能设置为true（可以注释掉，默认是false）, 否则更新的模板还是有可能被覆盖的

output{
    elasticsearch{
        hosts => ["10.19.24.94:9200", "10.19.24.100:9200", "10.19.24.91:9200"]
        #template_overwrite => true
        index => "logstash-%{type}-%{+YYYY.MM.dd}"
    }
    #stdout{codec => rubydebug}
}

或者干脆把模板加到配置文件中，保证每次建的index都是自己想要的

output{
    elasticsearch{
        hosts => ["10.19.24.94:9200", "10.19.24.100:9200", "10.19.24.91:9200"]
        template => "/data/deploy/logstash/es-template.json"
        template_overwrite => true
        index => "logstash-%{type}-%{+YYYY.MM.dd}"
    }
    #stdout{codec => rubydebug}
}

references:

[github-issue: Field data loading is forbidden on [FIELDNAME] #15267](https://github.com/elastic/elasticsearch/issues/15267)

ELKstack中文指南-保存进 Elasticsearch

Little Logstash Lessons: Using Logstash to help create an Elasticsearch mapping template

elasticsearch-analyzer

微信关注我们

原文链接：https://yq.aliyun.com/articles/616857

转载内容版权归作者及来源网站所有！

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

大数据学习资源最全版本（收藏）

资源列表：关系数据库管理系统（RDBMS）框架分布式编程分布式文件系统文件数据模型 Key -Map 数据模型键-值数据模型图形数据模型 NewSQL数据库列式数据库时间序列数据库类SQL处理数据摄取服务编程调度机器学习基准测试安全性系统部署应用程序搜索引擎与框架 MySQL的分支和演化 PostgreSQL的分支和演化 Memcached的分支和演化嵌入式数据库商业智能数据可视化物联网和传感器文章论文视频有一句话叫做三人行必有我师，其实做为一个开发者，有一个学习的氛围跟一个交流圈子特别重要这是一个我的大数据交流学习群531629188 不管你是小白还是大牛欢迎入驻，正在求职的也可以加入，大家一起交流学习，话糙理不糙，互相学习，共同进步，一起加油吧。关系数据库管理系统（RDBMS） MySQL：世界最流行的开源数据库； PostgreSQL：世界最先进的开源数据库； Oracle数据库：对象-关系型数据库管理系统。框架 Apache Hadoop：分布式处理架构，结合了MapReduce（并行处理）、YARN（作业调度）和...

2018-07-20

595

摘要：在南京 Elastic Meetup 南京交流会专场中，苏宁大数据平台搜索平台组的韩宝君为我们带来如何在大量的数据中发现数据的价值。从大数据平台的架构出发，详细解读了平台的概况和服务化平台的模块等方面的知识。最后，具体举出了在实践中出现的一些问题及对应的处理方案。数十款阿里云产品限时折扣中，赶快点击这里，领劵开始云上实践吧!直播视频回顾[PPT下载请点击]https://yq.aliyun.com/download/2885)以下为精彩视频内容整理：苏宁大数据平台总体架构大数据平台职责是提供苏宁集团各个业务所需要的大数据存储和计算能力，保证平台的稳定、高效运行，高平台易用性。本文将从ES平台总体介绍、ES平台化之路、实战经验这三个方面来为大家详细解读。大数据平台分为服务层、计算层、存储层三个部分。服务层包括大数据管理平台、数据云、数据开发平台、机器学习、准实时计算、OLAP、实时计算等。ES位于计算层。 Elasticsearch平台概况集群规模包括18个集群、195个节点、接入100+个业务、4500+个索引数据量64+TB；平台功能包括独占服务和共享服务、资源利用率和业...

2018-07-21

597

资源下载

更多资源

腾讯云软件源

为解决软件依赖安装时官方源访问速度慢的问题，腾讯云为一些软件搭建了缓存服务。您可以通过使用腾讯云软件源站来提升依赖包的安装速度。为了方便用户自由搭建服务架构，目前腾讯云软件源站支持公网访问和内网访问。

Spring

Spring框架（Spring Framework）是由Rod Johnson于2002年提出的开源Java企业级应用框架，旨在通过使用JavaBean替代传统EJB实现方式降低企业级编程开发的复杂性。该框架基于简单性、可测试性和松耦合性设计理念，提供核心容器、应用上下文、数据访问集成等模块，支持整合Hibernate、Struts等第三方框架，其适用范围不仅限于服务器端开发，绝大多数Java应用均可从中受益。

Rocky Linux

Rocky Linux（中文名：洛基）是由Gregory Kurtzer于2020年12月发起的企业级Linux发行版，作为CentOS稳定版停止维护后与RHEL（Red Hat Enterprise Linux）完全兼容的开源替代方案，由社区拥有并管理，支持x86_64、aarch64等架构。其通过重新编译RHEL源代码提供长期稳定性，采用模块化包装和SELinux安全架构，默认包含GNOME桌面环境及XFS文件系统，支持十年生命周期更新。

Sublime Text

Sublime Text具有漂亮的用户界面和强大的功能，例如代码缩略图，Python的插件，代码段等。还可自定义键绑定，菜单和工具栏。Sublime Text 的主要功能包括：拼写检查，书签，完整的 Python API ， Goto 功能，即时项目切换，多选择，多窗口等等。Sublime Text 是一个跨平台的编辑器，同时支持Windows、Linux、Mac OS X等操作系统。