怎么办?用DolphinScheduler调度执行复杂的HiveSQL时无法正确识别符号
在使用Apache DolphinScheduler调度执行复杂的HiveSQL时,HQL包含多种海豚无法正确识别的符号,怎么办?本文提供了可行的思路和方法,供用户参考。
一、目的
在Hive中完成复杂JSON,既有对象还有数组而且数组中包含数组的解析后,原本以为没啥问题了,结果在DolphinScheduler中调度又出现了大问题,搞了一天。试了很多种方法,死了无数脑细胞,才解决了这个问题!
二、HiveSQL
insert overwrite table hurys_dc_dwd.dwd_json_statistics partition(day) select t1.device_no, source_device_type, sn, model, create_time, cycle, get_json_object(coil_list,'$.laneNo') lane_no, get_json_object(coil_list,'$.laneType') lane_type, section_no, get_json_object(coil_list,'$.coilNo') coil_no, get_json_object(coil_list,'$.volumeSum') volume_sum, get_json_object(coil_list,'$.volumePerson') volume_person, get_json_object(coil_list,'$.volumeCarNon') volume_car_non, get_json_object(coil_list,'$.volumeCarSmall') volume_car_small, get_json_object(coil_list,'$.volumeCarMiddle') volume_car_middle, get_json_object(coil_list,'$.volumeCarBig') volume_car_big, get_json_object(coil_list,'$.speedAvg') speed_avg, get_json_object(coil_list,'$.speed85') speed_85, get_json_object(coil_list,'$.timeOccupancy') time_occupancy, get_json_object(coil_list,'$.averageHeadway') average_headway, get_json_object(coil_list,'$.averageGap') average_gap, substr(create_time,1,10) day from (select get_json_object(statistics_json,'$.deviceNo') device_no, get_json_object(statistics_json,'$.sourceDeviceType') source_device_type, get_json_object(statistics_json,'$.sn') sn, get_json_object(statistics_json,'$.model') model, get_json_object(statistics_json,'$.createTime') create_time , get_json_object(statistics_json,'$.data.cycle') cycle, get_json_object(replace(replace(section_list,':{',':[{'),'}}','}]}'),'$.sectionNo') section_no, section_list from hurys_dc_ods.ods_statistics lateral view explode(split(replace(replace(replace(get_json_object(statistics_json,'$.data.sectionList'), '[',''),']',''),'},{"sectionNo"','}|{"sectionNo"'),"\\|")) tf as section_list where day='2024-07-18' -- date_sub(current_date(), 1) -- '2024-07-18' -- ) as t1 lateral view explode(split(replace(replace(replace(get_json_object(replace(replace(section_list, ':{',':[{'),'}}','}]}'),'$.coilList'),'[',''),']',''),'},','}|'),"\\|")) tf1 as coil_list where substr(create_time,1,10) = '2024-07-18' --date_sub(current_date(), 1) --'2024-07-17' ;
三、原先海豚的任务调度方式
在shell脚本里添加HiveSQL语句
#! /bin/bash source /etc/profile nowdate=`date --date='0 days ago' "+%Y%m%d"` yesdate=`date -d yesterday +%Y-%m-%d` hive -e " use hurys_dc_dwd; set hive.vectorized.execution.enabled=false; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.max.dynamic.partitions.pernode=1000; set hive.exec.max.dynamic.partitions=1500; with t1 as( select get_json_object(statistics_json,'$.deviceNo') device_no, get_json_object(statistics_json,'$.sourceDeviceType') source_device_type, get_json_object(statistics_json,'$.sn') sn, get_json_object(statistics_json,'$.model') model, get_json_object(statistics_json,'$.createTime') create_time , get_json_object(statistics_json,'$.data.cycle') cycle, get_json_object(replace(replace(section_list,':{',':[{'),'}}','}]}'),'$.sectionNo') section_no, section_list from hurys_dc_ods.ods_statistics lateral view explode(split(replace(replace(replace(get_json_object(statistics_json,'$.data.sectionList'),'[',''),']',''),'},{"sectionNo"','}|{"sectionNo"'),"\\\\|")) tf as section_list where day='$yesdate' ) insert overwrite table hurys_dc_dwd.dwd_json_statistics partition(day) select t1.device_no, source_device_type, sn, model, substr(create_time,1,19) create_time , cycle, get_json_object(coil_list,'$.laneNo') lane_no, get_json_object(coil_list,'$.laneType') lane_type, section_no, get_json_object(coil_list,'$.coilNo') coil_no, get_json_object(coil_list,'$.volumeSum') volume_sum, get_json_object(coil_list,'$.volumePerson') volume_person, get_json_object(coil_list,'$.volumeCarNon') volume_car_non, get_json_object(coil_list,'$.volumeCarSmall') volume_car_small, get_json_object(coil_list,'$.volumeCarMiddle') volume_car_middle, get_json_object(coil_list,'$.volumeCarBig') volume_car_big, get_json_object(coil_list,'$.speedAvg') speed_avg, get_json_object(coil_list,'$.speed85') speed_85, get_json_object(coil_list,'$.timeOccupancy') time_occupancy, get_json_object(coil_list,'$.averageHeadway') average_headway, get_json_object(coil_list,'$.averageGap') average_gap, substr(create_time,1,10) day from t1 lateral view explode(split(replace(replace(replace(get_json_object(replace(replace(section_list,':{',':[{'),'}}','}]}'),'$.coilList'),'[',''),']',''),'},','}|'),"\\\\|")) tf1 as coil_list where substr(create_time,1,10) ='$yesdate' "
四、原先方式报错日志
DolphinScheduler无法正确识别HiveSQL里解析复杂JSON的多种符号。
五、解决方式
把HiveSQL放在一个SQL文件里,然后在脚本里是执行Hive的sourceSQL文件。
1 SQL文件
--使用hurys_dc_ods数据库 use hurys_dc_dwd;
--hive调优(必须先执行调优语句,否则部分复杂SQL运行会有问题) set hive.vectorized.execution.enabled=false; --开启动态分区功能(默认 true,开启) set hive.exec.dynamic.partition=true; --设置为非严格模式 nonstrict 模式表示允许所有的分区字段都可以使用动态分区 set hive.exec.dynamic.partition.mode=nonstrict; --在每个执行 MR 的节点上,最大可以创建多少个动态分区 set hive.exec.max.dynamic.partitions.pernode=1000; --在所有执行 MR 的节点上,最大一共可以创建多少个动态分区。默认 1000 set hive.exec.max.dynamic.partitions=1500;
insert overwrite table hurys_dc_dwd.dwd_json_statistics partition(day) select t1.device_no, source_device_type, sn, model, create_time, cycle, get_json_object(coil_list,'$.laneNo') lane_no, get_json_object(coil_list,'$.laneType') lane_type, section_no, get_json_object(coil_list,'$.coilNo') coil_no, get_json_object(coil_list,'$.volumeSum') volume_sum, get_json_object(coil_list,'$.volumePerson') volume_person, get_json_object(coil_list,'$.volumeCarNon') volume_car_non, get_json_object(coil_list,'$.volumeCarSmall') volume_car_small, get_json_object(coil_list,'$.volumeCarMiddle') volume_car_middle, get_json_object(coil_list,'$.volumeCarBig') volume_car_big, get_json_object(coil_list,'$.speedAvg') speed_avg, get_json_object(coil_list,'$.speed85') speed_85, get_json_object(coil_list,'$.timeOccupancy') time_occupancy, get_json_object(coil_list,'$.averageHeadway') average_headway, get_json_object(coil_list,'$.averageGap') average_gap, substr(create_time,1,10) day from (select get_json_object(statistics_json,'$.deviceNo') device_no, get_json_object(statistics_json,'$.sourceDeviceType') source_device_type, get_json_object(statistics_json,'$.sn') sn, get_json_object(statistics_json,'$.model') model, get_json_object(statistics_json,'$.createTime') create_time , get_json_object(statistics_json,'$.data.cycle') cycle, get_json_object(replace(replace(section_list,':{',':[{'),'}}','}]}'),'$.sectionNo') section_no, section_list from hurys_dc_ods.ods_statistics lateral view explode(split(replace(replace(replace(get_json_object(statistics_json,'$.data.sectionList'),'[',''),']',''),'},{"sectionNo"','}|{"sectionNo"'),"\\|")) tf as section_list where day= date_sub(current_date(), 1) ) as t1 lateral view explode(split(replace(replace(replace(get_json_object(replace(replace(section_list,':{',':[{'),'}}','}]}'),'$.coilList'),'[',''),']',''),'},','}|'),"\\|")) tf1 as coil_list where substr(create_time,1,10) = date_sub(current_date(), 1) ;
2 海豚任务执行脚本
#! /bin/bash source /etc/profile nowdate=`date --date='0 days ago' "+%Y-%m-%d"` yesdate=`date -d yesterday +%Y-%m-%d` hive -e " source dwd_json_statistics.sql "
3 执行任务,验证结果
终于解决了!以后碰到类似调度器识别不了SQL里符号的问题,可以用这个方法,把SQL放在SQL文件里,然后在脚本里执行这个SQL文件,这样就能规避这类问题了。
转载自天地风雷水火山泽 原文链接:https://blog.csdn.net/tiantang2renjian/article/details/140605840
本文由 白鲸开源科技 提供发布支持!

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。
持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。
转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。
- 上一篇
HarmonyOS ArkTS用户首选项的开发及测试
本节以一个“账本”为例,使用首选项的相关接口实现了对账单的增、删、改、查操作,并使用自动化测试框架arkxtest来对应用进行自动化测试。 为了演示该功能,创建一个名为“ArkTSPreferences”的应用。应用源码可以在文末《跟老卫学HarmonyOS开发》链接找到。 1. 操作Preferences 首先要获取一个Preferences来操作首选项。 在src/main/ets目录下创建名为“common”目录,用于存放常用的工具类。在该common目录创建工具类PreferencesUtil,代码如下: // 导入preferences模块 import { preferences } from '@kit.ArkData'; import { BusinessError } from '@kit.BasicServicesKit'; import { common } from '@kit.AbilityKit'; let context = getContext(this) as common.UIAbilityContext; let option...
- 下一篇
🔥🔥🔥MybatisPlus 发布 3.5.8 版本,更安全的体验升级
Mybatis-Plus 是一款 MyBatis 的增强工具包,简化 CRUD 操作。启动加载 XML 配置时注入单表 SQL 操作 ,为简化开发工作、提高生产率而生。Mybatis-Plus 启动注入动态 SQL 脚本、性能更优,让你专注业务快速敏捷开发。 演示例子:https://gitee.com/baomidou/mybatis-plus-samples 官方文档:https://baomidou.com/ 特别!!注意的是: 升级 mybatis-spring 至 3.0.4 版本,升级 jsqlparser 至 5.0 版本 升级日志 feat: 支持GoldenDB数据库 feat: 支持Duckdb数据库 feat: 支持Derby数据库 feat: 支持Vastbase数据库 feat: 升级kotlin2.0.0 feat: 升级SpringBoot3.3.2 feat: 升级fastjson2.0.52 feat: 升级mybatis-spring3.0.4 feat: 升级spring-cloud-commons4.1.4 feat: 增加deleteByIds...
相关文章
文章评论
共有0条评论来说两句吧...
文章二维码
点击排行
推荐阅读
最新文章
- CentOS8安装MyCat,轻松搞定数据库的读写分离、垂直分库、水平分库
- CentOS8编译安装MySQL8.0.19
- CentOS6,CentOS7官方镜像安装Oracle11G
- CentOS7,8上快速安装Gitea,搭建Git服务器
- SpringBoot2整合Thymeleaf,官方推荐html解决方案
- MySQL8.0.19开启GTID主从同步CentOS8
- SpringBoot2更换Tomcat为Jetty,小型站点的福音
- Red5直播服务器,属于Java语言的直播服务器
- CentOS6,7,8上安装Nginx,支持https2.0的开启
- CentOS8,CentOS7,CentOS6编译安装Redis5.0.7