首页 文章 精选 留言 我的

精选列表

搜索[官方镜像],共10000篇文章
优秀的个人博客,低调大师

Atom 1.57.0 发布,GitHub 官方文本编辑器

Atom 是由 GitHub 开发的自由及开放源代码的文字与代码编辑器,支持 macOS、Windows 和 Linux 操作系统,支持 Node.js 所写的插件,并内置由 Github 提供的 Git 版本控制系统。多数的延伸包皆为开放源代码许可,并由社区建置与维护。Atom 基于使用 Chromium 和 Node.js 的跨平台应用框架 Electron(最初名为Atom Shell),并使用 CoffeeScript 和 Less 撰写。Atom 也可当作 IDE 使用。被它的开发者称为“21 世纪的高自定义性”文本编辑器(hackable text editor for the 21st Century)。自 2014 年 5 月 6 日起,Atom 的核心程序、包管理器以及 Atom 基于 Chromium 的桌面程序框架皆使用 MIT 许可协议发布。 Atom 1.57.0 正式发布,本次更新内容如下: 值得注意的变化: #21847- 改进对不兼容的本地模块的检测 Atom Core: atom/atom#22019 -修复依赖性 Bump 脚本失败的问题 atom/atom#21927 -修复:直接要求 .node 文件以检测不兼容的本地模块 atom/atom#22046 -修复上下文菜单不工作的问题 atom/atom#22050 -修复上下文菜单右键不起作用的问题 atom/atom#22106 -将 y18n 从 3.2.1 升级到 3.2.2 atom/atom#22061 -focus-trap 升级至 6.3.0 atom/atom#22060 -chai 升级至 4.3.4 atom/atom#22063 -git-utils 升级至 5.7.1 atom/atom#22068 -normalize-package-data 升级至 3.0.2 atom/atom#22152 -settings-view 升级至 0.261.8 atom/atom#22159 -tree-view 升级至 0.228.3 setting-view: atom/settings-view#1176 -修复:更新异步依赖性 atom/settings-view#1182 -捕获 README 文件未找到的错误 tree-view: atom/tree-view#1377 -将 fs.realpathSync 包在 try-catch 中 更多详情可查看:https://github.com/atom/atom/releases/tag/v1.57.0

优秀的个人博客,低调大师

Atom 1.56.0 发布,GitHub 官方文本编辑器

Atom 是由 GitHub 开发的自由及开放源代码的文字与代码编辑器,支持 macOS、Windows 和 Linux 操作系统,支持 Node.js 所写的插件,并内置由 Github 提供的 Git 版本控制系统。多数的延伸包皆为开放源代码许可,并由社区建置与维护。Atom 基于使用 Chromium 和 Node.js 的跨平台应用框架 Electron(最初名为Atom Shell),并使用 CoffeeScript 和 Less 撰写。Atom 也可当作 IDE 使用。被它的开发者称为“21 世纪的高自定义性”文本编辑器(hackable text editor for the 21st Century)。自 2014 年 5 月 6 日起,Atom 的核心程序、包管理器以及 Atom 基于 Chromium 的桌面程序框架皆使用 MIT 许可协议发布。 Atom 1.56.0 正式发布,本次更新内容如下: 显著变化: #21847- 修复 macOS 上所有窗口关闭后退出的错误行为; #21852- 改进 Java 语法高亮显示; #21847- 增加禁用鼠标中键粘贴的设置; #21777- Electron 升级; Atom Core atom/atom#21753 -修正树状结构注入时"empty"语言的处理方式; atom/atom#21715 -检查 testRunner 是否为 es 模块; atom/atom#21848 -添加授权; atom/atom#21928 -修复了依赖性 bump 脚本; GitHub atom/github#2459 -使用 action-setup-atom 来实现; atom/github#2621 -package.json:将@babel/core 固定为 7.12.10 以下; atom/github#2625 -更新 shell.openExternal 为 Promise,因为 Atom 上的 Electron 更新; atom/github#2626 -更新 Electron API 中一些方法的 Promise api; atom/github#2631 -修复 GitHub 在 Atom electron 9.4.1 升级失败的测试; 拼写检查 atom/spell-check#357 -在配置中加入 enableDebug; atom/spell-check#359 -修复了无法加载包的问题; 更多详情可查看:https://github.com/atom/atom/releases/tag/v1.56.0

优秀的个人博客,低调大师

Atom 1.55.0 发布,GitHub 官方文本编辑器

Atom 是由 GitHub 开发的自由及开放源代码的文字与代码编辑器,支持 macOS、Windows 和 Linux 操作系统,支持 Node.js 所写的插件,并内置由 Github 提供的 Git 版本控制系统。多数的延伸包皆为开放源代码许可,并由社区建置与维护。Atom 基于使用 Chromium 和 Node.js 的跨平台应用框架 Electron(最初名为Atom Shell),并使用 CoffeeScript 和 Less 撰写。Atom 也可当作 IDE 使用。被它的开发者称为“21 世纪的高自定义性”文本编辑器(hackable text editor for the 21st Century)。自 2014 年 5 月 6 日起,Atom 的核心程序、包管理器以及 Atom 基于 Chromium 的桌面程序框架皆使用 MIT 许可协议发布。 显著变化 https://github.com/atom/github/pull/2564 - 在没有仓库的情况下读写 git 配置。 Atom Core atom/atom#21665 - 将 postcss 从 8.1.4 升级至 8.1.6 atom/atom#21762 - GitHub 软件包更新 atom/atom#21787 - 修复异步确认的问题 Github atom/github#2559 - GraphQL 模式更新 atom/github#2564 - 在没有仓库的情况下读写 git 配置 atom/github#2572 - 增加 GitHub 选项卡上登录提示的优先级 atom/github#2573 - 修正一个片状的测试 atom/github#2574 - 删减 issue 和 pull request 模板 atom/github#2583 - 将 superstring 从 2.4.2 升级至 2.4.3 atom/github#2584 - 将 whats-my-line 升级至 0.1.14 atom/github#2587 - Git 身份面板上的接受和取消按钮 atom/github#2592 - 将 ini 从 1.3.5 升级到 1.3.7 atom/github#2598 - 将 dompurify 从 2.0.7 升级到 2.0.17 atom/github#2617 - 测试:为 Atom CI 禁用失败的文件补丁测试 更多详情可查看:https://atom.io/releases

优秀的个人博客,低调大师

一起来读官方文档-----SpringIOC(08)

1.9。基于注解的容器配置 注解在配置Spring方面比XML更好吗? 基于注解的配置的引入提出了一个问题,即这种方法是否比XML“更好”。 简短的答案是“取决于情况”。 长话短说,每种方法都有其优缺点,通常,由开发人员决定哪种策略更适合他们。 由于定义方式的不同,注解在声明中提供了很多上下文,从而使配置更短,更简洁。 但是,XML擅长连接组件而不接触其源代码或重新编译它们。 一些开发人员更喜欢将布线放置在靠近源的位置,而另一些开发人员则认为带注解的类不再是POJO, 而且,该配置变得分散且难以控制。 无论选择如何,Spring都可以容纳两种样式,甚至可以将它们混合在一起。 值得指出的是,通过其JavaConfig选项,Spring允许以非侵入方式使用注解, 而无需接触目标组件的源代码。 注解是XML配置的替代方法,该配置依赖字节码元数据来连接组件,而不是尖括号声明。 通过使用相关的 类,方法或字段 声明上的注解,开发人员无需使用XML来描述bean的连接,而是将配置移入组件类本身。 如示例中所述:将RequiredAnnotationBeanPostProcessor,通过BeanPostProcessor的方式与注解结合使用是扩展Spring IoC容器的常用方法。 例如,Spring 2.0引入了使用@Required注解强制执行必需属性的可能性。 Spring 2.5引入@Autowired注解,提供的功能与自动装配协作器中所述的功能相同,但具有更细粒度的控制和更广泛的适用性。 Spring 2.5还添加了对JSR-250批注(例如 @PostConstruct和@PreDestroy)的支持。 Spring 3.0增加了对javax.inject包中包含的JSR-330(Java依赖性注入)注解的支持,例如@Inject 和@Named。 注解注入在XML注入之前执行。因此,XML配置将覆盖通过注解注入的属性 与往常一样,您可以根据类名将它们注册为单独的bean定义,但也可以通过在基于XML的Spring配置中包含以下标记来隐式注册它们: <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context" xsi:schemaLocation="http://www.springframework.org/schema/beans https://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/context https://www.springframework.org/schema/context/spring-context.xsd"> <context:annotation-config/> </beans> <context:annotation-config/> 隐式注册后置处理器包括 : AutowiredAnnotationBeanPostProcessor CommonAnnotationBeanPostProcessor PersistenceAnnotationBeanPostProcessor RequiredAnnotationBeanPostProcessor 并且当使用<context:component-scan/>后,即可将<context:annotation-config/>省去 context:annotation-config/只在定义它的相同应用程序上下文中查找关于bean的注解。 这意味着,如果你把context:annotation-config/定义在WebApplicationContext的DispatcherServlet中,它只是检查controllers中的@Autowired注解,而不是你的services。 上边的这段话意思不是很明确,需要解释一下以前用web.xml配置时的Spring启动流程 拿出几段配置 <!--配置开始 --> <servlet> <servlet-name>dispatcher</servlet-name> <servlet-class>org.springframework.web.servlet.DispatcherServlet</servlet-class> <init-param> <param-name>contextConfigLocation</param-name> <param-value>classpath:spring-mvc.xml</param-value> </init-param> <load-on-startup>1</load-on-startup> </servlet> <servlet-mapping> <servlet-name>dispatcher</servlet-name> <url-pattern>/service/*</url-pattern> </servlet-mapping> <listener> <listener-class>org.springframework.web.context.ContextLoaderListener</listener-class> </listener> <context-param> <param-name>contextConfigLocation</param-name> <param-value> classpath*:spring/spring-base.xml </param-value> </context-param> <!--配置结束 --> 上边的配置应该是多年前webi应用的基础配置,理一下tomcat启动后如何调用Spring的大概流程 1. tomcat读取web.xml文件(此处不管tomcat如何找到xml),解析内容并分组, 分成ServletContainerInitializer,servlet,listener,context-param等多个数组 2.逐个进行解析,先解析ServletContainerInitializer //这个就相当典型了 这个东西就是之前的文章讲过的ServletContainerInitializer //Tomcat启动会查找ServletContainerInitializer实现类并执行其中的onStartup方法。 //Spring-web模块存在ServletContainerInitializer实现类,所以Tomcat启动会调用Spring-web的代码。 //但是我们用Spring框架的话不需要实现这个接口,实现一个Spring的接口WebApplicationInitializer。 //就可以由Tomcat调用Spring-web的ServletContainerInitializer实现类 Iterator i$ = this.initializers.entrySet().iterator(); while(i$.hasNext()) { Entry entry = (Entry)i$.next(); try { ((ServletContainerInitializer)entry.getKey()).onStartup((Set)entry.getValue(), this.getServletContext()); } catch (ServletException var22) { log.error(sm.getString("standardContext.sciFail"), var22); ok = false; break; } } 但是这里我们并没有用这种方式而是用了listener的方式继续往下看 3. 解析listener,这里this.listenerStart()会解析我们配置的ContextLoaderListener if (ok && !this.listenerStart()) { log.error(sm.getString("standardContext.listenerFail")); ok = false; } 就在这里tomcat关联上了Spring的ApplicationContext,会实例化XmlWebApplicationContext, 实例化时取出context-param中的name为contextConfigLocation的配置文件,来进行解析注册 4.解析servlet,this.loadOnStartup(this.findChildren())来解析servlet, if (ok && !this.loadOnStartup(this.findChildren())) { log.error(sm.getString("standardContext.servletFail")); ok = false; } 这里就会进入DispatcherServlet的init方法, init方法中会根据当前的ServletContext查找在此之前有没有初始化过Spring的ApplicationContext, 然后再判断当前DispatcherServlet有没有ApplicationContext, 如果没有就初始化一个并把之前初始化ApplicationContext的设置为父节点 总结一下,也就是说用了上边的配置的话,tomcat在启动过程中,会初始化两遍并生成两个ApplicationContext对象, 第一遍解析context-param中param-name 为contextConfigLocation的配置文件, 并以此配置文件生成一个ApplicationContext ROOT 第二遍是解析DispatcherServlet servlet的spring-mvc.xml配置文件, 再以此配置文件生成一个ApplicationContext,并将ROOT设置为父节点 因此就产生了一个问题,当你在两个ApplicationContext都可以扫描到同一个Bean的时候, 那么这个bean在连个ApplicationContext中都各存在一个实例,并且实例不一样 举一个之前遇到的问题: 之前想给某个controller加一个AOP,拦截某些方法进行特殊处理,但是我把 <aop:aspectj-autoproxy/>这个注解 放到了下面这个层次的配置文件中了 <context-param> <param-name>contextConfigLocation</param-name> <param-value> classpath*:spring/spring-base.xml </param-value> </context-param> 最后我的AOP并没有生效,后来又把注解挪到了spring-mvc.xml中,才生效。 之前百度搜到说:spring-mvc 的配置扫描优先于spring的配置文件 通过调试才理解这句话: 我的controller在spring的ApplicationContext中有一份被AOP代理的对象 在spring-mvc的ApplicationContext中还有一份没被代理的普通对象 因为spring-mvc配置加载的晚,所以用到的都是没有被代理的对象 1.9.1。@Required 该@Required注解适用于bean属性setter方法,如下面的例子: public class SimpleMovieLister { private MovieFinder movieFinder; @Required public void setMovieFinder(MovieFinder movieFinder) { this.movieFinder = movieFinder; } } 这个注解要求,必须在配置时通过bean定义中的显式属性值或自动装配来填充bean属性。 如果未填充bean属性,容器将抛出异常。 这样显式的失败,避免了实例在应用的时候出现NullPointerException的情况。 @Required注解在Spring Framework 5.1时正式被弃用, Spring更赞同使用构造函数注入来进行必需参数的设置 (或者使用InitializingBean.afterPropertiesSet()的自定义实现来进行bean属性的设置)。 1.9.2。@Autowired 在本节包含的示例中,JSR330的@Inject注释可以用来替代Spring的@Autowired注释。 您可以将@Autowired注解应用于构造函数,如以下示例所示: public class MovieRecommender { private final CustomerPreferenceDao customerPreferenceDao; @Autowired public MovieRecommender(CustomerPreferenceDao customerPreferenceDao) { this.customerPreferenceDao = customerPreferenceDao; } // ... } 从Spring Framework 4.3开始,@Autowired如果目标bean仅定义一个构造函数作为开始,则不再需要在此类构造函数上进行注解。 但是,如果有多个构造函数可用,并且没有主/默认构造函数,则必须至少注解一个构造函数,@Autowired以指示容器使用哪个构造函数。 您还可以将@Autowired注解应用于传统的setter方法,如以下示例所示: public class SimpleMovieLister { private MovieFinder movieFinder; @Autowired public void setMovieFinder(MovieFinder movieFinder) { this.movieFinder = movieFinder; } } 您还可以将注解应用于具有任意名称和多个参数的方法,如以下示例所示: public class MovieRecommender { private MovieCatalog movieCatalog; private CustomerPreferenceDao customerPreferenceDao; @Autowired public void prepare(MovieCatalog movieCatalog, CustomerPreferenceDao customerPreferenceDao) { this.movieCatalog = movieCatalog; this.customerPreferenceDao = customerPreferenceDao; } } 您也可以将其应用于@Autowired字段,甚至可以将其与构造函数混合使用,如以下示例所示: public class MovieRecommender { private final CustomerPreferenceDao customerPreferenceDao; @Autowired private MovieCatalog movieCatalog; @Autowired public MovieRecommender(CustomerPreferenceDao customerPreferenceDao) { this.customerPreferenceDao = customerPreferenceDao; } // ... } 确保目标组件(例如MovieCatalog或CustomerPreferenceDao)与带@Autowired注解的注入点的类型一致地声明。 否则,注入可能会由于运行时出现“no type match found”错误而失败。 对于通过类路径扫描找到的xml定义的bean或组件类,容器通常预先知道具体的类型。 但是,对于@Bean工厂方法,您需要确保声明的返回类型具有足够的表达能力。 对于实现多个接口的组件,或者对于可能由其实现类型引用的组件, 考虑在您的工厂方法上声明最特定的返回类型(至少与引用您的bean的注入点所要求的那样特定)。 您还可以将@Autowired注解添加到需要该类型数组的字段或方法中,指示Spring提供特定类型的所有bean ,如以下示例所示: public class MovieRecommender { @Autowired private MovieCatalog[] movieCatalogs; // ... } 如以下示例所示,这同样适用于类型化集合: public class MovieRecommender { private Set<MovieCatalog> movieCatalogs; @Autowired public void setMovieCatalogs(Set<MovieCatalog> movieCatalogs) { this.movieCatalogs = movieCatalogs; } // ... } 如果希望数组或列表中的项目以特定顺序排序, 则目标bean可以实现该org.springframework.core.Ordered接口或使用@Order或标准@Priority注解。 否则,它们的顺序将遵循容器中相应目标bean定义的注册顺序。 您可以使用@Order在目标类级别和@Bean方法上声明注解。 @Order值可能会影响注入点的优先级,但请注意它们不会影响单例启动顺序, 这是由依赖关系和@DependsOn声明确定的正交关系。(举例:a,b,c三个bean设置的order分别为1,2,3, 但是a依赖c,所以a在初始化的时候会初始化c,导致c比b提前初始化) 请注意,标准javax.annotation.Priority注解在该@Bean级别不可用 ,因为无法在方法上声明它。 可以通过将@Order值与@Primary每个类型的单个bean结合使用来对其语义进行建模。 即使是类型化的Map实例也可以自动注入,键包含相应的bean名称是String类型,值是对应的bean实例,如下面的示例所示: public class MovieRecommender { private Map<String, MovieCatalog> movieCatalogs; @Autowired public void setMovieCatalogs(Map<String, MovieCatalog> movieCatalogs) { this.movieCatalogs = movieCatalogs; } } 默认情况下,当给定注入点没有匹配的候选bean可用时,自动装配将失败。对于声明的数组,集合或映射,至少应有一个匹配元素。 默认是将带注解的方法和字段视为必须要注入的依赖项。 你可以通过标记为非必需注入来改变这个行为(例如,通过在@Autowired中设置required属性为false): public class SimpleMovieLister { private MovieFinder movieFinder; @Autowired(required = false) public void setMovieFinder(MovieFinder movieFinder) { this.movieFinder = movieFinder; } // ... } @Autowired(required = false)用在方法上时 当存在任何一个参数不可注入,则根本不会调用该方法。 在这种情况下,完全不需要填充非必需字段,而保留其默认值。 当方法有多个参数时,可以使用该注解标识其中的某个参数可以不被注入 public class ServiceController { private ServiceTwo serviceTwo; private CusService serviceOne; public ServiceController(CusService cusService, @Autowired(required = false) ServiceTwo serviceTwo){ this.serviceOne = cusService; this.serviceTwo = serviceTwo; } } 在任何给定bean类中,只有一个构造函数可以声明@Autowired,并将required属性设置为true,以指示当作为Spring bean使用时要自动装配的构造函数。 因此,如果required属性的默认值为true,那么只有一个构造函数可以使用@Autowired注解。 如果有多个构造函数声明注解,那么它们都必须声明required=false,才能被认为是自动装配的候选者(类似于XML中的autowire=constructor)。 通过在Spring容器中匹配bean可以满足的依赖关系最多的构造函数将被选择。 如果没有一个候选函数可以满足,那么将使用主/默认构造函数(如果存在)。 类似地,如果一个类声明了多个构造函数,但是没有一个是用@Autowired注解的,那么一个主/默认构造函数(如果有的话)将会被使用。 如果一个类只声明了一个构造函数,那么它将始终被使用,即使没有@Autowired注解。 请注意,带注解的构造函数不必是public的。 建议在setter方法上的已弃用的@Required注释上使用@Autowired属性。 将required属性设置为false表示该属性对于自动装配目的是不需要的,并且如果该属性不能自动装配,则忽略它。 另一方面,@Required更强制,因为它强制用容器支持的任何方法设置属性,如果没有定义值,则会引发相应的异常。 另外,您可以通过Java8来表达特定依赖项的非必需性质java.util.Optional,如以下示例所示: public class SimpleMovieLister { @Autowired public void setMovieFinder(Optional<MovieFinder> movieFinder) { ... } } 从Spring Framework 5.0开始,您还可以使用@Nullable注解(任何包中的Nullable注解,例如,javax.annotation.Nullable来自JSR-305的注解)。 使用此注解标识该参数不一定会被注入,有可能会是空值 public class SimpleMovieLister { @Autowired public void setMovieFinder(@Nullable MovieFinder movieFinder) { ... } } 您还可以对这些接口(BeanFactory,ApplicationContext,Environment,ResourceLoader, ApplicationEventPublisher,和MessageSource)使用@Autowired。 这些接口及其扩展接口(例如ConfigurableApplicationContext或ResourcePatternResolver)将自动解析,而无需进行特殊设置。 以下示例自动装配ApplicationContext对象: public class MovieRecommender { @Autowired private ApplicationContext context; public MovieRecommender() { } // ... } 在@Autowired,@Inject,@Value,和@Resource注解由Spring注册的BeanPostProcessor实现。 这意味着您不能在自己的类型BeanPostProcessor或BeanFactoryPostProcessor类型(如果有)中应用这些注解。 必须通过使用XML或Spring@Bean方法显式地“连接”这些类型。 不仅相当上一章的内容: 您应该看到一条参考性日志消息: Bean someBean is not eligible for getting processed by all BeanPostProcessor interfaces (for example: not eligible for auto-proxying)。 这条消息的意思大概就是说这个bean没有得到所有BeanPostProcessor的处理 如果您自定义的BeanPostProcessor或BeanFactoryPostProcessor在自动注入的BeanPostProcessor之前实例化那么就无法为您注入您想要的参数。 1.9.3。@Primary 由于按类型自动布线可能会导致多个候选对象,因此通常有必要对选择过程进行更多控制。 一种实现此目的的方法是使用Spring的 @Primary注解。 @Primary指示当多个bean是要自动装配到单值依赖项的候选对象时,应给予特定bean优先权。 如果候选中恰好存在一个主bean,则它将成为自动装配的值。 考虑以下定义firstMovieCatalog为主要配置的配置MovieCatalog: @Configuration public class MovieConfiguration { @Bean @Primary public MovieCatalog firstMovieCatalog() { ... } @Bean public MovieCatalog secondMovieCatalog() { ... } // ... } 使用前面的配置,以下内容MovieRecommender将自动注入到 firstMovieCatalog: public class MovieRecommender { @Autowired private MovieCatalog movieCatalog; // ... } 1.9.4。@Qualifier @Primary当可以确定一个主要候选对象时,它是在几种情况下按类型使用自动装配的有效方法。 当需要对选择过程进行更多控制时,可以使用Spring的@Qualifier注解。 您可以将限定符值与特定的参数相关联,从而缩小类型匹配的范围,以便为每个参数选择特定的bean。 在最简单的情况下,这可以是简单的描述性值,如以下示例所示: public class MovieRecommender { @Autowired @Qualifier("main") private MovieCatalog movieCatalog; // ... } 您还可以@Qualifier在各个构造函数参数或方法参数上指定注解,如以下示例所示: public class MovieRecommender { private MovieCatalog movieCatalog; private CustomerPreferenceDao customerPreferenceDao; @Autowired public void prepare(@Qualifier("main") MovieCatalog movieCatalog, CustomerPreferenceDao customerPreferenceDao) { this.movieCatalog = movieCatalog; this.customerPreferenceDao = customerPreferenceDao; } // ... } 下面的示例显示了相应的bean定义。 <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context" xsi:schemaLocation="http://www.springframework.org/schema/beans https://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/context https://www.springframework.org/schema/context/spring-context.xsd"> <context:annotation-config/> <bean class="example.SimpleMovieCatalog"> <qualifier value="main"/> <!-- 指定qualifier属性 --> </bean> </beans> bean名称被认为是默认的qualifier值。 也可以不使用qualifier而是将该bean的id定义为main,达到相同的匹配效果。 然而,尽管您可以使用这种约定来按名称引用特定的bean,但@Autowired基本上是关于类型驱动的注入,qualifier只是在类型之上的可选选项,这意味着,即使使用了bean名称来进行qualifier的限定,qualifier 值也总是在类型匹配集中选择相同名称的bean。 qualifier 也适用于collections, 如前所述—例如 Set<MovieCatalog>,在这种情况下,根据声明的qualifier值,所有匹配的bean都作为一个集合注入。 这意味着qualifier不必是惟一的。相反,它们构成了过滤标准。例如,您可以定义具有相同qualifier值“action”的多个MovieCatalog bean, 所有这些bean都被注入到带有@Qualifier(“action”)注释的集合中。 public class ServiceController { @Autowired @Qualifier("main") private List<MovieCatalog> serviceList; } <bean class="example.SimpleMovieCatalogOne"> <qualifier value="main"/> <!-- 指定相同的qualifier属性 --> </bean> <bean class="example.SimpleMovieCatalogTwo"> <qualifier value="main"/> <!-- 指定相同的qualifier属性 --> </bean> <bean class="example.SimpleMovieCatalogThree"> <qualifier value="action"/> <!-- 指定相同的qualifier属性 --> </bean> 如果没有其他注解(例如qualifier或primary ), 对于非唯一依赖情况,Spring将注入点名称(即字段名称或参数名称)与目标bean名称或者bean id匹配, 并选择同名的候选对象(如果有同名的的话,没有同名的话则依然抛出异常)。 如果您打算通过名称进行依赖注入,请不要主要使用@Autowired,即使它能够通过bean名称在类型匹配的候选者中进行选择。 使用JSR-250 @Resource注解: 1. 如果同时指定了name和type,按照bean Name 和 bean 类型同时匹配 2. 如果指定了name,就按照bean Name 匹配 3. 如果指定了type,就按照类型匹配 4. 如果既没有指定name,又没有指定type,就先按照beanName匹配; 如果没有匹配,再按照类型进行匹配; 测试 @Resource的时候还发现一个有意思的东西, 当被注入的是个List的时候,不管是什么类型的List, 只要@Resource加了name条件,都能被注入进去, 比如 List<String> 会被注入到List<MovieCatalog> 大家可以试一下 @Autowired注解: 在按类型选择候选bean之后,再在候选者bean中选择相同名称的。 @Autowired适用于 字段,构造方法,和多参数方法,允许通过qualifier注解在参数级别上缩小范围。 相比之下,@Resource只支持具有单个参数的字段和bean属性设置器方法。 因此,如果注入目标是构造函数或多参数方法,则应该坚持使用qualifier。 您可以创建自己的自定义限定符注解。为此,请定义一个注解并在您的定义中提供该注解,如以下示例所示: @Target({ElementType.FIELD, ElementType.PARAMETER}) @Retention(RetentionPolicy.RUNTIME) @Qualifier public @interface Genre { String value(); } 然后,您可以在自动连接的字段和参数上提供自定义限定符,如以下示例所示: public class MovieRecommender { @Autowired @Genre("Action") private MovieCatalog actionCatalog; private MovieCatalog comedyCatalog; @Autowired public void setComedyCatalog(@Genre("Comedy") MovieCatalog comedyCatalog) { this.comedyCatalog = comedyCatalog; } // ... } 接下来,您可以为候选bean定义提供信息。您可以添加<qualifier></qualifier>标记作为<bean></bean>标记的子元素,然后指定类型和值来匹配您的自定义qualifier注解。该类型与注释的全限定类名匹配。 另外,为了方便起见,如果不存在名称冲突的风险,您可以使用简短的类名。 下面的例子演示了这两种方法: <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context" xsi:schemaLocation="http://www.springframework.org/schema/beans https://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/context https://www.springframework.org/schema/context/spring-context.xsd"> <context:annotation-config/> <bean class="example.SimpleMovieCatalog"> <qualifier type="Genre" value="Action"/> <!-- inject any dependencies required by this bean --> </bean> <bean class="example.SimpleMovieCatalog"> <qualifier type="example.Genre" value="Comedy"/> <!-- inject any dependencies required by this bean --> </bean> <bean id="movieRecommender" class="example.MovieRecommender"/> </beans> 在某些情况下,使用没有值的注解就足够了。当注解用于更通用的目的,并且可以跨几种不同类型的依赖项应用时,这一点非常有用。例如,您可以提供一个脱机目录,当没有可用的Internet连接时可以搜索该目录。首先,定义简单注释,如下例所示: @Target({ElementType.FIELD, ElementType.PARAMETER}) @Retention(RetentionPolicy.RUNTIME) @Qualifier public @interface Offline { } 然后将注解添加到要自动装配的字段或属性,如以下示例所示: public class MovieRecommender { @Autowired @Offline private MovieCatalog offlineCatalog; // ... } 现在,bean定义仅需要一个限定符type,如以下示例所示: <bean class="example.SimpleMovieCatalog"> <qualifier type="Offline"/> <!-- inject any dependencies required by this bean --> </bean> 您还可以定义自定义qualifier注解,自定义的注解可以定义除了简单value属性之外的属性。 如果随后在要自动装配的字段或参数上指定了多个属性值,则bean定义必须与所有此类属性值匹配才能被视为自动装配候选。 例如,请考虑以下注解定义: @Target({ElementType.FIELD, ElementType.PARAMETER}) @Retention(RetentionPolicy.RUNTIME) @Qualifier public @interface MovieQualifier { String genre(); Format format(); } 在这种情况下Format是一个枚举,定义如下: public enum Format { VHS, DVD, BLURAY } 要自动装配的字段将用自定义qualifier进行注解,并包括这两个属性的值:genre和format,如以下示例所示: public class MovieRecommender { @Autowired @MovieQualifier(format=Format.VHS, genre="Action") private MovieCatalog actionVhsCatalog; @Autowired @MovieQualifier(format=Format.VHS, genre="Comedy") private MovieCatalog comedyVhsCatalog; @Autowired @MovieQualifier(format=Format.DVD, genre="Action") private MovieCatalog actionDvdCatalog; @Autowired @MovieQualifier(format=Format.BLURAY, genre="Comedy") private MovieCatalog comedyBluRayCatalog; // ... } 最后,bean定义应该包含匹配的限定符值。这个例子还演示了您可以使用bean元属性来代替<qualifier></qualifier>元素。 如果可用,<qualifier></qualifier>元素及其属性优先,但是如果没有这样的限定符,自动装配机制就会回到<meta>标签中提供的值上,就像下面例子中的最后两个bean定义一样: <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context" xsi:schemaLocation="http://www.springframework.org/schema/beans https://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/context https://www.springframework.org/schema/context/spring-context.xsd"> <context:annotation-config/> <bean class="example.SimpleMovieCatalog"> <qualifier type="MovieQualifier"> <attribute key="format" value="VHS"/> <attribute key="genre" value="Action"/> </qualifier> <!-- inject any dependencies required by this bean --> </bean> <bean class="example.SimpleMovieCatalog"> <qualifier type="MovieQualifier"> <attribute key="format" value="VHS"/> <attribute key="genre" value="Comedy"/> </qualifier> <!-- inject any dependencies required by this bean --> </bean> <bean class="example.SimpleMovieCatalog"> <meta key="format" value="DVD"/> <meta key="genre" value="Action"/> <!-- inject any dependencies required by this bean --> </bean> <bean class="example.SimpleMovieCatalog"> <meta key="format" value="BLURAY"/> <meta key="genre" value="Comedy"/> <!-- inject any dependencies required by this bean --> </bean> </beans> 1.9.5。将泛型用作自动装配限定符 除了@Qualifier注解之外,您还可以将Java泛型类型用作资格的隐式形式。例如,假设您具有以下配置: @Configuration public class MyConfiguration { @Bean public StringStore stringStore() { return new StringStore(); } @Bean public IntegerStore integerStore() { return new IntegerStore(); } } 假设前面的bean实现了一个通用接口(即Store<String>和 Store<Integer>) class StringStore implements Store<String>{ } class IntegerStore implements Store<Integer>{ } 则可以@Autowire将该Store接口和通用用作限定符,如以下示例所示: @Autowired private Store<String> s1; // <String> qualifier, 注入 stringStore bean @Autowired private Store<Integer> s2; // <Integer> qualifier, 注入 the integerStore bean 在自动装配列表,Map实例和数组时,通用限定符也适用。下面的示例自动连接泛型List: // 只注入 Store<Integer> 类型 // Store<String> 不会被注入 @Autowired private List<Store<Integer>> s; 1.9.6。使用CustomAutowireConfigurer CustomAutowireConfigurer 是一个BeanFactoryPostProcessor 您可以注册自己的自定义限定符注解类型,即使它们未使用Spring的@Qualifier注解进行注解。 像之前我们定义的注解 @Target({ElementType.FIELD, ElementType.PARAMETER}) @Retention(RetentionPolicy.RUNTIME) @Qualifier public @interface MovieQualifier { String value(); } 这种写法主要就是托@Qualifier的福气 但我们也可以不依赖它 以下示例显示如何使用CustomAutowireConfigurer: <bean id="customAutowireConfigurer" class="org.springframework.beans.factory.annotation.CustomAutowireConfigurer"> <property name="customQualifierTypes"> <set> <value>example.CustomQualifier</value> </set> </property> </bean> example.CustomQualifier Spring会根据这个类路径加载这个类, 并将这个类作为和@Qualifier同作用来对待 自动注入是如何处理候选对象的? bean definition 的 autowire-candidate 值,值为false表示该bean不参于候选 <beans/>元素default-autowire-candidates上可用的任何模式,值为false表示该组的bean不参与候选 @Qualifier注解 和 任何在customautowiresfigurer注册的自定义注解的存在会被优先使用 当多个bean符合自动装配候选条件时, 确定“primary”的步骤如下:如果候选中恰好有一个bean定义将primary属性设置为true,则将其选中。 1.9.7。@Resource Spring还通过在字段或bean属性设置器方法上使用JSR-250@Resource批注(javax.annotation.Resource)支持注入。 1. 如果同时指定了name和type,按照bean Name 和 bean 类型同时匹配 2. 如果指定了name,就按照bean Name 匹配 3. 如果指定了type,就按照类型匹配 4. 如果既没有指定name,又没有指定type,就先按照beanName匹配; 如果没有匹配,再按照类型进行匹配; @Resource具有name属性。默认情况下,Spring将该值解释为要注入的Bean名称。 换句话说,它遵循名称语义,如以下示例所示: public class SimpleMovieLister { private MovieFinder movieFinder; @Resource(name="myMovieFinder") public void setMovieFinder(MovieFinder movieFinder) { this.movieFinder = movieFinder; } } 如果未明确指定名称,则默认名称是从字段名称或setter方法派生的。 如果是字段,则采用字段名称。 在使用setter方法的情况下,它采用bean属性名称。 以下示例将名为 movieFinder的bean注入到setter方法: public class SimpleMovieLister { private MovieFinder movieFinder; @Resource public void setMovieFinder(MovieFinder movieFinder) { this.movieFinder = movieFinder; } } 因此,在下例中,customerPreferenceDao字段首先查找名为“customerPreferenceDao”的bean,找不到的话然后返回到与类型customerPreferenceDao匹配的bean: public class MovieRecommender { @Resource private CustomerPreferenceDao customerPreferenceDao; @Resource private ApplicationContext context; public MovieRecommender() { } } 1.9.8。使用@Value @Value 通常用于注入外部属性: @Component public class MovieRecommender { private final String catalog; public MovieRecommender(@Value("${catalog.name}") String catalog) { this.catalog = catalog; } } 使用以下配置: @Configuration @PropertySource("classpath:application.properties") public class AppConfig { } 和以下application.properties文件: catalog.name=MovieCatalog 在这种情况下,catalog参数和字段将等于MovieCatalog值。 Spring提供了一个默认的值解析器。 它将尝试解析属性值,如果无法解析, ${catalog.name} 则将被当做值注入到属性中。 例如:catalog="${catalog.name}" 如果要严格控制不存在的值,则应声明一个PropertySourcesPlaceholderConfigurerbean,如以下示例所示: @Configuration public class AppConfig { @Bean public static PropertySourcesPlaceholderConfigurer propertyPlaceholderConfigurer() { return new PropertySourcesPlaceholderConfigurer(); } } 当配置PropertySourcesPlaceholderConfigurer使用JavaConfig,该@Bean方法必须是static。 如果${} 无法解析任何占位符,则使用上述配置可确保Spring初始化失败。 默认情况下,Spring Boot配置一个PropertySourcesPlaceholderConfigurer 从application.properties和application.yml获取bean的属性。 Spring提供的内置转换器支持允许自动处理简单的类型转换(例如转换为Integer 或转换为简单的类型int)。 多个逗号分隔的值可以自动转换为String数组,而无需付出额外的努力。 可以像下边一样提供默认值: @Component public class MovieRecommender { private final String catalog; public MovieRecommender(@Value("${catalog.name:defaultCatalog}") String catalog) { this.catalog = catalog; } } Spring BeanPostProcessor在后台使用ConversionService来处理将@Value中的字符串值转换为目标类型的过程。如果你想为自己的自定义类型提供转换支持,你可以提供自己的ConversionService bean实例,如下面的例子所示: @Configuration public class AppConfig { @Bean public ConversionService conversionService() { DefaultFormattingConversionService conversionService = new DefaultFormattingConversionService(); conversionService.addConverter(new MyCustomConverter()); return conversionService; } } 当@Value包含SpEL表达式时,该值将在运行时动态计算,如以下示例所示: @Component public class MovieRecommender { private final String catalog; public MovieRecommender(@Value("#{systemProperties['user.catalog'] + 'Catalog' }") String catalog) { this.catalog = catalog; } } SpEL还支持使用更复杂的数据结构: @Component public class MovieRecommender { private final Map<String, Integer> countOfMoviesPerCatalog; public MovieRecommender( @Value("#{{'Thriller': 100, 'Comedy': 300}}") Map<String, Integer> countOfMoviesPerCatalog) { this.countOfMoviesPerCatalog = countOfMoviesPerCatalog; } } 1.9.9。使用@PostConstruct和@PreDestroy CommonAnnotationBeanPostProcessor不仅处理@Resource注解 也处理javax.annotation.PostConstruct和 javax.annotation.PreDestroy。 在Spring 2.5中引入了对这些注解的支持,为初始化回调和销毁回调中描述的生命周期回调机制提供了一种替代方法。 如果在Spring ApplicationContext中注册了CommonAnnotationBeanPostProcessor,带有这两个注解的方法将会被回调执行。 在下面的例子中,缓存在初始化时被预填充,在销毁时被清除: public class CachingMovieLister { @PostConstruct public void populateMovieCache() { // populates the movie cache upon initialization... } @PreDestroy public void clearMovieCache() { // clears the movie cache upon destruction... } } 像@Resource一样,@PostConstruct和@PreDestroy注解是JDK6到8的标准Java库的一部分。 但是,整个javax.annotation 程序包都与JDK 9中的核心Java模块分开,并最终在JDK 11中删除了。 如果需要,需要对javax.annotation-api工件进行处理。 现在可以通过Maven Central获取,只需像其他任何库一样将其添加到应用程序的类路径中即可。

优秀的个人博客,低调大师

一起来读官方文档-----SpringIOC(07)

1.8。容器扩展点 通常,应用程序开发人员不需要对ApplicationContext 实现类进行子类化。相反,可以通过插入特殊集成接口的实现来扩展Spring IoC容器。接下来的几节描述了这些集成接口。 1.8.1。自定义bean实现BeanBeanPostProcessor接口 BeanPostProcessor接口定义了回调方法,您可以实现这些回调方法来修改默认的bean实例化的逻辑,依赖关系解析逻辑等。 如果您想在Spring容器完成实例化,配置和初始化bean之后实现一些自定义逻辑,则可以插入一个或多个自定义BeanPostProcessor。 您可以配置多个BeanPostProcessor实例,并且可以BeanPostProcessor通过实现Ordered 接口设置order属性来控制这些实例的运行顺序。 @Component public class MyBeanPostProcessor implements BeanPostProcessor, Ordered { @Override public Object postProcessAfterInitialization(Object bean, String beanName) throws BeansException { return bean; } @Override public int getOrder() { return 0; } } BeanPostProcessor实例操作的是bean的实例。 也就是说,Spring IoC容器实例化一个bean实例, 然后使用BeanPostProcessor对这些实例进行处理加工。 BeanPostProcessor实例是按容器划分作用域的。 仅在使用容器层次结构时,这才有意义。 如果BeanPostProcessor在一个容器中定义一个,它将仅对该容器中的bean进行后处理。 换句话说,一个容器中定义的bean不会被BeanPostProcessor另一个容器中的定义进行后处理, 即使这两个容器是同一层次结构的一部分也是如此。 BeanPostProcessor修改的是bean实例化之后的内容, 如果要更改实际的bean定义(即bean definition) 您需要使用 BeanFactoryPostProcessor接口. org.springframework.beans.factory.config.BeanPostProcessor接口恰好由两个回调方法组成。 当此类被注册为容器的post-processor时,对于容器创建的每个bean实例,post-processor都会在任何bean实例化之后并且在容器初始化方法(例如InitializingBean.afterPropertiesSet()或任何声明的init方法)被使用之前调用。 post-processor可以对bean实例执行任何操作,也可以完全忽略回调。 post-processor通常检查回调接口,或者可以用代理包装Bean。 一些Spring AOP基础结构类被实现为post-processor,以提供代理包装逻辑。 ApplicationContext自动检测实现BeanPostProcessor接口所有bean,注意是要注册成bean,仅仅实现接口是不可以的。 请注意,通过使用@Bean工厂方法声明BeanPostProcessor时,工厂方法的返回类型应该是实现类本身或至少是org.springframework.beans.factory.config.BeanPostProcessor 接口,以清楚地表明该bean的post-processor性质。 否则,ApplicationContext无法在完全创建之前按类型自动检测它。 由于BeanPostProcessor需要提前实例化以便应用于上下文中其他bean的初始化,因此这种早期类型检测至关重要。 @Bean public BeanPostProcessor myBeanPostProcessor(){ return new MyBeanPostProcessor(); } 以编程方式注册BeanPostProcessor实例 虽然推荐的BeanPostProcessor注册方法是通过ApplicationContext自动检测, 但是您可以ConfigurableBeanFactory使用addBeanPostProcessor方法通过编程方式对它们进行注册。 当您需要在注册之前评估条件逻辑(比如应用场景是xxx条件才注册,xxx条件不注册时), 甚至需要跨层次结构的上下文复制Bean post-processor时,这将非常有用。 但是请注意,以BeanPostProcessor编程方式添加的实例不遵守该Ordered接口。 在这里,注册的顺序决定了执行的顺序。 还要注意,以BeanPostProcessor编程方式注册的实例总是在通过自动检测注册的实例之前进行处理, 而不考虑任何明确的顺序。 BeanPostProcessor 实例和AOP自动代理 实现BeanPostProcessor接口的类是特殊的,并且容器对它们的处理方式有所不同。 BeanPostProcessor它们直接引用的所有实例和bean在启动时都会实例化, 作为ApplicationContext的特殊启动阶段的一部分。 接下来,BeanPostProcessor以排序方式注册所有实例,并将其应用于容器中的所有其他bean。 但是因为AOP自动代理的实现是通过BeanPostProcessor接口, 所以在AOP的BeanPostProcessor接口实例化之前的 BeanPostProcessor实例或BeanPostProcessor实例直接引用的bean都没有资格进行自动代理。 并且对于任何此类bean都没有任何处理切面的BeanPostProcessor指向他们。 您应该看到一条参考性日志消息: Bean someBean is not eligible for getting processed by all BeanPostProcessor interfaces (for example: not eligible for auto-proxying)。 这条消息的意思大概就是说这个bean没有得到所有BeanPostProcessor的处理 下面分析一下这条日志的逻辑:我们不用AOP的BeanPostProcessor用AutowiredAnnotationBeanPostProcessor来看这个情况 首先这条日志是在BeanPostProcessorChecker类中打印的, 这个类本身就实现了BeanPostProcessor, Spring容器增加这个processor的代码如下: //获取所有的BeanPostProcessor类型的bean //第一个true表示包括非单例的bean //第二个false表示仅查找已经实例化完成的bean,如果是factory-bean则不算入内 String[] postProcessorNames = beanFactory.getBeanNamesForType(BeanPostProcessor.class, true, false); //当前beanFactory内的所有post-processor数 + 1 + postBeanNames的数量 //这个数量在后续有个判断 //beanFactory.getBeanPostProcessorCount() 系统内置processor //1 就是BeanPostProcessorChecker //postProcessorNames.length 就是能扫描到的processor //这个数量之和就是目前系统能看到的所有processor //还有的就可能是解析完了某些bean又新增了processor那个不算在内 int beanProcessorTargetCount = beanFactory.getBeanPostProcessorCount() + 1 + postProcessorNames.length; //add BeanPostProcessorChecker 进入beanPostProcessor链 beanFactory.addBeanPostProcessor(new BeanPostProcessorChecker(beanFactory, beanProcessorTargetCount)); BeanPostProcessorChecker中判断并打印上边那条日志的方法如下: @Override public Object postProcessAfterInitialization(Object bean, String beanName) { //如果当前bean不是postProcessor的实例 //并且不是内部使用的bean //并且this.beanFactory.getBeanPostProcessorCount()小于刚才相加的值 //三个都满足才会打印那行日志 if (!(bean instanceof BeanPostProcessor) && !isInfrastructureBean(beanName) && this.beanFactory.getBeanPostProcessorCount() < this.beanPostProcessorTargetCount) { if (logger.isInfoEnabled()) { logger.info("Bean '" + beanName + "' of type [" + bean.getClass().getName() + "] is not eligible for getting processed by all BeanPostProcessors " + "(for example: not eligible for auto-proxying)"); } } return bean; } //当前beanName不为空,并且对应的bean是容器内部使用的bean则返回true private boolean isInfrastructureBean(@Nullable String beanName) { if (beanName != null && this.beanFactory.containsBeanDefinition(beanName)) { BeanDefinition bd = this.beanFactory.getBeanDefinition(beanName); return (bd.getRole() == RootBeanDefinition.ROLE_INFRASTRUCTURE); } return false; } 在看Spring createBean时遍历postProcessor的代码 @Override public Object applyBeanPostProcessorsAfterInitialization(Object existingBean, String beanName) throws BeansException { Object result = existingBean; for (BeanPostProcessor processor : getBeanPostProcessors()) { Object current = processor.postProcessAfterInitialization(result, beanName); if (current == null) { return result; } result = current; } return result; } 就是通过这么一个循环来执行后置方法applyBeanPostProcessorsAfterInitialization,前置方法也是这样的 现在假设我们有一个自定义的beanPostProcessor里面需要注入一个我们自定义的beanA, 那么在beanPostProcessor被实例化的时候肯定会要求注入我们自定义的beanA, 那么现在就有多种情况了: 1.我们用的set或者构造器注入那beanA会被实例化并注入 2.如果我们用的@Autowired,当我们自定义的beanPostProcessor实例化 在AutowiredAnnotationBeanPostProcessor实例化之前,那么beanA都无法被注入值 如果在之后,则还是可以被注入值 但是这两种情况都会打印这行日志 Bean 'beanA' of type [org.springframework.beanA] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying) 以下示例显示了如何在ApplicationContext中编写,注册和使用BeanPostProcessor实例。 示例:Hello World,BeanPostProcessor-style 第一个示例演示了基本用法。示例展示了一个自定义BeanPostProcessor实现,它在容器创建每个bean时调用该bean的toString()方法,并将结果字符串打印到系统控制台。 下面的清单显示了自定义的BeanPostProcessor实现类定义: package scripting; import org.springframework.beans.factory.config.BeanPostProcessor; public class InstantiationTracingBeanPostProcessor implements BeanPostProcessor { // 只需按原样返回实例化的bean public Object postProcessBeforeInitialization(Object bean, String beanName) { return bean; // 我们可以返回任何对象引用 } public Object postProcessAfterInitialization(Object bean, String beanName) { System.out.println("Bean '" + beanName + "' created : " + bean.toString()); return bean; } } 以下beans元素使用InstantiationTracingBeanPostProcessor: <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:lang="http://www.springframework.org/schema/lang" xsi:schemaLocation="http://www.springframework.org/schema/beans https://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/lang https://www.springframework.org/schema/lang/spring-lang.xsd"> <lang:groovy id="messenger" script-source="classpath:org/springframework/scripting/groovy/Messenger.groovy"> <lang:property name="message" value="Fiona Apple Is Just So Dreamy."/> </lang:groovy> <!-- 当上述bean (messenger)被实例化时,这个自定义的BeanPostProcessor实现将事实输出到系统控制台 --> <bean class="scripting.InstantiationTracingBeanPostProcessor"/> </beans> 请注意实例化tracingbeanpostprocessor是如何定义的。它甚至没有名称,而且,因为它是一个bean,所以可以像其他bean一样进行依赖注入。 下面的Java应用程序运行前面的代码和配置: import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext; import org.springframework.scripting.Messenger; public final class Boot { public static void main(final String[] args) throws Exception { ApplicationContext ctx = new ClassPathXmlApplicationContext("scripting/beans.xml"); Messenger messenger = ctx.getBean("messenger", Messenger.class); System.out.println(messenger); } } 前面的应用程序的输出类似于以下内容: Bean 'messenger' created : org.springframework.scripting.groovy.GroovyMessenger@272961 org.springframework.scripting.groovy.GroovyMessenger@272961 示例: RequiredAnnotationBeanPostProcessor 将回调接口或注解与自定义BeanPostProcessor实现结合使用是扩展Spring IoC容器的一种常见方法。 一个例子是Spring的AutowiredAnnotationBeanPostProcessor——一个随Spring发行版附带的BeanPostProcessor实现,它确保被注解(@Autowired,@Value, @Inject等注解)注释的属性会被注入一个bean实例。 1.8.2。自定义配置元数据BeanFactoryPostProcessor 我们要看的下一个扩展点是 org.springframework.beans.factory.config.BeanFactoryPostProcessor。 该接口与BeanPostProcessor主要区别在于:BeanFactoryPostProcessor对Bean配置元数据进行操作。 也就是说,Spring IoC容器允许BeanFactoryPostProcessor读取配置元数据,并有可能在容器实例化实例任何bean之前更改元数据。 您可以配置多个BeanFactoryPostProcessor实例,并且可以BeanFactoryPostProcessor通过设置order属性来控制这些实例的运行顺序。但是,仅当BeanFactoryPostProcessor实现 Ordered接口时才能设置此属性。 如果希望更改实际bean实例(从配置元数据创建的对象),则需要使用BeanPostProcessor。 尽管在BeanFactoryPostProcessor中使用bean实例在技术上是可行的(例如,通过使用BeanFactory.getBean()), 但是这样做会导致过早的bean实例化,违反标准的容器生命周期。 这可能会导致负面的副作用,比如绕过bean的后处理。 另外,BeanFactoryPostProcessor实例的作用域为每个容器。 这只有在使用容器层次结构时才有用。 如果您在一个容器中定义了BeanFactoryPostProcessor,那么它只应用于该容器中的bean定义。 一个容器中的Bean定义不会被另一个容器中的BeanFactoryPostProcessor实例进行后处理,即使这两个容器属于同一层次结构。 当BeanFactoryPostProcessor在ApplicationContext中声明时,它将自动运行,以便对定义容器的配置元数据应用更改。 Spring包括许多预定义的bean工厂后处理器,如PropertyOverrideConfigurer和PropertySourcesPlaceholderConfigurer。 您还可以使用自定义BeanFactoryPostProcessor例如,用于注册自定义属性编辑器。 ApplicationContext自动检测部署其中实现BeanFactoryPostProcessor接口的任何bean。在适当的时候,这些bean会被bean factory post-processors来使用。 你也可以像部署任何其他bean一样部署这些自定义的bean factory post-processors。 示例:PropertySourcesPlaceholderConfigurer 您可以使用PropertySourcesPlaceholderConfigurer使用标准的Java属性格式将bean定义中的属性值外部化到单独的文件中。这样,部署应用程序的人员就可以自定义特定于环境的属性,比如数据库url和密码,而无需修改主XML定义文件或容器文件的复杂性或风险。 考虑以下基于xml的配置元数据片段,其中定义了具有占位符值的数据源: <bean class="org.springframework.context.support.PropertySourcesPlaceholderConfigurer"> <property name="locations" value="classpath:com/something/jdbc.properties"/> </bean> <bean id="dataSource" destroy-method="close" class="org.apache.commons.dbcp.BasicDataSource"> <property name="driverClassName" value="${jdbc.driverClassName}"/> <property name="url" value="${jdbc.url}"/> <property name="username" value="${jdbc.username}"/> <property name="password" value="${jdbc.password}"/> </bean> 该示例显示了从外部Properties文件配置的属性。 在运行时,将 PropertySourcesPlaceholderConfigurer应用于替换数据源的某些属性的元数据。将要替换的值指定为形式的占位符,该形式${property-name}遵循Ant和log4j和JSP EL样式。 实际值来自标准Java Properties格式的另一个文件: jdbc.driverClassName = org.hsqldb.jdbcDriver jdbc.url = jdbc:hsqldb:hsql://production:9002 jdbc.username = sa jdbc.password = root 因此,${jdbc.username}在运行时将字符串替换为值“sa”,并且其他与属性文件中的键匹配的占位符值也适用。 在PropertySourcesPlaceholderConfigurer为大多数属性和bean定义的属性占位符检查。此外,您可以自定义占位符前缀和后缀。 <bean class="org.springframework.context.support.PropertySourcesPlaceholderConfigurer"> <property name="locations" value="classpath:jdbc.properties"/> //自定义前缀后缀 <property name="placeholderPrefix" value="${"/> <property name="placeholderSuffix" value="}"/> </bean> 1.8.3。自定义实例化逻辑FactoryBean 您可以org.springframework.beans.factory.FactoryBean为本身就是工厂的对象实现接口。 该FactoryBean接口是可插入Spring IoC容器的实例化逻辑的一点。 如果您有复杂的初始化代码,而不是(可能)冗长的XML,可以用Java更好地表达,则以创建自己的代码 FactoryBean, 在该类中编写复杂的初始化,然后将自定义FactoryBean插入容器。 该FactoryBean界面提供了三种方法: Object getObject():返回此工厂创建的对象的实例。实例可以共享,具体取决于该工厂是否返回单例或原型。 boolean isSingleton():true如果FactoryBean返回单例或false其他则返回 。 Class getObjectType():返回getObject()方法返回的对象类型,或者null如果类型未知,则返回该对象类型。 FactoryBeanSpring框架中的许多地方都使用了该概念和接口。Spring附带了50多种FactoryBean接口实现。Spring中的了解的少,但是Mybatis的MybatisSqlSessionFactoryBean很出名。 当您需要向容器询问FactoryBean本身而不是由它产生的bean的实际实例时,请在调用的方法时在该bean的id前面加上“&”符号(&)。 因此,对于给定id为myBean的一个FactoryBean ,调用getBean("myBean")返回的是FactoryBean生成的实例,getBean("&myBean")返回的是FactoryBean本身。 public class MyFactoryBean implements FactoryBean<MyBean> { @Override public MyBean getObject() throws Exception { return new MyBean(); } @Override public Class<?> getObjectType() { return MyBean.class; } } <bean id="myFactoryBean" class="org.springframework.example.factoryBean.MyFactoryBean"/> getBean("myFactoryBean") 返回的是MyBean实例 getBean("&myFactoryBean") 返回的是MyFactoryBean实例

优秀的个人博客,低调大师

HBase2.0官方文档翻译-RegionServer Sizing Rules of Thumb

36. On the number of column families HBase currently does not do well with anything above two or three column families so keep the number of column families in your schema low. Currently, flushing and compactions are done on a per Region basis so if one column family is carrying the bulk of the data bringing on flushes, the adjacent families will also be flushed even though the amount of data they carry is small. When many column families exist the flushing and compaction interaction can make for a bunch of needless i/o (To be addressed by changing flushing and compaction to work on a per column family basis). For more information on compactions, see Compaction. HBase现在还不能很好的处理超过2、3个列族的情况,所以尽可能保持较少的列族数量。目前,flush和compact是基于region的,所以如果其中一个列族由于数据过多触发flush,其它列族即使数据较少,也会一起被flush。当许多列族同时进行flush和compact,会造成大量不必要的i/o(待通过修改为基于列族进行flush和compact来解决)。关于compact的更多信息,请查看Compaction章节。 Try to make do with one column family if you can in your schemas. Only introduce a second and third column family in the case where data access is usually column scoped; i.e. you query one column family or the other but usually not both at the one time. 可能的话,尝试只使用一个列族。只有当数据的访问总是涉及一定范围的列时可以考虑引入第二个或第三个列族;比如,你会查询这个列或另一个列,而不会同时查询。 36.1. 列族基数(Cardinality of ColumnFamilies) Where multiple ColumnFamilies exist in a single table, be aware of the cardinality (i.e., number of rows). If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1 billion rows, ColumnFamilyA’s data will likely be spread across many, many regions (and RegionServers). This makes mass scans for ColumnFamilyA less efficient. 如果表包含多个列族,需要注意基数问题(比如,行数)。如果ColumnFamilyA包含100万行而ColumnFamilyB包含10亿行,那么ColumnFamilyA的数据会被分散到很多很多region(以及RegionServer)中。这会使对ColumnFamilyA的大规模scan比较低效。 37. Rowkey Design 37.1. 热点(Hotspotting) Rows in HBase are sorted lexicographically by row key. This design optimizes for scans, allowing you to store related rows, or rows that will be read together, near each other. However, poorly designed row keys are a common source of hotspotting. Hotspotting occurs when a large amount of client traffic is directed at one node, or only a few nodes, of a cluster. This traffic may represent reads, writes, or other operations. The traffic overwhelms the single machine responsible for hosting that region, causing performance degradation and potentially leading to region unavailability. This can also have adverse effects on other regions hosted by the same region server as that host is unable to service the requested load. It is important to design data access patterns such that the cluster is fully and evenly utilized. HBase中的行按照rowkey的字典序存储。这种设计优化了scan,允许你把有关联的,或者会被一起读取的行放在临近的地方。然而,不良的行键设计是热点的常见来源。当大量的客户端流量被导向集群中的一个或者少数几个节点时,就会出现热点。流量可能是读取、写入,或者其它操作。流量会压垮托管这些region的单个机器,导致性能下降甚至region不可用。由于主机不能够再提供服务,所以这同样会对这些regionServer上的其它region带来负面影响。对数据访问模式进行设计,使集群得到充分和均匀的使用,是很重要的。 To prevent hotspotting on writes, design your row keys such that rows that truly do need to be in the same region are, but in the bigger picture, data is being written to multiple regions across the cluster, rather than one at a time. Some common techniques for avoiding hotspotting are described below, along with some of their advantages and drawbacks. 要避免写热点,需将rowkey设计为,确实需要临近的行才存在于同一个region,总体上看,数据写到集群中的多个region比一个要好。下面是一些常见的避免热点的技术手段,以及它们的优点和缺点。 加盐(Salting) Salting in this sense has nothing to do with cryptography, but refers to adding random data to the start of a row key. In this case, salting refers to adding a randomly-assigned prefix to the row key to cause it to sort differently than it otherwise would. The number of possible prefixes correspond to the number of regions you want to spread the data across. Salting can be helpful if you have a few "hot" row key patterns which come up over and over amongst other more evenly-distributed rows. Consider the following example, which shows that salting can spread write load across multiple RegionServers, and illustrates some of the negative implications for reads. 这里的加盐与密码学无关,而是关于在rowkey的开头添加随机数据。在本例中,加盐是指通过给rowkey增加随机分配的前缀,来使其排序不同于其它方式。可能的前缀数量与你希望将数据分散到的region数量一致。如果存在一些行,相对于其它分布均匀的行来说,总是反复出现,那么加盐就会有很用。考虑后面这个例子,其展示了加盐能够将写入压力分散到多个RegionServer,同时对读取的一些负面影响。 Example 11. Salting Example Suppose you have the following list of row keys, and your table is split such that there is one region for each letter of the alphabet. Prefix 'a' is one region, prefix 'b' is another. In this table, all rows starting with 'f' are in the same region. This example focuses on rows with keys like the following: 假设你有下面这个rowkey列表,并且表按照每个首字母对应一个region的方式split。前缀a为一个region,前缀b为另一个region。在这个表中,以f开头的行存在于同一个reigon。这个例子主要关注具有如下键的行: foo0001 foo0002 foo0003 foo0004 Now, imagine that you would like to spread these across four different regions. You decide to use four different salts: a, b, c, and d. In this scenario, each of these letter prefixes will be on a different region. After applying the salts, you have the following rowkeys instead. Since you can now write to four separate regions, you theoretically have four times the throughput when writing that you would have if all the writes were going to the same region. 现在,想象下你需要将他们分散到不同的region去。你决定使用四种不同的盐:a, b, c, and d。在这个场景里,每个字母前缀会位于不同的region。使用这些盐之后,取而代之的是以下行键。由于你现在可以写入到四个独立的region,理论上与全部写入到同一个region相比,你获取了四倍的吞吐。 a-foo0003 b-foo0001 c-foo0004 d-foo0002 Then, if you add another row, it will randomly be assigned one of the four possible salt values and end up near one of the existing rows. 然后,如果你增加其它行,它会被随机的分配到四种盐值之一,并且放在现有的行附近。 a-foo0003 b-foo0001 c-foo0003 c-foo0004 d-foo0002 Since this assignment will be random, you will need to do more work if you want to retrieve the rows in lexicographic order. In this way, salting attempts to increase throughput on writes, but has a cost during reads. 由于分配是随机的,你需要做一些额外的工作来恢复行的字典顺序。在这个方法中,加盐尝试增加写入的吞吐能力,但是增加了读取时的代价。 哈希(Hashing) Instead of a random assignment, you could use a one-way hash that would cause a given row to always be "salted" with the same prefix, in a way that would spread the load across the RegionServers, but allow for predictability during reads. Using a deterministic hash allows the client to reconstruct the complete rowkey and use a Get operation to retrieve that row as normal. 你可以用单向哈希使给定的行总是以相同的前缀加盐,来取代随机分配,这个方法可以将压力分散到各个regionServer,同时在读取的时候能够预知前缀。使用一个确定的哈希,客户端能够重新构造完整的rowkey,然后使用一个普通的get操作去获取行。 Example 12. Hashing Example Given the same situation in the salting example above, you could instead apply a one-way hash that would cause the row with key foo0003 to always, and predictably, receive the a prefix. Then, to retrieve that row, you would already know the key. You could also optimize things so that certain pairs of keys were always in the same region, for instance. 上面加盐的例子中,你可以换用一个单向哈希来使foo0003总是能够得到a这个前缀。这样的话,你已经知道了用什么key去获取行。你还可以做一些优化,例如,使特定的一些key总是位于同样的region。 反转键(Reversing the Key) A third common trick for preventing hotspotting is to reverse a fixed-width or numeric row key so that the part that changes the most often (the least significant digit) is first. This effectively randomizes row keys, but sacrifices row ordering properties. 第三种常见的避免热点的方法是将固定长度或数字类型的rowkey进行反转,这样变化频繁的部分就会到前面。这使得rowkey变得随机,不过会失去顺序性。 See https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables, and article on Salted Tables from the Phoenix project, and the discussion in the comments of HBASE-11682 for more information about avoiding hotspotting. 查看https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables, 和Phoenix项目中其它加盐表相关的文章,以及HBASE-11682中评论的讨论,以了解更多关于避免热点的信息。 37.2. 递增rowkey/时序数据(Monotonically Increasing Row Keys/Timeseries Data) In the HBase chapter of Tom White’s book Hadoop: The Definitive Guide (O’Reilly) there is a an optimization note on watching out for a phenomenon where an import process walks in lock-step with all clients in concert pounding one of the table’s regions (and thus, a single node), then moving onto the next region, etc. With monotonically increasing row-keys (i.e., using a timestamp), this will happen. See this comic by IKai Lan on why monotonically increasing row keys are problematic in BigTable-like datastores: monotonically increasing values are bad. The pile-up on a single region brought on by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general it’s best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key. 在Tom White关于Hadoop的书中的HBase章节中:在权威指南里面有一个优化说明,其中指出要注意这样一种现象,所有客户端的写入操作全部集中在表的某一个region(也即,单个节点),然后转换到下一个region,一直这样。 使用单向递增的rowkey时(例如,使用时间戳),这就会发生。参考IKai Lan的连载,关于为什么在BigTable类的数据库中单向递增的rowkey会是问题:monotonically increasing values are bad。 可以通过将输入记录随机化而变得无序来缓解单向递增key带来的单region压力,不过通常更好的做法是避免使用时间戳或是一个序列(比如:1,2,3)作为rowkey。 If you do need to upload time series data into HBase, you should study OpenTSDB as a successful example. It has a page describing the schema it uses in HBase. The key format in OpenTSDB is effectively metric_type, which would appear at first glance to contradict the previous advice about not using a timestamp as the key. However, the difference is that the timestamp is not in the lead position of the key, and the design assumption is that there are dozens or hundreds (or more) of different metric types. Thus, even with a continual stream of input data with a mix of metric types, the Puts are distributed across various points of regions in the table. 如果你需要将时序数据存入HBase,你应该将OpenTSDB作为一个成功案例去学习。它有个页面描述其在HBase中使用的模式。OpenTSDB使用metric_type作为key的格式,乍一看与之前建议的避免使用时间戳作为key相矛盾。不过,区别在于时间戳没有处于key的前导位,并且该设计假设会有几十或几百个不同的指标类型。因此,即使有连续的混杂不同指标类型的输入数据,写入也会分布到表的不同region中去。 See schema.casestudies for some rowkey design examples. 更多关于rowkey设计的示例可查看schema.casestudies。 37.3. 尽可能最小化row和column大小(Try to minimize row and column sizes) In HBase, values are always freighted with their coordinates; as a cell value passes through the system, it’ll be accompanied by its row, column name, and timestamp - always. If your rows and column names are large, especially compared to the size of the cell value, then you may run up against some interesting scenarios. One such is the case described by Marc Limotte at the tail of HBASE-3551 (recommended!). Therein, the indices that are kept on HBase storefiles (StoreFile (HFile)) to facilitate random access may end up occupying large chunks of the HBase allotted RAM because the cell value coordinates are large. Mark in the above cited comment suggests upping the block size so entries in the store file index happen at a larger interval or modify the table schema so it makes for smaller rows and column names. Compression will also make for larger indices. See the thread a question storefileIndexSize up on the user mailing list. 在HBase中,value总是带有其坐标;cell的value在系统中处理时总是携带着row,column名称,以及时间戳。如果你的row和column名称很大,尤其是相对于value来说,那么你可能会碰到一些有意思的情景。在HBASE-3551的末尾Marc Limotte描述了这样的一个案例。其中,由于cell的value坐标过大,storefiles中存储的用来加速随机访问的索引数据占用了大量的HBase可用内存。在之前的回复中,Mark建议增加block的大小,使得store file中能以更大的间隔产生index,或者修改表设计,使用更小的row和column名称。压缩也能够带来较大的索引。查看用户邮件列表中的这个主题:a question storefileIndexSize。 Most of the time small inefficiencies don’t matter all that much. Unfortunately, this is a case where they do. Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated several billion times in your data. 多数时候,细微的低效并不重要。不幸的是,该案例中正是由此导致的。无论选择怎样的列族、属性、和行键,它们总是会在你的数据中重复数十亿次。 See keyvalue for more information on HBase stores data internally to see why this is important. 查看keyvalue章节,了解关于HBase内部数据存储的更多信息,来理解为什么这很重要。 37.3.1. 列族(Column Families) Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default). See KeyValue for more information on HBase stores data internally to see why this is important. 尝试让列族名称尽可能短,最好是一个字符。查看keyvalue章节,了解关于HBase内部数据存储的更多信息,来理解为什么这很重要。 37.3.2. 属性(Attributes) Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via") to store in HBase. See keyvalue for more information on HBase stores data internally to see why this is important. 虽然详细的属性名称容易阅读,但是短一些更有利于存储到HBase中。查看keyvalue章节,了解关于HBase内部数据存储的更多信息,来理解为什么这很重要。 37.3.3. 行键长度(Rowkey Length) Keep them as short as is reasonable such that they can still be useful for required data access (e.g. Get vs. Scan). A short key that is useless for data access is not better than a longer key with better get/scan properties. Expect tradeoffs when designing rowkeys. 使其合理的简短而不丧失数据访问时的可用性。一个简短但对数据访问来说无用的键,并不比一个长一些的键更好。在设计行键的时候需要进行权衡。 37.3.4. 字节模式(Byte Patterns) A long is 8 bytes. You can store an unsigned number up to 18,446,744,073,709,551,615 in those eight bytes. If you stored this number as a String — presuming a byte per character — you need nearly 3x the bytes. long类型占8个字节。你可以存储一个小于18,446,744,073,709,551,615的数字。如果你将该数字存储为字符串-假设每个字符一个字节-你需要三倍的字节数。 Not convinced? Below is some sample code that you can run on your own. 不相信吗?下面是一些示例代码,你可以自己运行看看。 // long // long l = 1234567890L; byte[] lb = Bytes.toBytes(l); System.out.println("long bytes length: " + lb.length); // returns 8 String s = String.valueOf(l); byte[] sb = Bytes.toBytes(s); System.out.println("long as string length: " + sb.length); // returns 10 // hash // MessageDigest md = MessageDigest.getInstance("MD5"); byte[] digest = md.digest(Bytes.toBytes(s)); System.out.println("md5 digest bytes length: " + digest.length); // returns 16 String sDigest = new String(digest); byte[] sbDigest = Bytes.toBytes(sDigest); System.out.println("md5 digest as string length: " + sbDigest.length); // returns 26 Unfortunately, using a binary representation of a type will make your data harder to read outside of your code. For example, this is what you will see in the shell when you increment a value: 不幸的是,用二进制类型会导致你的数据在代码之外难以理解。例如,当你incr一个值时你会在shell中看到这些东西。 hbase(main):001:0> incr 't', 'r', 'f:q', 1 COUNTER VALUE = 1 hbase(main):002:0> get 't', 'r' COLUMN CELL f:q timestamp=1369163040570, value=\x00\x00\x00\x00\x00\x00\x00\x01 1 row(s) in 0.0310 seconds The shell makes a best effort to print a string, and it this case it decided to just print the hex. The same will happen to your row keys inside the region names. It can be okay if you know what’s being stored, but it might also be unreadable if arbitrary data can be put in the same cells. This is the main trade-off. shell会尽可能的打印出字符串,但在该示例中它决定只是打印十六进制。这同样会发生在你的region名称中的行键。如果你知道所存储的东西,这可以接受,但在cell中放入任意数据可能会失去可读性。这是主要的权衡点。 37.4.反转时间戳(Reverse Timestamps) 反向scan接口(Reverse Scan API) HBASE-4811 implements an API to scan a table or a range within a table in reverse, reducing the need to optimize your schema for forward or reverse scanning. This feature is available in HBase 0.98 and later. See Scan.setReversed() for more information. HBASE-4811实现了一个可以反向scan表或其中一个范围的接口,减少你因为正向或反向扫描而进行模式优化的需要。该功能在HBase 0.98或更高版本中可用。更多信息可查看Scan.setReversed()。 A common problem in database processing is quickly finding the most recent version of a value. A technique using reverse timestamps as a part of the key can help greatly with a special case of this problem. Also found in the HBase chapter of Tom White’s book Hadoop: The Definitive Guide (O’Reilly), the technique involves appending (Long.MAX_VALUE - timestamp) to the end of any key, e.g. key. The most recent value for [key] in a table can be found by performing a Scan for [key] and obtaining the first record. Since HBase keys are in sorted order, this key sorts before any older row-keys for [key] and thus is first. This technique would be used instead of using Number of Versions where the intent is to hold onto all versions "forever" (or a very long time) and at the same time quickly obtain access to any other version by using the same Scan technique. 数据库处理中有这样一个常见的问题,快速找到最新版本的一个值。在特定的与此有关的案例中,把反转时间戳作为key的一部分,会有很大的帮助。Tom White的hadoop书籍的HBase章节:权威指南,关于在任意key的末尾添加(Long.MAX_VALUE - timestamp)的技巧。 一个表中键的最新值可通过执行一个对该键的scan并获取第一个记录得到。由于HBase中的键是有序的,该键会排在更老的行键之前,因此是第一个。 这个技巧可被用来替代意图永久保留所有版本(或一个较长的时期)的多版本技术,并且同时可使用同样的扫描方式来快速获取任意其它版本数据。 37.5. 行键和列族(Rowkeys and ColumnFamilies) Rowkeys are scoped to ColumnFamilies. Thus, the same rowkey could exist in each ColumnFamily that exists in a table without collision. 行键的作用域是列族。因此,相同的行键可以存在于表的每个列族中而不会冲突。 37.6. 行键的不变性(Immutability of Rowkeys) Rowkeys cannot be changed. The only way they can be "changed" in a table is if the row is deleted and then re-inserted. This is a fairly common question on the HBase dist-list so it pays to get the rowkeys right the first time (and/or before you’ve inserted a lot of data). 行键是不可变的。唯一使它们“改变”的的方法使先删除再重新插入。这是一个很常见的问题,因此有必要一开始就使用正确的行键(在你插入很多数据之前)。 37.7. 行键和region分片的关系(Relationship Between RowKeys and Region Splits) If you pre-split your table, it is critical to understand how your rowkey will be distributed across the region boundaries. As an example of why this is important, consider the example of using displayable hex characters as the lead position of the key (e.g., "0000000000000000" to "ffffffffffffffff"). Running those key ranges through Bytes.split (which is the split strategy used when creating regions in Admin.createTable(byte[] startKey, byte[] endKey, numRegions) for 10 regions will generate the following splits…​ 如果你预拆分你的表,理解你的行键在region边界如何分布非常重要。考虑这个使用可见十六进制字符作为先导位的行键(比如,"0000000000000000" to "ffffffffffffffff")的例子,用来说明为什么这很重要。运行Bytes.split(使用Admin.createTable(byte[] startKey, byte[] endKey, numRegions )创建region时使用的分片策略)将该范围的行键分为10个region,会得到下面这些分片。 (note: the lead byte is listed to the right as a comment.) Given that the first split is a '0' and the last split is an 'f', everything is great, right? Not so fast. (注:首字节作为注释在右边列出)。假设第一个分片是'0',最后一个分片是'f',一切都挺好,是吗?先别急。 The problem is that all the data is going to pile up in the first 2 regions and the last region thus creating a "lumpy" (and possibly "hot") region problem. To understand why, refer to an ASCII Table. '0' is byte 48, and 'f' is byte 102, but there is a huge gap in byte values (bytes 58 to 96) that will never appear in this keyspace because the only values are [0-9] and [a-f]. Thus, the middle regions will never be used. To make pre-splitting work with this example keyspace, a custom definition of splits (i.e., and not relying on the built-in split method) is required. 问题在于,所有数据会集中在前2个region以及最后1个region,因而带来了热点region问题。参考ASCII表来理解为什么。'0' 对应字节的值为48,'f'对应字节的值为102,但由于可能的值只有[0-9]和[a-f],其中有很大一部分字节值(58-96)不会出现在行键区间中,此时需要一个自定义分片策略(比如,不依赖内置的分片方法)。 Lesson #1: Pre-splitting tables is generally a best practice, but you need to pre-split them in such a way that all the regions are accessible in the keyspace. While this example demonstrated the problem with a hex-key keyspace, the same problem can happen with any keyspace. Know your data. 经验1:预拆分表通常是一个最佳实践,但你需要以一种所有region都会被访问的方式去拆分。这个例子只是演示了使用十六进制行键时的问题,使用其它任意行键都可能有类似问题。理解你的数据。 Lesson #2: While generally not advisable, using hex-keys (and more generally, displayable data) can still work with pre-split tables as long as all the created regions are accessible in the keyspace. 经验2:虽然通常不建议使用十六进制行键(通常采用可见字符),但只要能够使所有region都能被访问,就可以进行预拆分。 To conclude this example, the following is an example of how appropriate splits can be pre-created for hex-keys:. 作为总结,下面是一个如何为十六进制行键进行适当的预拆分的示例: public static boolean createTable(Admin admin, HTableDescriptor table, byte[][] splits) throws IOException { try { admin.createTable( table, splits ); return true; } catch (TableExistsException e) { logger.info("table " + table.getNameAsString() + " already exists"); // the table already exists... return false; } } public static byte[][] getHexSplits(String startKey, String endKey, int numRegions) { byte[][] splits = new byte[numRegions-1][]; BigInteger lowestKey = new BigInteger(startKey, 16); BigInteger highestKey = new BigInteger(endKey, 16); BigInteger range = highestKey.subtract(lowestKey); BigInteger regionIncrement = range.divide(BigInteger.valueOf(numRegions)); lowestKey = lowestKey.add(regionIncrement); for(int i=0; i < numRegions-1;i++) { BigInteger key = lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i))); byte[] b = String.format("%016x", key).getBytes(); splits[i] = b; } return splits; } 38. Number of Versions 38.1. 最大版本数(Maximum Number of Versions) The maximum number of row versions to store is configured per column family via HColumnDescriptor. The default for max versions is 1. This is an important parameter because as described in Data Model section HBase does not overwrite row values, but rather stores different values per row by time (and qualifier). Excess versions are removed during major compactions. The number of max versions may need to be increased or decreased depending on application needs. 行的最大保存版本数通过HColumnDescriptor为每个列族配置。默认最大版本数为1.这是个很重要的参数,正如数据模型章节所述,HBase不会覆盖数据,而是按时间(和限定符)为每行保存不同的值。多余的版本会在major合并时删除。最大版本数可根据应用需要增大或减小。 It is not recommended setting the number of max versions to an exceedingly high level (e.g., hundreds or more) unless those old values are very dear to you because this will greatly increase StoreFile size. 不建议将最大版本数设置的过大(比如,几百或更多),因为这会大幅增加StoreFile的大小,除非那些旧数据对你来说很有价值。 38.2. 最小版本数(Minimum Number of Versions) Like maximum number of row versions, the minimum number of row versions to keep is configured per column family via HColumnDescriptor. The default for min versions is 0, which means the feature is disabled. The minimum number of row versions parameter is used together with the time-to-live parameter and can be combined with the number of row versions parameter to allow configurations such as "keep the last T minutes worth of data, at most N versions, but keep at least M versions around" (where M is the value for minimum number of row versions, M 与最大版本数一样,最小版本数也通过HColumnDescriptor为每个列族配置。默认最小版本数为0,意味着该功能未启用。最小版本数可以与存活时间以及最大版本数一起使用,来进行"保留最近T分钟内的数据,最多N个版本,但最少要保留M个版本"(M代表最小版本数,M 39. Supported Datatypes HBase supports a "bytes-in/bytes-out" interface via Put and Result, so anything that can be converted to an array of bytes can be stored as a value. Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes. HBase通过Put和Result支持"字节输入/字节输出"接口,所以可被转换为字节数组的任意东西都能够被作为值存储。输入可以是字符串、数字、组合对象或者甚至图片也可以只要它们可以被表示为字节。 There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask); search the mailing list for conversations on this topic. All rows in HBase conform to the Data Model, and that includes versioning. Take that into consideration when making your design, as well as block size for the ColumnFamily. 实际上对于值大小有一些限制(比如,在HBase中存储10-50MB的对象可能要求太高);搜索邮件列表来查看与此话题相关的讨论。HBase中的所有行都需要遵循数据模型,包括版本控制。与列族的块大小一样,你需要在设计时考虑这些。 39.1. Counters One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers). See Increment in Table. 特别值得一提的一种数据类型是"计数器"(用于实现数值原子递增)。See Increment in Table. Synchronization on counters are done on the RegionServer, not in the client. 对计数器的同步是在RegionServer完成,而不是客户端。 40. Joins If you have multiple tables, don’t forget to factor in the potential for Joins into the schema design. 如果你有多个表,不要忘记将连接的潜力考虑到模式设计中。 41. Time To Live (TTL) ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached. This applies to all versions of a row - even the current one. The TTL time encoded in the HBase for the row is specified in UTC. 列族可以设置以秒为单位的存活时间,HBase会在过期时自动删除这些行。这将应用到行的所有版本-甚至当前那个。存活时间在HBase中采用UTC进行编码。 Store files which contains only expired rows are deleted on minor compaction. Setting hbase.store.delete.expired.storefile to false disables this feature. Setting minimum number of versions to other than 0 also disables this. See HColumnDescriptor for more information. 只包含已过期数据的Store files会在minor合并的时候 被删除。可将hbase.store.delete.expired.storefile设置为false来禁用此功能。也可以将最小版本数设置为大于0的值来禁用。更多信息可查看HColumnDescriptor. Recent versions of HBase also support setting time to live on a per cell basis. See HBASE-10560 for more information. Cell TTLs are submitted as an attribute on mutation requests (Appends, Increments, Puts, etc.) using Mutation#setTTL. If the TTL attribute is set, it will be applied to all cells updated on the server by the operation. There are two notable differences between cell TTL handling and ColumnFamily TTLs: Cell TTLs are expressed in units of milliseconds instead of seconds. A cell TTLs cannot extend the effective lifetime of a cell beyond a ColumnFamily level TTL setting. 最近版本的HBase支持基于每个cell设置存活时间。更新信息查看HBASE-10560。cell的存活时间,通过Mutation#setTTL方法,将其作为mutation请求的一个属性进行提交。如果设置了存活时间属性,则会应用到被此操作更新的所有cell。cell的存活时间和列族的存活时间有2个明显的不同: cell的存活时间单位是毫秒而不是秒。 cell的存活时间不能超过列族的存活时间而延长cell的有效寿命。 42. Keeping Deleted Cells By default, delete markers extend back to the beginning of time. Therefore, Get or Scan operations will not see a deleted cell (row or column), even when the Get or Scan operation indicates a time range before the delete marker was placed. 默认情况下,删除标记会作用至最开始的时间。因此,Get或Scan操作将不会看到已删除的cell(行或列),即使其指定了早于删除标记的时间范围。 ColumnFamilies can optionally keep deleted cells. In this case, deleted cells can still be retrieved, as long as these operations specify a time range that ends before the timestamp of any delete that would affect the cells. This allows for point-in-time queries even in the presence of deletes. 列族可以选择保留已删除cell。这种情况下,已删除的cell可以被获取,只要操作所指定的时间范围,早于这些cell的删除操作的时间点。这允许在存在删除的情况下,进行任意时间点的查询。 Deleted cells are still subject to TTL and there will never be more than "maximum number of versions" deleted cells. A new "raw" scan options returns all deleted rows and the delete markers. 已删除的cell依然受存活时间和最大版本数的约束。一个新的"raw"scan选项可返回所有已删除的行和删除标记。 通过shell修改KEEP_DELETED_CELLS的值 hbase> hbase> alter ‘t1′, NAME => ‘f1′, KEEP_DELETED_CELLS => true 通过api修改KEEP_DELETED_CELLS的值 ... HColumnDescriptor.setKeepDeletedCells(true); ... Let us illustrate the basic effect of setting the KEEP_DELETED_CELLS attribute on a table.First, without:举例说明一下给表设置KEEP_DELETED_CELLS属性后的基本影响。 首先,未设置: create 'test', {NAME=>'e', VERSIONS=>2147483647} put 'test', 'r1', 'e:c1', 'value', 10 put 'test', 'r1', 'e:c1', 'value', 12 put 'test', 'r1', 'e:c1', 'value', 14 delete 'test', 'r1', 'e:c1', 11 hbase(main):017:0> scan 'test', {RAW=>true, VERSIONS=>1000} ROW COLUMN+CELL r1 column=e:c1, timestamp=14, value=value r1 column=e:c1, timestamp=12, value=value r1 column=e:c1, timestamp=11, type=DeleteColumn r1 column=e:c1, timestamp=10, value=value 1 row(s) in 0.0120 seconds hbase(main):018:0> flush 'test' 0 row(s) in 0.0350 seconds hbase(main):019:0> scan 'test', {RAW=>true, VERSIONS=>1000} ROW COLUMN+CELL r1 column=e:c1, timestamp=14, value=value r1 column=e:c1, timestamp=12, value=value r1 column=e:c1, timestamp=11, type=DeleteColumn 1 row(s) in 0.0120 seconds hbase(main):020:0> major_compact 'test' 0 row(s) in 0.0260 seconds hbase(main):021:0> scan 'test', {RAW=>true, VERSIONS=>1000} ROW COLUMN+CELL r1 column=e:c1, timestamp=14, value=value r1 column=e:c1, timestamp=12, value=value 1 row(s) in 0.0120 seconds Notice how delete cells are let go. 注意被删除的cell是如何消失的。 Now let’s run the same test only with KEEP_DELETED_CELLS set on the table (you can do table or per-column-family): 现在只给表增加KEEP_DELETED_CELLS设置(可以在表上或者列族上),并重新运行同样的测试: hbase(main):005:0> create 'test', {NAME=>'e', VERSIONS=>2147483647, KEEP_DELETED_CELLS => true} 0 row(s) in 0.2160 seconds => Hbase::Table - test hbase(main):006:0> put 'test', 'r1', 'e:c1', 'value', 10 0 row(s) in 0.1070 seconds hbase(main):007:0> put 'test', 'r1', 'e:c1', 'value', 12 0 row(s) in 0.0140 seconds hbase(main):008:0> put 'test', 'r1', 'e:c1', 'value', 14 0 row(s) in 0.0160 seconds hbase(main):009:0> delete 'test', 'r1', 'e:c1', 11 0 row(s) in 0.0290 seconds hbase(main):010:0> scan 'test', {RAW=>true, VERSIONS=>1000} ROW COLUMN+CELL r1 column=e:c1, timestamp=14, value=value r1 column=e:c1, timestamp=12, value=value r1 column=e:c1, timestamp=11, type=DeleteColumn r1 column=e:c1, timestamp=10, value=value 1 row(s) in 0.0550 seconds hbase(main):011:0> flush 'test' 0 row(s) in 0.2780 seconds hbase(main):012:0> scan 'test', {RAW=>true, VERSIONS=>1000} ROW COLUMN+CELL r1 column=e:c1, timestamp=14, value=value r1 column=e:c1, timestamp=12, value=value r1 column=e:c1, timestamp=11, type=DeleteColumn r1 column=e:c1, timestamp=10, value=value 1 row(s) in 0.0620 seconds hbase(main):013:0> major_compact 'test' 0 row(s) in 0.0530 seconds hbase(main):014:0> scan 'test', {RAW=>true, VERSIONS=>1000} ROW COLUMN+CELL r1 column=e:c1, timestamp=14, value=value r1 column=e:c1, timestamp=12, value=value r1 column=e:c1, timestamp=11, type=DeleteColumn r1 column=e:c1, timestamp=10, value=value 1 row(s) in 0.0650 seconds KEEP_DELETED_CELLS is to avoid removing Cells from HBase when the only reason to remove them is the delete marker. So with KEEP_DELETED_CELLS enabled deleted cells would get removed if either you write more versions than the configured max, or you have a TTL and Cells are in excess of the configured timeout, etc. KEEP_DELETED_CELLS用来避免删除那些只是被删除标记所删除的cell。因此KEEP_DELETED_CELLS启用时,如果超出最大版本数,或者超出了配置的存活时间,被delete的cell还是会被真正删除掉。 43. Secondary Indexes and Alternate Query Paths This section could also be titled "what if my table rowkey looks like this but I also want to query my table like that." A common example on the dist-list is where a row-key is of the format "user-timestamp" but there are reporting requirements on activity across users for certain time ranges. Thus, selecting by user is easy because it is in the lead position of the key, but time is not. 这个章节也可以使用"如果我的表行键是这样但是希望以那样的方式去查询"的标题。问题列表中常见的一个例子是行键的格式是"用户-时间戳",但存在按照特定时间范围查询用户活动的报表需求。此时,按用户查询很容易,因为它位于行键的先导位,但按时间查询就比较难。 There is no single answer on the best way to handle this because it depends on…​ 对于如何以最好的方式去解决该问题并没有单一的答案,因为这取决于... Number of users Data size and data arrival rate Flexibility of reporting requirements (e.g., completely ad-hoc date selection vs. pre-configured ranges) Desired execution speed of query (e.g., 90 seconds may be reasonable to some for an ad-hoc report, whereas it may be too long for others) 用户数量 数据大小和到达速率 报表需求的复杂度(比如,完全自由的日期选择 vs 预先配置范围) 查询所需的执行速度(比如,对于一个ad-hoc报表,90秒可能是合理的,但是对于其它情况就太久了) and solutions are also influenced by the size of the cluster and how much processing power you have to throw at the solution. Common techniques are in sub-sections below. This is a comprehensive, but not exhaustive, list of approaches. 而且解决方案也受集群大小和能够投入的处理器多少的影响。后面的子章节列出了常用的技术手段。这是一份全面而并不详尽的方法列表。 It should not be a surprise that secondary indexes require additional cluster space and processing. This is precisely what happens in an RDBMS because the act of creating an alternate index requires both space and processing cycles to update. RDBMS products are more advanced in this regard to handle alternative index management out of the box. However, HBase scales better at larger data volumes, so this is a feature trade-off. 毫无疑问二级索引需要额外的集群空间和处理.这就是关系型数据库中所发生的,因为创建额外索引既需要空间也需要花时间去更新。在开箱即用的索引管理方面,关系型数据库更为先进。然而,HBase在更大数据量是具备更好的扩展性,因此这是一个功能上的权衡。 Pay attention to Apache HBase Performance Tuning when implementing any of these approaches. 在实现那些方法时,请注意"性能调优"。 Additionally, see the David Butler response in this dist-list thread HBase, mail # user - Stargate+hbase 此外,可查看David Butler在问题列表中的回复,HBase, mail # user - Stargate+hbase。 43.1. (过滤器查询)Filter Query Depending on the case, it may be appropriate to use Client Request Filters. In this case, no secondary index is created. However, don’t try a full-scan on a large table like this from an application (i.e., single-threaded client). 根据具体情况,使用客户端过滤器进行请求可能时合适的。但是,不要尝试从应用程序中对一个大表进行全扫描(比如,单线程客户端)。 43.2. (周期性更新二级索引)Periodic-Update Secondary Index A secondary index could be created in another table which is periodically updated via a MapReduce job. The job could be executed intra-day, but depending on load-strategy it could still potentially be out of sync with the main data table. See mapreduce.example.readwrite for more information. 二级索引可通过另外一张表创建,通过MapReduce作业周期性更新。该作业可以当天运行,不过取决于负载策略,它仍然可能与主表不同步。 更多信息查看mapreduce.example.readwrite。 43.3. (多写二级索引)Dual-Write Secondary Index Another strategy is to build the secondary index while publishing data to the cluster (e.g., write to data table, write to index table). If this is approach is taken after a data table already exists, then bootstrapping will be needed for the secondary index with a MapReduce job (see secondary.indexes.periodic). 另一个策略是在写入数据到集群的时候构建二级索引(比如,写入数据表,然后写入索引表)。如果是对已存在的表采用该方法,则需要先执行一个MapReduce作业来进行初始化(查看secondary.indexes.periodic)。 43.4. (汇总表)Summary Tables Where time-ranges are very wide (e.g., year-long report) and where the data is voluminous, summary tables are a common approach. These would be generated with MapReduce jobs into another table. See mapreduce.example.summary for more information. 在时间范围很长且数据量很大时,汇总表是常用的方法。可通过MapReduce作业将其生成为另外一个表。 更多信息查看mapreduce.example.summary。 43.5. (协处理器二级索引)Coprocessor Secondary Index Coprocessors act like RDBMS triggers. These were added in 0.92. For more information, see coprocessors 协处理器类似关系型数据库中的触发器。在0.92版本中加入。更多信息,查看coprocessors。 44. Constraints HBase currently supports 'constraints' in traditional (SQL) database parlance. The advised usage for Constraints is in enforcing business rules for attributes in the table (e.g. make sure values are in the range 1-10). Constraints could also be used to enforce referential integrity, but this is strongly discouraged as it will dramatically decrease the write throughput of the tables where integrity checking is enabled. Extensive documentation on using Constraints can be found at Constraint since version 0.94. HBase现在支持传统数据库所说的"约束"。约束用来强制表中的属性遵守业务规则(比如,确保值在1-10之间)。约束也可以用来强制参照完整性,但是由于它会显著降低写吞吐,因此强烈不赞成使用。在0.94版本之后,关于如何使用约束,可查看扩展文档Constraint。 45. Schema Design Case Studies The following will describe some typical data ingestion use-cases with HBase, and how the rowkey design and construction can be approached. Note: this is just an illustration of potential approaches, not an exhaustive list. Know your data, and know your processing requirements. 以下会描述一些使用HBase进行数据获取的用户案例,以及如何进行行键设计和构造的方法。注:这里只是对可能的方法的说明,并非一个详尽的列表。理解你的数据,以及你的处理需求。 It is highly recommended that you read the rest of the HBase and Schema Design first, before reading these case studies. 强烈推荐你在阅读这些学习案例之前,先读一读HBase and Schema Design的剩余内容。 The following case studies are described: Log Data / Timeseries Data Log Data / Timeseries on Steroids Customer/Order Tall/Wide/Middle Schema Design List Data 以下描述的是这些案例: 日志数据 / 时序数据 日志数据 / 聚合时序数据 客户/订单 高/宽/中等 模式设计 列表数据 45.1. 案例学习-日志和时序数据(Case Study - Log Data and Timeseries Data) Assume that the following data elements are being collected. Hostname Timestamp Log event Value/message 假设收集到的是以下数据元素 主机名 时间戳 日志事件 值/消息 We can store them in an HBase table called LOG_DATA, but what will the rowkey be? From these attributes the rowkey will be some combination of hostname, timestamp, and log-event - but what specifically? 我们可以将它们存储在一个叫做LOG_DATA的表中,但是行键是什么呢?由这些属性可知,应该是主机、时间戳和日志事件的一些组合,但具体是什么? 45.1.1. 时间戳位于前导位(Timestamp In The Rowkey Lead Position) The rowkey timestamp[log-event] suffers from the monotonically increasing rowkey problem described in Monotonically Increasing Row Keys/Timeseries Data. timestamp[log-event]组成的行键会遇到Monotonically Increasing Row Keys/Timeseries Data中所描述的单调递增行键问题。 There is another pattern frequently mentioned in the dist-lists about "bucketing" timestamps, by performing a mod operation on the timestamp. If time-oriented scans are important, this could be a useful approach. Attention must be paid to the number of buckets, because this will require the same number of scans to return results. 还有另一种dist-lists中经常提到的,对时间戳取模进行分桶的模式。如果基于时间的扫描比较重要,这会是一个有用的方法。注意桶的数量,因为这会带来同样数量的scan,以返回结果。 long bucket = timestamp % numBuckets; to construct: [bucket][timestamp][hostname][log-event] As stated above, to select data for a particular timerange, a Scan will need to be performed for each bucket. 100 buckets, for example, will provide a wide distribution in the keyspace but it will require 100 Scans to obtain data for a single timestamp, so there are trade-offs. 如上所述,要获取一个特定时间范围的数据,需要对每个桶执行一个scan。比如100个桶,能够对键空间提供一个广泛的分布,但在获取某个时间戳范围的数据时需要100个scan,因此需要做权衡。 45.1.2. 主机名位于前导位(Host In The Rowkey Lead Position) The rowkey hostname[timestamp] is a candidate if there is a large-ish number of hosts to spread the writes and reads across the keyspace. This approach would be useful if scanning by hostname was a priority. 如果有很多的节点来分散对键空间的写入和读取,hostname[timestamp]也是个可选项。这个方法在主要以主机进行扫描时会比较有效。 45.1.3. 时间戳,或反转时间戳(Timestamp, or Reverse Timestamp?) If the most important access path is to pull most recent events, then storing the timestamps as reverse-timestamps (e.g., timestamp = Long.MAX_VALUE – timestamp) will create the property of being able to do a Scan on hostname to obtain the most recently captured events. 如果最重要的访问方式是得到最新的事件,那么以反转时间戳的方式存储的话(e.g., timestamp = Long.MAX_VALUE – timestamp),将产生这样的特性:在对hostname进行scan时可以获取最近得到的事件。 Neither approach is wrong, it just depends on what is most appropriate for the situation. 方法无所谓对错,只取决于对具体情况是否最为适合。 Reverse Scan API HBASE-4811 implements an API to scan a table or a range within a table in reverse, reducing the need to optimize your schema for forward or reverse scanning. This feature is available in HBase 0.98 and later. See Scan.setReversed() for more information. 反转scan接口 HBASE-4811实现了一个接口,用来反向扫描一个表或其中一个范围,以减少为能够反向扫描而所需的设计优化。在HBase 0.98及其后版本可用。See Scan.setReversed() for more information。 45.1.4. 变长 或 定长行键(Variable Length or Fixed Length Rowkeys?) It is critical to remember that rowkeys are stamped on every column in HBase. If the hostname is a and the event type is e1 then the resulting rowkey would be quite small. However, what if the ingested hostname is myserver1.mycompany.com and the event type is com.package1.subpackage2.subsubpackage3.ImportantService? 务必要记得,在HBase中行键会重复存在于每个列。如果主机名和事件类型分别是a和e1,行键就会非常小,但如果主机名和事件类型是myserver1.mycompany.com和com.package1.subpackage2.subsubpackage3.ImportantService呢? It might make sense to use some substitution in the rowkey. There are at least two approaches: hashed and numeric. In the Hostname In The Rowkey Lead Position example, it might look like this: 对行键进行一些替换也许是有意义的。至少有2种方法:哈希和数字。在主机名作为行键前导位的例子中,看起来是这样: Composite Rowkey With Hashes: [MD5 hash of hostname] = 16 bytes [MD5 hash of event-type] = 16 bytes [timestamp] = 8 bytes 使用哈希的组合行键 [MD5 hash of hostname] = 16 bytes [MD5 hash of event-type] = 16 bytes [timestamp] = 8 bytes Composite Rowkey With Numeric Substitution: For this approach another lookup table would be needed in addition to LOG_DATA, called LOG_TYPES. The rowkey of LOG_TYPES would be: type [bytes] variable length bytes for raw hostname or event-type. A column for this rowkey could be a long with an assigned number, which could be obtained by using an HBase counter So the resulting composite rowkey would be: [substituted long for hostname] = 8 bytes [substituted long for event type] = 8 bytes [timestamp] = 8 bytes 使用数字的组合行键 这个方法在LOG_DATA之外,还需要另一张叫做LOG_TYPE的查找表。LOG_TYPE表的行键是: type [bytes] 代表原始主机名和事件的定长字节数组 该行键的列可以是通过计数器获取到的一个数值。 因此最终的组合行键是这样: [substituted long for hostname] = 8 bytes [substituted long for event type] = 8 bytes [timestamp] = 8 bytes In either the Hash or Numeric substitution approach, the raw values for hostname and event-type can be stored as columns. 无论是用哈希还是数字的替换方法,主机名和事件类型的原始值都可以作为列进行存储。 45.2. 案例学习 - 日志数据和聚合时序数据(Case Study - Log Data and Timeseries Data on Steroids) This effectively is the OpenTSDB approach. What OpenTSDB does is re-write data and pack rows into columns for certain time-periods. For a detailed explanation, see: http://opentsdb.net/schema.html, and Lessons Learned from OpenTSDB from HBaseCon2012. 这实际上就是OpenTSDB采用的方法。它把数据进行重写并按照一定的时间周期将行打包成列。对其细节的解释, see: http://opentsdb.net/schema.html, and Lessons Learned from OpenTSDB from HBaseCon2012. But this is how the general concept works: data is ingested, for example, in this manner…​ hostname[timestamp1] hostname[timestamp2] hostname[timestamp3] with separate rowkeys for each detailed event, but is re-written like this…​ hostname[timerange] and each of the above events are converted into columns stored with a time-offset relative to the beginning timerange (e.g., every 5 minutes). This is obviously a very advanced processing technique, but HBase makes this possible. 不过这里展示了大概的工作原理:比如,数据以下面的方式被获取: [hostname][log-event][timestamp1] [hostname][log-event][timestamp2] [hostname][log-event][timestamp3] 每一个明细事件作为一个行键,但会被重写成这样: [hostname][log-event][timerange] 并且以上的每个事件,都会转换为一个列,存储着相对于起始时间范围的一个时间偏移(比如,每5分钟)。这显然是一个非常先进的处理技术,但是HBase使之成为可能。 45.3. (案例学习 - 客户/订单)Case Study - Customer/Order Assume that HBase is used to store customer and order information. There are two core record-types being ingested: a Customer record type, and Order record type. The Customer record type would include all the things that you’d typically expect: 假设使用HBase存储客户和订单信息。会获取到两种主要的记录类型:客户记录,和订单记录。 客户记录会包含如下内容: Customer number Customer name Address (e.g., city, state, zip) Phone numbers, etc. 订单记录会包含如下内容: Customer number Order number Sales date A series of nested objects for shipping locations and line-items (see Order Object Design for details) Assuming that the combination of customer number and sales order uniquely identify an order, these two attributes will compose the rowkey, and specifically a composite key such as: 假设客户号和订单号的组合唯一标识一个订单,对于订单表,将会由这2个属性组成行键,如下: [customer number][order number] for an ORDER table. However, there are more design decisions to make: are the raw values the best choices for rowkeys? 当然,还有更多设计决策需要去做:原始值对行键来说是不是最好的选择? The same design questions in the Log Data use-case confront us here. What is the keyspace of the customer number, and what is the format (e.g., numeric? alphanumeric?) As it is advantageous to use fixed-length keys in HBase, as well as keys that can support a reasonable spread in the keyspace, similar options appear: 日志数据案例中遇到的设计问题,这里一样存在。客户号的键空间是怎样的,格式如何(比如,数字?字符串?)在HBase中使用定长以及能够合理分布的行键是有益的,类似这样: Composite Rowkey With Hashes: [MD5 of customer number] = 16 bytes [MD5 of order number] = 16 bytes Composite Numeric/Hash Combo Rowkey: [substituted long for customer number] = 8 bytes [MD5 of order number] = 16 bytes 哈希方式组合行键: [MD5 of customer number] = 16 bytes [MD5 of order number] = 16 bytes 混合数字和哈希的方式组合行键: [substituted long for customer number] = 8 bytes [MD5 of order number] = 16 bytes 45.3.1. (单个表?多个表?)Single Table? Multiple Tables? A traditional design approach would have separate tables for CUSTOMER and SALES. Another option is to pack multiple record types into a single table (e.g., CUSTOMER++). 一个典型的设计方法是将客户和销售分为独立的表。另一个选项是将多种记录类型放到一个表中(比如,CUSTOMER++)。 Customer Record Type Rowkey: [customer-id] [type] = type indicating `1' for customer record type Order Record Type Rowkey: [customer-id] [type] = type indicating `2' for order record type [order] The advantage of this particular CUSTOMER++ approach is that organizes many different record-types by customer-id (e.g., a single scan could get you everything about that customer). The disadvantage is that it’s not as easy to scan for a particular record-type. 这种独特的CUSTOMER++方法的优势是将多种不同的记录类型通过客户id进行组织(比如,单个scan就可以获取该客户的所有数据)。劣势是对于特定的记录类型进行扫描不太容易。 45.3.2. (订单对象设计)Order Object Design Now we need to address how to model the Order object. Assume that the class structure is as follows: Order(an Order can have multiple ShippingLocations LineItem(a ShippingLocation can have multiple LineItems there are multiple options on storing this data. 选择我们需要解决如何对订单对象建模。假设类结构如下:订单(一个订单可以包含多个物流地址)明细项(一个物流地址可以含有多个明细项)对此类数据的存储有多种选择。 完全标准化(Completely Normalized) With this approach, there would be separate tables for ORDER, SHIPPING_LOCATION, and LINE_ITEM. 在这个方法中,将会分为ORDER, SHIPPING_LOCATION, and LINE_ITEM等独立的表。 The ORDER table’s rowkey was described above: schema.casestudies.custorder The SHIPPING_LOCATION’s composite rowkey would be something like this: [order-rowkey] shipping location number The LINE_ITEM table’s composite rowkey would be something like this: [order-rowkey] shipping location number line item number ORDER表的行键如上所述:schema.casestudies.custorder SHIPPING_LOCATION表的组合主键是: [order-rowkey] [shipping location number](e.g., 1st location, 2nd, etc.) LINE_ITEM表的组合主键是: [order-rowkey] [shipping location number](e.g., 1st location, 2nd, etc.) [line item number](e.g., 1st lineitem, 2nd, etc.) Such a normalized model is likely to be the approach with an RDBMS, but that’s not your only option with HBase. The cons of such an approach is that to retrieve information about any Order, you will need: Get on the ORDER table for the Order Scan on the SHIPPING_LOCATION table for that order to get the ShippingLocation instances Scan on the LINE_ITEM for each ShippingLocation granted, this is what an RDBMS would do under the covers anyway, but since there are no joins in HBase you’re just more aware of this fact. RDBMS中常会采用这样的一个标准模型,但在HBase中却非唯一选择。这种方法的缺点是,要检索任意订单的信息,你需要: 从ORDER表中获取订单信息 扫描SHIPPING_LOCATION表获取该订单的物流地址信息 扫描LINE_ITEM表获取每个物流地址的物品项 当然,这就是RDBMS底层实际所做的,但由于HBase不支持join,所以你更理解了这个事实。 带有记录类型的单个表(Single Table With Record Types) With this approach, there would exist a single table ORDER that would contain 在这个方法中,将会存在单个表ORDER,包含 The Order rowkey was described above: schema.casestudies.custorder [order-rowkey] [ORDER record type] The ShippingLocation composite rowkey would be something like this: [order-rowkey] [SHIPPING record type] shipping location number The LineItem composite rowkey would be something like this: [order-rowkey] [LINE record type] shipping location number line item number ORDER表的行键如上所述:schema.casestudies.custorder [order-rowkey] [ORDER record type] ShippingLocation表的组合行键是: [order-rowkey] [SHIPPING record type] [shipping location number](e.g., 1st location, 2nd, etc.) LineItem表的组合行键是: [order-rowkey] [LINE record type] [shipping location number](e.g., 1st location, 2nd, etc.) [line item number](e.g., 1st lineitem, 2nd, etc.) 非规范化(Denormalized) A variant of the Single Table With Record Types approach is to denormalize and flatten some of the object hierarchy, such as collapsing the ShippingLocation attributes onto each LineItem instance. 对带记录类型的单个表的一个变化,是将对象结构扁平化,比如将ShippingLocation属性放到每个明细项去。 LineItem表的组合行键是: [order-rowkey] [LINE record type] [line item number](e.g., 1st lineitem, 2nd, etc., care must be taken that there are unique across the entire order) LineItem表的列是: itemNumber quantity price shipToLine1 (denormalized from ShippingLocation) shipToLine2 (denormalized from ShippingLocation) shipToCity (denormalized from ShippingLocation) shipToState (denormalized from ShippingLocation) shipToZip (denormalized from ShippingLocation) The pros of this approach include a less complex object hierarchy, but one of the cons is that updating gets more complicated in case any of this information changes. 这个方法的优点是可以包含一些复杂对象结构,缺点是一旦信息有变将难以更新。 Object BLOB With this approach, the entire Order object graph is treated, in one way or another, as a BLOB. For example, the ORDER table’s rowkey was described above: schema.casestudies.custorder, and a single column called "order" would contain an object that could be deserialized that contained a container Order, ShippingLocations, and LineItems. 这个方法中,整个订单对象图,以这样或那样的方式,处理为BLOB。例如,订单表的行键如上所述:schema.casestudies.custorder,然后单个的称为order的列会包含一个可被反序列化的对象,包含Order, ShippingLocations, and LineItems. There are many options here: JSON, XML, Java Serialization, Avro, Hadoop Writables, etc. All of them are variants of the same approach: encode the object graph to a byte-array. Care should be taken with this approach to ensure backward compatibility in case the object model changes such that older persisted structures can still be read back out of HBase. 有多种选项:JSON, XML, Java Serialization, Avro, Hadoop Writables, 等等。它们都可以做到:将对象图编码为字节数组。对于该方法,需要注意的是,确保向后兼容,旧的数据结构在对象模型变化之后仍然能够从HBase中读取。 Pros are being able to manage complex object graphs with minimal I/O (e.g., a single HBase Get per Order in this example), but the cons include the aforementioned warning about backward compatibility of serialization, language dependencies of serialization (e.g., Java Serialization only works with Java clients), the fact that you have to deserialize the entire object to get any piece of information inside the BLOB, and the difficulty in getting frameworks like Hive to work with custom objects like this. 优点是可以通过很小的IO管理复杂的对象图(比如, 在该例中单个get请求就可以获取整个订单信息 ),但缺点如前所述,需要小心序列化方面的向后兼容,序列化的语言依赖(比如,java的序列化只能通过java的客户端),获取一点点数据也需要反序列化整个对象,以及类似Hive这样的框架难以处理此类自定义对象。 45.4. Case Study - "Tall/Wide/Middle" Schema Design Smackdown This section will describe additional schema design questions that appear on the dist-list, specifically about tall and wide tables. These are general guidelines and not laws - each application must consider its own needs. 这个章节将描述出现在dist-list中的另外一些设计问题,特别是关于高表和宽表。这些是一般性的指南而不是法律 - 每个应用必须考虑其自身所需。 45.4.1. 行 vs 版本(Rows vs. Versions) A common question is whether one should prefer rows or HBase’s built-in-versioning. The context is typically where there are "a lot" of versions of a row to be retained (e.g., where it is significantly above the HBase default of 1 max versions). The rows-approach would require storing a timestamp in some portion of the rowkey so that they would not overwrite with each successive update. Preference: Rows (generally speaking). 一个常见的问题是使用行还是内置的版本。典型的情况是那些一个行有很多版本需要保存(比如,明显需要超过默认的最大一个版本)。行的方式需要在行键的某个部分存储一个时间戳,从而不会覆盖每次更新。 优先:行(通常来说) 45.4.2. 行 vs 列(Rows vs. Columns) Another common question is whether one should prefer rows or columns. The context is typically in extreme cases of wide tables, such as having 1 row with 1 million attributes, or 1 million rows with 1 columns apiece. Preference: Rows (generally speaking). To be clear, this guideline is in the context is in extremely wide cases, not in the standard use-case where one needs to store a few dozen or hundred columns. But there is also a middle path between these two options, and that is "Rows as Columns." 另一个常见的问题是使用行还是列。典型的情况是较为极端的宽表,比如一行含有一百万列,或者一百万行各自含一个列。 优先:行(通常来说)。澄清一下,该准则针对极端宽表的情况,而不是常规的只需要存储几十或几百个列的使用场景。但在这两个选项之间还有一个中间选则,即"行作为列"。 45.4.3. 行作为列(Rows as Columns) The middle path between Rows vs. Columns is packing data that would be a separate row into columns, for certain rows. OpenTSDB is the best example of this case where a single row represents a defined time-range, and then discrete events are treated as columns. This approach is often more complex, and may require the additional complexity of re-writing your data, but has the advantage of being I/O efficient. For an overview of this approach, see schema.casestudies.log-steroids. 行vs列的中间选择是针对一些特定的行,将其数据打包作为列。 OpenTSDB 就是一个最好的例子,单个行表示一个既定的时间范围,而离散的事件作为列。这种方法通常会更复杂,并且需要额外的复杂度去重写你的数据,但在I/O性能上有优势。对方法的概要说明,查看schema.casestudies.log-steroids。 45.5. 案例学习 - 列表数据(Case Study - List Data) The following is an exchange from the user dist-list regarding a fairly common question: how to handle per-user list data in Apache HBase. 以下是来自用户dist-list的关于一个常见问题的交流:如何用HBase处理用户列表数据。 QUESTION * We’re looking at how to store a large amount of (per-user) list data in HBase, and we were trying to figure out what kind of access pattern made the most sense. One option is store the majority of the data in a key, so we could have something like: 我们在研究如何在HBase中存储大量列表数据,并尝试找出最有意义的访问模式。一个选项是将数据的主要部分存为一个键,看起来是这样: <FixedWidthUserName><FixedWidthValueId1>:"" (no value) <FixedWidthUserName><FixedWidthValueId2>:"" (no value) <FixedWidthUserName><FixedWidthValueId3>:"" (no value) The other option we had was to do this entirely using: 另一个选项是完全使用: <FixedWidthUserName><FixedWidthPageNum0>:<FixedWidthLength><FixedIdNextPageNum><ValueId1><ValueId2><ValueId3>... <FixedWidthUserName><FixedWidthPageNum1>:<FixedWidthLength><FixedIdNextPageNum><ValueId1><ValueId2><ValueId3>... where each row would contain multiple values. So in one case reading the first thirty values would be: 每行会包含多个值。因此读取前三十个值的话前者可以这样: scan { STARTROW => 'FixedWidthUsername' LIMIT => 30} And in the second case it would be 而后者是这样: get 'FixedWidthUserName\x00\x00\x00\x00' The general usage pattern would be to read only the first 30 values of these lists, with infrequent access reading deeper into the lists. Some users would have ⇐ 30 total values in these lists, and some users would have millions (i.e. power-law distribution) 常见的使用方式是只从列表中读取前30行,较少去读取更多。有些用户的列表共有30行,而有些用户则有百万行。 The single-value format seems like it would take up more space on HBase, but would offer some improved retrieval / pagination flexibility. Would there be any significant performance advantages to be able to paginate via gets vs paginating with scans? 单个值的格式在HBase中看起来会占用更多空间,但能够提供更优的检索/分页灵活性。通过gets分页是否比scans分页有明显的性能优势? My initial understanding was that doing a scan should be faster if our paging size is unknown (and caching is set appropriately), but that gets should be faster if we’ll always need the same page size. I’ve ended up hearing different people tell me opposite things about performance. I assume the page sizes would be relatively consistent, so for most use cases we could guarantee that we only wanted one page of data in the fixed-page-length case. I would also assume that we would have infrequent updates, but may have inserts into the middle of these lists (meaning we’d need to update all subsequent rows). Thanks for help / suggestions / follow-up questions. 我最初的理解是,如果分页大小未知的话,执行一个scan会比较快,但如果总是需要同样的分页大小,那么gets会更快。我听到其他人对于性能有不同看法。我假设分页大小会相对一致,因此对于大多用例,我们可以保证我们只获取固定大小的一页数据。我也假设我们很少更新,但在列表的中间插入数据(意味着我们需要更新所有后续的行)。 ANSWER * If I understand you correctly, you’re ultimately trying to store triples in the form "user, valueid, value", right? E.g., something like: 如果我理解的没错,你本质上是想存储"user, valueid, value"的元组?类似这样: "user123, firstname, Paul", "user234, lastname, Smith" (But the usernames are fixed width, and the valueids are fixed width). (不过usernames为定长,并且valueids也是定长)。 And, your access pattern is along the lines of: "for user X, list the next 30 values, starting with valueid Y". Is that right? And these values should be returned sorted by valueid? The tl;dr version is that you should probably go with one row per user+value, and not build a complicated intra-row pagination scheme on your own unless you’re really sure it is needed. 并且,你的访问模式是"对于用户x,列出从Y开始的30个值"。是这样吗?另外,这些值需要以valueid顺序返回? tl;dr版本是,你或许应该每个user+value作为一行,而不是去亲自构建一个行内分页模式,除非你确定这是需要的。 Your two options mirror a common question people have when designing HBase schemas: should I go "tall" or "wide"? Your first schema is "tall": each row represents one value for one user, and so there are many rows in the table for each user; the row key is user + valueid, and there would be (presumably) a single column qualifier that means "the value". This is great if you want to scan over rows in sorted order by row key (thus my question above, about whether these ids are sorted correctly). You can start a scan at any user+valueid, read the next 30, and be done. What you’re giving up is the ability to have transactional guarantees around all the rows for one user, but it doesn’t sound like you need that. Doing it this way is generally recommended (see here https://hbase.apache.org/book.html#schema.smackdown). 你的两个选项反映了人们在设计HBase模式时的一个常见问题:应该用高表还是宽表?你第一个模式时高表:每一行代表一个用户的一个值;行键是user + valueid,且只有一个列限定符叫做"the value"。如果你想基于有序行键进行扫描的话,这很不错。你可以从任意的user+valueid开始一个scan,读取接下来的30行,就可以了。你所放弃的是对于某个用户所有行的事务保证方面的能力,但貌似你并不需要这个。这是通常所推荐的方式(看这里:https://hbase.apache.org/book.html#schema.smackdown)。 Your second option is "wide": you store a bunch of values in one row, using different qualifiers (where the qualifier is the valueid). The simple way to do that would be to just store ALL values for one user in a single row. I’m guessing you jumped to the "paginated" version because you’re assuming that storing millions of columns in a single row would be bad for performance, which may or may not be true; as long as you’re not trying to do too much in a single request, or do things like scanning over and returning all of the cells in the row, it shouldn’t be fundamentally worse. The client has methods that allow you to get specific slices of columns. 你的第二个选项是宽表:你在一行中存储一批值,用不同的限定符(这里使用valueid)。要做到这样只需要简单的将单个用户的数据存为一行。我猜你想到了分页版本,因为你假定在一行中存储百万列性能会比较差,但未必是这样;只要你没有试图在单个请求中获取过多数据,或扫描并返回行的所有cell,实际上就不会更差。客户端有一些方法,允许你指定部分列。 Note that neither case fundamentally uses more disk space than the other; you’re just "shifting" part of the identifying information for a value either to the left (into the row key, in option one) or to the right (into the column qualifiers in option 2). Under the covers, every key/value still stores the whole row key, and column family name. (If this is a bit confusing, take an hour and watch Lars George’s excellent video about understanding HBase schema design: http://www.youtube.com/watch?v=_HLoH_PgrLk). 注意,没有哪个选项会占用更多的空间;你只是将值的标识信息放在左边(行键中)或右边(列限定符)。在底层,每个键值对仍然会存储整个行键和列名称。(如果有一些困惑,花一个小时看下Lars George关于理解HBase模式设计的视频:http://www.youtube.com/watch?v=_HLoH_PgrLk) A manually paginated version has lots more complexities, as you note, like having to keep track of how many things are in each page, re-shuffling if new values are inserted, etc. That seems significantly more complex. It might have some slight speed advantages (or disadvantages!) at extremely high throughput, and the only way to really know that would be to try it out. If you don’t have time to build it both ways and compare, my advice would be to start with the simplest option (one row per user+value). Start simple and iterate! ) 手工分页的版本更为复杂,如你所知,比如需要跟踪每页有多少内容,有新的数据插入时需要重新调整,等等。这看起来明显更为复杂。也许在极端高吞吐情况下,它会有微小的速度优势(或劣势),但只能通过测试来知道真实情况。如果你没时间去构建它们并比较,我的建议是从最简单的选项开始(每个user+value作为一行)。从简单开始然后迭代!) 46. Operational and Performance Configuration Options 46.1. 优化HBase 服务端RPC处理(Tune HBase Server RPC Handling) Set hbase.regionserver.handler.count (in hbase-site.xml) to cores x spindles for concurrency. Optionally, split the call queues into separate read and write queues for differentiated service. The parameter hbase.ipc.server.callqueue.handler.factor specifies the number of call queues: 0 means a single shared queue 1 means one queue for each handler. A value between 0 and 1 allocates the number of queues proportionally to the number of handlers. For instance, a value of .5 shares one queue between each two handlers. Use hbase.ipc.server.callqueue.read.ratio (hbase.ipc.server.callqueue.read.share in 0.98) to split the call queues into read and write queues: 0.5 means there will be the same number of read and write queues < 0.5 for more read than write > 0.5 for more write than read Set hbase.ipc.server.callqueue.scan.ratio (HBase 1.0+) to split read call queues into small-read and long-read queues: 0.5 means that there will be the same number of short-read and long-read queues < 0.5 for more short-read > 0.5 for more long-read 将hbase.regionserver.handler.count设置为cpu数量的倍数. 可选的,针对不同服务将请求队列进行隔离,hbase.ipc.server.callqueue.handler.factor参数定义了请求队列的数量: 0 代表共用1个队列。 1 代表每个handler对应1个队列。 0-1中间的值,代表根据handler的数量,按比例分配队列。比如,0.5意味着2个handler共用1个队列。 使用hbase.ipc.server.callqueue.read.ratio将请求队列拆分为读和写队列: 0.5 代表读队列和写队列数量一样 < 0.5 代表读队列更多 > 0.5 代表写队列更多 配置hbase.ipc.server.callqueue.scan.ratio (HBase 1.0+) 将读队列拆分为short-read和long-read队列: 0.5 代表short-read和long-read队列数量一样 < 0.5 代表short-read队列更多 > 0.5 代表long-read队列更多 46.2. 对RPC禁用Nagle(Disable Nagle for RPC) Disable Nagle’s algorithm. Delayed ACKs can add up to ~200ms to RPC round trip time. Set the following parameters: In Hadoop’s core-site.xml: ipc.server.tcpnodelay = true ipc.client.tcpnodelay = true In HBase’s hbase-site.xml: hbase.ipc.client.tcpnodelay = true hbase.ipc.server.tcpnodelay = true 禁用Nagle算法. 延迟的ACKs会将RPC往返时间最多增加到200ms。 Set the following parameters: In Hadoop’s core-site.xml: ipc.server.tcpnodelay = true ipc.client.tcpnodelay = true In HBase’s hbase-site.xml: hbase.ipc.client.tcpnodelay = true hbase.ipc.server.tcpnodelay = true 46.3. 限制服务端错误影响(Limit Server Failure Impact) Detect regionserver failure as fast as reasonable. Set the following parameters: In hbase-site.xml, set zookeeper.session.timeout to 30 seconds or less to bound failure detection (20-30 seconds is a good start). Notice: the sessionTimeout of zookeeper is limited between 2 times and 20 times the tickTime(the basic time unit in milliseconds used by ZooKeeper.the default value is 2000ms.It is used to do heartbeats and the minimum session timeout will be twice the tickTime). Detect and avoid unhealthy or failed HDFS DataNodes: in hdfs-site.xml and hbase-site.xml, set the following parameters: dfs.namenode.avoid.read.stale.datanode = true dfs.namenode.avoid.write.stale.datanode = true 在合理范围内尽快发现regionserver的错误. 配置以下参数: 在hbase-site.xml中, 将zookeeper.session.timeout设置为30秒或更少 (20-30秒是个不错的开始)。 注意: zookeeper的会话超时时间被限制为tickTime的2倍到20倍之间(ZooKeeper使用的一个基本时间单位.默认值是2000ms.它被用来发送心跳,且最小的会话过期时间应2倍于此值)。 发现和避免非健康或失败的HDFS节点: in hdfs-site.xml and hbase-site.xml, set the following parameters: dfs.namenode.avoid.read.stale.datanode = true dfs.namenode.avoid.write.stale.datanode = true 46.4. Optimize on the Server Side for Low Latency Skip the network for local blocks when the RegionServer goes to read from HDFS by exploiting HDFS’s Short-Circuit Local Reads facility. Note how setup must be done both at the datanode and on the dfsclient ends of the conneciton — i.e. at the RegionServer and how both ends need to have loaded the hadoop native .so library. After configuring your hadoop setting dfs.client.read.shortcircuit to true and configuring the dfs.domain.socket.path path for the datanode and dfsclient to share and restarting, next configure the regionserver/dfsclient side. 当RegionServer从HDFS读取时,利用HDFS的短路读特性,针对本地块可以跳过网络。注意需要在datanode和dfsclient中同时配置,并且都需要加载Hadoop的本地.so库。将hadoop的dfs.client.read.shortcircuit设置为true,并且配置dfs.domain.socket.path用来共享,然后重启,接下来配置regionserver端。 In hbase-site.xml, set the following parameters: dfs.client.read.shortcircuit = true dfs.client.read.shortcircuit.skip.checksum = true so we don’t double checksum (HBase does its own checksumming to save on i/os. See hbase.regionserver.checksum.verify for more on this. dfs.domain.socket.path to match what was set for the datanodes. dfs.client.read.shortcircuit.buffer.size = 131072 Important to avoid OOME — hbase has a default it uses if unset, see hbase.dfs.client.read.shortcircuit.buffer.size; its default is 131072. Ensure data locality. In hbase-site.xml, set hbase.hstore.min.locality.to.skip.major.compact = 0.7 (Meaning that 0.7 <= n <= 1) Make sure DataNodes have enough handlers for block transfers. In hdfs-site.xml, set the following parameters: dfs.datanode.max.xcievers >= 8192 dfs.datanode.handler.count = number of spindles Check the RegionServer logs after restart. You should only see complaint if misconfiguration. Otherwise, shortcircuit read operates quietly in background. It does not provide metrics so no optics on how effective it is but read latencies should show a marked improvement, especially if good data locality, lots of random reads, and dataset is larger than available cache. 重启之后检查RegionServer的日志。如果配置错误会看到异常日志。否则,短路读并不会有显式的输出。它并未提供监控指标,所以效果如何不好看出,但是读取延迟应该会有显著提升,尤其是如果数据有较好的本地性,大量的随机读取,且数据集远大于可用缓存。 Other advanced configurations that you might play with, especially if shortcircuit functionality is complaining in the logs, include dfs.client.read.shortcircuit.streams.cache.size and dfs.client.socketcache.capacity. Documentation is sparse on these options. You’ll have to read source code. 另一个你可能需要处理的高级配置,尤其是当日志里出现关于短路功能异常时,包含dfs.client.read.shortcircuit.streams.cache.size 和 dfs.client.socketcache.capacity。它们的配置文档比较分散。你可能需要阅读源码。 For more on short-circuit reads, see Colin’s old blog on rollout, How Improved Short-Circuit Local Reads Bring Better Performance and Security to Hadoop. The HDFS-347 issue also makes for an interesting read showing the HDFS community at its best (caveat a few comments). 更多关于短路读的信息,可以查看Colin的旧博客,How Improved Short-Circuit Local Reads Bring Better Performance and Security to Hadoop。有兴趣的话可以阅读HDFS-347,其展示了HDFS社区在这上面的努力(一些评论值得关注)。 46.5. JVM Tuning 46.5.1. Tune JVM GC for low collection latencies Use the CMS collector: -XX:+UseConcMarkSweepGC Keep eden space as small as possible to minimize average collection time. Example: -XX:CMSInitiatingOccupancyFraction=70Optimize for low collection latency rather than throughput: -Xmn512m Collect eden in parallel: -XX:+UseParNewGC Avoid collection under pressure: -XX:+UseCMSInitiatingOccupancyOnly Limit per request scanner result sizing so everything fits into survivor space but doesn’t tenure. In hbase-site.xml, set hbase.client.scanner.max.result.size to 1/8th of eden space (with -Xmn512m this is ~51MB ) Set max.result.size x handler.count less than survivor space 使用CMS收集器:-XX:+UseConcMarkSweepGC 使eden区尽可能的小,来最小化平均收集时间。例如:-XX:CMSInitiatingOccupancyFraction=70 为低延迟而不是吞吐进行优化:-Xmn512m eden区使用并行收集:-XX:+UseParNewGC 避免在压力大时收集:-XX:+UseCMSInitiatingOccupancyOnly 限制单个请求的结果大小,从而都可以放到survivor区而不是tenure区。在hbase-site.xml中,配置hbase.client.scanner.max.result.size为eden区的八分之一(with -Xmn512m this is ~51MB) 使max.result.size x handler.count小于survivor区。 46.5.2. OS-Level Tuning Turn transparent huge pages (THP) off: echo never > /sys/kernel/mm/transparent_hugepage/enabledecho never > /sys/kernel/mm/transparent_hugepage/defragSet vm.swappiness = 0 Set vm.min_free_kbytes to at least 1GB (8GB on larger memory systems) Disable NUMA zone reclaim with vm.zone_reclaim_mode = 0x 47. Special Cases 47.1. 对于那些希望快速失败而非等待的应用(For applications where failing quickly is better than waiting) In hbase-site.xml on the client side, set the following parameters: Set hbase.client.pause = 1000 Set hbase.client.retries.number = 3 If you want to ride over splits and region moves, increase hbase.client.retries.number substantially (>= 20) Set the RecoverableZookeeper retry count: zookeeper.recovery.retry = 1 (no retry) In hbase-site.xml on the server side, set the Zookeeper session timeout for detecting server failures: zookeeper.session.timeout ⇐ 30 seconds (20-30 is good). 47.2. 对于那些能够容忍稍微过时信息的应用(For applications that can tolerate slightly out of date information) HBase timeline consistency (HBASE-10070) With read replicas enabled, read-only copies of regions (replicas) are distributed over the cluster. One RegionServer services the default or primary replica, which is the only replica that can service writes. Other RegionServers serve the secondary replicas, follow the primary RegionServer, and only see committed updates. The secondary replicas are read-only, but can serve reads immediately while the primary is failing over, cutting read availability blips from seconds to milliseconds. Phoenix supports timeline consistency as of 4.4.0 Tips: Deploy HBase 1.0.0 or later. Enable timeline consistent replicas on the server side. Use one of the following methods to set timeline consistency: Use ALTER SESSION SET CONSISTENCY = 'TIMELINE’ Set the connection property Consistency to timeline in the JDBC connect string HBase时间线一致性(HBase -10070)在启用读副本的情况下,region的只读副本分布在集群中。 一个RegionServer提供默认的或主副本服务, 写服务只能由该副本提供. 其它RegionServers提供从副本服务, 跟进主RegionServer, 只对已提交的更新可见.从副本是只读的,但当主副本挂掉时,能够立即提供读服务,将读不可用的时间从秒级减少到毫秒级。 Phoenix从4.4.0开始支持时间线一致性: 部署HBase 1.0.0之后的版本。 在服务端启用时间线一致性. 使用下述的方法之一来配置时间线一致性: Use ALTER SESSION SET CONSISTENCY = 'TIMELINE’ Set the connection property Consistency to timeline in the JDBC connect string

资源下载

更多资源
腾讯云软件源

腾讯云软件源

为解决软件依赖安装时官方源访问速度慢的问题,腾讯云为一些软件搭建了缓存服务。您可以通过使用腾讯云软件源站来提升依赖包的安装速度。为了方便用户自由搭建服务架构,目前腾讯云软件源站支持公网访问和内网访问。

Nacos

Nacos

Nacos /nɑ:kəʊs/ 是 Dynamic Naming and Configuration Service 的首字母简称,一个易于构建 AI Agent 应用的动态服务发现、配置管理和AI智能体管理平台。Nacos 致力于帮助您发现、配置和管理微服务及AI智能体应用。Nacos 提供了一组简单易用的特性集,帮助您快速实现动态服务发现、服务配置、服务元数据、流量管理。Nacos 帮助您更敏捷和容易地构建、交付和管理微服务平台。

Spring

Spring

Spring框架(Spring Framework)是由Rod Johnson于2002年提出的开源Java企业级应用框架,旨在通过使用JavaBean替代传统EJB实现方式降低企业级编程开发的复杂性。该框架基于简单性、可测试性和松耦合性设计理念,提供核心容器、应用上下文、数据访问集成等模块,支持整合Hibernate、Struts等第三方框架,其适用范围不仅限于服务器端开发,绝大多数Java应用均可从中受益。

Sublime Text

Sublime Text

Sublime Text具有漂亮的用户界面和强大的功能,例如代码缩略图,Python的插件,代码段等。还可自定义键绑定,菜单和工具栏。Sublime Text 的主要功能包括:拼写检查,书签,完整的 Python API , Goto 功能,即时项目切换,多选择,多窗口等等。Sublime Text 是一个跨平台的编辑器,同时支持Windows、Linux、Mac OS X等操作系统。