房地产网站建设方案/学校网站建设
文章目录
- 前言
- Ozone Upgrade相关要素关系
- Ozone Upgrade的执行器模型
- 相关链接
前言
最近Ozone社区刚刚发布了Ozone 1.1版本,这也是Ozone发布GA版本以来的第二个版本release了。当越来越多Ozone版本release后,这里就会有个版本升级的问题。可能有同学会好奇:目前Ozone支持版本Upgrade功能吗?据社区目前的进展,这个功能第一阶段实现已经基本完成,预计会在Ozone 1.2版本中和大家见面。在Ozone第一阶段Upgrade功能的实现里,一个基本的Upgrade框架模型基本成型,不过目前是non-rolling upgrade的升级,是需要集群downtime升级的。本文就简单和大家聊聊Ozone Upgrade的一个整体实现思路,比之前笔者写过的Ozone Upgrade文章会更加详细一些。
Ozone Upgrade相关要素关系
首先我们要清楚涉及到一次完整的Ozone Upgrade升级,这里面会牵扯到哪些要素。
如果论Upgrade的状态来说,这里会有Upgrade不同阶段的不同状态,比如before Upgrade, finalize upgrade和finalized阶段。
说服务的话,毫无疑问OM, SCM和Datanode服务当然是首当其冲的服务,这是第一层我们直接能够联想到的东西。
那么再往下深入一层呢,还有什么?这里我们要提到的东西有版本,version的概念。
然后在一个version里,还会有2个与其紧密相关的要素:1)每个version对应的可用的feature。2)每个version对应的layout,layout可理解为数据的一种分布方式。
然后我们再来看feature这个概念,这里就不得不提feature compatible的问题了。新老feature的compatible问题绝对也是Upgrade过程需要特别关注的。
综合上面提及到的种种要素,我们大体可以得出下面的一个Ozone Upgrade关系要素图(下图中还未提及到的概念在下文中会再进行具体阐述)。
Ozone Upgrade的执行器模型
分析完Ozone Uggrade相关要素后,我们再具体了解Ozone Upgrade的整个执行过程。
一个复杂集群的升级并不是一个简单的Upgrade命令的事情,往往里面还会涉及到Upgrade前后需要依赖执行的各种操作。Ozone在这边对整个Upgrade的动作进行了如下步骤的拆分执行,
...public void execute(T component, BasicUpgradeFinalizer finalizer)throws IOException {try {finalizer.emitStartingMsg();finalizer.getVersionManager().setUpgradeState(FINALIZATION_IN_PROGRESS);// 升级前需要完成的步骤finalizer.preFinalizeUpgrade(component);// 升级时需要做的事情finalizer.finalizeUpgrade(component);// 升级后需要做的操作finalizer.postFinalizeUpgrade(component);finalizer.emitFinishedMsg();} catch (Exception e) {LOG.warn("Upgrade Finalization failed with following Exception. ", e);if (finalizer.getVersionManager().needsFinalization()) {finalizer.getVersionManager().setUpgradeState(FINALIZATION_REQUIRED);throw (e);}} finally {finalizer.markFinalizationDone();}}
}
...
上面BasicUpgradeFinalizer扮演的是一个Upgrade执行器的角,每个component有其对应的component。
这里以SCM服务的UpgradeFinalizer为例,它需要执行如下的preFinalizeUpgrade操作,来进行Pipeline的关闭,这样就不会允许写操作的执行。
/*** UpgradeFinalizer for the Storage Container Manager service.*/
public class SCMUpgradeFinalizer extendsBasicUpgradeFinalizer<StorageContainerManager, HDDSLayoutVersionManager> {public SCMUpgradeFinalizer(HDDSLayoutVersionManager versionManager) {super(versionManager);}// This should be called in the context of a separate finalize upgrade thread.// This function can block indefinitely till the conditions are met to safely// finalize Upgrade.@Overridepublic void preFinalizeUpgrade(StorageContainerManager scm)throws IOException {/** Before we can call finalize the feature, we need to make sure that* all existing pipelines are closed and pipeline Manger would freeze* all new pipeline creation.*/String msg = " Existing pipelines and containers will be closed " +"during Upgrade.";msg += "\n New pipelines creation will remain frozen until Upgrade " +"is finalized.";PipelineManager pipelineManager = scm.getPipelineManager();// Pipeline creation will remain frozen until postFinalizeUpgrade()pipelineManager.freezePipelineCreation();waitForAllPipelinesToDestroy(pipelineManager);// We can not yet move all the existing data nodes to HEALTHY-READONLY// state since the next heartbeat will move them back to HEALTHY state.// This has to wait till postFinalizeUpgrade, when SCM MLV version is// already upgraded as part of finalize processing.// While in this state, it should be safe to do finalize processing for// all new features. This will also update ondisk mlv version. Any// disrupting upgrade can add a hook here to make sure that SCM is in a// consistent state while finalizing the upgrade.logAndEmit(msg);},,,
然后升级完成后,执行后续的postFinalizeUpgrade操作来重新激活Pipeline。
public void postFinalizeUpgrade(StorageContainerManager scm)throws IOException {// Don 't wait for next heartbeat from datanodes in order to move them to// Healthy - Readonly state. Force them to Healthy ReadOnly state so that// we can resume pipeline creation right away.scm.getScmNodeManager().forceNodesToHealthyReadOnly();PipelineManager pipelineManager = scm.getPipelineManager();pipelineManager.resumePipelineCreation();// Wait for at least one pipeline to be created before finishing// finalization, so clients can write.boolean hasPipeline = false;while (!hasPipeline) {int pipelineCount = pipelineManager.getPipelines(HddsProtos.ReplicationType.RATIS, HddsProtos.ReplicationFactor.THREE,Pipeline.PipelineState.OPEN).size();hasPipeline = (pipelineCount >= 1);if (!hasPipeline) {LOG.info("Waiting for at least one pipeline after SCM finalization.");try {Thread.sleep(5000);} catch (InterruptedException e) {// Try again on next loop iteration.}} else {LOG.info("Pipeline found after SCM finalization");}}emitFinishedMsg();}
然后我们再回过头来看finalize操作的真正执行操作,finalizeUpgrade方法。这个方法里面实质做的操作即feature操作的finalize的执行。
protected void finalizeUpgrade(Supplier<Storage> storageSuppplier)throws UpgradeException {// 获取到那些还没有finalize的feature(升级带来的新feature),for (Object obj : versionManager.unfinalizedFeatures()) {LayoutFeature lf = (LayoutFeature) obj;Storage layoutStorage = storageSuppplier.get();// 获取新feature里的在filize阶段需要执行的action操作Optional<? extends UpgradeAction> action = lf.action(ON_FINALIZE);// 执行上面的action操作finalizeFeature(lf, layoutStorage, action);updateLayoutVersionInVersionFile(lf, layoutStorage);// finalzie此feature,此操作意为此feature已经升级成功versionManager.finalized(lf);}versionManager.completeFinalization();}
从上面代码可以看到,Ozone的feature里面是包含了不同阶段需要执行的action,比如下面这个类:
/*** List of OM Layout features / versions.*/
public enum OMLayoutFeature implements LayoutFeature {// //INITIAL_VERSION(0, "Initial Layout Version");/// /// Example OM Layout Feature with Actions// CREATE_EC(1, "",// new ImmutablePair<>(ON_FINALIZE, new OnFinalizeECAction()),// new ImmutablePair<>(FIRST_RUN_ON_UPGRADE,// new OnFirstUpgradeStartECAction());//// //private int layoutVersion;private String description;private EnumMap<UpgradeActionType, OmUpgradeAction> actions =new EnumMap<>(UpgradeActionType.class);
.../*** upgrade action执行的阶段*/enum UpgradeActionType {// Run every time an un-finalized component is started up.VALIDATE_IN_PREFINALIZE,// Run exactly once when an upgraded cluster is detected with this new// layout version.// NOTE 1 : This will not be run in a NEW cluster!// NOTE 2 : This needs to be a backward compatible action until a DOWNGRADE// hook is provided!// NOTE 3 : These actions are not submitted through RATIS (TODO)ON_FIRST_UPGRADE_START,// Run exactly once during finalization of layout feature.ON_FINALIZE}
...
我们看到,在UpgradeActionType执行阶段的定义里还有升级前的初始启动校验操作和初次action操作。这些action可以很好的作为新feature对已有老feature的兼容行为操作。
最后我们再来看这里的UpgradeAction的定义,UpgradeAction表示的意思是一个特定feature在特定Upgrade阶段的action操作行为。比如下面这个action操作为SCM HA功能在升级前需要执行的检测操作。
@UpgradeActionHdds(feature = SCM_HA, component = SCM,type = VALIDATE_IN_PREFINALIZE)
public class ScmHAUnfinalizedStateValidationActionimplements HDDSUpgradeAction<StorageContainerManager> {@Overridepublic void execute(StorageContainerManager scm) throws Exception {boolean isHAEnabled =scm.getConfiguration().getBoolean(ScmConfigKeys.OZONE_SCM_HA_ENABLE_KEY,ScmConfigKeys.OZONE_SCM_HA_ENABLE_DEFAULT);if (isHAEnabled) {throw new UpgradeException(String.format("Configuration %s cannot be " +"used until SCM upgrade has been finalized",ScmConfigKeys.OZONE_SCM_HA_ENABLE_KEY),UpgradeException.ResultCodes.PREFINALIZE_ACTION_VALIDATION_FAILED);}}
}
这里用了annotation的方式进行了UpgradeAction操作信息的标注。
综上所述,Ozone的这套UpgradeAction,Finalizer的设计实现使得整个Upgrade执行变得更加的独立和灵活,对原有代码逻辑的侵入性也不至于过强。
以上就是本文要阐述的关于Ozone Upgrade的相关的内容了,对Ozone Upgrade感兴趣的同学还可以阅读笔者之前写过的另外一篇博文Ozone OM Upgrade期间请求一致性处理的保证。另外本文提及的代码可参考文末链接处。
相关链接
[1].https://github.com/apache/ozone/blob/HDDS-3698-nonrolling-upgrade/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/upgrade/BasicUpgradeFinalizer.java
[2].https://github.com/apache/ozone/blob/HDDS-3698-nonrolling-upgrade/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/upgrade/ScmHAUnfinalizedStateValidationAction.java