应对失败的方法（coping with failure）

时间：2014-05-15 16:39:50 收藏：0 阅读：281

我们曾经说过ES能应对node出现故障的事情，那就来试试吧。我们先吧第一个node给干掉，如下图：

bubuko.com,布布扣

我们干掉的第一个node是master node。但是为了ES能够正常的工作，一个集群cluster必须要有一个master node，所以，干掉master node后ES首要事情就是重新选择一个node作为新的master master，就是node2.

primary shard1和primary shard2随着master node的干掉而丢失了，这是ES因为缺少了primary shard也不能正常工作了，如果此时在检查cluster health，就会发现状态就是red：表示并不是所有的primary shard都是被激活。

幸好，这个丢失的node中的两个primary shard的备份存在于另外的两个node中，所以新的master node要把丢失的prmary shard的备份变为primary shard，然后把cluster health变为yellow。这个过程是瞬间的，就像是转动开关一样。

那么，为什么cluster health的状态是yellow而不是green呢？我们有3个primary shard，但是我们其玩，每个primary shard对应的两个replica shard都启动，但是目前仅仅有一个replica被分配了。

就是这个原因，ES的状态不是green，但是不用担心：即使我们把node2也给删除了，我们的ES照样能在不丢失数据情况下正常工作，因为node3有每个shard的数据备份。

那么，现在你应该对ES是怎么横向扩展并且保证数据安全有一个合理的认识了，稍后我们将会了解shard生命周期的更详细的知识。

原文：http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_coping_with_failure.html#_coping_with_failure