Redis Sentinel - High Availability
We can get the high availability of Redis Replica setup using Redis Sentinel Service.
What is the sentinel process ?
The Sentinel process is a Redis instance which was started with the -sentinel option and it only needs a config file that tells the sentinel which Redis master it should monitor.
Lets see the key benefits of using sentinel,
Automatic fail-over. If a master is not working as expected, Sentinel can start a failover process where a replica is promoted to master, the other additional replicas are reconfigured to use the new master, and the applications using the Redis server are informed about the new address to use when connecting.
Monitoring : Sentinel constantly checks if your master and replica instances are working as expected.
Notification. Sentinel can notify the system administrator, or other computer programs, via an API, that something is wrong with one of the monitored Redis instances.
Configuration provider. Sentinel acts as a source of authority for clients service discovery: clients connect to Sentinels in order to ask for the address of the current Redis master responsible for a given service. If a failover occurs, Sentinels will report the new address.
In this blog we will see how to configure the sentinel process and how to achieve high availability using this.
Configuration
Here, I have configured three sentinel nodes and one replica setup (Master and two slaves).
For each sentinels we need to add the below entries,
sentinel monitor mymaster 127.0.0.1 6379 2
- It tells the Redis to monitor a master called mymaster, that is at address 127.0.0.1 and port 6379, with a quorum of 2.
- The quorum is the number of Sentinels that need to agree about the fact the master is not reachable, in order to really mark the master as failing, and eventually start a failover procedure if possible.
sentinel down-after-milliseconds mymaster 60000
- down-after-milliseconds is the time in milliseconds an instance should not be reachable (either does not reply to our PINGs or it is replying with an error) for a Sentinel starting to think it is down.
sentinel parallel-syncs mymaster 2
- parallel-syncs sets the number of replicas that can be reconfigured to use the new master after a failover at the same time
Replication setup configuration :
For each Redis replica setup we need to add below variable along with normal replication configuration options.
replicaof 127.0.0.1 6379
Replica Setup :
127.0.0.1 6379 Master
127.0.0.1 6380 Slave1
127.0.0.1 6381 Slave2
To verify the Replication health,
Sentinel Setup :
127.0.0.1 26379 Sentinel1
127.0.0.1 26380 Sentinel2
127.0.0.1 26381 Sentinel3
Once you have started the sentinel process it discovers all the sentinels that connected with the particular master, we can see the below details in each sentinel.conf file.
We get the following details in sentinel log,
4686:X 23 Apr 2020 21:48:22.857 # +monitor master mymaster 127.0.0.1 6379 quorum 2
We can get the all the sentinels details using below command,
sentinel sentinels mymaster
We can get the current Master and slave details using following command in sentinel server,
sentinel get-master-addr-by-name mymaster
Sentinel slaves mymaster
Sentinel master mymaster
Lets see what will happen when the master is unresponsive,
I have shutdown the Redis Master 127.0.0.1:6379 after the I could see below details are logged in sentinel logs let me explain line by line,
4686:X 23 Apr 2020 21:52:22.599 # +sdown master mymaster 127.0.0.1 6379
4686:X 23 Apr 2020 21:52:22.661 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2
4686:X 23 Apr 2020 21:52:22.661 # +new-epoch 1
The specified instance is now in Subjectively Down state and the current epoch was updated.
4686:X 23 Apr 2020 21:52:22.771 # +elected-leader master mymaster 127.0.0.1 6379
4686:X 23 Apr 2020 21:52:22.771 # +failover-state-select-slave master mymaster 127.0.0.1 6379
4686:X 23 Apr 2020 21:52:22.854 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
Won the election for the specified epoch, can do the failover. New failover state is select-slave: we are trying to find a suitable replica for promotion.We found the specified good replica to promote.
4686:X 23 Apr 2020 21:52:22.854 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
4686:X 23 Apr 2020 21:52:22.913 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
4686:X 23 Apr 2020 21:52:22.954 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
We are trying to reconfigure the promoted replica as master, waiting for it to switch.New failover state is select-slave: we are trying to find a suitable replica for promotion.
4686:X 23 Apr 2020 21:52:23.004 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
4686:X 23 Apr 2020 21:52:23.970 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
4686:X 23 Apr 2020 21:52:23.970 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
4686:X 23 Apr 2020 21:52:24.060 # +failover-end master mymaster 127.0.0.1 6379
The replica being reconfigured showed to be a replica of the new master ip:port pair, but the synchronization process is not yet complete.The replica is now synchronized with the new master.
The failover terminated with success. All the replicas appears to be reconfigured to replicate with the new master.
After that I have started the old master again and we now the old master also joins the replica set.
4686:X 23 Apr 2020 21:52:24.060 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
The above lines tells the old master successfully added as a slave of new master.
Finally verify the Replication health of the new master,
Now, the slave is successfully promoted as Master and slaves are also sync with the new master.
this setup is working fine for static ip address but if you try for hostname setup the old master which gets resurrected after shutdown doesn't rejoin the master-slave setup unless one of the sentinels need to be rebooted manually so you have any knowledge about this.