Dropping the DR node

c_baker · ‎12-09-2022

(...continuing from Part 2)

Dropping the DR node

Dropping the DR node can be done in 3 steps from the primary RMA:

Drop database replication to DR:

1> sap_disable_replication Toronto, Offsite, tpcc

2> go

 TASKNAME            TYPE              VALUE                                                                                                                                                     

 ------------------- ----------------- --------------------------------------------------------------------------------------------------------------------------------------------------------- 

 Disable Replication Start Time        Thu Dec 08 21:54:47 UTC 2022                                                                                                                              

 Disable Replication Elapsed Time      00:00:45                                                                                                                                                  

 DisableReplication  Task Name         Disable Replication                                                                                                                                       

 DisableReplication  Task State        Completed                                                                                                                                                 

 DisableReplication  Short Description Disable the flow of Replication                                                                                                                           

 DisableReplication  Long Description  Successfully disabled Replication for database 'tpcc'. Please execute 'sap_enable_replication Toronto, Offsite, tpcc' to enable replication for database. 

 DisableReplication  Task Start        Thu Dec 08 21:54:47 UTC 2022                                                                                                                              

 DisableReplication  Task End          Thu Dec 08 21:55:32 UTC 2022                                                                                                                              

 DisableReplication  Hostname          primarynode.openstack.na-ca-1.cloud.sap                                                                                                                   



(9 rows affected)

1>

Remove DR node from the HADR system:

1> sap_update_replication remove Offsite

2> go

 TASKNAME           TYPE              VALUE                                                                       

 ------------------ ----------------- --------------------------------------------------------------------------- 

 Update Replication Start Time        Thu Dec 08 21:57:52 UTC 2022                                                

 Update Replication Elapsed Time      00:01:54                                                                    

 UpdateReplication  Task Name         Update Replication                                                          

 UpdateReplication  Task State        Completed                                                                   

 UpdateReplication  Short Description Update configuration for a currently replicating site.                      

 UpdateReplication  Long Description  Update replication request to remove host 'Offsite' completed successfully. 

 UpdateReplication  Task Start        Thu Dec 08 21:57:52 UTC 2022                                                

 UpdateReplication  Task End          Thu Dec 08 21:59:46 UTC 2022                                                

 UpdateReplication  Hostname          primarynode.openstack.na-ca-1.cloud.sap                                     



(9 rows affected)

Clean up replication definitions to the DR host:

1> sap_drop_host Offsite  

2> go

 TASKNAME    TYPE              VALUE                                                                

 ----------- ----------------- -------------------------------------------------------------------- 

 Drop Host   Start Time        Thu Dec 08 22:03:12 UTC 2022                                         

 Drop Host   Elapsed Time      00:00:01                                                             

 DropHostApi Task Name         Drop Host                                                            

 DropHostApi Task State        Completed                                                            

 DropHostApi Short Description Drop the logical host from the environment.                          

 DropHostApi Long Description  Submission of the design change for a model property was successful. 

 DropHostApi Task Start        Thu Dec 08 22:03:12 UTC 2022                                         

 DropHostApi Task End          Thu Dec 08 22:03:13 UTC 2022                                         

 DropHostApi Hostname          primarynode.openstack.na-ca-1.cloud.sap                              



(9 rows affected)

The DR host is now removed from the HADR environment:

1> sap_status path

2> go

 PATH                  NAME                      VALUE                   INFO                                                                                 

 --------------------- ------------------------- ----------------------- ------------------------------------------------------------------------------------ 

                       Start Time                2022-12-08 22:03:50.498 Time command started executing.                                                      

                       Elapsed Time              00:00:00                Command execution time.                                                              

 London                Hostname                  companionnode           Logical host name.                                                                   

 London                HADR Status               Standby : Inactive      Identify the primary and standby sites.                                              

 London                Synchronization Mode      Synchronous             The configured Synchronization Mode value.                                           

 London                Synchronization State     Inactive                Synchronization Mode in which replication is currently operating.                    

 London                Distribution Mode         Remote                  Configured value for the distribution_mode replication model property.               

 London                Replication Server Status Active                  The status of Replication Server.                                                    

 Toronto               Hostname                  primarynode             Logical host name.                                                                   

 Toronto               HADR Status               Primary : Active        Identify the primary and standby sites.                                              

 Toronto               Synchronization Mode      Synchronous             The configured Synchronization Mode value.                                           

 Toronto               Synchronization State     Synchronous             Synchronization Mode in which replication is currently operating.                    

 Toronto               Distribution Mode         Remote                  Configured value for the distribution_mode replication model property.               

 Toronto               Replication Server Status Active                  The status of Replication Server.                                                    

 London.Toronto.DEM    State                     Suspended               Path is suspended (Replication Agent Thread). Transactions are not being replicated. 

 London.Toronto.DEM    Latency Time              Unknown                 No latency information for database 'DEM'.                                           

 London.Toronto.DEM    Latency                   Unknown                 No latency information for database 'DEM'.                                           

 London.Toronto.DEM    Commit Time               Unknown                 No last commit time for the database 'DEM'.                                          

 London.Toronto.DEM    Distribution Path         Toronto                 The path of Replication Server through which transactions travel.                    

 London.Toronto.DEM    Drain Status              Unknown                 The drain status of the transaction logs of the primary database server.             

 London.Toronto.master State                     Suspended               Path is suspended (Replication Agent Thread). Transactions are not being replicated. 

 London.Toronto.master Latency Time              Unknown                 No latency information for database 'master'.                                        

 London.Toronto.master Latency                   Unknown                 No latency information for database 'master'.                                        

 London.Toronto.master Commit Time               Unknown                 No last commit time for the database 'master'.                                       

 London.Toronto.master Distribution Path         Toronto                 The path of Replication Server through which transactions travel.                    

 London.Toronto.master Drain Status              Unknown                 The drain status of the transaction logs of the primary database server.             

 London.Toronto.tpcc   State                     Suspended               Path is suspended (Replication Agent Thread). Transactions are not being replicated. 

 London.Toronto.tpcc   Latency Time              Unknown                 No latency information for database 'tpcc'.                                          

 London.Toronto.tpcc   Latency                   Unknown                 No latency information for database 'tpcc'.                                          

 London.Toronto.tpcc   Commit Time               Unknown                 No last commit time for the database 'tpcc'.                                         

 London.Toronto.tpcc   Distribution Path         Toronto                 The path of Replication Server through which transactions travel.                    

 London.Toronto.tpcc   Drain Status              Unknown                 The drain status of the transaction logs of the primary database server.             

 Toronto.London.DEM    State                     Active                  Path is active and replication can occur.                                            

 Toronto.London.DEM    Latency Time              2022-12-06 18:47:31.278 Time latency last calculated                                                         

 Toronto.London.DEM    Latency                   379                     Latency (ms)                                                                         

 Toronto.London.DEM    Commit Time               2022-12-06 18:47:31.284 Time last commit replicated                                                          

 Toronto.London.DEM    Distribution Path         London                  The path of Replication Server through which transactions travel.                    

 Toronto.London.DEM    Drain Status              Not Applicable          The drain status of the transaction logs of the primary database server.             

 Toronto.London.master State                     Active                  Path is active and replication can occur.                                            

 Toronto.London.master Latency Time              2022-12-06 18:47:31.286 Time latency last calculated                                                         

 Toronto.London.master Latency                   383                     Latency (ms)                                                                         

 Toronto.London.master Commit Time               2022-12-06 18:47:31.286 Time last commit replicated                                                          

 Toronto.London.master Distribution Path         London                  The path of Replication Server through which transactions travel.                    

 Toronto.London.master Drain Status              Not Applicable          The drain status of the transaction logs of the primary database server.             

 Toronto.London.tpcc   State                     Active                  Path is active and replication can occur.                                            

 Toronto.London.tpcc   Latency Time              2022-12-06 18:47:31.286 Time latency last calculated                                                         

 Toronto.London.tpcc   Latency                   383                     Latency (ms)                                                                         

 Toronto.London.tpcc   Commit Time               2022-12-06 19:33:53.846 Time last commit replicated                                                          

 Toronto.London.tpcc   Distribution Path         London                  The path of Replication Server through which transactions travel.                    

 Toronto.London.tpcc   Drain Status              Not Applicable          The drain status of the transaction logs of the primary database server.             



(50 rows affected)

which can also be confirmed by connecting to the DR node with isql:

1> sp_configure 'HADR mode'

2> go

 Parameter Name   Default     Memory Used   Config Value   Run Value    Unit           Type    

 ---------------- ----------- ------------- -------------- ------------ -------------- ------- 

 HADR mode                 -1           0             -1             -1 not applicable dynamic 



(1 row affected)

'-1' as a run value indicates that this instance is no longer participating in any HADR environment.

There is still an RMA instance running. The SRS instance should already be shutdown and removed.

If the database is to be used for other purposes, it can be further cleaned up. Steps are documented at: Removing the DR Node from the HADR System. Otherwise, cleaning up the node and recreating the DR instance, adding the DR node back into the HADR cluster might be desirable for testing.

The following steps can be taken to help clean up the DR node if resetting to add it again instead of attempting to reuse the ASE instance:

Stop the RMA instance (connect to the RMA and issue the 'shutdown' command).

Remove the entries in the second interfaces file located in $SYBASE/DM

A new DR instance can now be added back using 'Adding the DR node' from Part 1.

Checking the HADR Cluster

Running the test application results in the same number of records in both primary (active) and companion (standby):

1> use tpcc

2> go

1> select count(*) from ORDER_LINE

2> go

             

 ----------- 

      900695 



(1 row affected)

but, obviously since the DR node is no longer part of the cluster, the count remains the same as previously reported.

Before proceeding, we will perform a shutdown and startup of the HADR system. This is documented at: Starting and Stopping the HADR System but consists of the following steps and assumes that the active ASE is on the primary node.

Shutdown sequence:

fault manager

primary and companion backup servers

deactivate and shutdown primary ASE

primary SRS (by default this is actually on the companion node)

primary and companion RMAs

companion ASE

companion SRS (by default this is actually on the primary node)

Startup sequence:

companion/standby ASE

primary SRS (by default this is on the companion niode)

primary ASE

primary and companion backup servers

companion SRS (by default this is on the primary node)

primary and companion RMAs

fault manager

Failover only needs 2 commands now:

sap_failover <active>, <standby>, <timeout>

sap_host_available <new standby/previous active>

1> sap_failover Toronto, London, 120

2> go

 TASKNAME       TYPE                  VALUE                                                                                                      

 -------------- --------------------- ---------------------------------------------------------------------------------------------------------- 

 Failover       Start Time            Fri Dec 09 17:24:05 UTC 2022                                                                               

 Failover       Elapsed Time          00:00:02                                                                                                   

 DRExecutorImpl Task Name             Failover                                                                                                   

 DRExecutorImpl Task State            Running                                                                                                    

 DRExecutorImpl Short Description     Failover makes the current standby ASE as the primary server.                                              

 DRExecutorImpl Long Description      Started task 'Failover' asynchronously.                                                                    

 DRExecutorImpl Additional Info       Please execute command 'sap_status task' to determine when task 'Failover' is complete.                    

 Failover       Task Name             Failover                                                                                                   

 Failover       Task State            Running                                                                                                    

 Failover       Short Description     Failover makes the current standby ASE as the primary server.                                              

 Failover       Long Description      Waiting for markers that verify all in-flight data has been sent from source 'Toronto' to target 'London'. 

 Failover       Current Task Number   6                                                                                                          

 Failover       Total Number of Tasks 18                                                                                                         

 Failover       Task Start            Fri Dec 09 17:24:05 UTC 2022                                                                               

 Failover       Hostname              primarynode.openstack.na-ca-1.cloud.sap                                                                    



(15 rows affected)

1> sap_status task

2> go

 TASKNAME   TYPE                  VALUE                                                                                                                                                   

 ---------- --------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------- 

 Status     Start Time            Fri Dec 09 17:24:05 UTC 2022                                                                                                                            

 Status     Elapsed Time          00:00:04                                                                                                                                                

 Failover   Task Name             Failover                                                                                                                                                

 Failover   Task State            Completed                                                                                                                                               

 Failover   Short Description     Failover makes the current standby ASE as the primary server.                                                                                           

 Failover   Long Description      Failover from source 'Toronto' to target 'London' is complete.  The target may be unquiesced.                                                           

 Failover   Additional Info       Please run command 'sap_host_available Toronto' to complete disabling replication from the old source, now that the target 'London' is the new primary. 

 Failover   Current Task Number   14                                                                                                                                                      

 Failover   Total Number of Tasks 14                                                                                                                                                      

 Failover   Task Start            Fri Dec 09 17:24:05 UTC 2022                                                                                                                            

 Failover   Task End              Fri Dec 09 17:24:09 UTC 2022                                                                                                                            

 Failover   Hostname              primarynode.openstack.na-ca-1.cloud.sap                                                                                                                 



(12 rows affected)

1> sap_host_available Toronto

2> go

 TASKNAME      TYPE                  VALUE                                                                                                   

 ------------- --------------------- ------------------------------------------------------------------------------------------------------- 

 HostAvailable Start Time            Fri Dec 09 17:24:47 UTC 2022                                                                            

 HostAvailable Elapsed Time          00:01:44                                                                                                

 HostAvailable Task Name             HostAvailable                                                                                           

 HostAvailable Task State            Completed                                                                                               

 HostAvailable Short Description     Resets the original source logical host when it is available after failover.                            

 HostAvailable Long Description      Completed the reset process of logical host 'Toronto' receiving replication from logical host 'London'. 

 HostAvailable Current Task Number   11                                                                                                      

 HostAvailable Total Number of Tasks 11                                                                                                      

 HostAvailable Task Start            Fri Dec 09 17:24:47 UTC 2022                                                                            

 HostAvailable Task End              Fri Dec 09 17:26:31 UTC 2022                                                                            

 HostAvailable Hostname              primarynode.openstack.na-ca-1.cloud.sap                                                                 



(11 rows affected)

At this point, we are back to an HADR primary and companion -only cluster.

In a future blog, we will examine using the Fault Manager, unplanned failover, and making the application HA-aware.

My next blog addresses how to enable HADR for an existing ASE instance.

Please let me know by the comments if you have any issues/corrections and I will address them.

Chris Baker

hganga · ‎01-31-2023

Dear Chris, great blog. i was struggle with the HA + DR configuration a few months ago. Now is working correctly. I have only one doubt, that i could not find in the documentation:

The failover to the DR site is not supported. So, how do you atctivate the DR node in case of a disaster? i think that will be possible with "sp_hadr_admin primary, force" but the documentation is not clear about that.

Your help will be appreciatted.

Thanks and best regards.

c_baker · ‎01-31-2023

In the case of a disaster that takes the primary and companion nodes both offline, the DR node can be used to preserve the data, but must be activated using a manual or other process. Automatic client connection failover is only supported between the primary and companion nodes of the cluster.

Recovering the HADR cluster or a database in the HADR cluster from the DR node is documented under Recovering the HADR Cluster from the DR Node.

Chris

hganga · ‎01-31-2023

Thanks for your answer Chris. That information (activating the DR node by manual process) is not clear in the documentation. I think my manual approach can work, but i can not test it by now.

Your advice will be appreciatted.

From your answer, i can understand that the DR node is only for data preservation, and to reconstruct the primary site? Is not intended for use with applications (Even manually redirecting all the conected apps to this new activated DR server)?

Thanks and best regards.

c_baker · ‎01-31-2023

Only the active node should have data altered. When configured, replication from the active (primary) to the companion (standby) is one-way only. The direction is reversed during failover when the companion becomes the active and the primary becomes the standby.

Applications cannot connect to the non-active node of the cluster - they will be redirected (unless the login has 'allow hadr login' privilege) to the active node - hence the reason why an application does not need to be 'ha-aware' if both nodes are sill live, as in this blog (my next blog will cover what application changes are needed in an 'unplanned' failover).

The DR node is replicated from the HADR cluster, but does not participate in failover/failback operations. Connections to the DR are not recommended unless the primary and companion are both offline. However, once data is altered on the DR, restoring the primary and companion nodes of the cluster from the DR would be a planned operation as documented in the link previously provided.

Chris

hganga · ‎01-31-2023

Thank Chris. I understand clearly that you said. My original question is how to activate de DR node in case of a disaster?, that is, if the HA site is lost completly (both nodes off line), that procedure is not clear in the documentation (Manual procedure to activate the DR node, because "sap_failover" command is not supported for third node) and be able to the business still operates in this scenario. I know that later we will need to restore the HA site with a backup from the DR site, because the business was operating in the DR node (in the case of lost HA nodes).

So, i think that the "sp_hadr_admin primary, force" in the DR node will be sufficient, because in the documentation that you comment, there is not mention of activation of the DR database, just the backup, restore in the HA and enabling replication as originaly flow was.

I really apprecite your help and advice. Sadly, i can not test this in my DR node, but i need to document the procedure in case of disaster and lost of HA site.

Thanks and best regards.

hganga · ‎01-31-2023

Dear Chris, the Azure team has cloned and isolated the DR node to test the manual activation of the DR node with the command "sp_hadr_admin ptimary, force" but is not working.

Do you know how the DR node can be activated?

Thanks and best regards.

c_baker · ‎01-31-2023

Per the documentation (Overview), the DR node only backs up the databases. It does not participate in the failover or failback of the active and standby nodes, so cannot be activated using RMA or ASE commands.

It is not an HADR cluster standby node, only a live backup of the primary/companion HADR cluster utilizing the HADR replication capabilities, instead of requiring additional replication licensing.

If the original cluster is not available (isolated, per your last comment), simply start using it as a standalone ASE instance. You can remove the 'cluster' parts by following the documentation at Manually Removing the Replication in HADR System with DR Node Environment on the DR node to clear information retained in the DR node related to the original HADR cluster.

Obviously, once any data in the DR node has been changed, you will need to follow steps in the previously linked documentation (Recovering the HADR Cluster from the DR Node) to re-establish the HADR cluster.

Chris

Adding a DR node to ASE Always-on for a Custom Application (Part 3)

Dropping the DR node

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win