Thursday, September 3, 2020

Potential solution to dgmgrl error ORA-16665: time out waiting for the result from a member

After having added a terminal standby database to an existing configuration, the Data Guard Broker configuration seemed unhappy with communicating with the new member. The output from "show configuration" showed the following:
DGMGRL> show configuration;

Configuration - DB01

  Protection Mode: MaxPerformance
  Members:
  DB01      - Primary database
    DB01_STB  - Physical standby database
      DB01_TSTB - Physical standby database (receiving current redo)
        Error: ORA-16665: time out waiting for the result from a member

    DB01_RO   - Physical standby database
When looking at the details by using
show database verbose "DB01_TSTB"
the entire operation would take very long, and at the, the following message is displayed:
Database Status:
DGM-17016: failed to retrieve status for database "DB01_TSTB"
ORA-16665: time out waiting for the result from a member
The broker log file showed:
09/02/2020 15:08:52
Data Guard Broker Status Summary:
  Type                        Name                            Severity  Status
  Configuration               DB01                            Warning  ORA-16607
  Primary Database            DB01                            Success  ORA-0
  Physical Standby Database   DB01_STB                        Success  ORA-0
  Physical Standby Database   DB01_RO                         Success  ORA-0
  Physical Standby Database   DB01_TSTB                       Error  ORA-16665
Root cause here was firewalls. The terminal standby database could not reach the primary database. Although the terminal standby database isn't set up to receive redo data from the primary database directly, in a broker configuration all members must be able to communicate with eachother. A good tool for troubleshooting issues dealing with ports and firewalls is nmap. I installed it on the terminal server and issued:
[root@db04_server ~]# nmap -n -p 1511 db01_sever.oric.no

Starting Nmap 6.40 ( http://nmap.org ) at 2020-09-02 14:23 CEST
Nmap scan report for db01_sever.oric.no (xxx.xxx.xxx.xxx)
Host is up (0.016s latency).
PORT     STATE    SERVICE
1511/tcp filtered 3l-l1

Nmap done: 1 IP address (1 host up) scanned in 0.49 seconds
A filtered port means that it is not possible to determine whether the port is open or closed, most often due to firewalls along the way. Further checks in the firewall log files showed
action=Drop service=1511 dst=xxx.xxx.xxx.xxx scr=yyy.yyy.yyy.yyy
where xxx.xxx.xxx.xxx was matching the ipadress of the terminal standby server, while yyy.yyy.yyy.yyy was matching the ipadress of the primary server. The network admin opened the port, and the ORA-16665 immediately disappeared from the dgmgrl output.