Monday, February 4, 2013

DPMRA not starting on the DPM Server

The following issue I encountered in our own DPM environment.

When checking the DPM console we found that all jobs were stuck. During investigation we found that the the DPMRA service on the DPM server keeps failing to start.The event log shows countless number of the following errors.

Log Name:      System
Source:        Service Control Manager
Event ID:      7009
Task Category: None
Level:         Error
Keywords:      Classic
Description: A timeout was reached (30000 milliseconds) while waiting for the DPMRA service to connect.

Log Name: System
Source: Application Popup
Event ID: 26
Task Category: None
Level: Informatio
Keywords: Classic
Description: Application popup: DPMRA.exe - Application Error : The exception unknown software exception (0x8007000e) occurred in the application at location 0xfd74bccd

There are no other event in the eventlog that indicate why the DPMRA service keeps failing. The DPMRA1.errlog logs gives us a better indication of the root cause.

2430 28D4 02/04 09:31:11.131 03 agentcfg.cpp(475) [0000000000A5CC20] NORMAL Cound not find configuration for DPMRA

2430 28D4 02/04 09:31:11.131 29 dpmra.cpp(125) [0000000000A594D0] NORMAL CDPMRA: destructor [0000000000A594D0]

With help of this Technet forum post we found that this error indicates that one of the registry keys for DPM is missing.

did a procmon trace, and found that the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Data Protection Manager\Agent\2.0\Configuration is missing ! Which would co-incide with the agent saying "Element not found"

We checked the registry and the key is indeed missing. As shown in the image below.


This is a complex key with a lot of unique information, therefor the best way to fix this issue is by restoring the key. In our case we have a System State back-up available of the server (created at the time the server was working fine).

The steps below show you how to restore the registry key from a System state back-up (VHD). This procedure uses Windows 2008 R2 server. Other OS versions involve different steps

Restore registry key

First we need to locate the System state VHD and mount it. We use the disk manager to mount the VHD. image
Mount the VHD (not as read-only) image
Assign a drive letter to the volume. In our case Z: image
Browse to the directory z:\windows\system32\config and copy the file software to temp directory. image
I prefer to perform the next steps on a test system or my laptop instant of the DPM server. Just incase  
Open registry editor. Regedit.exe

Browse to the Folder HKey_Local_Machine and chosoe
In the file menu choose Load Hive image
Browse to the temporary location where you saved the file software and click OK image
As key name give a temporary name (temp_restore)  image
Now you are able to browse to the key we need image
We see here that the Configuration key is available. image
Next step is to export the key image
We need to edit the exported to to the correct hive

Make sure the key is referring to HKEY_LOCAL_MACHINE\SOFTWARE
Now back on the DPM server we can import the key.


Now we check if the key is present on the DPM server image
And we check if the DPMRA service will start. And check if the DPM jobs are running again. image
Last step is a little clean-up.
We dismount the VHD
We unload the temporary HIVE

The steps above solved the issues for us. Unfortunately not a 100% sure why the registry key disappeared in the first place. We found that the server was under a lot of memory stress a the moment that the issues started (Low on virtual memory). This might cased the issues, but that is not for sure.

1 comment:

  1. Hello

    Thanks for another informative web site. Where else could I am getting that kind of info written in such an ideal approach?
    I have a challenge that I am just now operating on, and I have been at the look out for such information.
    Data Protection Services