Today I’ve found a little failure within our production environment.

We use icinga to monitor our production lan, its guests and services. For our Citrix XenApp servers we check the uptime to get notified if they are running to long without reboot.

After we upgrade several servers to vmx-10 everything works fine. Icinga reports all services and performance charts correctly. Now, two days later, the first uptime alerts came up. But the system boot time was only a few hours ago and not two days…

uptimevmx10000004

If we power off and power up the machine the uptime does reset.
The counter didn’t reset if the machine is only warm rebooted.

After searching the VMware KB I found the article KB2082042. Windows Server 2008 / 2008 R2 and Windows 7 / 8 are using the TSC CPU registers to get their uptime information. Third-party product, like icinga/check_mk/…, also using this value. Our Windows Server 2012 R2 machines don’t show this behavior.

VMware gives a solution for the failure.
By manually adding add a row to the virtual machine advanced configuration.

My Testlab is running on ESXi 6. After upgrading a machine to HW-version 11 (vmx-11) the failure is not reproducible. Maybe the TSC CPU registers are cleared by default within a soft Reset in vmx-11.
Tested on

  • Windows Server 2008 R2 with SP1 (no more updates)
  • Windows Server 2008 R2 SP all availiable updates installed

Web Client:

  • Power off the machine
  • Edit settings
    uptimevmx10000001
  • VM-Options > Edit configuration
    uptimevmx10000002
  • Add row
    Name: monitor_control.enable_softResetClearTSC
    Value: TRUE
    uptimevmx10000003

vSphere Client:

  • Power off the machine
  • Right-Click the machine > Edit settings
    uptimevmx10000005
  •  Advanced – General – Configuration
    uptimevmx10000006
  • Add row
    Name: monitor_control.enable_softResetClearTSC
    Value: TRUE
    uptimevmx10000007

After powering on the machine the uptime is set to 0 which also worked before adding the row. Now wait several minutes and reboot the system “warm”. Now the uptime count gets to 0 and everything is like it was before the HW-upgrade to vmx-10.

uptimevmx10000008

 

Related Documents:

VMware KB2082042

Microsoft Operating System time sources and virtualHW 10 (vmx-10)