Recently in my Home Lab Environment i noticed one of my ESXi hosts frequently getting disconnected from vCenter Server. 
Before we proceed further with troubleshooting we need to understand that ESXi host share the heartbeats (UDP 902) with vCenter Server to inform the vCenter Server that it is accessible over management network so it seems to be the problem with the heartbeat packets which are getting dropped,blocked or lost between the vCenter Server and the ESXi host.
Now as part of troubleshooting process we may come up with many possible cause for this problem as we have three different layers involved i.e (vCenter Server, ESXi and hardware)
First layer first vCenter Server one possible cause could be windows firewall rules which was enabled on vCenter Server System and was blocking UDP Port 902.
At ESXi layer we may suspect that ESXi host is not using port 902 for receiving the heartbeats rather a different port is configured or it could be the Internal ESXi firewall which is blocking that port.
At the bottom layer Hardware Layer we are suspecting that the network between my ESXi hosts and vCenter Server is congested.
In my Home Lab i am using windows based vCenter Server so checked firewall settings to be on safer side i disabled Windows Firewall because there were no ports configured.
   
We can make use of command vi /etc/vmware/firewall/hearbeat.xml which will create heartbeat.xml file if not already created and will also edit the file for you.
Now that we have find the problem and we fixed it as it was the issue with one of my ESXi what if it was't the problem than the last resort of troubleshooting was to check the hardware i.e whether if network congestion exist between my vCenter Server and ESXi host.
Either we can make use of some third party tools like Wireshark to Analyze the live traffic or we can also make use of ESXTOP command line utility to analyze traffic which can give us an insight about the congestion if any in our network.
Before we proceed further with troubleshooting we need to understand that ESXi host share the heartbeats (UDP 902) with vCenter Server to inform the vCenter Server that it is accessible over management network so it seems to be the problem with the heartbeat packets which are getting dropped,blocked or lost between the vCenter Server and the ESXi host.
Now as part of troubleshooting process we may come up with many possible cause for this problem as we have three different layers involved i.e (vCenter Server, ESXi and hardware)
First layer first vCenter Server one possible cause could be windows firewall rules which was enabled on vCenter Server System and was blocking UDP Port 902.
At ESXi layer we may suspect that ESXi host is not using port 902 for receiving the heartbeats rather a different port is configured or it could be the Internal ESXi firewall which is blocking that port.
At the bottom layer Hardware Layer we are suspecting that the network between my ESXi hosts and vCenter Server is congested.
In my Home Lab i am using windows based vCenter Server so checked firewall settings to be on safer side i disabled Windows Firewall because there were no ports configured.
Now time to check ESXi host if it was using the default port 902 or any other port not sure if i changed as part of some testing.
*Note: Installation/Configurations/Specifications methods used here has been Tested in My Home Lab Nested Environment.
As suspected it wasn't using the default port 902 so now i got two ways to fix this problem either add a firewall rule to ESXi firewall to allow the port being used or change the port back to 902.
Changing the port back to default 902 can be done by editing the vpxa.cfg file residing at /etc/vmware/vpxa/.
For creating a firewall rule for ESXi host to allow other ports i referred KB Article 2020100 which talks about the details we need to enter when creating Heartbeat.xml file and adding the rules into the file.
Now that we have find the problem and we fixed it as it was the issue with one of my ESXi what if it was't the problem than the last resort of troubleshooting was to check the hardware i.e whether if network congestion exist between my vCenter Server and ESXi host.
Either we can make use of some third party tools like Wireshark to Analyze the live traffic or we can also make use of ESXTOP command line utility to analyze traffic which can give us an insight about the congestion if any in our network.


 
Not sure if the change to Port 9021 was intended, but it could have easily been just a quick tap of the 1 key during the wrong part of the install.
ReplyDeleteYeah Not Sure what exactly happened, but glad i found the Kb article to fix it :-)
ReplyDeleteI once faced similar issue where vpxa port was set to 922 instead of 902 in my lab. I did not changed anything during install so not sure how it got changed. Later I found that this was being pushed from registry setting from my windows based vcenter.
ReplyDeleteI wrote a blog on that.
https://alexhunt86.wordpress.com/2015/03/22/troubleshooting-esxi-host-disconnection-from-vcenter-issue/
It was a interesting issue which i faced