The VxRail Manager server unreachable in vCenter (Solution)

 Are you seeing the error “The VxRail Manager server is unreachable" in vCenter? You're not alone. This is a common issue in VxRail deployments, especially when the cluster has been offline for a while.

An Engineer solving the vxrail unreachable issue in vcenter - UFOtechs

In this guide, we’ll walk through one of the most common root causes of this error:
👉 vmware-marvin service fails to start and kubectl does not work due to expired rke2 certificates.


🔎 Troubleshooting the Issue

Step 1: SSH into VxRail Manager

Connect to the VxRail Manager using SSH as the root user.


Step 2: Check vmware-marvin Service Status

systemctl status vmware-marvin{codeBox}

  • If you see Inactive (dead) or failed, attempt a restart:
    systemctl restart vmware-marvin{codeBox}

Step 3: Monitor System Logs

If it hangs or fails again, check the system logs:

tail -f /var/log/messages{codeBox}


🔥 Log Example of Expired Certificate

rke2[2935]: CA cert validation failed: Get "https://127.0.0.1:9345/cacerts": 
x509: certificate has expired or is not yet valid: current time 2025-04-10T13:02:11Z 
is after 2024-09-30T14:17:09Z{codeBox}


🧠 Root Cause: Expired rke2 Certificates

When VxRail Manager is powered off for an extended time, rke2 certificates can expire. While rke2 supports certificate rotation, it fails if the certs are already expired before the rotation triggers.


✅ Full Solution: Renew Expired rke2 Certificates

Take a snapshot of VxRail Manager and vCenter VMs before making changes.{alertWarning}

Step 1: Generate New rke2 Certs

mv /root/.kube/config /root/.kube/config.bak
cp /etc/rancher/rke2/rke2.yaml /root/.kube/config
kubectl config set-context --current --namespace=helium
rm -f /var/lib/rancher/rke2/server/tls/dynamic-cert.json
systemctl restart rke2-server
kubectl delete secret -n kube-system rke2-serving
systemctl restart rke2-server{codeBox}


Step 2: Temporarily Set System Date Before Certificate Expiry

Disable time sync:

sudo systemctl stop chronyd
sudo systemctl disable chronyd
sudo systemctl stop ntp
sudo systemctl disable ntp
sudo systemctl disable systemd-timesyncd
sudo systemctl stop systemd-timesyncd{codeBox}


Set a manual date before the expiration (e.g., September 29, 2024):

sudo date -s "2024-09-29 12:00:00"{codeBox}


Step 3: Run cert_util.py Script

Download and run the certificate regeneration script as guided in this article How to Import vCenter SSL Certificate on VxRail Manager Using cert_util.py Script:

python cert_util.py -r{codeBox}

Wait until the certificate is regenerated successfully.


Step 4: Re-enable Time Synchronization

sudo systemctl enable chronyd
sudo systemctl start chronyd
sudo systemctl enable ntp
sudo systemctl start ntp
sudo systemctl enable systemd-timesyncd
sudo systemctl start systemd-timesyncd{codeBox}


Step 5: Verify in vCenter

Head back to the vCenter UI. The VxRail plugin should now be reachable and functioning normally.


📌 Pro Tips for Admins

Avoid keeping VxRail Manager offline for long durations.


🧩 Conclusion

If your VxRail Manager is unreachable in vCenter, and the issue ties back to expired certificates or failed services like vmware-marvin, this step-by-step guide should help you restore functionality quickly.

Post a Comment

Previous Post Next Post

Contact Form