Azure subscription monitoring max retries error
Hi all,
We are in the process of setting up SL1 to monitor a mix of Azure and on-prem VMWare devices.
The Azure subscription monitors repeatedly error with:
"Azure: Network Failure on App: ARMVMPerformance, Message: HTTPSConnectionPool(host='management.azure.com', port=443): Max retries exceeded with url:
Failed to establish a new connection: [Errno -2] Name or service not known',)),".
We have 14 subscriptions setup and some don't show the error at all, some show it weekly and 1 shows it every 15 minutes generating 1000's of critical alerts a month.
SL1 support/PS say it is down to a DNS issue: We have checked with the ENG Team and it looks like a random DNS issue where it can't resolve "management.azure.com" from the collector and it looks transient. The error message points to a failure to resolve the hostname when the config da’s are run.
However we use the same DNS server for all collectors and some subscriptions don't see the error. I can also do successful nslookups/pings/nmaps from the collectors.
Has anyone seen something like this before?
Thanks,
Mark