Tags

, ,

Our Windows 2012 R2 application system experienced a gradual increase of system-wide application and database errors. Rebooting servers can temporarily eliminate those errors only for a few hours.

Findings

  1. System-wide application and database errors
  2. One event stood out, TCP/IP port exhaustion. ‘netstat –ano’ confirmed many ports stayed in “CLOSE WAIT” state
  3. Used the tool Microsoft Remote Connectivity Analyzer, and found out these Autodiscover steps:

a) Attempting to test potential Autodiscover URL https://mycompany.com:443/Autodiscover/Autodiscover.xml. Testing of the potential Autodiscover URL failed
b) Attempting to test potential Autodiscover URL https://autodiscover.mycompany.com:443/Autodiscover/Autodiscover.xml. Testing of the potential Autodiscover URL was successful

Cause

After Findings 3) step a) failed, the EWS Managed API 2.0 left those ports in “CLOSE WAIT” state. In a heavily loaded system, this quickly led to port exhaustion. After port exhaustion, the system began to throw system-wide application and database errors.

Resolution

We stopped using Autodiscover, and use the mail server URL directly.

  1. There are no unexpected application and database errors immediately
  2. ‘netstat –ano’ confirmed that there are no more abnormal ports stayed in “CLOSE WAIT”.

How to Repeat

  1. Environment: Windows 2012 R2, Exchange 2010, and EWS Managed API 2.0
  2. Set up the environment to make sure testing of this potential Autodiscover URL https://mycompany.com:443/Autodiscover/Autodiscover.xml.
  3. Use a ‘for’ loop to iterate though the following code by a few hundred email addresses. It doesn’t matter whether there is a try/catch block or not.

ExchangeService service = new ExchangeService(ExchangeVersion.Exchange2010);
service.AutodiscoverUrl(differentEmailAddress);

4. Verify that many ports stayed in “CLOSE WAIT” state for a very long time

Discussion

After there was a gradually increase of system-wide application and database errors, we began to troubleshoot these errors. We finally isolated the issue to EWS Autodiscover feature [1] used in our applications, and believed that there is a bug in EWS Autodiscover feature leaves those ports in “CLOSE WAIT” state when this Autodiscover step failed to locate https://mycompany.com:443/Autodiscover/Autodiscover.xml, as you can see in Findings 3) a). In addition, to make things worse, our system was also heavily loaded to accelerate this problem. When we stopped using the Autodiscover, and application and database errors were gone, these directly confirmed there is a bug in EWS Autodiscover feature.

I downloaded the source code from github, and tried to find out where the bug is. However, I don’t have enough knowledge of Win32 APIs to debug this issue. So report this as a bug/issue in Github.

I don’t have time and tools to repeat the tests on EWS Managed API 2.1 and 2.2. The bug might still be there by just looking at the release notes of version 2.1. and 2.2.

Recommendation

As of today (May 2016), I highly recommend you don’t use EWS Autodiscover frequently, and only use it for debugging purpose until this bug is fixed.

What happened in our environment might not happen to you. However, since we had this incident and if you use EWS Autodiscover, I strongly recommend you take your time to use these two utilities tool to examine your environment to find out the weak spots:

  1. Microsoft Remote Connectivity Analyzer
  2. netstat -ano

There are many articles discussing “CLOSE WAIT”, port exhaustion, and system errors. So I skip the explanation and let you read on those articles.

References:

  1. Autodiscover for Exchange, https://msdn.microsoft.com/en-us/library/office/jj900169(v=exchg.150).aspx
Advertisements