Tuesday, June 24, 2014

Office 365 Outages in a Row!!!

Yesterday and today are quite a bad day for Microsoft and Office 365 customers, Microsoft team recently made new enhancements to the way they deliver updates for  Office 365 customers using the new Office 365 for Business Public Roadmap which was a great innovation and welcomed by everyone in the community. Though great things are in place at one side and everywhere we see Office 365 and Microsoft Cloud on the top on the other end, its quite unfortunate that we had two major unexpected outages in the last two days affecting major customers during production hours causing severe business impact at a large scale, Yesterday it was Lync Online and Today its Exchange Online. Being a Office 365 Administrator for a global client managing one of  the major tenant affected by this issue I literally felt the impact it had on my Environment and Organization's Business.


Though Outages are unavoidable with any service which is inevitable. Though we take this in to consideration and ideally call these as a "Service Interruption"  its quite unacceptable if they occurs during the business hours one after the other in consecutive days causing drastic impact to customers, which eventually put a thought on customers mind that though Microsoft team is doing great enhancements at one side they need service improvement on the support side. New enhancements are always secondary for any customer if their primary goal of providing highly available service to critical services like E-mail and Instant Messaging are at-stake and needs critical attention. All these shows still Microsoft needs improvement in this key area.

There are more customers out there in the Market who are planning to adopt Office 365 service, my recent post " Office 365 Evaluation with Gartner" is one key example and they all are now in a state to reconsider whether they can move or not due to these major outages. Because nothing is hidden in today's IT world and everyone knows that during this critical issues all we can do is to just wait for the dash board to get refresh with the current update from Microsoft, Raise support cases to report our issue to receive updates and also keep updating our internal users and management regarding the status without any great action from our end till the issue gets fixed, by this time all our users would have really frustrated with the downtime and end up blaming us for moving to cloud, We alone know that we were handcuffed at this stage without being able to do anything on this from our end to restore the service which we could ideally do if we are on On premises over which our users were used to for long years before we adopt Office 365, promising them that they will have highly available service as before with many add-on features.

Below is the ZDNet post which outlines these issues with the current status update, Its getting updated once there is a new update from Microsoft end and if you are one of the affected customer like me keep a track of your mail delivery queue towards Exchange Online from On premises if you are in a Hybrid environment and check your user's web mail and outlook performance periodically. Ensure that your dashboard is refreshed often to review the current information form Microsoft team and act accordingly, Update Microsoft team if you face any issues which didn't get fixed once the portal gets updated as resolved for their further action.

Review here: Some Exchange Online users reporting email problems
 
If you surf Online you can find some of our Technical Experts, Community members and Customers affected by these issues already started to share their views in the global community, few vital ones are mentioned below for your reference.


Microsoft's Exchange Online becomes Exchange Offline as service goes dark - Computer World Magazine

Microsoft Restores Exchange Online Service after Tuesday Outage - Redmond Magazine

Hello, Microsoft. Welcome to IT - WindowsITPro


I wrote this post today to share how the world reacted to these unexpected events occurred with Office 365 in a row, which I never expected, As I always recommend my clients and well known folks to move to Office 365 and continue learn more about it, amazed over its new enhancements and tend to share my good experience on the product line all the time through my blog and other means, these are quite bad events which I wanted to keep a track and share my thoughts. Hope Microsoft team acknowledges these pain points from customers and improve their service and ensure that these sort of Outages won't occur and if they do occur unexpectedly then efficiently update the customers and partners to handle the situation in a controlled manner in the mere Future.

*Update:

Microsoft Rajesh Jha, Corporate Vice President, Office 365 Engineering team apologized for the Lync Online and Exchange Online downtime to the affected customers in North America Region. He wrote a post today and expressed his apology, also explained the root cause for the issues and the action taken to restore the service, soon PIR will be released providing the complete RCA with the next steps for service improvement.

Access the Office 365 Community webpage below to know more in detail.

Recent Office 365 service issues 

Additionally review the ZDNet post from Mary Jo Foley and Redmond Magazine post from Kurt Mackie covering some key aspects on this announcement.

Microsoft explains roots of this week's Office 365 downtime

Microsoft Offers Explanations for Lync and Exchange Service Outages 

Review the below WindowsITPro blog post from MVP Tony Redmond sharing his views over the issue and recommendations.

Directory flaw led to Exchange Online outage

Final thoughts...

Though these issue were related to Microsoft they typically throw some light to us that not all services can be stable all the time and as a precautionary measure ensure that we monitor our Environment proactively and keep an eye components like Dirsync and ADFS (if you have SSO), Network/Firewall connectivity changes etc., with a reporting mechanism to avoid other potential issues that could occur from our end which will still have similar impact like this. One key factor here is we do have the in hand control to address the issues in these areas if they occur but Effective monitoring plays the major role to make this happen on time, If you don't have one in place already then start to plan for it in your road-map from today.

* Check out my new blog post "Office 365 Availability and Performance Monitoring" for some additional information on monitoring Office 365 for High availability and better performance.

No comments:

Post a comment