Last Friday’s outage was a first-hand reminder of how dependent businesses are on technology. The impact was widespread across multiple industries and across the globe. Businesses operating with Windows OS and CrowdStrike’s Falcon sensor experienced the “blue screen of death,” greatly impacting their operations.
While CrowdStrike issued a fix and statement quite quickly, there were still significant interruptions to many businesses with cascading effects, including thousands of flight delays and cancellations (still happening as of Tuesday, July 23rd).
The day after the outage, as canceled flights and the number of affected businesses continued to climb, Microsoft’s cybersecurity executive, David Weston, shared, “We currently estimate that CrowdStrike’s update affected 8.5 million Windows devices or less than one percent of all Windows machines.”
As the far-reaching impact of this outage continues to be explored, it’s important to understand what took place, the implications of this outage, what was done to resolve the situation, and why it’s essential to have a dedicated technology team on standby.
For anyone managing a business, it’s essential to understand that outages like this will happen and to know what you can do to prepare for future challenges like these.
Understanding the Impact of the Outage
A Widespread Disruption
According to USA Today, “CrowdStrike blamed a botched update to its “Falcon Sensor software” in the 1:30am ET alert, saying it was causing Microsoft Windows to crash.”
By approximately 3am ET, Crowdstrike had issued a fix for the issue and by 5:45am ET, Crowdstrike CEO had issued a statement on X stating that the incident was not a cyber attack and that it had been “identified, isolated and a fix has been deployed.”
Although the time from error to fix was relatively short, the impacts continued to last much longer.
Not every business discovered the issue and received or deployed the fix immediately, which led to more significant interruptions for some businesses than others. As businesses began their operations on Friday morning, more were greeted by “the blue screen of death.”
I happened to be in the Tampa airport Friday morning, seeing NEWS reporters interviewing impacted passengers, overhearing disappointed and worried cell phone calls as people tried to find other flights on non-affected airlines, and seeing Starbucks employees shouting to a snaking line of caffeine-deprived people, “Our systems are down, we can’t take payments.”
It wasn’t just airlines that were affected. Many other businesses in the healthcare industry, some 911 operations, and other emergency units, including the US Coast Guard, were greatly impacted.
This incident not only raised concerns among business owners about the reliability of their technology but also highlighted the need for expert technology support to minimize the impact of such an outage.
The ripple effect of such outages can be devastating, as businesses can lose revenue and damage their reputations.
Financial Ramifications
The financial implications of outages can be staggering. Every minute a business experiences an outage can impact its revenue.
Additionally, the loss of customer trust can be hard to quantify but is just as critical. Customers expect seamless service and any disruption can turn a happy customer into a dissatisfied one. This loss can result in long-term consequences for businesses, underscoring the importance of minimizing the impact of an outage as quickly as possible.
What Was Done to Resolve the Outage?
Immediate Response and Communication
In response to the outage, both Microsoft and CrowdStrike quickly mobilized their teams to diagnose the root cause of the problem. Both companies issued public statements to clarify what caused the outage and provided updates on their progress in resolving the situation, revealing it was a failed update by CrowdStrike’s Falcon sensor.
Technical Fixes and Solutions
Once the problem was identified, fixed, and made available by CrowdStrike, it was up to internal IT teams, MSPs, and other technical teams to quickly implement the solution so service could be restored as quickly as possible.
Cheers to the IT experts, serving their internal teams and clients, who sprang into action to rapidly deploy the fix and work with end users to confirm that all systems were back in operational order!
Are Outages Normal? Should We Worry or Expect Them?
Understanding Typical Outage Frequency
Outages are, unfortunately, a part of our world. While they can be alarming, it is important to be prepared and remain calm. Hardware failures, software bugs, or even unexpected spikes in user activity can all lead to an outage.
Balancing Concern and Preparedness
While it is natural to be concerned about potential outages, it’s essential to focus on preparedness rather than panic.
To prepare, businesses need to evaluate their existing technology strategies and identify areas for improvement. Developing contingency plans, such as partnering with outsourced or co-managed IT, can provide peace of mind and minimize disruptions.
Having a team that is proactively monitoring for tech issues and outage alerts, like this, can make the difference of being down for a only matter of hours instead of being down for days.
Knowing When to Worry
While occasional outages are expected, certain situations should raise red flags. For example, if a service provider experiences repeated outages over a short period, it may indicate deeper issues within their infrastructure. In those cases, it can be best to seek alternatives.
Also, if an outage lasts an extended time without a clear resolution, it may signal a lack of accountability or responsiveness from the provider. In these instances, evaluating other options is best for your long-term success.
Why Is It Important to Have a 24/7 Technology Team?
Continuous Monitoring and Support
Having a dedicated technology team available around the clock can significantly enhance your ability to respond to outages and other technical issues. Continuous monitoring ensures potential problems can be identified and addressed with speed and expertise, minimizing the impact.
For instance, a 24/7 team can track system performance metrics, detect anomalies, and implement fixes instantly. This can prevent downtime and ensure smooth operations, enabling your business to maintain productivity even during challenging times.
Quick Response Times
In the event of an outage, every second counts. A dedicated technology team can provide rapid response times that minimize the impact of disruptions. Their expertise allows them to troubleshoot and resolve issues quickly, getting you back online faster which is vital for maintaining customer trust and satisfaction.
Also, having a team that understands your specific technology setup can lead to more effective problem-solving. With knowledge of your systems, they can implement fixes tailored to the unique needs of your business.
Strategic Planning for Future Growth
A 24/7 technology team also plays an instrumental role in strategic planning. They can work with leadership to align technology initiatives with business goals, ensuring that IT investments support overall growth. By leveraging their insights, you can stay ahead of trends and innovations that may impact your operations.
Additionally, having a reliable technology team fosters a culture of innovation. With the right support, you can rely on your IT service provider to explore new technologies and approaches that fit your business now and where you want your business to go/grow.
What Have We Learned from the Windows and CrowdStrike outage?
The recent Windows OS and CrowdStrike outage serves as a crucial reminder of the vulnerabilities present in our high-tech world.
It’s important to remember outages like this can happen to anyone.
Understanding the impact of these types of outages, the steps taken to resolve them, and how to prepare for future incidents is vital for business owners and IT professionals.
By investing in a dedicated technology team, you won’t have a silver bullet that will avoid all outages, but you will strengthen your organization’s resilience to reduce downtime and stay up-to-date with the latest advancements and proactive measures.
By being proactive and prepared, you can greatly reduce your business risks ensuring that you not only survive but thrive.
If your organization has experienced difficulties recovering from outages like this, we’d love to talk to you about your current technology needs and help you explore a path forward to reduce risk and increase resilience.
Click here to book a conversation with one of our experts:
Nathan Caldwell
Marketing expert, thought leader, speaker, and security awareness solution creator.