Today was a truly terrible day for countless people worldwide. From passengers in airports to patients in hospitals and drivers at the DMV, countless individuals were affected. However, my colleagues in IT and Security bore the brunt of the storm, grappling with countless servers and endpoints across the globe.

But I'm writing about something else. This outage must serve as a lesson; otherwise, it's pointless. The lesson? We must rigorously test, review, assess, and audit our systems. We need to verify everything firsthand.

Every company should prioritize a thorough technical assessment of its infrastructure. It doesn't need to be extensive, but it must evaluate several critical areas:

How mature is your third-party management? Do you know which external components you have in your environment and how they can impact you? Many systems within every company must be considered in risk calculation and business continuity planning.

How about a dependency map? Can you identify the main relationships of any particular system in your environment and how these dependencies can impact you? Network devices, servers, security software, network configuration, authentication, encryption, and more should be included.

How are automatic updates configured and implemented? Protection is essential, but sweeping changes across your entire fleet at once are risky. Basic change management principles apply.

How are recovery keys and credentials managed and stored? Ensure secure storage and accessibility for faulty devices. However, what if the recovery key store itself fails? Do you have a tested backup plan?

Do you have a procedure for emergency device access? This includes remote access for unreachable devices, especially those in hard-to-reach locations. Test these procedures thoroughly. Consider potential single points of failure.

For remote work-from-home endpoints:

Provide basic user training on device recovery. While specific steps vary, common principles like obtaining recovery keys from IT and accessing Windows safe mode or Apple recovery are essential. Avoid adding training to a crisis.

Consider a local administrator for authentication failures. While risky, it can be less risky than not having one in a critical moment. Additional controls will reduce this risk yet still allowing you the emergency access.

Implement endpoint backups to protect against encryption losses. Cloud storage is recommended, but consider additional backups for specific users.

And another thing - policies and actions mostly done by different teams thus must be aligned and validated.

Remember, it is never too early to check and validate your environment, but sometimes it is too late. Always better to be proactive than reactive. Learn from Today's incident and strengthen your security posture.

For optimal results, combine internal expertise with external specialists. A fresh perspective can complement your team's knowledge and overcome organizational blind spots. I'm available to assist with such a project.