Security Operations
The 9th article in my series Five Principles for Success in Azure.
Security for your IT systems and resources must ensure confidentiality, integrity, and availability for authorized users. Security operations are designed to detect security breaches, respond to them, and recover from them, so that the objectives of confidentiality, integrity, and availability are consistently and continually achieved.
- Detect - Whatever the security incident or breach, security operations must detect it as rapidly as possible. Preferably, security operations will even find such problems before they occur, by proactively looking for anomalous events in your enterprise activity logs. Security operations must also unearth attackers who try to hide in enterprise systems to reach their targets unhindered.
- Respond - After discovering the anomaly, security operations must quickly determine whether a real attacker is at work or a significant security breach has occurred (true positive) or if the anomaly does not represent a threat (false positive). In the case of a real attack or breach, the extent and the target of the attack must be identified.
- Recover - Security operations must bring any affected IT systems and resources back to the levels of confidentiality, integrity, and availability required by your organization, during and after any incident or breach.
The security operations team or security operations center (SOC) must minimize attacker time and access to your enterprise systems and data. Automated attacks may happen fast and sometimes (for example, Wannacrypt and NotPetya) more rapidly than responses from signature and machine learning based defenses. In other cases, these defense mechanisms significantly mitigate the risk of such automated and repeated attacks.
A greater security risk is more likely to come from human attackers. Notwithstanding variations in attacker skill levels, the adaptability of such attackers compared to automated attacks poses an additional challenge. However, they typically operate at the same speed as human defenders, which helps swing the balance back in favor of your SOC.
Goals and metrics
To understand how effective your security operations are in restoring systems to normal security status, you need suitable metrics. Conversely, by focusing on appropriate measurements, you can drive the right improvement and reduce the most important risks. Examples of such metrics are:
- Time to acknowledge an alert: reducing this time for all alerts helps make sure that detected attacks are addressed as well as false alarms.
- Time to remediate: shorter times here means faster actions to disable attacks and faster closure of the window of opportunity for attackers to penetrate critical systems.
You should also develop a focus on:
- High priority systems: these are systems that are likely targets and/or are critical to your business and/or facilitate access to other such systems, and for which you should prioritize your security investments.
- Proactive discovery: by “shifting left”, so that your security investigations begin earlier in attack lifecycles, you can detect attacks before they become reactive events. This may also help you detect and remediate more highly skilled attackers that know how to avoid triggering reactive alerts.
Enterprise-wide on premises and cloud security
Attackers may try to access any or all your IT assets, no matter where they are located. If you use cloud services such as Azure and AWS in conjunction with on premises installations, your security must extend across this entire hybrid environment. This includes security tools, processes, and skillsets. Not only must you maintain security and eliminate or mitigate risks for cloud and on premises attacks, but you must also do this for attacks that pivot between these two domains. This requires enterprise-wide security visibility and management.
Use built-in detections and controls
Cloud providers like Azure offer high-quality native detection tools and controls that facilitate security operations by keeping false positive rates low. They also evolve these security tools to keep pace with new cloud service features. In the first instance, it therefore makes sense for your organization to use security tools that have been built into the cloud platform.
If you use several cloud platforms, a centralized security information and event management (SIEM) system lets you federate input from the native detection tools and controls of these platforms. Examples of such SIEM tools for broad visibility of hybrid environments include Azure Sentinel, Splunk, and QRadar. On the other hand, using a generalized log analysis tool instead of native detections and controls is not recommended. To achieve a high quality of alerts with such a tool needs an investment of time and knowhow that may be better applied to proactive hunting and other security activities.
In conjunction with a centralized SIEM, use native detection and controls like:
- Azure Security Center for generating alerts on the Azure platform
- Native logging capabilities like Azure Monitor and AWS CloudTrail for ingestion into the SIEM for a unified view
- Network Security Group (NSG) capabilities to see network activities on the Azure platform
- Native tools like an Endpoint Detection and Response (EDR) solution, Identity tools, and Azure Sentinel for investigation with deep knowledge of given asset types.
Prioritize high value information for your SIEM
Prioritize your data collection for your SIEM as follows:
- Alerts, either as detections from existing tools or as inputs for triggering custom alerts if appropriate
- Incident investigation
- Proactive detection of attacker activity and potential breaches.
Low value data can cause your SIEM costs, noise, and false positives to rise, while lowering performance. Make the integration of critical security alerts and logs into your SIEM your priority and avoid large-scale collection of low-value data. Remember that collection is not the same as detection, even if higher volumes of data may allow you to give additional context to alerts for faster response and remediation.
Keep churning out the good content Jason Milgram