CEF is not normalized
While searching for an image to accompany this article, I found that many of the images associated with the Common Event Format, or CEF for short, are diagrams I created on the use of CEF in Microsoft Sentinel. This might suggest that Microsoft Sentinel may now be the top CEF events consumer.
This leads to an important question: if Microsoft Sentinel natively uses CEF, why did we need to develop a new normalization framework: the Advanced Security Information Model, or ASIM?
The answer is that CEF, arguably the world's first and most pervasive attempt at security event normalization, does not provide much normalization.
How comes?
There are two fundamental reasons for that:
Heavy use of custom fields
CEF does not natively support many values provided by sources, which leads to extensive use of custom fields. If you have ever used the CEF field DeviceCustomString1, you know what I am talking about. For example, while CEF supports HTTP fields, it does not support DNS fields, requiring any source reporting DNS queries to use non-normalized custom fields.
Recommended by LinkedIn
There are two reasons why CEF lacks coverage. Obviously, created 20 years ago and updated little since, CEF shows its age. Also, as a single schema that needs to cover every imaginable event and store it in a single physical table, there is an inherent limit to how wide CEF can stretch.
Categorization is not part of CEF, just ArcSight
Categorization is an integral part of normalization. It defines for each event, in a consistent manner, what it is about. Categorization commonly includes a standard set of labels to describe, among other things, what activity the event represents, what the outcome is, and what is the action taken by the reporting device. In ASIM, we use fields such as EventType, EventResult, and DvcAction to describe those categories. Each such field has an allowed set of values or labels.
ArcSight has a strong concept of categorization, as described in the ArcSight categorization whitepaper. However, the categories are applied by the ArcSight connectors and cannot be sent by the source device as part of the CEF standard. While CEF has fields such as outcome, act, cat, and reason, they don't require any set of values and are used very liberally by source systems. As a result, their usability in security analytics is limited.
Why does it matter?
So, while CEF is handy since so many sources support it, it is time to move forward and select a modern schema for your events.
More importantly, CEF lax normalization lets source devices get away with providing insufficiently normalized data. As an industry, we need to come up with a replacement that will ensure source devices provide data that supports security analytics better.
Ofer, CEF has been gold and pain for me on my day job. There is not much Common in CEF, as everybody thinks he understood the format and can do things like customString7... As Raffal Marty pointed out in his article https://raffy.ch/blog/2017/12/15/5-security-big-data-challenges/ in section 2. Common data model... Is the most important for big data collection. Ofer Shezaf what's you thought of Elastic Common Schema, to be the next one size fits all Common data schema? Cheers A.
And back in 2000, pre-ArcSight e-Security SIEM, we called it EDB (extended db - a macro core schema/mapping with hooks / extensions to data specific to point products - a touch like Zeek does with conn logs and then other stuff that connects). There’s always going to be something new (or a need / want to tap into data that’s long existed, just not ingested - user-level sources are an easy example of a wrench in default models or even cheesy stuff like a virus name field for av ingest when nothing else has it). Def fun to watch both the evolution, but also the ebb and flow (Splunk CIM will fix everything! Oh… wait…) ;)
I believe CEF is perfect for legacy/stateful firewalls. But since everyone wants the Nextgen Firewalls, we need Nextgen normalization. The best thing would be if the vendors agreed on the fields, which would make all our security a bit easier.
Obligatory XKCD. Jokes aside, format standardization is a common problem that shows up way too much. Can we not just have standard naming for fields? Microsoft's own AuditLogs has ClientIP and ClientIp for diffent MS products.