A Primer in Data Classification
International Atomic Energy Agency

A Primer in Data Classification

Introduction

Data classification, we all do it. For instance, in the world of radioactive materials the International Nuclear and Radiological Event Scale was introduced in 1990 by the International Atomic Energy Agency in order to enable prompt communication of safety-significant information in case of a nuclear accident. Although not necessary a "data classification" method from a traditional perspective of what is data, the INRE scale is a method of classification nonetheless. In the world of business, information technology and data management, the use of a Data Classification Standard becomes the framework for assessing data sensitivity, measured by the adverse business impact in the event of a data breach. Here the goal of such a standard provides the foundation for establishing a protection profile around the requirements for each class of data.

When we look at a Risk Assessment within our hardware, software, and business practises the goal is to identify what is at risk and then either accept the risk, mitigate the risk, or remove the risk from the environment. As part of this assessment, providing a framework to classify your data allows you to understand cause and effect of data loss as well as building a platform that takes into an account Data Loss Prevention (DLP). Over the years the practises I had used for Data Management and Risk Management focused on a Data Classification Framework for the classification around, storing, handling and sharing data based on data sensitivity, privacy and security. In this article I will share with you a framework with associated examples to aid in your data classification. This will not be an all exhaustive list, however what you will find is that any items not in this list are to be classified according to the definitions and impacts outlined in a Data Classification Policy.

Definitions

In general terms the practises of data classification is based on a tier system. Typically the tier system is made up of three tiers, although four and even five tiers have also been suggested. With that said, looking at anything greater than three are only variations to a three tier system. For this discussion, we will keep to the three tier system as anything more than three tiers complicates how data is to be handled. In a three tier data classification framework the levels of strickness is that of restricted, controlled and public domain. Within this definition the three tiers are defined as follows:

Tier 1 – Restricted. Here we have highly sensitive information that is either critical to a company's ongoing operations, regulated or privacy-impacting for an individual; this information is meant for a limited group of users with a specific need to know.

Tier 2 – Controlled. This tier contains sensitive information that either defines the way the organization operates or is privacy-impacting for an individual; this information is meant for a limited group of users, typically based on job function, with a need to know.

Tier 3 – Public Domain. This is where information meant for both internal and external use that is accessible by anyone.

The Three Tiers of Data Classification

Looking specifically at each tier in more detail we see that the classification of data will place the data whether, electronic, paper, or other physical formats in specific tiers. It is important to realize that data can not reside in more than one tier and in general the practice should be that if data is not fully understood, then it should reside in the restricted tier until it’s classification is fully realized. By doing so, the process will follow the path of managing expectations. In other words it is better to keep the data restricted than to release it only to find that you have to refrain the use of the data. After all, once data is in the wild, getting it back into the cage is often impossible.

TIER 1 - RESTRICTED

Typically data residing in Tier 1 is well protected and ambiguity to how this data is to behave is not an issue. Data residing in this camp may be government issued information of a personal or corporate perspective, data falling under HIPPA, PCI, PIPPA and now the GDPR for instance. In this realm transmission of this type of data inside and outside of the network as well as email need to be encrypted. Printing of hard-copies may only be done through secure print practises.

Physical data under this classification would also follow the strictest procedures such as having each page or hardware identified as Tier 1 or restricted content and is always kept locked up in a secure container, file system or room. The transportation of this type of information off premise would need to be considered and follow approved shipping off site procedures. In this case it would be reasonable to expect the implementation of a tracking procedure to ensure that the information is allowed to leave the premises. Furthermore information leaving the premises would be in a concealed tamer-evident package and transported by approved couriers. Faxing of this type of information would be forbidden as you never know who will see it at the other end. Some may argue that the fax going to a secure email address is valid. If the email is not encrypted it is not to go out.

When it comes to the life expectation of the restricted materials, the disposal of printed materials would be shredded via a cross cut method in a secure shredding container. Data contained on hard drives would be wiped to DoD 5220.22-M ECE standard before the drives are reused. If the drives were not to be reclaimed, they would then be shredded. Note that if drives were to be reclaimed by the organization, their purpose would be for Tier 1 data management. It is poor practice to allow more restricted media to be moved to a lower data classification level.

At this point it is important to recognize that standards such as the Department of Defence example or cross-cut shredding methods are based on current technologies and should always be reviewed to ensure that secure data management practices such as encryption is kept in line with current good practices.

TIER 2 - CONTROLLED

In the data classification of Tier 2, the process here is to control your information. Examples may be from a personal perspective, salary grade, home address and home phone number. This information is considered controlled and is not restricted in the sense that an employee may inform another employee of their personal phone number or address for instance. From a business perspective, business strategies, information protected by attorney-client privilege, vendor contracts, service contracts, pricing contracts, organization charts, and employee handbooks are just a few examples.

When looking at the transmission of this type of data inside and outside of the network as well as email the recommendation may be to encrypt the information, but in general not required. In the areas of printing of hard-copies secure printing practices would not be required, however the recommendation would be to pick up the printed material as quickly as possible. Faxing of this type of data would be permitted provided that verification of a receipt of the fax can be provided and the use of a company-approved cover sheet is in place.

Similar to restricted print, the recommendation would be to keep controlled data in a secure locked cabinet or file storage area or a locked room. Once the data has outlived it’s usefulness disposal of the hard copes would be that of a shredding process, not necessarily a cross-cut using a secured shredding box. From an electronic perspective, hard-drives would need to be wiped before redeployment. Wiping drives is time consuming and using the DoD 5220.22-M ECE standard is not necessary due to the length of time it takes and the level to which a data wipe is performed. Unlike the restricted tier, in general this data is controlled and does not meet any government restrictions and does not pose a significant risk to the company from a liability or business perspective and as such the recommended approach is to implement the Air Force System Security Instruction 5020. Originally defined by the United States Air Force, this 2-pass overwrite is completed by verifying the write and is very quick.

Keep in mind that if these drives are to be reclaimed, a good practice is to keep the drives either at the controlled (original tier) or moved to the restricted Tier. It is poor practice to have higher restricted tier hardware moved to decreasing tier levels. Hardware should either remain at their current tier level or move towards the more restricted level. Never the other way around.

TIER 3 - PUBLIC DOMAIN

Classification of data falling under tier 3, that being public domain removes a lot of the restrictions such as encrypted data. In this case data encryption is not a requirement and for the most part your information such as annual or financial reports, marketing and social media does not fall under this level of scrutiny. Considerations for securing printed material in a lock box is not required, shredding of information is not required, faxing of documents with an approved cover head is permitted. Publishing this information on a web site does not require any special considerations.

Tier 3 seems to be the wild west of data classification, however with that said, there are precautions that need to be taken. The type of message you want to send to your customers and to the public domain needs to be from trusted sources and tamper proof. In fact in all areas of the tier system this is an underlying similarity. A commonality if you will where ensuring that the information is the information we want to convey and use must have a level of integrity in place. After all who would not want to have our annual reports tampered with?

A Brief Comment on Data Integrity

There are many ways to ensure integrity of your data. Using SSL websites is a starting point, however generating a hash of any official documents ensures the reader that the information being presented was not tampered with. There are many different types of hashes available, and although MD5 does have it’s critics and for good reason, I will use the MD5 hash method as an example to show how you could protect the integrity of a statement or an article. First for those that are unaware of what a hash is, consider it a fingerprint of a file or a sentence or any given input for that matter, even a photo. The hash is not encryption, it is a one-way transaction and as such it is impossible to reverse engineer any hash to retrieve the original content.

As to how a hash can work to identify a tampering of a report, this can be done via a application or presented directly to the user. When presenting a report to the public domain it is a common practice to also present the hash value. By doing this the person downloading the document could generate the hash value and determine if the document has been tampered with. Alternatively, having the document in a database, or simply a file system the programmer could generate a hash value for the document and compare it against the generated hash value at the time the document was created. By comparing the two hash values, the application can determine if the document was tampered with. If it was the document would not be presented to the requestor and the issue would then be reported to the Data Integrity Manager for analysis.

In the following example I will write the same sentence twice. You will see that the value of the hash will be completely different. Can you see why?

The purpose of a hash is to ensure data integrity.

MD5 HASH: e4e28dfb12f7eb05e23d524a57320987

The purpose of a hash is to ensure data integrity,

MD5 HASH: c7a25944f2cb8e0f52a8289240557ada

As you can see if your public domain data is compromised just a little, your hash value can be used to determine that the contents were tampered with. This is a powerful application to ensuring what you put in the public face is not tampered with. By the way, did you find the difference between the two sentences?

Conclusion

Data classification is a powerful tool in applying proper risk assessments and in the determination of what you should be concerned about to begin with. As with any framework or policy understanding the role of that framework is vital to the success of the implementation. Following through the policy with the intention of measuring the results and the level of compliance allows the business to function at it’s highest level of effectiveness by providing the workforce the direction required to be productive while at the same time protecting the assets of the business at all levels.

Big data is classified will be useful...

Like
Reply

To view or add a comment, sign in

More articles by Todd Lohvinenko

Others also viewed

Explore content categories