Do You Know Your Data?
Data-driven decision making is one of the most important factors in having a successful online presence, regardless of whether you are focused on digital marketing, ecommerce, or run an informational website. Key to this is, of course, data. Data comes in many sources and is stored in many forms. It may be complete, it may be incomplete, or it may dirty (inconsistent, duplicated, or other issues that lead to unusable data or data in need of transformation before use {like removing duplicates}).
Types of Data
Generally, in digital analysis, I think of data as falling into one of three types:
- Quantitative
- Qualitative
- Descriptive
Quantitative data describes what happened: How many visitors came to your website, How many subscribers opened your email, etc. Qualitative data describes why it happened: Why did visitors come to your website (product information, shopping, etc.), why did the subscriber open your email (good sale, reminded them they needed a new pair of shoes, etc.), etc. Descriptive data describes the individual (visitor or email subscriber, for example): Who came to your website (profession, marital status, own/rent home, etc.), Who opened your email, etc.
Types of Data Elements
The data elements themselves typically fall into one of four data element types:
- Continuous
- Nominal
- Ordinal
- Binary
Continuous data elements are those that are most often aggregated (measures, in data-warehousing speak), such as income or sales. Categorical data elements (nominal, ordinal, or binary) are data elements that take on a limited set of values. Categorical data elements are those that are most often used as dimensions in graphs/tables/charts. They are either nominal, which means that they take values with no particular order, like marital status or profession. Or they are ordinal, which is like nominal with the exception that the values have an order to them, such as credit rating or engagement score expressed as high, medium, or low. Lastly, binary data elements are data elements that can only take on one of two values, such as gender or employment status (if you are only accepting employed or unemployed, otherwise it would be a nominal data element).
Outlier Data
Data is rarely perfect, and beyond dirty and incomplete data, there are the issues of outlier data that can skew any meaningful analysis. This can be as simple as unidimensional data, such as income, where all values are below $200K, and then one value is $10Mil. Or it can be compounded by multivariate data outliers, such as income and age, where both are far outside the norm, thereby skewing data in both directions.
There are a number of ways of dealing with outliers. A simple histogram or box plot can point out outliers. These are simple visualizations where unidimentional data outliers can be identified. You can also use a little statistics and calculate z-scores, determine a score beyond which the data will not be included in the analysis. Regression lines can be used to identify multivariate outliers. I have rarely had to concern myself with multivariate outliers in digital analysis, but often have to deal with unidimensional outlier data. So, it is important to know how to identify and deal with them.
Data Standardization & Classification
One of the things I run into in dealing with social media data is in comparing competitive data. To compare apples to apples, I need to standardize the data because values, when divided (a post # comments/fans, for example), may become very small. Therefore, using a technique such as decimal scaling (dividing value by a power of 10) can be used to bring the values to a similar scale for comparison (without falling off the graph).
Lastly, sometimes it necessary to categorize (or classify) data to reduce the number of variables. This essentially transforms a continuous data element into a nominal or ordinal data element or can reduce a categorical data element into fewer categories/classes. For example, age is often categorized into age groups. This is done for easier analysis when you want to use age, for example, as a dimension instead of a measure.
Conclusion
I hope this gets you thinking not only about your data, but the types of data you have and some of ideas of how to deal with them. Data is increasingly becoming important to making intelligent business decisions and can be very valuable to you in optimizing your online presence.