Innovation and data science
The world seems to be obsessed with data. Particularly, Big Data. In the innovation community, you can’t walk a step without hearing someone talk about how much data they’ve got: being a "data scientist" is basically like being a unicorn. Data science is poised to be the source of most major upcoming innovations.
It's promising, but in order to do something truly useful, we need to apply more design thinking to data capture, management, analysis, and visualization. It seems there are a couple companies doing this, like Datascope and Walmart, and some research groups like The Human-Centered Data Research Lab at the University of Washington, but not nearly enough people are using this approach.
From a designers perspective, approaching data science using design thinking is important for several reasons:
- Most data is assumed to be quantitative, but behind every quantitative measure and every metric are a host of qualitative judgements: what we can and should measure. These are still human choices.
- Numbers dazzle us and can intimidate people. Invoking numbers can be a powerful rhetorical device, but can easily be abused to trick people into accepting invalid conclusions.
- Data science can seems more like an arcane art than an actual science (or engineering discipline). Although dealing with numbers, notation, and code looks precise, there is just as much room to fudge data as anything else. Data scientists should not be data whisperers.
- It's imperative to ask how the data were collected. Just because it's big doesn't mean it's representative. Garbage in, garbage out.
That means learning to addresses what and how quantitative data are captured, how it's sampled or aggregated, what questions are asked of data, and what processes are enacted as results are generated, and how it's analyzed. Designers need to learn to be part of the conversation so that humans are not forgotten in the data science process.
I fully agree with your point 4. It is important to know the context in which the data originally was collected and what quality corners are cut off during collection. There is reason in the data madness. The trick is to understanding this, to be able to evaluate the usefulness of the data for your own purposes.
Humans with scientific reasoning are always needed to resolve ambiguity in data. A machine can do that only if it is as smart as human brains with the ability to reason and learn. In that case, we have other things to worry about.
Interesting! I agree with you that it's not very hard to consume data(or big data), but what you are trying to say / prove is the more difficult part.