5 Metaphors for Big Data and Why They Matter

From the advent of written language, we have proof that we humans love to tell stories. Ancient myths were humans’ ways of explaining the world around them in terms they could understand. And while we may have moved past explaining natural phenomena like rainbows and earthquakes with stories, we have taken this method of understanding and explaining the world and applied it to difficult concepts like quantum mechanics, economics, and big data.

As a field, big data is rife with metaphors that help the initiated explain it to the layperson. But just as ancient myths got some things wrong when it came to explaining the world, we have to be careful with the sorts of metaphors we use to explain big data, and what the terms actually mean. As you can see from the following examples, some big data metaphors hit the mark, and some don’t.

Machine Learning and Artificial Intelligence

The words learning and intelligence seem to imply that computers think and learn in the same way humans do, but nothing could be further from the truth. Computers compute (hence the name), and they do so at a much faster rate and with greater accuracy than humans do. They can perform highly complex computations, but they still rely on an initial program – an algorithm or set of instructions – to tell them how. They literally cannot think outside the box.  These terms give the false impression that machines think. In reality, they literally cannot think outside the box.

Data Tsunami, Flood, or Deluge

qatar airways

Deborah Lupton said in a piece on liquid metaphors for big data: “These rather vivid descriptions of data as a fluid, uncontrollable entity possessing great physical power emphasize the sheer volume and fast nature of digital data movements, as well as their unpredictability and the difficulty of control and containment. They suggest an economy of digital data and surveillance in which data are collected constantly and move from site to site in ways that cannot easily themselves be monitored, measured, or regulated.”

In addition, liquid is controlled by pipes, channels, and streams, so the metaphor of “data as liquid” is continued and applied to the infrastructure needed to handle all that fast-moving liquid data.  Data also can be “leaked” or “spilled,” much the same way that liquid can.

Data as Oil, Gold, or Other Valuable Resources

The idea that data is “the new oil,” or that there is a sort of “data rush” (like a gold rush) highlights the idea that data is a valuable resource for companies. But it also can highlight some of the pitfalls of data. Humans have long gone to war over resources, and as a result of viewing data as a valuable resource, companies are realizing that they must protect their data resources as they would protect a claim. People also talk about data as though we were fishermen, “casting a wide net” for data or using a data dragnet to describe collecting and hoarding data, whether it’s useful or chum.

Data as Food

We have “raw” data “feeds” that we “clean” before we “ingest” it into our systems and “digest” for its nuggets of wisdom. But “raw” data and “clean” data are relative terms. Your version of raw data depends on your position in the data process. One person’s raw is someone else’s well done, and one person’s clean data is, to another, still quite polluted. And data can never be considered truly “raw” or “clean,” because some person’s bias went into its collection and into how it was cleaned. In short, these terms are never as absolute as they seem.

‘Big’ Data

It’s true: Even the term “big data” itself can be misleading. When we hear it, we automatically think of expansive size, relevance, and importance. But it doesn’t necessarily mean any of those things. Big data can be collected for small businesses on a small scale just as easily as it can be collected for big corporations. And the size of your dataset has absolutely no bearing on its relevance, utility, or meaning.

It would be practically impossible to stay away from using any metaphor at all when discussing complex subjects like big data, but it’s important that those of us using a metaphor – and passing on the myth, so to speak – understand the truth behind the stories we tell.