Nov 222012

Earlier this week I was in Singapore, attending the MRSS Asia Research Conference, which this year focused on the theme of Big Data. There was an interesting range of papers, including ones linking neuroscience, Behavioural Economics, and ethnography to Big Data.

One reference that was repeated by several of the speakers, including me, was IBM’s four Vs, i.e. Volume, Velocity, Variety, and Veracity. Volume is a given, big data is big. Velocity relates to the speed that people want to access the information. Variety reminds us that Big Data includes a mass of unstructured information, including photos, videos, and open-ended comments. Veracity relates to whether the information is correct or reliable.

However, as I listened to the presentations, and whilst I heard at least three references to the French mathematician/philosopher René Descartes, my mind turned to another French mathematician, Peirre-Simon Laplace. In 1814, Laplace put forward the view that if someone were (theoretically) to know the precise position and movement of every atom it would be possible to estimate their future position – a philosophical position known as determinism. Laplace was shown to be wrong, first by the laws of thermodynamics, and secondly and more thoroughly by quantum mechanics.

The assumption underlying much of Big Data seems to echo Laplace’s deterministic views, i.e. that if we have enough data we can predict what will happen next. A corollary to this proposition is a further assumption that if we have more data, then the predictions will be even better. However, neither of these is necessarily true.

There are several key factors that limit the potential usefulness of big data:

  1. Big Data only measures what has happened in a particular context. Mathematics can often use interpolation to produce a reliable view of the detail of what happened. However, extrapolation, i.e. predicting what will happen in a different context (e.g. the future) is often problematic.
  2. If you add random or irrelevant data to a meaningful signal, then the signal is less clear. The only way to process the signal is to remove the random or irrelevant data. If we try to measure shopping data and we collect everything we can collect, then we can only make sense of it by removing elements irrelevant to the behaviour we are trying to measure – bigger isn’t always better.
  3. If the data we collect are correlated with each other (i.e. they exhibit multicollinearity) then most mathematical techniques will not interpret their contribution of the factors correctly – rendering predictions unstable.
  4. Some patterns of behaviour are chaotic. Changes in the inputs cause changes in the outputs, but not in ways in which are predictable.

One of the most successful organisations in using Big Data has been Tesco. For almost 20 years, the retailer Tesco has been giving competitors and suppliers a hard time by utilising the data from its Clubcard loyalty scheme. Scoring Points (the book about Tesco written by Clive Humby and Terry Hunt) shows that one key to Tesco’s success was that they took the 4 points above into account.

Tesco simplified the data, removed noise, categorised the shoppers, the baskets, and times of day. Their techniques are based on interpolation, not extrapolation, and they are able to extend the area of knowledge by trial and error. Big Data is going to be increasingly important to marketers and market researchers. But, its usefulness will be greater if people do not over-hype it. More data is not necessarily better. Knowing what people did will not necessarily tell you what they will do. And, knowing what people did will often not tell you why they did it, and that they might do if the choice is repeated or varied.

Marketers and market researchers seduced by the promise of Big Data should remember Laplace’s demon – and realise that the world is not deterministic.