Thursday, September 25, 2014

There is no unicorn in your BigData

Recently, companies have started to heavily invest in data science solutions. These analytic solutions come under many names such as BigData - Machine Learning - Deep Learning - Business Intelligence etc. and there is a lot of misconceptions about these solutions out there presently. This next series of blog postings will try to explore the various issues as well as pitfalls surrounding these technologies and provide cautionary advice on how best to avoid them. In this first posting, I will be looking at why it is a pointless exercise to hope to find the “next big idea” in your data and instead, one should be leveraging the information extracted towards operational excellence as well as market dominance. 




Why is there no Unicorn in your BigData ?

You won't be able to find the next big thing when you mine your own data because what you are basically doing is building a highly efficient expanding system to extract all value from data of the infinite continuum within a finite domain [1].

Ok, so what does this really mean? Let’s break down the key elements of this statement: (i) Finite Domain, (ii) Infinite Continuum and (iii) Expanding System.
  • (i) "Finite Domain": as a company, you are exploring the data generated from within a finite market bounded by physical and economical limitations. Simply put, there is a maximum value that can be extracted from the industry or marketplace ecosystem in which you are evolving in and you are extracting the information from this same domain.
  • (ii) "Infinite Continuum": within this finite ecosystem exists an infinite continuum and basically this translates into an infinite number of variation of products in order to cover completely the potential consumer domains. This is a corollary of Cantor's diagonal argument
  • (iii) "Expanding System": in order to deliver the optimal product implementation to capture completely the market share, a company needs to adjust an infinity of small details, while perfection is not achievable, you can try to move towards it as you cannot capture the infinite complexity with finite software code as it would require you to run it indefinitely, as demonstrated by Turing's halting problem
Startups in their initial phases tend to excel in maximizing the process of product discovery and refinement by iterating extremely fast. They are nimble and less rigid than their structured older siblings. These established companies can emulate such approach by leveraging "BigData" and other machine learning techniques to move towards this ideal, albeit at the different pace, but with less risk and effectively compensating the lack of agility for larger data-sets to tap into. 

However, leveraging this data won’t magically enable them to break away from the pre-existing market limitations as they are trapped by the very premise they started from. Either they already have the "unicorn" idea and leveraging analytic, operational excellence, and sheer luck will enable them to iterate as fast as possible towards explosive growth. Or, they are already in an established ecosystem and in order to develop novel business models or innovative products they will need to venture far into the unknown. Data Analytic will only help once they do this jump as it won’t help them to recognize and even less validate true but unprovable novel ideas when they encounter them unfortunately

Enterprises looking to leverage data science solution should be careful to understand the true benefit they can extract from such solutions and avoid a false hope that this will miraculously bring them to the unicorn pasture field.

In the following post, we will be looking why such technologies must still be embraced sooner than later.



[1] Inspired by the excellent post: Gödel Incompleteness For Startups , Max Skibinsky - January 2013