Demystifying Big Data

Article by Govind Desikan

What the Hell is Big Data? - This is the question that rises in every IT stake-holder's mind. This article is to provide a primer on what if BIG DATA, the new BUZZ word.

Big Data - I believe will change the world. It will be even bigger than the Internet. What's certain, big data will impact everyone's life. Having said that, I also think that the term 'big data' is not very well defined and is, in fact, not well chosen. Let me use this article to explain what's behind the massive 'big data' buzz and demystify some of the hype.

Basically, big data refers to our ability to collect and analyze the vast amounts of data we are now generating in the world. The ability to harness the ever-expanding amounts of data is completely transforming our ability to understand the world and everything within it. The advances in analyzing big data allow us to e.g. decode human DNA in minutes, find cures for cancer, accurately predict human behavior, foil terrorist attacks, pinpoint marketing efforts and prevent diseases. Take this business example: A retail company which can take data from your past buying patterns, their internal stock information, your mobile phone location data, social media as well as external weather information and analyze all of this in seconds so it can send you a voucher for a home-cleaner to your phone – and you currently are within a specific radius of a retail store that has the home-cleaner in stock. That's scary stuff, but one step at a time, let's first look at why we have so much more data than ever before. Simply put - big data is all about 'datafication'.

This datafication is caused by a number of things including the adoption of social media, the digitalization of books, music and videos, the increasing use of the Internet as well as cheaper and better sensors that allow us to measure and track everything. Just think about it for a minute:

  • When you were reading a book in the past, no external data was generated. If you now use a tablet device, the book publisher track what you are reading, when you are reading it, how often you read it, how quickly you read it, and so on.
  • When you were listening to CDs in the past no data was generated. Now we listen to Music on your iPhone or digital music player and these devices are recording data on what we are listening to, when and how often, in what order etc.
  • Today, most of us carry smart phones and they are constantly collecting and generating data by logging our location, tracking our speed, monitoring what apps we are using as well as who we are ringing or texting.
  • Finally, combine all this now with the billions of internet searches performed daily, the billions of status updates, wall posts, comments and likes generated on Facebook each day, the 400+ million tweets sent on Twitter per day and the 72 hours of video uploaded to YouTube every minute.

Google's executive chairman Eric Schmidt brings it to a point: "From the dawn of civilization until 2003, humankind generated five exabytes of data. Now we produce five exabytes every two days…and the pace is accelerating."

Not only do we have a lot of data, we also have a lot of different and new types of data: text, video, web search logs, sensor data, financial transactions and credit card payments etc. In the world of 'Big Data' we talk about the 4 Vs that characterize big data:

  • Volume - the vast amounts of data generated every second
  • Velocity - the speed at which new data is generated and moves around (credit card fraud detection is a good example where millions of transactions are checked for unusual patterns in almost real time)
  • Variety - the increasingly different types of data (from financial data to social media feeds, from photos to sensor data, from video capture to voice recordings)
  • Veracity - the messiness of the data (just think of Twitter posts with hash tags, abbreviations, typos and colloquial speech)

So, we have a lot of data, in different formats, that is often fast moving and of varying quality - why would that change the world? The reason the world will change is that we now have the technology to bring all of this data together and analyze it.

In the past we had traditional database and analytics tools that couldn't deal with extremely large, messy, unstructured and fast moving data. Without going into too much detail, we now have software like Hadoop and others which enable us to analyze large, messy and fast moving volumes of structured and unstructured data. It does it by breaking the task up between many different computers (which is a bit like how Google breaks up the computation of its search function). As a consequence of this, companies can now bring together these different and previously inaccessible data sources to generate impressive results. Let’s look at some real examples of how big data is used today to make a difference:

  • The Investigating Agencies are combining data from social media, CCTV cameras, phone calls and texts to track down criminals and predict the next terrorist attack.
  • Facebook is using face recognition tools to compare the photos you have up-loaded with those of others to find potential friends of yours.
  • Politicians are using social media analytics to determine where they have to campaign the hardest to win the next election.
  • Video analytics and sensor data of Cricket games is used to improve performance of players and teams. For example, you can now buy a Cricket bat with over 200 sensors in it that will give you detailed feedback on how to improve your game.
  • Artists are using data of our listening preferences and sequences to determine the most popular playlist for their live performances.
  • Google’s self-driving car is analyzing a gigantic amount of data from sensor and cameras in real time to stay on the road safely.
  • The GPS information on where our phone is and how fast it is moving is now used to provide live traffic up-dates.
  • Companies are using sentiment analysis of Facebook and Twitter posts to determine and predict sales volume and brand equity.
  • Supermarkets are combining their loyalty card data with social media information to detect and leverage changing buying patterns. For example, it is easy for retailers to predict that a woman is pregnant simply based on the changing buying patterns. This allows them to target pregnant women with promotions for baby related goods.
  • A hospital unit that looks after premature and sick babies is generating a live steam of every heartbeat. It then analyses the data to identify patterns. Based on the analysis the system can now detect infections 24hrs before the baby would show any visible symptoms, which allows early intervention and treatment.

And these examples are just the beginning. Companies are barely starting to get to grips with the new world of big data. In conclusion then, big data will change the world. In terms of language I prefer to talk about the "datafication" in relation to the ever-growing amounts of data and 'large-scale analytics' (or simply 'analytics' because what is large now will be normal tomorrow) in relation to our ability to analyze and harness big data.

At the moment, I work with executive teams of companies spanning all sectors and sizes to help them develop strategies to harness big data and find each of these discussions and projects amazingly fascinating because they all open up new opportunities.