Big Data, Small Data, and Everything In Between

measuring performance data

I’m definitely not the first person to say this, but the term “Big Data” is significantly overused.  In fact, Big Data has become a catch-all phrase for anything related to data analysis, (regardless of whether it actually involves data sources commonly associated with Big Data).  In some ways, a lot of people probably think they are working with Big Data when in fact they are not.

So, what exactly is Big Data, how is it different from more conventional data sources, what is the benefit of using it, and how does it relate to organizational culture?

Over the past 7-10 years our ability to collect and store digital information has increased exponentially.  In fact, it is estimated that the world’s per capita data capacity has doubled every 40 months since 1980, currently we generate about 2.5 exabytes of data every day (that’s about 500 million hard drives or 1 billion digital movie downloads).

This data comes from social media, GPS sensors, digital pictures, video, and purchase transactions, among other sources.  The term “Big Data” comes from our attempt to collect, compile, and analyze this information in meaningful ways.

More simply put, Big Data is information that is too large and complex to use traditional data management practices.  Since Big Data comes in a variety of formats, it needs to be cleaned and structured before it can be analyzed.  Due to its size, this takes time and can be a very involved process.  In terms of ease of use, it is definitely on the more challenging side of things.  If you think of data management as a continuum, datasets (i.e. surveys) are on one end, databases (i.e. sales records, employee information) are in the middle and Big Data is on the other end. The larger the data source the harder it is to manage.

The hype surrounding Big Data is usually focused on two things: (1) its sheer size, which potentially gives us a more representative sample of the population, and (2) its content, which allows us to look at areas that would not have been feasible 10 years ago.  Because of this, Big Data is touted as the key to unlocking our ability to understand large social, medical, biological, and emergent trends.  The ultimate goal is for businesses, nonprofits, consumers, and policy makers to leverage this data to make more informed choices and better understand the potential cause and effects of those choices.

While I’ll be the first to push for using data to better understand our client’s needs, bigger isn’t always better.  Ultimately the type of data we use depends on the scope of the questions we want to answer.  For instance, if you are trying to figure out what genes are linked to diabetes, then leveraging a large amount of complex data (i.e. Big Data) will probably help you reach your goal. However, if you want to understand what contributes to employee satisfaction and performance, then a traditional dataset or database is more appropriate.  Most analyses of organizational culture leverage smaller data sources such as surveys, interviews, and metrics reports (see my post on using text mining to understand culture).  As of now, Big Data is too complex and resource intensive for most organizations to utilize; however, it is likely that companies such as Google, Apple, and Facebook (which have both a sizable workforce and processing power) may become pioneers in using Big Data to understand the inner workings of their organizations and fine tune the customer-business relationship.

One last remark on Big Data.  Big Data is still in its infancy.  There are a lot of interesting things people are doing with complex data sources, but there are a number of kinks that still need to be worked out.  Researchers have a very good grasp on using Big Data to predict trends, but they are far behind on the ability to make inferences.  For instance, Amazon can recommend products based on your search and purchase history (prediction) but I am not sure whether they are able to determine how likely you are to actually purchase the product.  In other words, the models are robust, but our confidence in the models still needs some improvement.

At the end of the day, regardless of its format, data is becoming an integral part of the way we do business and make choices as customers.  While there will likely be only a handful of people doing the actual analysis, it will be essential for people to be critical consumers of data so that they are able to assess the quality of the analysis, understand its strengths and weaknesses, and discuss their thoughts with others.

Now that we’ve touched on the basics, you may be wondering about how businesses acquire and use this data.  I’ll discuss the ethics of data in my next post (stay tuned).