Jinfo: Defining Big Data: The Four Vs

Defining Big Data: The Four Vs
Jinfo Blog

26th April 2013

Abstract

Security and speed of data transmission, accuracy of information stored and where to store vast amounts of data are just some of the issues that need to be tackled when considering big data projects. In his second article on the topic, Victor Camlek looks into the detail of the four Vs: Volume, Velocity, Vaulting and Validation and how they define the essence of big data.

Item

My first article in this series of three on "big data" introduced the notion of the "Four Vs". Here I look at each V in more detail.

Based on a review of numerous sources it has become clear that there is a user-friendly and basic way to describe the essence of big data using a series of terms that happen to start with a “V”. As always there are enough V candidates to have some variation among the ones that are used. Here are my choices, simple terms that when viewed together should provide a working definition that covers the various dimensions of big data.

Volume

It has become commonplace for analysts to agree that V-1 stands for Volume. I still recall a period when terabyte networks were considered to be the major coming attraction. Well, fast forward to 2013 and any meeting devoted to the amount of data that must be dealt with will describe volumes of data that are much greater, such as exabytes, petabytes, zettabytes, and even yottabytes. Without getting technical, let me say those terms represent a lot of data. For example, in everyday living you can now easily buy an external USB terabyte drive for around $150 USD. This leads me to recall a statement by Google chief Eric Schmidt, who said, "There were 5 exabytes of information created between the dawn of civilization through 2003,”… “but that much information is now created every 2 days, and the pace is increasing.” Given the increasing amount of complex data that we are now collecting, it becomes very difficult to consider the amount of data that is now ripe for collection. So in a generally accepted formula, V-1 means that we are dealing with an enormous volume of data.

Velocity

Nor is there much debate about V-2, which is all about the dramatic speed at which data can now be transmitted. Data is simply moving to and fro faster than ever before. Whereas leased lines used by corporations for voice and data applications used to seem fast at around T-1 speeds (1.544 mbs), there are now much faster options available today along with routers capable of working at enormous speeds with concurrent connection properties that are staggering. I’m by no means a networking expert, but I can say with certainty that the velocity at which all forms of data may travel, including video, is definitely the second element of big data.

Vaulting

I saw an internet piece by Steve Baunach, founder of Stratview, where he used three Vs (Volume, Velocity, and Variety) to clarify big data. His take was that there is a great variety of data - such as video. This is true, but whilst mostly everyone agrees with volume and velocity, I’d weigh in that formats are pretty well covered within the first two Vs. To me the third significant V revolves around another critical aspect of the problem, Vaulting, or simply the Data Vault, as a euphemism for secure storage. I recall a meeting during my days at Telcordia Technologies quite a while ago, where I was a fly on the wall and observed some brilliant people engaged in a raging debate about the future of data speeds, protocols and bandwidth. I vividly recall the chief scientist calling the crew of passionate techies to order and proclaiming in a most emphatic manner, “Guys, please do not forget about the storage, we need to put this data somewhere!" This proved to be a prophetic moment, and for today’s purpose it supports my notion of the importance of the Information or Data Vault.

The vault concept appears to come out of the banking industry and refers to the need for secure storage of valuable currency, which big data has certainly become. Very often the Data Vault is housed off-campus. Also, given the growing volume of data, this leads to a build vs. outsource business problem that is very visible in today’s environment every time someone speaks about “the cloud”. The cloud represents the ultimate storage solution to large enterprises who simply can no longer afford their own data storage infrastructure. More and more enterprise decision makers are turning to offload the storage of data to those firms who can guarantee reliable, secure and easy access by virtue of their ability to create giant server farms that offer secure storage to business users. Data can be a wonderful thing, but like any other valuable possession it needs to be housed somewhere safe and it must be available when needed. When the cost of managing this prized resource becomes too great there is a need to use a bigger and safer vault, known today as the cloud.

Data Vault Modelling refers to the process of gathering knowledge about collected data so that the history of the way it was gathered may be documented. However, Data Vault Modelling does not address the accuracy of the data. Rather, there is another aspect necessary to support the purpose of using data to support critical decisions or actions. This leads to the fourth "V" for Validation.

Validation

In my view, the fourth V, and arguably the most critical aspect of big data, is the need to make sure that the trillions and trillions of bytes we collect are accurate. The old saying of “garbage in, garbage out” was never more true. Big data is not simply about collection, storage and speed. It is truly about finding ways to make the data actionable and even predictive. Big data is now essential to support both routine and mission critical business decisions. Hence, my proposition to support Validation as the fourth V is vital, yet another V-word, to an understanding of big data. It is a no brainer to say that business leaders must trust the validity of the data used to make critical business decisions.

Hopefully this non-technical view was helpful. I would also like to address one more question, “What should big data mean to the information professional?” Stay tuned for the piece in my next perspective that will attempt to address the role of the corporate information professional in the world of big data.

Editor's Note: Big Data in Action

This article is part of the FreePint Topic Series: Big Data in Action, which includes articles, reports, webinars and resources published between April and June 2013. Learn more about the series here.

Articles in series: