As a business information category, “Big Data” (BD) is typically defined as; “one or more extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.” However, while the definition is easily read, what does that really mean as a practical matter, and more importantly, how does an enterprise develop, employ and achieve BD as a recurrent value-add?
The effectiveness of Big Data doesn’t simply exist on the basis of multiple databases shoehorned into a common framework and hope that the consequent mass of data will render meaningful information at the end of the day. Granted, record density is a central element, but that alone is a long way from establishing a final product value.
The BD theory:
Thankfully, we’ve had an elemental roadmap to work from since in the early part of the millennium, Gartner’s Doug Laney opined that the value of Big Data was characterized by three equal parts that, in turn, ultimately lead to the creation of this newly-emergent information category. They included;
Volume: Collaborative data elements delivered by multiple sources, including; current and historical business transactions, active/passive social media, recognition, and delivery of sensor-based or machine-to-machine records output.
Velocity: Particularly high-rate, real-time data mechanisms, coupled with equally adept applications management frameworks including; RFID tagging, Near-field Communications (NFC) sensor arrays, smart load and metering, etc., supported by DevOps soft-applications methodologies.
Variety: Ready access to holistic record elements regardless of type and format ranging from structured to alpha/numeric, and legacy data repositories including structured/unstructured text, email, video, audio, metrical commercial/financial elements.
Nevertheless, while Mr. Laney gave us a reasonable intellectual baseline, there was much more to consider, such as how the central BD value proposition would be applied in the real world. In this event, various legacy linguistic and developmental doctrines were originally utilized.
However, as the market began to innovate, forcing demands for faster data identification and manipulation, pushed along by an interest in global access to raw data delivered by previously unknown channels such as mobility, these approaches began to be seen as weak links in the overall production chain since they couldn’t stay ahead of future trends fostered by the premise of BD’s much faster multi-nodal data identification/exchange model.
Hadoop’s genesis and what it does:
The resolution of these concerns tended to start from the bottom and work up, and since BD’s original theoretical value was grounded in an ability to rapidly search and identify global data, then bring it home in a useful form, as one might expect, the Hadoop variant emerged from the search-world. In 2002 two University of Washington alums, Doug Cutting and Mike Carafella, began working on the nugget of an idea guided by producing a “better” open-source search engine.
They were making progress when they read a 2003 Google paper on its Google File System (GFS), and a paper on the applied use of parallel processing in the form of a MapReduce whitepaper in 2004 further supported the new awareness. Together, Cutting and Carafella experienced an intellectual epiphany that caused them to work forward, thereby leading to the initial underpinnings of what we ultimately know today as Hadoop.
Through a series of professional and technical confluences, the variant, including the mating of an Apache framework, ended up being adopted and extended by Yahoo as a large-scale commercial research tool able to handle “…42,000 nodes and hundreds of petabytes of storage,” Consequently, the technical construct was further extended by other third-parties including Christophe Bisciglia a former associate professor at University of Washington, with help from IBM and the National Science Foundation, who subsequently triggered the first Hadoop-based enterprise, Cloudera; launched in 2008.
Where the Big Data + Hadoop + DevOps model meets:
Now that we’ve established what BD is; where Hadoop came from; what it does; and how wide its data universe extends; the intrinsic utility delivered by big data consulting, based on the DevOps methodology, should be a bit clearer. Using the strategic management process and additionally leveraging Hadoop services and consultants, extended reach and fetch infrastructures can be created very quickly at a minimal cost.
These elemental values are further extended by Hadoop and big data development capabilities, applied in concert with additional experience in human resources. Since achieving the BD goal of delivering highly-active, large-scale information products on-demand while mitigating or eliminating any production slowdowns, the combined value delivered by BD, Hadoop, and DevOps provides for an optimal enterprise information opportunity.