From “Principles of Big Data” (J.J. Berman, 2013: p.xxiii, Morgan Kauffman)
“Generally, Big Data come into existence through any of several different mechanisms.
1. An entity has collected a lot of data, in the course of its normal activities, and seeks to organize the data so that materials can be retrieved, as needed. The Big Data effort is intended to streamline the regular activities of the entity. In this case, the data is just waiting to be used. The entity is not looking to discover anything or to do anything new. It simply wants to use the data to do what it has always been doing—only better. The typical medical center is a good example of an “accidental” Big Data resource. The day-to-day activities of caring for patients and recording data into hospital information systems results in terabytes of collected data in forms such as laboratory reports, pharmacy orders, clinical encounters, and billing data. Most of this information is generated for a one-time specific use (e.g., supporting a clinical decision, collecting payment for a procedure). It occurs to the administrative staff that the collected data can be used, in its totality, to achieve mandated goals: improving quality of service, increasing staff efficiency, and reducing operational costs.
2. An entity has collected a lot of data in the course of its normal activities and decides that there are many new activities that could be supported by their data. Consider modern corporations—these entities do not restrict themselves to one manufacturing process or one target audience. They are constantly looking for new opportunities. Their collected data may enable them to develop new products based on the preferences of their loyal customers, to reach new markets, or to market and distribute items via the Web. These entities will become hybrid Big Data/manufacturing enterprises.
3. An entity plans a business model based on a Big Data resource. Unlike the previous entities, this entity starts with Big Data and adds a physical component secondarily. Amazon and FedEx may fall into this category, as they began with a plan for providing a data-intense service (e.g., the Amazon Web catalog and the FedEx package-tracking system). The traditional tasks of warehousing, inventory, pickup, and delivery had been available all along, but lacked the novelty and efficiency afforded by Big Data.
4. An entity is part of a group of entities that have large data resources, all of whom understand that it would be to their mutual advantage to federate their data resources. An example of a federated Big Data resource would be hospital databases that share electronic medical health records.
5. An entity with skills and vision develops a project wherein large amounts of data are collected and organized to the benefit of themselves and their user-clients. Google, and its many services, is an example (see Glossary items, Page rank, Object rank).
6. An entity has no data and has no particular expertise in Big Data technologies, but it has money and vision. The entity seeks to fund and coordinate a group of data creators and data holders who will build a Big Data resource that can be used by others. Government agencies have been the major benefactors. These Big Data projects are justified if they lead to important discoveries that could not be attained at a lesser cost, with smaller data resources” (J.J. Berman)