The term Big Data highlights the information inflation we are experiencing nowadays: many databases, each of considerable volume, include both comprehended and obscure information. Databases referred to as Big Data feature the following properties:
Volume-the amount of data accumulated in organizations is rapidly increasing (for example, over the years data concerning executed deals are stored in order to analyze the patterns and trends of exchange in the stock market). Furthermore, the increase in volume of data is derived from the fact that additional sources of information are added to the database (for example, obscure data received from Social Networks). These all increase the amount of data analyzed.
Variety- data comes today in various formats, figures built in to traditional databases, supplementary information retrieved from documents, emails, video clips, websites, social networks, etc.
Complexity-as noted above, the data are received from numerous sources. Therefore, cleaning and adapting the information is necessary, mainly by identifying patterns and relations that cannot be foreseen by human intellect. In order to identify these patterns algorithms are used. These algorithms were expanded and adapted to work in the required amount and pace.
Velocity-due to customers demanding complex answers in real time (i.e. as quickly as possible). Data flows quickly to the database and should be dealt with in the shortest time possible. Dealing with the velocity of receiving and processing data is a challenge for most organizations, since the process requires advanced means of storage that can support large amounts of information and enable rapid writing and retrieval.
For example, when attempting to locate preliminary information, data from several information sources is cross-referenced (e.g. information from Social Networks such as Facebook and Twitter, Information from systems such as the border police records, cellular phone activity, etc.). The purpose of this long process is to identify patterns and relations that might indicate a possibility for future terrorist activity.
The difference between the familiar classic BI and Big Data:
Big Data is a sub-world of the world of BI. Big Data systems are potential for data mining. The ability to identify interesting patterns and learn from them about the content world is made possible thanks to the increase of data.
In classic BI, integral data related to the organization's activities is processed, the analysis including cross referencing and 'chopping' the information concerning past activities. However, when dealing with Big Data integral and obscure information (collected by advanced search and monitoring tools such as analyzing patterns of web surfing) will be combined.
In many cases when dealing with Big Data, the focus shifts from analyzing past events to analyzing possible predictions. This kind of questions can produce substantial insights for the organization concerning its future activity.
The difference between Big Data and Knowledge Management:
This is actually a subject that deserves an article of its own.
References:
コメント