Monday, March 28, 2016

Big Data Defined

With all the discussion about big data, there is a persistent problem. There is not general agreement on a definition of big data. For some, it means data available on the internet generally; for others, it's data coming from social media, or the internet of things. It sometimes refers to unstructured data and for others includes structured data such as that available from relational databases.

Sometimes big data is defined according to the tools used to analyze it, such as Hadoop or Spark. For others it relates to data from enterprise systems, like ERP and CRM.

Thee are lots of definitions around. Wikipedia, for example, says "big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate." Most people would say this definition is too narrow.

Webopedia defines it as follows: "Big data is a buzzword, or catch-phrase, meaning a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques."

This definition is better as it focusses on structured and unstructured data, which encompasses both data from traditional business systems as well as internet data such as that from social media. It also refers to massive in quantity, which is one of the defining characteristics.

A more analytical approach to defining big data is through the use of the words Volume, Variety and Velocity, perhaps with the word Variability. But sole use of these words does not clearly define big data. Nevertheless, they do provide a framework for a definition. Volume means very large amounts of data. Variety means data coming from very different sources, from business systems to the Internet of Things. Velocity is important because big data is always moving in fast, and there is a trend now to streaming analytics to recognize this. Variability means the data changes in volume, format and source.

Wrap these together and we can approach a definition. Big data is structured and unstructured data coming from a variety of sources, such as business systems, social media and the internet of things, moving at a high velocity and with frequently changing sources, formats and subject matter.

This definition may not be perfect and elegant, but at least it is broad but specific and encompasses the generally understood characteristics of big data.

For some further reading, check out these references: from Techtarget, Wikipedia and Webopedia.



Thursday, March 24, 2016

Bank of Russia Implements XBRL for SMEs


Online media portal Russia Today is reporting that the Bank of Russia is planning on simplifying procedures for the issuance of securities by SMEs, including the introduction of XBRL. The intention is to improve the bond market by cutting costs and improving the flow of information to investors. They also plans legislation changes to improve overall handling of debt with an eye towards reducing reliance on banks as a source of funding for SMEs.

More regulators are seizing on the opportunity presented by using structured data and the resulting transparency to meet strategic aims like promoting economic growth, transforming capital markets or improving government performance. This is a trend we expect to see more of and to make its way down into the enterprise as well. 

(source: XBRL International Newsletter)

Monday, March 21, 2016

Business Goes for Streaming Analytics

In a move that is likely to prove a watershed, business has been moving into Streaming Analytics. A recent report by Forrester Research pinpointed this trend in a study of the adoption of data analytics. They pointed out that three years ago, business was struggling with ways to apply analytics to their existing data stores and some big data.

Forrester defines streaming analytics software as technology that "can filter, aggregate, enrich, and analyze a high throughput of data from multiple disparate live data sources and in any data format to identify simple and complex patterns to visualize business in real-time, detect urgent situations, and automate immediate actions."(source)

A number of prominent software providers and technologies are available, including "Apache Spark Streaming, Apache Storm, Data Torrent, IBM, Informatica, SAP, Software AG, SQLstream, Strim (WebAction), TIBCO, and Vitria," says Forrester.

The advent of streaming analytics could herald a new era in business decision making. In the past, decisions have largely been based on historical information with attempts to extrapolate into the future using whatever current information and intuition is available. Streaming analytics will reduce the uncertainty of this approach and add some real science to the decision making process.




Thursday, March 17, 2016

Where Big Data Analytics is Headed

We hear much about big data and big data analytics. And we are told that it is the big next thing. But how much of this is hype. And where is it really going?

Forbes has published a summary of predictions that sheds some light on the whole matter. It highlights how the amount of data will grow exponentially, that analytics tools will be changing to move past SQL into Spark and others. More importantly they talk of how new user friendly tools are being released, such as those from Microsoft and Salesforce, that do not require programming expertise. Even more importantly they predict that machine learning will play a big role in the future of data analytics, with perhaps even tools that operate free of people.  They also stress the importance of prescriptive analytics, which takes descriptive and predictive analytics to the next stage by indicating not only what will happen in the future but why it will happen, paving the way for serious support of decision making.

Overall the development of data analytics is heading for uncharted waters and may take directions we don't see yet.

For this thought-stimulating article in Forbes Magazine, follow this link.

Tuesday, March 15, 2016

Predictive Analytics can Improve Business Decisions

Predictive Analytics is a means of studying large amounts of data and drawing from it inferences about future behaviour of customers, employees, stakeholders, and others. While other kinds of analysis can indicate problem areas in, say sales, and tell management what isn't working, predictive analytics can indicate what policies are likely to work before they are implemented.

The availability of big data is particularly useful for predictive analytics because of the sheer volume of data and the coverage of behaviour it encompasses.

Companies are therefore using predictive analytics with increasing success in a variety of circumstances, including analyzing individual customer traits to determine how best to serve them and to determine the most effective procurement strategies in advance. Specific information on the elasticity of demand can also be used to determine the best price/production strategies. There is a myriad of possible scenarios where predictive analytics can be used, which accounts for its popularity.

For some specific examples, check out this site.

Monday, March 14, 2016

Data Analytics for the Internet of Things

As the numbers of buildings, cars, appliances and other things get connected to the internet, and the data generated by this connectivity grows in volume, And yet, the variety and sheer scope of the data available almost defies interpretation and analysis.  The task is not only one of analyzing data from different platforms, but the more difficult task of analyzing data coming from widely disparate devices.

Data showing driving conditions in a particular area, for example, can be distorted by a myriad of non relevant events. The same goes for liveability conditions in a particular type of building. The volume of IoT data is so great that is can't even be analyzed on the cloud.

Data analytics is attempting to address this data, but it has been recognized that there needs to be some sort or order brought into the data and the analytical approach taken.

A team of researchers supported by the National Science Foundation in the US is looking into this issue and is charged with developing a framework for conducting data analytics across a variety of IoT devices. The framework will consist of an organization of software that will facilitate communications and research into the data. The research is led by Stacy Patterson, the Clare Boothe Luce Assistant Professor of Computer Science at Rensselaer Polytechnic Institute (RPI). Read more at: http://phys.org/news/2016-03-internet-thingsa-framework-analytics-digital.html#jCp


Friday, March 11, 2016

Using Data Analytics for Developing Emerging Markets

Data Analytics, including that involving big data, is increasingly being used for decision making in a variety of organizations. Some of the data is embedded in new systems and some is gathered from the internet on, for example, social media and through the internet of things.

The essential task of big data analytics is to transform raw and structured data from a variety of disparate sources into actionable knowledge.

This process involves using new tools, often based on Hadoop, such as Google DataQuery and BigQuery. It makes use of contemporary computer capabilities such as high speed communications, massive storage capability and super powerful processors.

A good example of big data analytics in action is that of emerging energy markets, such as that in the Gulf of Mexico's Mexican region.

"Data analytics is being used through most of the lifecycle of offshore activities. During seismic and reservoir characterization studies, data sources with 3D seismic data, well logs and faults, are integrated and analyzed to support decisions related to achieving key targets in flow assurance, field optimization, drilling performance, well categorization and so forth. Benefits range from attaining optimal reservoir exploitation rate to forecasting the decline of new wells.

"For fixed, floating and subsea assets, data analytics starts with collecting data at the asset level, including operating parameters, equipment status, structural stresses and environmental data. For moving assets such as offshore support vessels and dynamic positioning floaters/vessels, data collection can also include location, direction and speed."

In this way big data analytics facilitates decision making in a difficult market. For more on this particular application, check out this link.

Monday, March 07, 2016

LLoyds and Google Team Up

Lloyds Group has teamed up with Google in analyzing big data relating to its insurance customer's non-personal behaviour. In the initial project they analyzed a year's worth of data in under one minute using Google BigQuery. One of the outcomes is that they were able to reduce certain response times from 96 hours to 30 minutes - a remarkable achievement.

The work will continue with a wider array of data and more Google tools for big data analytics, including Data Flow and Big Table.

Big Data Analytics is starting to show real results and while there is still a novelty factor to it, within the year it will be a competitive necessity in many industries.


Friday, March 04, 2016

Big Data Analytics Begins to Mature

The ability to make effective use of big data has been hampered by the lack of big data skills along with the lack of useful tools for analysis. Both of these areas are being addressed by interested organizations.

The lack of big data scientists has been bemoaned since the advent of big data and the realization of its potential. Rutgers is making an important step forward by offering a new program that focuses on big data skills. On March 29, the Center for Innovation Education at Rutgers University will begin its 44-week skills-based technology career certificate program for professionals who want to gain skills in big data disciplines. Initially the course is only open to recent grads in the US. But is marks the beginning of much needed data oriented education that hopefully will spread.

And also there are important innovations going on in the availability of analytical tools. Google features strongly in this field with its announcement that Google Dataproc, its managed Apache Hadoop and Apache Spark service, is now available to the public. Who other than the world's most prominent exploiter of data would step up to the major challenges of offering powerful user oriented tools for big data analysis using the Hadoop system which has been the core of much big data analytical activity.

We can expect a rash of new product announcements as big data gains in importance for business policy. Check out this article for more.