THE INFLUENCE OF BIG DATA on OPERATIONS.
Our applications and systems are generating more and more data. This is not just user generated data but also data generated by the Applications and Systems themselves. We are talking about logging from applications, systems, networks., etc., as well as metric from applications, systems, and networks. A small application can quickly generate 15 Megs of data-a-day, on a conservative side. (For comparison THE BIBLE is about 3.2 Megs). We as humans can’t read or comprehend this amount of data so when something goes wrong, most will search in the amount of data that they have to find out what is wrong.
Most people learn some of the errors by heart and search for those (90% of all data that is created is not used, ‘generally’ because we always look for the same data, same errors or metrics and ignore the rest, because there is too much data).
When we try to change the way we look at data with dashboards, look at “counters, averages and percentages”, this is a way of reducing the amount of data we have to look at. The side effects of this are 1-Dimension views of the data, and it gives you only the view of the person who created the dashboard. As well, “they” cluster data (which will always lose the fine granularity of the data), this can happen to time elements or ANY other element of DATA, such as loglines, response-times, etc.,
All this happens at a cost, not just missing things or longer search times, but a ‘Monetary Real Cost’, because we will still need to process and store all the data somewhere, regardless if it is used or not.
HOW TO IMPROVE THINGS.
We need to look very carefully at ALL the data that is generated to see what is ‘really’ useful, to reduce the amount of data we create.
In most cases this has already happened, if not you will need to go through “logging and metric” with determination and eliminate everything that doesn’t help, and/or indicates a problem. Most applications will still produce a large amount of data. Then you will need to start looking into automation of your operations.
The speed of response time to issues The influence of developers on their environment, without accountability. Correlation between events from difference sources.
AUTOMATION OF OPERATIONS
The easiest way to start automating your operations is to get applications in place that on basis of simple rules, evaluate the current data and send out notifications if problematic conditions arise. Some applications that could already be in use in your organization, like Elastic Search for logging or Grafana for Metrics, have features built in to do this, these applications are very generic and very easy to setup.
OR you can go one step further and setup ‘specialized’ applications, which are built for this purpose, like Prometheus (this application is specialized in Alerting on Metrics).
The above solutions have a few things in common.
- They will only work on current conditions.
- They work best for one type of data. (Elastic Search works best for logging, Prometheus works only with Time Series).
- No correlation between data sources.
What if that is not enough and you want more?
AIOPS TO THE RESCUE.
Artificial Intelligent and Machine Learning Applications, helps to make sense of your Data. The NEW MONITORING TECHNOLOGIES provide a unified view of “ALL” components of a services, from the Application Code to the Infrastructure. Most of the time if these systems are integrated into the operations environment it is called AIOps Applications.
The definition of AIOPS :
A multi-layered technology platform that automates and enhances IT operations by:
- Using analytics and machine learning to analyze big data collected from various IT operations tools and devices, in order to
- Automatically spot and react to issues in real time.
AIops applications bring together several different disciplines of your operations.
- Service Management
- Performance Management
AIOps Applications can help you find solutions in the following area’s
- Manually management of Applications and Infrastructure.
- Amount of data that is generated by the services from Applications to infrastructure.
- The speed of responses to issueThe speed of response time to issues
What should a good AIOps Application be able to do;
- Read data from a lot of different sources. (You should be able to keep your current data storage solutions and just import the data automatically from these resources.
- Real Time Processing of Streaming Data. (There is a lot of data that you want to process directly from your event busses or applications. This will become more important in the long run because it’s the quickest way to get results).
- Rule Application (Being Able to write rules that the AIOps applications applies to the data).
- Pattern Recognition (find patterns in the Data).
- Custom Algorithms for any IT Business. (Ability to fine tune the application to your needs).
- Machine Learning. (The ML of the system should be able to automatically alter or create new algorithms based on the output of algorithmic analysis and any new data introduced in to the system.
- Artificial Intelligence. (Adapt to unknown environments).
- Automation. (On the OUTCOME of the Rules/ML, it should automatically create and apply a response or improvement for identified issues and situations)
AI VERSUS ML
A small side note, the term AIOps, is somewhat misleading, most of the applications that are now available are “rule and machine learning” based. Real artificial intelligence is not implemented anywhere (with good reason)! Machine learning has been used within the IT World for some time, such as; Large Social Media Firms, applications like Google Maps, Yelp, Waze and extensively used by Online Marketplaces. Any place where there is a need for reliable real time responses, dynamically changing conditions and user customization. There is a buildup knowledge of how these systems work, and confidence in the systems. This is NOT the case for REAL AL systems.
IT Operations Personnel are in general, conservative in the use of “new technologies”. We need to be able to adapt to the fast changing environments. We need to keep pace with new changes and deal with them directly. We need to handle big data and get useful information out of it. We need a good overview of the complex environments we manage and be able to manage application systems with less man- power. To meet these challenges I think we need these complex AIOps tools , or we will not see the trees due to the forest.
Marcel Koert. B.S.E.E.