AIOps
   HOME

TheInfoList



OR:

Artificial Intelligence for IT Operations (AIOps) is a term coined by Gartner in 2016 as an industry category for machine learning analytics technology that enhances IT operations analytics. AIOps is the acronym of "Artificial Intelligence Operations". Such operation tasks include automation, performance monitoring and event correlations among others. There are two main aspects of an AIOps platform:
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
and big data. In order to collect observational data and engagement data that can be found inside a big data platform and requires a shift away from sectionally segregated IT data, a holistic machine learning and analytics strategy is implemented against the combined IT data. The goal is to enable IT transformation, receive continuous insights which provide continuous fixes and improvements via automation. This is why AIOps can be viewed as
CI/CD In software engineering, CI/CD or CICD is the combined practices of continuous integration (CI) and (more often) continuous delivery or (less often) continuous deployment (CD). Comparison * Continuous integration: Frequent merging of severa ...
for core IT functions. Given the inherent nature of IT operations, which is closely tied to cloud deployment and the management of distributed applications, AIOps has increasingly led to the coalescence of
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
and cloud research.


Process

The normalized data is suitable to be processed through machine learning algorithms to automatically reduce noise and identify the probable root cause of incidents. The main output of such stage is the detection of any abnormal behavior from users, devices or applications. Noise reduction can be done by various methods, but most of the research in the field points to the following actions: # Analysis of all incoming alerts; # Remove duplicates; # Identify the false positives; # Early anomaly, fault and failure (AFF) detection and analysis. Anomaly detection - another step in any AIOps process is based on the analysis of past behavior of users, equipment and applications. Anything that strays from that behavior baseline is considered unusual and flagged as abnormal. Root cause determination is usually done by passing incoming alerts through algorithms that take into consideration correlated events as well as topology dependencies. The algorithms on which AI are basing their functioning can be influenced directly, essentially by "training" them.


Use

A very important use of AIOps platforms is related to the analysis of large and unconnected datasets, such as the Johns Hopkins Covid-19's data published through GitHub. The data in this example is pulled from a large number of un-normalized databases - aggregated data (10 sources), US regional data (113 sources) and Non-US data (37 sources), which are unuseable considering the needed emergency response time by the traditional analysis models. Generally, the main areas of use for AIOps platforms and principles areUPC.edu - Top 10 Artificial Intelligence Trends in 2019
/ref> * Automation of tasks (
DevOps DevOps is a set of practices that combines software development (''Dev'') and IT operations (''Ops''). It aims to shorten the systems development life cycle and provide continuous delivery with high software quality. DevOps is complementary t ...
) *
Machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
platforms *
Augmented reality Augmented reality (AR) is an interactive experience that combines the real world and computer-generated content. The content can span multiple sensory Modality (human–computer interaction), modalities, including visual, Hearing, auditory, hap ...
* Agent-based simulations *
Internet of things The Internet of things (IoT) describes physical objects (or groups of such objects) with sensors, processing ability, software and other technologies that connect and exchange data with other devices and systems over the Internet or other com ...
(IoT) * AI Optimized Hardware *
Natural language generation Natural language generation (NLG) is a software process that produces natural language output. In one of the most widely-cited survey of NLG methods, NLG is characterized as "the subfield of artificial intelligence and computational linguistics tha ...
*
Streaming data Streaming data is data that is continuously generated by different sources. Such data should be processed incrementally using stream processing techniques without having access to all of the data. In addition, it should be considered that concept d ...
platforms * Conversational BI and
analytics Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data. It also entails applying data patterns toward effective decision-making. It ...
* Deployment and integration testing *
System configuration A system configuration (SC) in systems engineering defines the computers, processes, and devices that compose the system and its boundary. More generally, the system configuration is the specific definition of the elements that define and/or prescri ...
* Service quality monitoring and
anomaly detection In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority o ...
* Resource scheduling and optimization * Capacity/workload management and prediction * Hardware/software failure prediction * Auto-diagnosis and problem localization *
Incident management An incident is an event that could lead to loss of, or disruption to, an organization's operations, services or functions. Incident management (IcM) is a term describing the activities of an organization to identify, analyze, and correct hazards ...
* Auto service healing *
Data center management Data center management is the collection of tasks performed by those responsible for managing ongoing operation of a data center This includes ''Business service management'' and planning for the future. Historically, ''data center management'' w ...
*
Customer support Customer support is a range of services to assist customers in making cost effective and correct use of a product. It includes assistance in planning, installation, training, troubleshooting, maintenance, upgrading, and disposal of a product. Reg ...
*
Security" \n\n\nsecurity.txt is a proposed standard for websites' security information that is meant to allow security researchers to easily report security vulnerabilities. The standard prescribes a text file called \"security.txt\" in the well known locat ...
*
Privacy Privacy (, ) is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively. The domain of privacy partially overlaps with security, which can include the concepts of a ...


References

{{reflist Artificial intelligence publications