LogMine: Fast Pattern Recognition for Log Analytics

logMine: 日志分析的快速识别模式

ABSTRACT

概要

Modern engineering incorporates smart technologies in all aspects of our lives. Smart technologies are generating terabytes of log messages every day to report their status.

It is crucial to analyze these log messages and present usable information (e.g. patterns) to administrators, so that they can manage and monitor these technologies. Patterns minimally represent large groups of log messages and enable the administrators to do further analysis, such as anomaly detection and event prediction.

Although patterns exist commonly in automated log messages, recognizing them in massive set of log messages from heterogeneous sources without any prior information is a significant undertaking. We propose a method, named LogMine, that extracts high quality patterns for a given set of log messages. Our method is fast, memory efficient, accurate, and scalable.

LogMine is implemented in map-reduce framework for distributed platforms to process millions of log messages in seconds. LogMine is a robust method that works for heterogeneous log messages generated in a wide variety of systems. Our method exploits algorithmic techniques to minimize the computational overhead based on the fact that log messages are always automatically generated. We evaluate the performance of LogMine on massive sets of log messages generated in industrial applications. LogMine has successfully generated patterns which are as good as the patterns generated by exact and unscalable method, while achieving a 500× speedup.

Finally, we describe three applications of the patterns generated by LogMine in monitoring large scale industrial systems.

在我们生活的方方面面，现代工程都融入大量的智能应用。智能应用每天都会生成TB级别的日志，反应运行状态。

快速分析这些应用日志，并且向管理员提供可用信息非常重要，管理员可以管理和监控这些智能应用。模式最低限度的描述了大量的日志信息，使得管理员可以进一步分析日志，例如：异常检测和事件预测。

虽然，模式存在于自动采集的日志中，但是在没有任何先验信息的情况下，对来自很多不同应用的大量日志，识别分类非常有挑战性。我们提出一种称为 logmine的方法，该方法能提取给定日志的高质量模式，我们的logmine方法，是快速、内存高效、准确和可扩展的。

logmine是基于map-reduce 框架实现的，在分布式平台中，可以在数秒时间处理数百万条日志。logmine是一种健壮方法，适用于多个不同应用的日志处理。基于日志总是自动生成、采集的前提，我们将不断优化算法，最小化占用系统资源。我们评估了 logmine 对工业应用中的大量日志处理的性能。一方面，logmine可以生成精确不可缩放方法中一样多的模型，另一方面，logmine的生成速度是普通方法的500多倍。

最后，我们将描述基于生成模式，在监控大规模工业系统中的三个应用。

1. INTRODUCTION

介绍

The Internet of Things (IoT) enables advanced connectivity of computing and embedded devices through internet infrastructure. Although computers and smartphones are the most common devices in IoT, the number of “things” is expected to grow to 50 billion by 2020

IoT involves machine-to-machine communications (M2M), where it is important to continuously monitor connected machines to detect any anomaly or bug, and resolve them quickly to minimize the downtime. Logging is a commonly used mechanism to record machines’ behaviors and various states for maintenance and troubleshooting. An acceptable logging standard is yet to be developed for IoT, most commonly due to the enormous varieties of “things” and their fast evolution over time. Thus, it is extremely challenging to parse and analyze log messages from systems like IoT.

基于互联网基础设施，物联网实现了计算和嵌入式设备之间的关联。虽然计算机和手机是最常见的设备，但是物联网中不仅仅只有它们，预计2020年，将会有500亿个智能设备。

物联网涉及机器间的通信，在通信过程中，持续监控连接的机器，监测它们可能发生的任何异常和报错，并且最快时间的解决异常问题，减少设备停机时间，非常重要。日志记录是一种常用机制，日志记录机器的行为和各种状态，便于后期维护和故障排除。物联网至今还没有制定一个可以接受的日志标准，因为智能设备种类非常多、随时间更新频繁。因此，解析和分析来自物联网系统的大量日志，非常有挑战性。

An automated log analyzer must have one component to recognize patterns from log messages, and another component to match these patterns with the inflow of log messages to identify events and anomalies. Such a log message analyzer must have the following desirable properties:

自动日志分析器必须有一个组件来识别日志消息中的模式，另一个组件将这些组件和日志流进行匹配，用于识别事件和异常。这一类日志识别分析器，需要具备下面的特性：

No-supervision:

The pattern recognizer needs to be working from the scratch without any prior knowledge or human supervision. For a new log message format, the pattern recognizer should not require an input from the administrator.

无监督，模式识别器需要在没有任何先验知识或者人工监督的情况下从头开始，对于新的日志格式，模式识别器不需要管理员的输入。
Heterogeneity:

There can be log messages generated from different applications and systems. Each system may generate log messages in multiple formats. An automated recognizer must find all formats of the log messages irrespective of their origins.

异构性，可以从不同的应用程序和系统生成日志，每个系统都可以生成多种格式的日志，自动识别器需要识别日志的所有格式，而不考虑日志的来源
Efficiency:

IoT-like systems generate millions of log messages every day. The log processing should be done so efficiently that the processing rate is always faster than the log generation rate.

效率，类似物联网的系统，每天生成数百万条日志，日志处理需要高效进行，处理的效率要大于日志产生的速度。
Scalability:

Pattern recognizer must be able to pro- cess massive batches of log messages to maintain a cur- rent set of patterns without incurring CPU and mem- ory bottlenecks.

可扩展性，模式识别器必须能够处理大量的日志，维护当前的模式集，不会产生CPU和内存瓶颈。

Many companies such as Splunk, Sumo Logic, Loggly, LogEntries, etc. offer log analysis tools. Open source packages such as ElasticSearch, Graylog and OSSIM have also been developed to analyze logs. Most of these tools and packages use regular expressions (regex) to match with log messages. These tools assume that the ad- ministrators know how to work with regex, and there are plenty of tools and libraries that support regex. However, these tools do not have the desirable properties mentioned earlier.

By definition, these tools support only supervised matching. Human involvement is clearly non-scalable for heterogeneous and continuously evolving log message formats in systems such as IoT, and it is humanly impossible to parse the sheer number of log entries generated in an hour, let alone days and weeks.

On top of that, writing regex rules is long, frustrating, error-prone, and regex rules may conflict with each other especially for IoT-like systems. Even if a set of regex rules is written, the rate of processing log messages can be slow due to overgeneralized regexes.

许多公司都提供日志分析工具，例如：Splunk, sumo logic, loggly, logEntries 等等。还有很多开源软件，也可以分析日志，例如：ElasticSearch、Graylog 和 OSIM 等。大多数工具和包使用正则表达式与日志消息匹配。这些工具假设管理员知道如何写正则表达式，并且有很多工具和库支持正则表达式。然而，这些工具都不具备前文的特性。

根据定义，这些工具是监督匹配。对于物联网系统中各种不同格式、不断更新的日志，人类参与分类显然是不可行的，从人类角度，不可能把一个小时的日志分类，更不用说一天，几天，几周。

除此之外，编写正则规则，容易导致规则非常长，让人困惑，编写出错等问题。

reading-technical

LogMine: Fast Pattern Recognition for Log Analytics

参考

ABSTRACT

1. INTRODUCTION