CN115729777A

CN115729777A - Monitoring and early warning method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN115729777A
Application number: CN202211320447.8A
Authority: CN
Inventors: 卫鹏; 张宁; 杨春磊; 官艳青
Original assignee: Shanghai Qianzhen Information Technology Co ltd
Current assignee: Shanghai Qianzhen Information Technology Co ltd
Priority date: 2022-10-26
Filing date: 2022-10-26
Publication date: 2023-03-03

Abstract

The application provides a monitoring and early warning method, a monitoring and early warning device, electronic equipment and a computer readable storage medium, which are used for monitoring and early warning a monitored object in a target system, wherein the method comprises the following steps: acquiring an early warning object identifier corresponding to the target system; acquiring an early warning condition corresponding to each monitoring object in the target system, wherein the monitoring object in the target system comprises at least one of an interface, a database and middleware; for each monitoring object, when the monitoring object meets the early warning condition corresponding to the monitoring object, generating early warning information and sending the early warning information to an early warning queue corresponding to the monitoring object; and consuming the early warning queue corresponding to the monitoring object so as to push the early warning information to the early warning object corresponding to the early warning object identifier. By configuring the early warning condition, the monitoring and early warning of each monitored object in the system can be realized without modifying codes.

Description

Monitoring and early warning method, device, electronic equipment, and computer-readable storage medium

技术领域technical field

本申请涉及系统监控预警、物流运输的技术领域，尤其涉及监控预警方法及相关装置。The application relates to the technical field of system monitoring and early warning, logistics transportation, and in particular to a monitoring and early warning method and related devices.

背景技术Background technique

对于包括网点系统在内的各种系统，业务功能如果出现异常，要由功能的使用者向开发人员反馈问题，开发人员都是被动地获悉异常情况。这种被动发现问题的现状主要有以下问题：①功能使用异常，影响用户体验；②开发人员不能实时获取系统运行状态和资源使用情况，导致不能及时发现问题，问题的解决存在滞后现象；③开发人员排查、定位问题困难，解决问题效率低下，影响业务开展。For various systems including the branch system, if there is an abnormality in the business function, the user of the function must report the problem to the developer, and the developer is passively informed of the abnormal situation. The status quo of this passive discovery of problems mainly has the following problems: ① Abnormal use of functions affects user experience; ② Developers cannot obtain real-time system operation status and resource usage, resulting in failure to find problems in time, and there is a lag in problem resolution; ③ Development It is difficult for personnel to check and locate problems, and the efficiency of solving problems is low, which affects business development.

例如，用户发现功能异常后，才会把问题反馈给开发人员，开发人员再根据用户的描述排查、定位、修复问题，反馈偏差和反馈及时性也将影响问题修复效果，用户体验不好。另一方面，一些定时脚本在后台执行和跨系统调用的接口，一旦出现问题，排查困难。此外，还有功能使用的服务器资源(例如Redis、My SQL、MQ)等，开发人员无法实时掌握这些资源的使用情况，当资源占用过多时，可能导致功能的不可用，对业务影响很大。For example, after the user finds that the function is abnormal, the problem will be reported to the developer. The developer will then troubleshoot, locate, and fix the problem according to the user's description. The feedback deviation and the timeliness of the feedback will also affect the problem repair effect, and the user experience is not good. On the other hand, some timing scripts execute in the background and cross-system call interfaces. Once a problem occurs, it is difficult to troubleshoot. In addition, there are server resources used by functions (such as Redis, My SQL, MQ), etc. Developers cannot grasp the usage of these resources in real time. When resources are occupied too much, functions may become unavailable, which has a great impact on business.

基于此，本申请提供了监控预警方法及相关装置，以改进上述现有技术的不足。Based on this, the present application provides a monitoring and early warning method and related devices to improve the above-mentioned deficiencies in the prior art.

发明内容Contents of the invention

本申请的目的在于提供监控预警方法及相关装置，通过配置预警条件，不需要修改代码就可以实现对系统中各监控对象的监控和预警。The purpose of this application is to provide a monitoring and early warning method and related devices. By configuring the early warning conditions, the monitoring and early warning of each monitoring object in the system can be realized without modifying the code.

本申请的目的采用以下技术方案实现：The purpose of this application adopts following technical scheme to realize:

第一方面，本申请提供了一种监控预警方法，用于对目标系统中的监控对象进行监控和预警，所述方法包括：In the first aspect, the present application provides a monitoring and early warning method for monitoring and early warning of monitoring objects in the target system, the method comprising:

获取所述目标系统对应的预警对象标识，所述预警对象标识包括预警对象的数字账号、邮箱地址和电话号码中的至少一种；Obtain an early warning object identifier corresponding to the target system, where the early warning object identifier includes at least one of a digital account number, an email address, and a telephone number of the early warning object;

获取所述目标系统中的每个监控对象对应的预警条件，所述目标系统中的监控对象包括接口、数据库和中间件中的至少一种；Obtaining an early warning condition corresponding to each monitoring object in the target system, where the monitoring object in the target system includes at least one of an interface, a database, and middleware;

针对每个所述监控对象，当所述监控对象满足自身对应的预警条件时，生成预警信息并发送至所述监控对象对应的预警队列；For each of the monitored objects, when the monitored object satisfies its corresponding early warning condition, generate early warning information and send it to the corresponding early warning queue of the monitored object;

消费所述监控对象对应的预警队列，以将所述预警信息推送至所述预警对象标识对应的预警对象。Consuming the warning queue corresponding to the monitoring object, so as to push the warning information to the warning object corresponding to the warning object identifier.

该技术方案的有益效果在于：通过配置预警条件，不需要修改代码就可以实现对系统中各监控对象的监控和预警。The beneficial effect of the technical solution is that by configuring the early warning conditions, the monitoring and early warning of each monitoring object in the system can be realized without modifying codes.

首先，获取目标系统对应的接收预警信息的预警对象的预警对象标识，预警对象例如可以是开发人员的个人用户或者群用户，预警对象标识可以是数字账号(微信账号、QQ账号、钉钉账号、微博账号、支付宝账号等)、邮箱地址、电话号码等；其次，获取目标系统中的每个监控对象对应的预警条件，目标系统中可以有一个或多个监控对象，这些监控对象例如可以是接口、数据库、中间件等，每个监控对象都有其对应的预警条件，一般而言，不同监控对象对应的预警条件是不同的；之后，针对每个监控对象，判断该监控对象是否满足自身对应的预警条件，如果满足，则生成预警信息并发送至该监控对象对应的预警队列，也就是说，针对每个监控对象设置其所对应的预警队列，不同监控对象所对应的预警队列不同；然后，消费各个监控对象对应的预警队列，以将预警队列中的预警信息推送至(预警对象标识对应的)预警对象。First, obtain the warning object identifier of the warning object corresponding to the target system that receives the warning information. Weibo account, Alipay account, etc.), email address, phone number, etc.; secondly, obtain the warning conditions corresponding to each monitoring object in the target system. There can be one or more monitoring objects in the target system, and these monitoring objects can be, for example, Interface, database, middleware, etc., each monitoring object has its corresponding early warning conditions. Generally speaking, the corresponding early warning conditions of different monitoring objects are different; after that, for each monitoring object, it is judged whether the monitoring object satisfies its own If the corresponding warning conditions are met, the warning information will be generated and sent to the warning queue corresponding to the monitoring object, that is, the corresponding warning queue is set for each monitoring object, and the warning queues corresponding to different monitoring objects are different; Then, consume the warning queue corresponding to each monitoring object, so as to push the warning information in the warning queue to the warning object (corresponding to the warning object identifier).

这样做的好处是，针对预警通知领域，通过对接口、数据库、中间件等监控对象进行预警规则(即预警条件)配置，实现系统监控功能和预警信息推送功能。开发人员可以实时掌握功能使用的接口和服务器资源(例如Redis、MySQL、MQ等)的具体情况，当资源占用过多时，能及时预警排查处理，降低功能不可用对业务造成的影响，保障了业务的正常运行。一方面，能够减少功能使用异常影响用户体验的情况；另一方面，开发人员能够实时获取系统运行状态和资源使用情况，及时发现问题和解决问题；又一方面，方便开发人员排查、定位问题，解决问题效率高，减少系统故障对业务开展的影响。The advantage of this is that for the field of early warning notification, the system monitoring function and early warning information push function are realized by configuring early warning rules (ie early warning conditions) for monitoring objects such as interfaces, databases, and middleware. Developers can grasp the specific situation of interfaces and server resources (such as Redis, MySQL, MQ, etc.) used by functions in real time. When resources are occupied too much, they can promptly warn and troubleshoot and deal with them, reducing the impact of unavailable functions on business and ensuring business of normal operation. On the one hand, it can reduce the impact of abnormal function usage on user experience; on the other hand, developers can obtain real-time system operation status and resource usage, and find and solve problems in a timely manner; on the other hand, it is convenient for developers to troubleshoot and locate problems. The efficiency of problem solving is high, and the impact of system failure on business development is reduced.

在一些可选的实施方式中，所述接口对应的预警条件包括以下至少一种：In some optional implementation manners, the warning conditions corresponding to the interface include at least one of the following:

所述接口在目标时间段内的请求失败次数大于预设次数阈值；The number of request failures of the interface within the target time period is greater than a preset number of times threshold;

所述接口在目标时间段内的请求失败次数和总请求次数的比值大于预设比值阈值。The ratio of the number of failed requests of the interface to the total number of requests within the target time period is greater than a preset ratio threshold.

该技术方案的有益效果在于：接口对应的预警条件可以是：在目标时间段内请求失败次数大于预设次数阈值(即请求失败次数过多)，和/或，在目标时间段内请求失败次数和总请求次数的比值大于预设比值阈值(即请求失败次数占比过大)。一方面，目标时间段例如可以采用1分钟、10分钟、30分钟、1小时等不同的时间粒度，即可以根据接口的实际情况设置不同的时间粒度；另一方面，可以针对接口的实际情况，将请求失败次数本身和/或请求失败次数占比作为预警条件，适用范围广。The beneficial effect of this technical solution is that: the warning condition corresponding to the interface can be: the number of request failures within the target time period is greater than the preset number threshold (that is, the number of request failures is too many), and/or, the number of request failures within the target time period The ratio to the total number of requests is greater than the preset ratio threshold (that is, the proportion of request failures is too large). On the one hand, different time granularities such as 1 minute, 10 minutes, 30 minutes, and 1 hour can be used for the target time period, that is, different time granularities can be set according to the actual situation of the interface; on the other hand, according to the actual situation of the interface, Taking the number of request failures itself and/or the proportion of request failures as an early warning condition has a wide range of applications.

在一些可选的实施方式中，获取所述接口在目标时间段内的请求失败次数和总请求次数的过程包括：In some optional implementation manners, the process of obtaining the number of request failures and the total number of requests of the interface within the target time period includes:

响应于针对请求成功条件的配置操作，确定所述接口返回的响应参数对应的请求成功条件；In response to the configuration operation for the request success condition, determine the request success condition corresponding to the response parameter returned by the interface;

利用日志监控脚本，从所述目标系统对应的日志表中获取目标时间段内的接口日志数据，所述接口日志数据用于指示目标时间段内的每次请求对应的所述接口返回的响应参数；Use the log monitoring script to obtain the interface log data in the target time period from the log table corresponding to the target system, and the interface log data is used to indicate the response parameters returned by the interface corresponding to each request in the target time period ;

针对每次请求，检测所述请求对应的所述接口返回的响应参数是否满足所述请求成功条件；如果满足，则将所述接口在目标时间段内的请求成功次数加一；如果不满足，则将所述接口在目标时间段内的请求失败次数加一；For each request, detect whether the response parameter returned by the interface corresponding to the request satisfies the request success condition; if so, add one to the number of successful requests of the interface within the target time period; if not, Add one to the number of request failures of the interface within the target time period;

对所述接口在目标时间段内的请求成功次数和请求失败次数进行求和处理，得到所述接口在目标时间段内的总请求次数。The number of successful requests and the number of failed requests of the interface within the target time period are summed to obtain the total number of requests of the interface within the target time period.

该技术方案的有益效果在于：首先配置接口返回的响应参数对应的请求成功条件；其次利用日志监控脚本从日志表中获取目标时间段内的接口日志数据；之后，针对接口日志数据中的每次请求，检测请求对应的接口返回的响应参数是否满足所配置的请求成功条件，如果满足则将请求成功次数加一，如果不满足则将请求失败次数加一；将请求成功次数和请求失败次数求和得到总请求次数，由此即可得到接口在目标时间段内的请求失败次数和总请求次数。这样做的好处是，能够利用日志监控脚本，从目标系统对应的日志表中获取目标系统对应的接口日志数据，而不需要修改代码，操作简单，容易实现，接口日志数据获取效率高，从整体上提升了监控效率和预警效率；另一方面，分别统计请求成功次数、请求失败次数和总请求次数，统计结果准确度高，这种接口预警方式科学、合理。The beneficial effect of this technical solution is: firstly configure the request success condition corresponding to the response parameter returned by the interface; secondly, use the log monitoring script to obtain the interface log data in the target time period from the log table; Request, check whether the response parameters returned by the interface corresponding to the request meet the configured request success conditions, if yes, add one to the number of successful requests, if not, add one to the number of failed requests; calculate the number of successful requests and the number of failed requests and get the total number of requests, from which you can get the number of request failures and the total number of requests for the interface within the target time period. The advantage of this is that the log monitoring script can be used to obtain the interface log data corresponding to the target system from the log table corresponding to the target system without modifying the code. The operation is simple, easy to implement, and the interface log data acquisition efficiency is high. Overall On the one hand, the monitoring efficiency and early warning efficiency are improved; on the other hand, the number of successful requests, the number of failed requests and the total number of requests are counted separately, and the statistical results are highly accurate. This interface early warning method is scientific and reasonable.

在一些可选的实施方式中，所述利用日志监控脚本，从所述目标系统对应的日志表中获取目标时间段内的接口日志数据，包括：In some optional implementation manners, the use of log monitoring scripts to obtain interface log data within a target time period from a log table corresponding to the target system includes:

以多进程的方式，利用所述日志监控脚本从多个系统对应的日志表中获取每个系统对应的目标时间段内的接口日志数据；In a multi-process manner, using the log monitoring script to obtain the interface log data corresponding to each system in the target time period from log tables corresponding to multiple systems;

其中，所述目标系统是多个系统的其中一个，不同系统对应的目标时间段的时长相同或不同。Wherein, the target system is one of multiple systems, and the durations of the target time periods corresponding to different systems are the same or different.

该技术方案的有益效果在于：当多个系统都需要监控和预警时，可以以多进程的方式，利用同一个日志监控脚本从各个系统对应的日志表中分别获取每个系统对应的目标时间段内的接口日志数据，接口日志数据获取效率高；另外，针对不同系统，可以设置相同或不同的目标时间段，也就是说，不同系统所对应的目标时间段的时长可以是相同或不同的，不同系统所对应的目标时间段的起始时刻、结束时刻也可以是相同或不同的。这样做的好处是，基于各系统的实际情况，以相同或不同的时间粒度分别监控不同系统，能够满足实际应用中的性能需求和成本需求。多个系统例如可以包括物流运输领域中的网点系统、订单系统、供应链系统等。The beneficial effect of this technical solution is that: when multiple systems need monitoring and early warning, the same log monitoring script can be used to obtain the target time period corresponding to each system from the log table corresponding to each system in a multi-process manner Interface log data within the interface, the acquisition efficiency of interface log data is high; in addition, for different systems, the same or different target time periods can be set, that is to say, the duration of the target time periods corresponding to different systems can be the same or different, The start time and end time of the target time periods corresponding to different systems may also be the same or different. The advantage of this is that based on the actual situation of each system, different systems are monitored at the same or different time granularity, which can meet the performance requirements and cost requirements in practical applications. The multiple systems may include, for example, an outlet system, an order system, and a supply chain system in the field of logistics and transportation.

在一些可选的实施方式中，所述数据库对应的预警条件包括以下至少一种：In some optional implementation manners, the warning conditions corresponding to the database include at least one of the following:

所述数据库中的慢查询的查询时长大于预设时长阈值；The query duration of the slow query in the database is greater than a preset duration threshold;

所述数据库中的线程数大于预设数量阈值。The number of threads in the database is greater than a preset number threshold.

该技术方案的有益效果在于：对于数据库来说，如果慢查询的查询时长过长，则会极大占用数据库资源，另外，如果线程数过多，也会极大占用数据库资源，因此，针对数据库所配置的预警条件需要考虑到慢查询的查询时长和线程数的至少一者。这样做的好处是，能够在慢查询的查询时长过长和/或线程数过多时，及时生成预警信息，以推送至对应的开发人员，方便开发人员及时查看数据库的实际情况。The beneficial effect of this technical solution is: for the database, if the query time of the slow query is too long, it will greatly occupy the database resources. In addition, if the number of threads is too large, it will also greatly occupy the database resources. Therefore, for the database The configured early warning condition needs to take into account at least one of the query duration and the number of threads of the slow query. The advantage of this is that when the query time of the slow query is too long and/or the number of threads is too large, early warning information can be generated in time and pushed to the corresponding developer, so that the developer can check the actual situation of the database in a timely manner.

在一些可选的实施方式中，所述中间件对应的预警条件包括以下至少一种：In some optional implementation manners, the warning conditions corresponding to the middleware include at least one of the following:

所述中间件的队列长度大于所述中间件对应的预设长度阈值。The queue length of the middleware is greater than the preset length threshold corresponding to the middleware.

该技术方案的有益效果在于：针对每个中间件，配置其所对应的预设长度阈值，当该中间件的队列长度大于其所对应的预设长度阈值时，表明该中间件的队列长度过长，此时可以生成预警信息以推送至开发人员，方便开发人员及时查看中间件的实际情况，及时干预，避免中间件在较长时间段内处于消费缓慢甚至消费停滞的情况。另外，不同中间件所对应的预设长度阈值可以相同或不同。The beneficial effect of the technical solution is: for each middleware, its corresponding preset length threshold is configured, and when the queue length of the middleware is greater than the corresponding preset length threshold, it indicates that the queue length of the middleware is too long. At this time, early warning information can be generated and pushed to developers, which is convenient for developers to check the actual situation of middleware in time and intervene in time to avoid slow or even stagnant consumption of middleware for a long period of time. In addition, the preset length thresholds corresponding to different middleware may be the same or different.

在一些可选的实施方式中，所述中间件包括RabbitMQ中间件和/或Redis中间件。In some optional implementation manners, the middleware includes RabbitMQ middleware and/or Redis middleware.

该技术方案的有益效果在于：本申请中的监控预警方法所适用的中间件包括RabbitMQ中间件和/或Redis中间件。针对RabbitMQ中间件和Redis中间件，所对应的预设长度阈值可以是相同的，也可以是不同的，例如RabbitMQ中间件对应的预设长度阈值可以是3000、5000、8000等，Redis中间件对应的预设长度阈值可以是90、100、110等。The beneficial effect of this technical solution is that: the middleware applicable to the monitoring and early warning method in this application includes RabbitMQ middleware and/or Redis middleware. For RabbitMQ middleware and Redis middleware, the corresponding preset length thresholds can be the same or different. For example, the preset length thresholds corresponding to RabbitMQ middleware can be 3000, 5000, 8000, etc., and Redis middleware corresponds to The preset length threshold can be 90, 100, 110, etc.

第二方面，本申请提供了一种监控预警装置，用于对目标系统中的监控对象进行监控和预警，所述装置包括：In the second aspect, the present application provides a monitoring and early warning device for monitoring and early warning of monitoring objects in the target system, and the device includes:

标识获取模块，用于获取所述目标系统对应的预警对象标识，所述预警对象标识包括预警对象的数字账号、邮箱地址和电话号码中的至少一种；An identification acquisition module, configured to acquire an early warning object identification corresponding to the target system, where the early warning object identification includes at least one of a digital account number, an email address and a telephone number of the early warning object;

条件获取模块，用于获取所述目标系统中的每个监控对象对应的预警条件，所述目标系统中的监控对象包括接口、数据库和中间件中的至少一种；A condition acquisition module, configured to acquire an early warning condition corresponding to each monitoring object in the target system, where the monitoring object in the target system includes at least one of an interface, a database, and a middleware;

监控模块，用于针对每个所述监控对象，当所述监控对象满足自身对应的预警条件时，生成预警信息并发送至所述监控对象对应的预警队列；A monitoring module, for each of the monitored objects, when the monitored object meets its own corresponding early warning conditions, generate early warning information and send it to the corresponding early warning queue of the monitored object;

预警模块，用于消费所述监控对象对应的预警队列，以将所述预警信息推送至所述预警对象标识对应的预警对象。The early warning module is configured to consume the early warning queue corresponding to the monitoring object, so as to push the early warning information to the early warning object corresponding to the early warning object identifier.

在一些可选的实施方式中，获取目标时间段内的接口日志数据的过程包括：In some optional implementation manners, the process of obtaining interface log data within a target time period includes:

第三方面，本申请提供了一种电子设备，所述电子设备包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现上述任一项方法的步骤。In a third aspect, the present application provides an electronic device, the electronic device includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of any one of the above methods when executing the computer program.

第四方面，本申请提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现上述任一项方法的步骤。In a fourth aspect, the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps of any one of the above-mentioned methods are implemented.

附图说明Description of drawings

下面结合附图和实施方式对本申请进一步说明。The present application will be further described below in conjunction with the accompanying drawings and embodiments.

图1示出了本申请实施例提供的一种监控预警方法的流程示意图。Fig. 1 shows a schematic flowchart of a monitoring and early warning method provided by an embodiment of the present application.

图2示出了本申请实施例提供的一种获取请求失败次数和总请求次数的流程示意图。FIG. 2 shows a schematic flowchart of obtaining the number of request failures and the total number of requests provided by the embodiment of the present application.

图3示出了本申请实施例提供的另一种监控预警方法的流程示意图。Fig. 3 shows a schematic flowchart of another monitoring and early warning method provided by the embodiment of the present application.

图4示出了本申请实施例提供的一种监控预警装置的结构示意图。Fig. 4 shows a schematic structural diagram of a monitoring and early warning device provided by an embodiment of the present application.

图5示出了本申请实施例提供的一种电子设备的结构框图。FIG. 5 shows a structural block diagram of an electronic device provided by an embodiment of the present application.

图6示出了本申请实施例提供的一种程序产品的结构示意图。。FIG. 6 shows a schematic structural diagram of a program product provided by an embodiment of the present application. .

具体实施方式Detailed ways

下面将结合本申请的说明书附图以及具体实施方式，对本申请中的技术方案进行描述，需要说明的是，在不相冲突的前提下，以下描述的各实施方式之间或各技术特征之间可以任意组合形成新的实施方式。The following will describe the technical solutions in this application in conjunction with the description, drawings and specific implementation methods of the application. Any combination forms a new embodiment.

在本申请中，“至少一个”是指一个或者多个，“多个”是指两个或两个以上。“和/或”，描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B的情况，其中A，B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达，是指的这些项中的任意组合，包括单项(个)或复数项(个)的任意组合。例如，a，b或c中的至少一项(个)，可以表示：a，b，c，a和b，a和c，b和c，a和b和c，其中a、b和c可以是单个，也可以是多个。值得注意的是，“至少一项(个)”还可以解释成“一项(个)或多项(个)”。In this application, "at least one" means one or more, and "multiple" means two or more. "And/or" describes the association relationship of associated objects, indicating that there may be three types of relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and B exists alone, where A, B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b or c can represent: a, b, c, a and b, a and c, b and c, a and b and c, wherein a, b and c can be It can be single or multiple. It should be noted that "at least one item (item)" can also be interpreted as "one item (item) or multiple items (item)".

还需说明的是，本申请中，“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或者“例如”的任何实施方式或设计方案不应被解释为比其他实施方式或设计方案更优选或更具优势。确切而言，使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。It should also be noted that in this application, words such as "exemplary" or "for example" are used to represent examples, illustrations or illustrations. Any implementation or design described herein as "exemplary" or "for example" should not be construed as being preferred or advantageous over other implementations or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete manner.

方法实施例method embodiment

参见图1，图1示出了本申请实施例提供的一种监控预警方法的流程示意图。Referring to FIG. 1 , FIG. 1 shows a schematic flowchart of a monitoring and early warning method provided by an embodiment of the present application.

本申请实施例提供了一种监控预警方法，用于对目标系统中的监控对象进行监控和预警，所述方法包括：The embodiment of the present application provides a monitoring and early warning method for monitoring and early warning of monitoring objects in the target system, the method comprising:

步骤S101：获取所述目标系统对应的预警对象标识，所述预警对象标识包括预警对象的数字账号、邮箱地址和电话号码中的至少一种；Step S101: Obtain the warning object identifier corresponding to the target system, and the warning object identifier includes at least one of the digital account number, email address and telephone number of the warning object;

步骤S102：获取所述目标系统中的每个监控对象对应的预警条件，所述目标系统中的监控对象包括接口、数据库和中间件中的至少一种；Step S102: Obtain an early warning condition corresponding to each monitoring object in the target system, where the monitoring object in the target system includes at least one of an interface, a database, and a middleware;

步骤S103：针对每个所述监控对象，当所述监控对象满足自身对应的预警条件时，生成预警信息并发送至所述监控对象对应的预警队列；Step S103: For each monitored object, when the monitored object meets its corresponding early warning condition, generate early warning information and send it to the corresponding early warning queue of the monitored object;

步骤S104：消费所述监控对象对应的预警队列，以将所述预警信息推送至所述预警对象标识对应的预警对象。Step S104: Consume the warning queue corresponding to the monitoring object, so as to push the warning information to the warning object corresponding to the warning object identifier.

本申请实施例对预警对象不作限定，其例如可以是开发人员的个人用户或者群用户。The embodiment of the present application does not limit the warning object, which may be, for example, an individual user or a group user of a developer.

本申请实施例对数字账号不作限定，其例如可以包括微信账号、QQ账号、钉钉账号、微博账号和支付宝账号中的一种或多种。The embodiment of the present application does not limit the digital account, which may include, for example, one or more of WeChat account, QQ account, DingTalk account, Weibo account and Alipay account.

本申请实施例对目标系统不作限定，其例如可以是物流运输领域中的网点系统、订单系统、供应链系统等。The embodiment of the present application does not limit the target system, which may be, for example, an outlet system, an order system, a supply chain system, etc. in the field of logistics and transportation.

作为一个示例，目标系统是网点系统，该网点系统对应的预警对象标识是网点系统对应的开发人员的钉钉群账号(即钉钉群号)。As an example, the target system is an outlet system, and the warning object identifier corresponding to the outlet system is the DingTalk group account (ie, the DingTalk group number) of the developer corresponding to the outlet system.

作为另一个示例，目标系统是订单系统，该订单系统对应的预警对象标识是订单系统对应的开发人员的个人邮箱地址。As another example, the target system is an order system, and the warning object identifier corresponding to the order system is the personal email address of the developer corresponding to the order system.

本申请实施例对目标系统所对应的监控对象的种类以及每种监控对象的数量不作限定，监控对象的种类可以是一种或多种，每种监控对象的数量可以是一个或多个。The embodiment of the present application does not limit the types of monitoring objects corresponding to the target system and the quantity of each type of monitoring objects. There may be one or more types of monitoring objects, and one or more types of monitoring objects may be used.

本申请实施例中，获取所述目标系统中的每个监控对象对应的预警条件的方式可以是人工配置或者智能配置等。当采用人工配置时，例如可以人工手动将各监控对象对应的目标系统和预警条件配置到该监控对象对应的预警配置菜单中的各选项。当采用智能配置时，可以读取系统默认的缺省值以填充预警配置菜单中的各选项。In the embodiment of the present application, the manner of obtaining the warning condition corresponding to each monitored object in the target system may be manual configuration or intelligent configuration. When manual configuration is adopted, for example, the target system and warning condition corresponding to each monitoring object can be manually configured to each option in the warning configuration menu corresponding to the monitoring object. When intelligent configuration is adopted, the default values of the system can be read to fill the options in the alarm configuration menu.

作为一个示例，目标系统是网点系统，网点系统中的监控对象包括接口、数据库、RabbitMQ中间件和Redis中间件，此时监控对象的数量是4个，每个监控对象都有其所对应的预警队列，则该网点系统对应的预警队列的数量也是4个，即接口预警队列、数据库预警队列、RabbitMQ中间件预警队列和Redis中间件预警队列。例如，针对Redis中间件，当Redis中间件满足自身对应的预警条件时，生成预警信息并发送至Redis中间件预警队列；消费Redis中间件预警队列，将预警信息推送至网点系统对应的开发人员所在的钉钉群用户。As an example, the target system is a network system. The monitoring objects in the network system include interfaces, databases, RabbitMQ middleware, and Redis middleware. At this time, the number of monitoring objects is 4, and each monitoring object has its corresponding warning. Queues, the number of early warning queues corresponding to the network point system is also four, namely the interface early warning queue, database early warning queue, RabbitMQ middleware early warning queue and Redis middleware early warning queue. For example, for Redis middleware, when the Redis middleware meets its own corresponding warning conditions, it generates warning information and sends it to the Redis middleware warning queue; consumes the Redis middleware warning queue, and pushes the warning information to the corresponding developer of the network point system. DingTalk group users.

其中，中间件是指不同应用程序用于相互通信的软件，是为应用提供通用服务和功能的软件。数据管理、应用服务、消息传递、身份验证和API管理通常都要通过中间件。Among them, the middleware refers to software used by different application programs to communicate with each other, and is software that provides general services and functions for applications. Data management, application serving, messaging, authentication, and API management often go through middleware.

本申请实施例中，推送预警信息的方式例如是短信推送、邮件推送、应用内推送、电话通知等，应用例如是钉钉APP、企业微信APP、小程序等。In the embodiment of the present application, the way of pushing the early warning information is, for example, SMS push, email push, in-app push, phone notification, etc., and the application is, for example, DingTalk APP, Enterprise WeChat APP, applet, etc.

由此，通过配置预警条件，不需要修改代码就可以实现对系统中各监控对象的监控和预警。Therefore, by configuring the early warning conditions, the monitoring and early warning of each monitoring object in the system can be realized without modifying the code.

MySQL是一个关系型数据库管理系统，应用广泛。MySQL is a relational database management system that is widely used.

MQ(Message Queue)消息队列，是基础数据结构中“先进先出”的一种数据结构，一般用来解决应用解耦、异步消息、流量削峰等问题，实现高性能、高可用、可伸缩和最终一致性架构。MQ (Message Queue) message queue is a "first in, first out" data structure in the basic data structure. It is generally used to solve problems such as application decoupling, asynchronous messages, and traffic peak clipping to achieve high performance, high availability, and scalability. and eventually consistent architecture.

由此，接口对应的预警条件可以是：在目标时间段内请求失败次数大于预设次数阈值(即请求失败次数过多)，和/或，在目标时间段内请求失败次数和总请求次数的比值大于预设比值阈值(即请求失败次数占比过大)。一方面，目标时间段例如可以采用1分钟、10分钟、30分钟、1小时等不同的时间粒度，即可以根据接口的实际情况设置不同的时间粒度；另一方面，可以针对接口的实际情况，将请求失败次数本身和/或请求失败次数占比作为预警条件，适用范围广。Therefore, the early warning condition corresponding to the interface may be: the number of request failures within the target time period is greater than the preset number of thresholds (that is, the number of request failures is too many), and/or, the number of request failures and the total number of requests within the target time period The ratio is greater than the preset ratio threshold (that is, the proportion of request failures is too large). On the one hand, different time granularities such as 1 minute, 10 minutes, 30 minutes, and 1 hour can be used for the target time period, that is, different time granularities can be set according to the actual situation of the interface; on the other hand, according to the actual situation of the interface, Taking the number of request failures itself and/or the proportion of request failures as an early warning condition has a wide range of applications.

其中，基于不同的时间，可以将时间粒度看作不同时间单位的时间长度。时间单位是测量时间所用的基本单位，从大到小排列分别为千年、世纪、年代、年、季度、月、旬、星期、日、时辰、小时、刻、字(福建和广东地区用法)、分、秒、毫秒(ms)、微秒(μs)、奈秒(ns)、皮秒(ps)、飞秒(fs)、阿秒(as)、仄秒(zs)。在实际应用中，时间粒度可以是每30秒钟、每1分钟、每5分钟、每1小时、每1周、每1月、每1季度、每1年等。Wherein, based on different times, time granularity may be regarded as time lengths of different time units. The unit of time is the basic unit used to measure time, arranged from large to small are millennium, century, year, year, quarter, month, ten days, week, day, hour, hour, engraving, character (used in Fujian and Guangdong), Minutes, seconds, milliseconds (ms), microseconds (μs), nanoseconds (ns), picoseconds (ps), femtoseconds (fs), attoseconds (as), zetoseconds (zs). In practical applications, the time granularity may be every 30 seconds, every 1 minute, every 5 minutes, every 1 hour, every 1 week, every 1 month, every 1 quarter, every 1 year, etc.

本申请实施例对目标时间段的时长不作限定，其例如可以是10秒钟、15秒钟、30秒钟、1分钟、5分钟、10分钟、30分钟、1小时、1周、1月、1季度、1年等。The embodiment of the present application does not limit the length of the target time period, which can be, for example, 10 seconds, 15 seconds, 30 seconds, 1 minute, 5 minutes, 10 minutes, 30 minutes, 1 hour, 1 week, 1 month, 1 quarter, 1 year, etc.

作为一个示例，目标时间段的时间粒度为1分钟，预设次数阈值是10次，如果接口在1分钟内的请求失败次数是11，则判断该接口满足自身对应的预警条件。As an example, the time granularity of the target time period is 1 minute, and the preset number of times threshold is 10 times. If the number of request failures of the interface within 1 minute is 11, it is determined that the interface satisfies its corresponding warning condition.

作为另一个示例，目标时间段的时间粒度为5分钟，预设比值阈值是10％，如果接口在5分钟内的请求失败次数是11次，请求成功次数是89次，则总请求次数是100次，请求失败次数和总请求次数的比值是11％，判断该接口满足自身对应的预警条件。As another example, the time granularity of the target time period is 5 minutes, and the preset ratio threshold is 10%. If the interface has 11 failed requests and 89 successful requests within 5 minutes, the total number of requests is 100. times, the ratio of the number of request failures to the total number of requests is 11%, and it is judged that the interface meets its own corresponding warning conditions.

参见图2，图2示出了本申请实施例提供的一种获取请求失败次数和总请求次数的流程示意图。Referring to FIG. 2 , FIG. 2 shows a schematic flowchart of obtaining the number of request failures and the total number of requests provided by the embodiment of the present application.

步骤S201：响应于针对请求成功条件的配置操作，确定所述接口返回的响应参数对应的请求成功条件；Step S201: In response to the configuration operation for the request success condition, determine the request success condition corresponding to the response parameter returned by the interface;

步骤S202：利用日志监控脚本，从所述目标系统对应的日志表中获取目标时间段内的接口日志数据，所述接口日志数据用于指示目标时间段内的每次请求对应的所述接口返回的响应参数；Step S202: Use the log monitoring script to obtain the interface log data within the target time period from the log table corresponding to the target system, and the interface log data is used to indicate that the interface returns corresponding to each request within the target time period The response parameter;

步骤S203：针对每次请求，检测所述请求对应的所述接口返回的响应参数是否满足所述请求成功条件；如果满足，则将所述接口在目标时间段内的请求成功次数加一；如果不满足，则将所述接口在目标时间段内的请求失败次数加一；Step S203: For each request, detect whether the response parameter returned by the interface corresponding to the request satisfies the request success condition; if so, add one to the number of successful requests of the interface within the target time period; if If it is not satisfied, add one to the number of request failures of the interface within the target time period;

步骤S204：对所述接口在目标时间段内的请求成功次数和请求失败次数进行求和处理，得到所述接口在目标时间段内的总请求次数。Step S204: Summing the number of successful requests and the number of failed requests of the interface within the target time period to obtain the total number of requests of the interface within the target time period.

本申请实施例中，接口返回的响应参数的数量可以是一个或多个。In this embodiment of the application, the number of response parameters returned by the interface may be one or more.

作为一个示例，接口返回的响应参数是"is_success"和"response_code"，请求成功条件是"is_success"是"1"且"response_code"是"0000"。请求成功条件的格式例如可以表示为{"is_success":"1","condition":"and","response_code":"0000"}。当一个请求对应的接口返回的响应参数是{"is_success":"1","response_code":"0000"}时，判断该请求对应的接口返回的响应参数满足请求成功条件；当一个请求对应的接口返回的响应参数是{"is_success":"0","response_code":"0000"}时，判断该请求对应的接口返回的响应参数不满足请求成功条件。As an example, the response parameters returned by the interface are "is_success" and "response_code", and the request success condition is that "is_success" is "1" and "response_code" is "0000". The format of the request success condition can be expressed as {"is_success":"1","condition":"and","response_code":"0000"}, for example. When the response parameter returned by the interface corresponding to a request is {"is_success": "1","response_code":"0000"}, it is judged that the response parameter returned by the interface corresponding to the request satisfies the request success condition; when a request corresponds to When the response parameter returned by the interface is {"is_success":"0","response_code":"0000"}, it is judged that the response parameter returned by the interface corresponding to the request does not meet the request success condition.

作为另一个示例，接口返回的响应参数是"is_success"，请求成功条件是"is_success"是"1"。请求成功条件的格式例如可以表示为{"is_success":"1"}。当一个请求对应的接口返回的响应参数是{"is_success":"1"}时，判断该请求对应的接口返回的响应参数满足请求成功条件。As another example, the response parameter returned by the interface is "is_success", and the request success condition is that "is_success" is "1". The format of the request success condition can be represented as {"is_success":"1"}, for example. When the response parameter returned by the interface corresponding to a request is {"is_success": "1"}, it is judged that the response parameter returned by the interface corresponding to the request meets the request success condition.

作为又一个示例，接口返回的响应参数是"is_success"和"response_code"，请求成功条件是"is_success"是"1"或"response_code"是"0000"。请求成功条件的格式例如可以表示为{"is_success":"1","condition":"or","response_code":"0000"}。当一个请求对应的接口返回的响应参数是{"is_success":"0","response_code":"0000"}时，判断该请求对应的接口返回的响应参数满足请求成功条件。As yet another example, the response parameters returned by the interface are "is_success" and "response_code", and the request success condition is that "is_success" is "1" or "response_code" is "0000". The format of the request success condition can be expressed as {"is_success":"1","condition":"or","response_code":"0000"}, for example. When the response parameter returned by the interface corresponding to a request is {"is_success":"0","response_code":"0000"}, it is judged that the response parameter returned by the interface corresponding to the request meets the request success condition.

由此，首先配置接口返回的响应参数对应的请求成功条件；其次利用日志监控脚本从日志表中获取目标时间段内的接口日志数据；之后，针对接口日志数据中的每次请求，检测请求对应的接口返回的响应参数是否满足所配置的请求成功条件，如果满足则将请求成功次数加一，如果不满足则将请求失败次数加一；将请求成功次数和请求失败次数求和得到总请求次数，由此即可得到接口在目标时间段内的请求失败次数和总请求次数。这样做的好处是，能够利用日志监控脚本，从目标系统对应的日志表中获取目标系统对应的接口日志数据，而不需要修改代码，操作简单，容易实现，接口日志数据获取效率高，从整体上提升了监控效率和预警效率；另一方面，分别统计请求成功次数、请求失败次数和总请求次数，统计结果准确度高，这种接口预警方式科学、合理。Therefore, first configure the request success conditions corresponding to the response parameters returned by the interface; secondly, use the log monitoring script to obtain the interface log data within the target time period from the log table; then, for each request in the interface log data, detect the corresponding Whether the response parameters returned by the interface meet the configured request success conditions, if yes, add one to the number of request successes, and if not, add one to the number of request failures; sum the number of successful requests and the number of failed requests to get the total number of requests , so that the number of request failures and the total number of requests of the interface within the target time period can be obtained. The advantage of this is that the log monitoring script can be used to obtain the interface log data corresponding to the target system from the log table corresponding to the target system without modifying the code. The operation is simple, easy to implement, and the interface log data acquisition efficiency is high. Overall On the one hand, the monitoring efficiency and early warning efficiency are improved; on the other hand, the number of successful requests, the number of failed requests and the total number of requests are counted separately, and the statistical results are highly accurate. This interface early warning method is scientific and reasonable.

在一些可选的实施方式中，所述利用日志监控脚本，从所述目标系统对应的日志表中获取目标时间段内的接口日志数据(即步骤S202)，包括：In some optional implementation manners, the use of the log monitoring script to obtain the interface log data within the target time period from the log table corresponding to the target system (that is, step S202), includes:

本申请实施例中，多个系统例如可以包括物流运输领域中的网点系统、订单系统、供应链系统等。作为一个示例，网点系统对应的目标时间段的时长是1分钟，订单系统对应的目标时间段的时长是30秒钟，供应链系统对应的目标时间段的时长是5分钟。In the embodiment of the present application, the multiple systems may include, for example, an outlet system, an order system, and a supply chain system in the field of logistics and transportation. As an example, the target time period corresponding to the outlet system is 1 minute, the target time period corresponding to the order system is 30 seconds, and the target time period corresponding to the supply chain system is 5 minutes.

由此，当多个系统都需要监控和预警时，可以以多进程的方式，利用同一个日志监控脚本从各个系统对应的日志表中分别获取每个系统对应的目标时间段内的接口日志数据，接口日志数据获取效率高；另外，针对不同系统，可以设置相同或不同的目标时间段，也就是说，不同系统所对应的目标时间段的时长可以是相同或不同的，不同系统所对应的目标时间段的起始时刻、结束时刻也可以是相同或不同的。这样做的好处是，基于各系统的实际情况，以相同或不同的时间粒度分别监控不同系统，能够满足实际应用中的性能需求和成本需求。Therefore, when multiple systems need monitoring and early warning, the same log monitoring script can be used to obtain the interface log data corresponding to each system in the target time period from the log table corresponding to each system in a multi-process manner , the interface log data acquisition efficiency is high; in addition, for different systems, the same or different target time periods can be set, that is to say, the target time periods corresponding to different systems can be the same or different, and the corresponding The start time and end time of the target time period may also be the same or different. The advantage of this is that based on the actual situation of each system, different systems are monitored at the same or different time granularity, which can meet the performance requirements and cost requirements in practical applications.

其中，进程是对运行时程序的封装，是系统进行资源调度和分配的基本单位，实现了操作系统的并发；线程是进程的子任务，是CPU调度和分派的基本单位，用于保证程序的实时性，实现进程内部的并发；线程是操作系统可识别的最小执行和调度单位。每个线程都独自占用一个虚拟处理器：独自的寄存器组，指令计数器和处理器状态。中央处理器(central processing unit，简称CPU)作为计算机系统的运算和控制核心，是信息处理、程序运行的最终执行单元。Among them, a process is the encapsulation of the runtime program, and it is the basic unit for resource scheduling and allocation of the system, which realizes the concurrency of the operating system; a thread is a subtask of the process, and is the basic unit of CPU scheduling and dispatching, which is used to ensure the operation of the program. Real-time, realizing concurrency within the process; thread is the smallest execution and scheduling unit recognizable by the operating system. Each thread has its own virtual processor: its own register set, instruction counter, and processor state. The central processing unit (CPU for short) is the computing and control core of the computer system and the final execution unit for information processing and program operation.

在一些可选的实施方式中，获取每个系统对应的目标时间段的时长的过程包括：In some optional implementation manners, the process of obtaining the duration of the target time period corresponding to each system includes:

响应于针对所述系统的目标时间段的时长的配置操作，确定所述系统对应的目标时间段的时长。In response to the configuration operation for the duration of the target time period of the system, the duration of the target time period corresponding to the system is determined.

在另一些可选的实施方式中，获取每个系统对应的目标时间段的时长的过程包括：In some other optional implementation manners, the process of obtaining the duration of the target time period corresponding to each system includes:

将所述系统的故障数据输入至时长配置模型，以得到所述系统对应的目标时间段的时长，所述系统的故障数据包括故障序号、故障类型和故障开始时刻。The fault data of the system is input into the duration configuration model to obtain the duration of the target time period corresponding to the system, and the fault data of the system includes fault sequence number, fault type and fault start time.

故障序号例如可以采用中文、字母、数字、符号中的一种或多种来表示，故障类型例如可以采用中文、字母、数字、符号中的一种或多种来表示，故障开始时刻例如可以采用北京时间、格林威治时间或者时间戳来表示。The fault sequence number can be represented by one or more of Chinese characters, letters, numbers, and symbols, the fault type can be represented by one or more of Chinese characters, letters, numbers, and symbols, and the fault start time can be represented by, for example, Beijing time, Greenwich mean time or timestamp.

故障类型例如可以包括接口响应故障、数据库慢查询故障、数据库进程故障、RabbitMQ中间件故障、Redis中间件故障等。The fault types may include, for example, interface response faults, database slow query faults, database process faults, RabbitMQ middleware faults, Redis middleware faults, and the like.

其中，时长配置模型的训练过程例如可以包括：Wherein, the training process of the duration configuration model may include, for example:

获取训练集，所述训练集包括多个训练数据，每个所述训练数据包括一个样本系统的故障数据以及所述样本系统对应的目标时间段的时长的标注数据，所述样本系统的故障数据包括故障序号、故障类型和故障开始时刻；Obtain a training set, the training set includes a plurality of training data, each of the training data includes the fault data of a sample system and the label data of the duration of the target time period corresponding to the sample system, and the fault data of the sample system Including fault sequence number, fault type and fault start time;

针对所述训练集中的每个训练数据，执行以下处理：For each training data in the training set, perform the following processing:

将所述训练数据中的样本系统的故障数据输入至预设的深度学习模型，以得到所述样本系统对应的目标时间段的时长的预测数据；Inputting the fault data of the sample system in the training data into a preset deep learning model to obtain the prediction data of the duration of the target time period corresponding to the sample system;

基于所述样本系统对应的目标时间段的时长的预测数据和标注数据，对所述深度学习模型的模型参数进行更新；updating the model parameters of the deep learning model based on the prediction data and labeling data of the duration of the target time period corresponding to the sample system;

检测是否满足预设的训练结束条件；如果是，则将训练出的所述深度学习模型作为所述时长配置模型；如果否，则利用下一个所述训练数据继续训练所述深度学习模型。Detect whether the preset training end condition is met; if yes, use the trained deep learning model as the duration configuration model; if not, use the next training data to continue training the deep learning model.

由此，通过设计，建立适量的神经元计算节点和多层运算层次结构，选择合适的输入层和输出层，就可以得到预设的深度学习模型，通过深度学习模型的学习和调优，建立起从输入到输出的函数关系，虽然不能100％找到输入与输出的函数关系，但是可以尽可能地逼近现实的关联关系，由此训练得到的时长配置模型，可以基于输入数据获取对应的输出数据，适用范围广，且计算结果准确性高、可靠性高。Therefore, by designing and establishing an appropriate amount of neuron computing nodes and a multi-layer computing hierarchy, and selecting an appropriate input layer and output layer, a preset deep learning model can be obtained. Through the learning and tuning of the deep learning model, the establishment of Starting from the functional relationship from input to output, although the functional relationship between input and output cannot be found 100%, it can approach the actual correlation as much as possible. The time configuration model obtained from this training can obtain the corresponding output data based on the input data , with a wide range of applications, and the calculation results are highly accurate and reliable.

使用样本系统的故障数据对深度学习模型进行训练，能够只通过学习少量样本就可以进行快速建模，深度学习模型在不断的训练过程中训练误差会逐渐降低，深度学习模型可以保存最优的权重，并读取权重；记录下训练集和验证集的精度，便于调参(调整模型参数)；对深度学习模型的模型参数进行更新，可以使模型更好地拟合数据，具有有效的泛化能力，提高鲁棒性和拟合精度。Using the fault data of the sample system to train the deep learning model can quickly model by learning a small number of samples. The training error of the deep learning model will gradually decrease during the continuous training process, and the deep learning model can save the optimal weight. , and read the weights; record the accuracy of the training set and verification set, which is convenient for parameter adjustment (adjusting model parameters); updating the model parameters of the deep learning model can make the model better fit the data and have effective generalization Ability to improve robustness and fitting accuracy.

在一些可选的实施方式中，本申请实施例可以训练得到时长配置模型，在另一些可选的实施方式中，本申请可以采用预先训练好的时长配置模型。In some optional implementation manners, the embodiment of the present application may train a duration configuration model, and in other optional implementation manners, the present application may use a pre-trained duration configuration model.

在一些可选的实施方式中，可以对历史数据进行数据挖掘，以获取训练集中的样本系统的故障数据。也就是说，这些样本系统的故障数据可以是对真实系统进行采集得到的。另外，样本系统的故障数据也可以是利用GAN模型的生成网络自动生成的。In some optional implementation manners, data mining may be performed on the historical data to obtain fault data of the sample systems in the training set. That is to say, the fault data of these sample systems can be obtained by collecting real systems. In addition, the fault data of the sample system can also be automatically generated using the generative network of the GAN model.

其中，GAN模型即生成对抗网络(Generative Adversarial Network)，由一个生成网络与一个判别网络组成。生成网络从潜在空间(latent space)中随机采样作为输入，其输出结果需要尽量模仿训练集中的真实样本。判别网络的输入则为真实样本或生成网络的输出，其目的是将生成网络的输出从真实样本中尽可能分辨出来。而生成网络则要尽可能地欺骗判别网络。两个网络相互对抗、不断调整参数，最终目的是使判别网络无法判断生成网络的输出结果是否真实。使用GAN模型可以生成多个样本系统的故障数据，用于时长配置模型的训练过程，能有效降低原始数据采集的数据量，大大降低数据采集和标注的成本。Among them, the GAN model is the Generative Adversarial Network, which consists of a generation network and a discriminant network. The generator network randomly samples from the latent space as input, and its output needs to imitate the real samples in the training set as much as possible. The input of the discriminative network is the real sample or the output of the generation network, and its purpose is to distinguish the output of the generation network from the real sample as much as possible. The generative network should deceive the discriminative network as much as possible. The two networks fight against each other and constantly adjust the parameters. The ultimate goal is to make the discriminative network unable to judge whether the output of the generating network is true or not. The fault data of multiple sample systems can be generated by using the GAN model, which is used in the training process of the time-length configuration model, which can effectively reduce the amount of data collected by the original data, and greatly reduce the cost of data collection and labeling.

本申请实施例对标注数据的获取方式不作限定，例如可以采用人工标注的方式，也可以采用自动标注或者半自动标注的方式。当样本系统的故障数据是对真实系统采集得到时，可以通过关键词提取的方式从历史数据中获取真实数据作为标注数据。The embodiment of the present application does not limit the manner of obtaining the labeling data, for example, manual labeling, automatic labeling or semi-automatic labeling may be used. When the fault data of the sample system is collected from the real system, the real data can be obtained from the historical data as labeled data by means of keyword extraction.

本申请实施例对时长配置模型的训练过程不作限定，其例如可以采用上述监督学习的训练方式，或者可以采用半监督学习的训练方式，或者可以采用无监督学习的训练方式。The embodiment of the present application does not limit the training process of the duration configuration model. For example, the above-mentioned supervised learning training method, semi-supervised learning training method, or unsupervised learning training method may be used.

本申请实施例对预设的训练结束条件不作限定，其例如可以是训练次数达到预设次数(预设次数例如是1次、3次、10次、100次、1000次、10000次等)，或者可以是训练集中的训练数据都完成一次或多次训练，或者可以是本次训练得到的总损失值不大于预设损失值。The embodiment of the present application does not limit the preset training end condition, which can be, for example, that the number of training times reaches the preset number of times (the preset number of times is, for example, 1 time, 3 times, 10 times, 100 times, 1000 times, 10000 times, etc.), Or it may be that all the training data in the training set have completed one or more trainings, or it may be that the total loss value obtained in this training is not greater than the preset loss value.

慢查询是指数据库中查询时长超过指定时长阈值(例如可以设置为100毫秒、1秒、10秒等)的SQL，慢查询是数据库的性能杀手，也是业务优化数据库访问的重要抓手。随着物流运输业务的高速增长，日均慢查询量的数量级可以达到十万级、百万级乃至亿级，因慢查询导致的故障约占数据库故障总数的10％以上，而且高级别的故障呈日益增长趋势。因此，对慢查询的优化已经变得刻不容缓。其中，SQL是一种特定目的编程语言，用于管理关系数据库管理系统，或在关系流数据管理系统中进行流处理。SQL基于关系代数和元组关系演算，包括一个数据定义语言和数据操纵语言。SQL的范围包括数据插入、查询、更新和删除，数据库模式创建和修改，以及数据访问控制。Slow query refers to the SQL in the database whose query time exceeds the specified time threshold (for example, it can be set to 100 milliseconds, 1 second, 10 seconds, etc.). Slow query is the performance killer of the database and an important starting point for business optimization database access. With the rapid growth of the logistics and transportation business, the order of magnitude of the average daily slow query volume can reach hundreds of thousands, millions or even billions, and the faults caused by slow queries account for more than 10% of the total database faults, and high-level faults There is an increasing trend. Therefore, optimization of slow queries has become urgent. Among them, SQL is a special-purpose programming language used to manage relational database management systems, or stream processing in relational stream data management systems. SQL is based on relational algebra and tuple relational calculus, including a data definition language and a data manipulation language. The scope of SQL includes data insertion, query, update, and deletion, database schema creation and modification, and data access control.

本申请实施例对预设时长阈值不作限定，其例如可以是100毫秒、1秒钟、10秒钟、15秒钟、20秒钟、30秒钟等。The embodiment of the present application does not limit the preset duration threshold, which may be, for example, 100 milliseconds, 1 second, 10 seconds, 15 seconds, 20 seconds, or 30 seconds.

本申请实施例对预设数量阈值不作限定，其例如可以是10、20、30、50、80、100、110、120、150、200、500、1000等。This embodiment of the present application does not limit the preset quantity threshold, which may be, for example, 10, 20, 30, 50, 80, 100, 110, 120, 150, 200, 500, 1000, and so on.

由此，对于数据库来说，如果慢查询的查询时长过长，则会极大占用数据库资源，另外，如果线程数过多，也会极大占用数据库资源，因此，针对数据库所配置的预警条件需要考虑到慢查询的查询时长和线程数的至少一者。这样做的好处是，能够在慢查询的查询时长过长和/或线程数过多时，及时生成预警信息，以推送至对应的开发人员，方便开发人员及时查看数据库的实际情况。Therefore, for the database, if the query time of the slow query is too long, it will greatly occupy the database resources. In addition, if the number of threads is too large, it will also greatly occupy the database resources. Therefore, the early warning conditions configured for the database At least one of the query duration and the number of threads of the slow query needs to be considered. The advantage of this is that when the query time of the slow query is too long and/or the number of threads is too large, early warning information can be generated in time and pushed to the corresponding developer, so that the developer can check the actual situation of the database in a timely manner.

本申请实施例中，中间件的队列长度即队列消息数量。In this embodiment of the application, the queue length of the middleware is the number of messages in the queue.

由此，针对每个中间件，配置其所对应的预设长度阈值，当该中间件的队列长度大于其所对应的预设长度阈值时，表明该中间件的队列长度过长，此时可以生成预警信息以推送至开发人员，方便开发人员及时查看中间件的实际情况，及时干预，避免中间件在较长时间段内处于消费缓慢甚至消费停滞的情况。另外，不同中间件所对应的预设长度阈值可以相同或不同。Therefore, for each middleware, its corresponding preset length threshold is configured. When the queue length of the middleware is greater than its corresponding preset length threshold, it indicates that the queue length of the middleware is too long. At this time, you can Generate early warning information to push to developers, so that developers can check the actual situation of middleware in time, intervene in time, and avoid slow consumption or even stagnant consumption of middleware for a long period of time. In addition, the preset length thresholds corresponding to different middleware may be the same or different.

本申请实施例对预设长度阈值不作限定，其例如可以是10、20、30、50、80、90、100、110、120、150、200、300、500、800、1000、2000、3000、5000、8000、10000、30000、50000、100000、1000000等。The embodiment of the present application does not limit the preset length threshold, which can be, for example, 10, 20, 30, 50, 80, 90, 100, 110, 120, 150, 200, 300, 500, 800, 1000, 2000, 3000, 5000, 8000, 10000, 30000, 50000, 100000, 1000000, etc.

由此，本申请中的监控预警方法所适用的中间件包括RabbitMQ中间件和/或Redis中间件。Therefore, the middleware applicable to the monitoring and early warning method in this application includes RabbitMQ middleware and/or Redis middleware.

针对RabbitMQ中间件和Redis中间件，所对应的预设长度阈值可以是相同的，也可以是不同的，例如RabbitMQ中间件对应的预设长度阈值可以是3000、5000、8000等，Redis中间件对应的预设长度阈值可以是90、100、110等。For RabbitMQ middleware and Redis middleware, the corresponding preset length thresholds can be the same or different. For example, the preset length thresholds corresponding to RabbitMQ middleware can be 3000, 5000, 8000, etc., and Redis middleware corresponds to The preset length threshold can be 90, 100, 110, etc.

RabbitMQ是实现了高级消息队列协议(AMQP)的开源消息代理软件(亦称面向消息的中间件)。RabbitMQ服务器是用Erlang语言编写的，而聚类和故障转移是构建在开放电信平台框架上的，所有主要的编程语言均有与代理接口通讯的客户端库。RabbitMQ is an open source message broker software (also known as message-oriented middleware) that implements the Advanced Message Queuing Protocol (AMQP). The RabbitMQ server is written in the Erlang language, while the clustering and failover are built on the Open Telecom Platform framework, and all major programming languages have client libraries for communicating with the broker interface.

Redis是一个开源的内存中的数据结构存储系统，可以用作数据库、缓存和消息中间件。Redis即远程字典服务，是一个开源的使用ANSI C语言编写、支持网络、可基于内存亦可持久化的日志型、“Key-Value”数据库，并提供多种语言的API；Redis会周期性地把更新的数据写入磁盘或者把修改操作写入追加的记录文件。Redis is an open source in-memory data structure storage system that can be used as a database, cache, and message middleware. Redis is the remote dictionary service. It is an open source written in ANSI C language, supports the network, can be based on memory and can be persistent log, "Key-Value" database, and provides APIs in multiple languages; Redis will periodically Write updated data to disk or write modification operations to appended log files.

本申请实施例对接收各种人工操作(或者说用户操作)的方式不作限定。按照输入方式划分操作，例如可以包括文本输入操作、音频输入操作、视频输入操作、按键操作、按钮操作、旋钮操作、鼠标操作、键盘操作、智能触控笔操作、智能触控板操作等。这些操作包括但不限于针对请求成功条件的配置操作、针对所述系统的目标时间段的时长的配置操作等。The embodiment of the present application does not limit the manner of receiving various manual operations (or user operations). Operations are divided according to input methods, for example, text input operations, audio input operations, video input operations, key operations, button operations, knob operations, mouse operations, keyboard operations, smart stylus operations, smart touchpad operations, etc. can be included. These operations include but are not limited to configuration operations for request success conditions, configuration operations for the duration of the target time period of the system, and the like.

其中，配置操作例如可以是点击监控预警软件内的“配置”控件。在计算机编程当中，控件(或部件，widget或control)是一种图形用户界面元素，其显示的信息排列可由用户改变，例如视窗或文本框。控件定义的特点是为给定数据的直接操作(directmanipulation)提供单独的互动点。控件是一种基本的可视构件块，包含在应用程序中，控制着该程序处理的所有数据以及关于这些数据的交互操作。Wherein, the configuration operation may be, for example, clicking the "configuration" control in the monitoring and early warning software. In computer programming, a control (or component, widget or control) is a graphical user interface element whose arrangement of displayed information can be changed by the user, such as a window or a text box. A control definition is characterized by providing a single point of interaction for direct manipulation of given data. A control is a basic visual building block contained in an application program that controls all data that the program processes and interacts with that data.

参见图3，图3示出了本申请实施例提供的另一种监控预警方法的流程示意图。Referring to FIG. 3 , FIG. 3 shows a schematic flowchart of another monitoring and early warning method provided by an embodiment of the present application.

在一个具体应用场景中，本申请实施例提供了一种监控预警方法，包括以下步骤一至步骤六。In a specific application scenario, an embodiment of the present application provides a monitoring and early warning method, including the following steps 1 to 6.

步骤一：创建钉钉群并关联目标系统。Step 1: Create a DingTalk group and associate it with the target system.

步骤二：选择目标系统，配置目标系统对应的数据库、中间件及其对应的日志表名称。Step 2: Select the target system, configure the database, middleware and corresponding log table name corresponding to the target system.

步骤三：配置预警规则。Step 3: Configure alert rules.

①配置接口预警规则。具体而言，在接口日志预警配置菜单中配置目标系统，新增目标系统对应的接口，配置接口名称、接口值、预警时间阈值(分钟)、预警阈值(次数)、请求成功条件、责任人等信息，请求成功条件默认为响应参数中is_success＝1且response_code＝0000，若接口请求成功条件不是is_success＝1且response_code＝0000，则可以通过人工配置的方式来配置请求成功条件，格式为：({"is_success":"1","condition":"or","response_code":"0000"})，连接符用an d或者or。未配置的字段可以不填充，也可以采用缺省值来填充。① Configure interface warning rules. Specifically, configure the target system in the interface log warning configuration menu, add an interface corresponding to the target system, configure the interface name, interface value, warning time threshold (minutes), warning threshold (number of times), request success conditions, responsible person, etc. Information, the request success condition defaults to is_success=1 and response_code=0000 in the response parameter. If the interface request success condition is not is_success=1 and response_code=0000, the request success condition can be configured manually. The format is: ({ "is_success":"1","condition":"or","response_code":"0000"}), the connector uses and or or. Unconfigured fields can be left blank or filled with default values.

②配置数据库预警规则。具体而言，在数据库预警配置菜单中配置数据库连接配置信息(即配置数据库所对应的目标系统)以及慢查询最大秒数、最大线程数、责任人等参数。② Configure database warning rules. Specifically, configure the database connection configuration information (that is, the target system corresponding to the configuration database) and parameters such as the maximum number of seconds for slow queries, the maximum number of threads, and the person in charge in the database warning configuration menu.

③配置RabbitMQ中间件预警规则。具体而言，在RabbitMQ预警配置菜单中配置MQ连接配置信息(即配置RabbitMQ中间件所对应的目标系统)，在新建的MQ连接信息上配置队列名称、队列、交换机、队列消息数量预警阈值。③ Configure RabbitMQ middleware warning rules. Specifically, configure the MQ connection configuration information in the RabbitMQ early warning configuration menu (that is, configure the target system corresponding to the RabbitMQ middleware), and configure the queue name, queue, switch, and queue message quantity warning threshold on the newly created MQ connection information.

④配置Redis中间件预警规则。具体而言，在Redis预警配置菜单中配置Re dis连接配置信息(即配置Redis中间件所对应的目标系统)，在新建的MQ连接信息上配置队列、队列消息数量预警阈值。④ Configure Redis middleware warning rules. Specifically, configure the Redis connection configuration information in the Redis warning configuration menu (that is, configure the target system corresponding to the Redis middleware), and configure the queue and queue message quantity warning threshold on the newly created MQ connection information.

步骤四：监控。Step Four: Monitoring.

①接口日志监控。具体而言，日志监控脚本使用多进程的方式，从多个系统对应的日志表获取接口日志数据，按1分钟、10分钟、30分钟、1小时等不同的时间粒度统计接口的请求总次数、请求成功次数、请求失败次数；再对统计之后的，监控开关开启的系统中的监控开关开启的接口，按照该接口对应的预警条件，将触发预警阈值而生成的预警信息，推送到接口对应的预警队列，该预警队列例如可以采用Redis队列。需要注意的是，多个系统的接口可以采用同一个预警队列，也可以采用不同的预警队列。①Interface log monitoring. Specifically, the log monitoring script uses a multi-process method to obtain interface log data from the log tables corresponding to multiple systems, and counts the total number of interface requests, The number of successful requests and the number of failed requests; and after the statistics, the interface with the monitoring switch in the system with the monitoring switch turned on will push the warning information generated by triggering the warning threshold to the corresponding interface according to the warning conditions corresponding to the interface. An early warning queue, for example, the early warning queue may be a Redis queue. It should be noted that the interfaces of multiple systems can use the same warning queue or different warning queues.

②数据库监控。具体而言，针对监控开关开启的系统中的监控开关开启的数据库(服务)，通过show full processlist命令查询数据库中慢查询SQL以及线程数，对连接失败、SQL查询时间达到预警阈值慢查询SQL、线程数达到预警阈值的情况生成对应的预警信息，将预警信息推送到该数据库对应的预警队列，该预警队列例如可以采用Redis队列。需要注意的是，多个系统的数据库可以采用同一个预警队列，也可以采用不同的预警队列。②Database monitoring. Specifically, for the database (service) whose monitoring switch is turned on in the system where the monitoring switch is turned on, use the show full processlist command to query the slow query SQL and the number of threads in the database. When the number of threads reaches the warning threshold, corresponding warning information is generated, and the warning information is pushed to the warning queue corresponding to the database. The warning queue may be, for example, a Redis queue. It should be noted that the databases of multiple systems can use the same early warning queue or different early warning queues.

③RabbitMQ中间件监控。具体而言，针对监控开关开启的系统中的监控开关开启的RabbitMQ中间件(服务)，通过查询队列中的消息数量，对达到预警阈值的情况生成对应的预警信息，将预警信息推送到该RabbitMQ中间件对应的预警队列，该预警队列例如可以采用Redis队列。需要注意的是，多个系统的RabbitMQ中间件可以采用同一个预警队列，也可以采用不同的预警队列。③RabbitMQ middleware monitoring. Specifically, for the RabbitMQ middleware (service) with the monitoring switch turned on in the system where the monitoring switch is turned on, by querying the number of messages in the queue, corresponding warning information is generated when the warning threshold is reached, and the warning information is pushed to the RabbitMQ The early warning queue corresponding to the middleware, for example, the early warning queue can be a Redis queue. It should be noted that the RabbitMQ middleware of multiple systems can use the same alert queue or different alert queues.

④Redis中间件监控。具体而言，针对监控开关开启的系统中的监控开关开启的Redis中间件(服务)，通过查询队列中的消息数量，对达到预警阈值的情况生成对应的预警信息，将预警信息推送到该Redis中间件对应的预警队列，该预警队列例如可以采用Redis队列。需要注意的是，多个系统的Redis中间件可以采用同一个预警队列，也可以采用不同的预警队列。④Redis middleware monitoring. Specifically, for the Redis middleware (service) with the monitoring switch turned on in the system where the monitoring switch is turned on, by querying the number of messages in the queue, corresponding warning information is generated for the situation that reaches the warning threshold, and the warning information is pushed to the Redis The early warning queue corresponding to the middleware, for example, the early warning queue can be a Redis queue. It should be noted that the Redis middleware of multiple systems can use the same alert queue or different alert queues.

步骤五：预警。Step five: early warning.

消费接口、数据库、中间件对应的预警队列中的数据，异步发送预警消息至对应的开发人员的群用户。Consume the data in the early warning queue corresponding to the interface, database, and middleware, and asynchronously send early warning messages to the corresponding developer group users.

步骤六：辅助查询。Step 6: Auxiliary query.

在接口日志菜单中查询接口请求异常的日志(或者说错误日志)，其他数据库、中间件可以查看对应的服务器资源信息，方便开发人员具体问题具体分析。In the interface log menu, query the log of interface request exception (or error log), and other databases and middleware can view the corresponding server resource information, which is convenient for developers to analyze specific problems.

作为一个示例，该监控预警方法通过监控预警软件实现，该监控预警软件内设置有钉钉预警配置菜单、接口日志预警配置菜单、数据库预警配置菜单、Rab bitMQ预警配置菜单、Redis预警配置菜单和接口日志菜单，该监控预警方法包括以下Step1至Step6。As an example, the monitoring and early warning method is implemented by monitoring and early warning software. The monitoring and early warning software is provided with DingTalk early warning configuration menu, interface log early warning configuration menu, database early warning configuration menu, Rab bitMQ early warning configuration menu, Redis early warning configuration menu and interface Log menu, the monitoring and early warning method includes the following Step1 to Step6.

Step1：创建网点系统对应的钉钉预警群，在钉钉预警配置菜单中配置该网点系统。Step1: Create a DingTalk alert group corresponding to the outlet system, and configure the outlet system in the DingTalk alert configuration menu.

Step2：在数据库预警配置菜单中配置网点系统日志库(数据库名称例如是test)连接信息；在接口日志预警配置菜单中配置网点系统所对应的日志表t_log连接信息。Step2: Configure the connection information of the network point system log library (database name is test, for example) in the database early warning configuration menu; configure the log table t_log connection information corresponding to the network point system in the interface log early warning configuration menu.

Step3：①接口预警规则配置：在接口日志预警配置菜单中选择网点系统，添加接口名称“测试接口”，接口值为TEST_INTERFACE，请求成功条件为：{"is_success":"1","condition":"or","response_code":"0000"}，预警阈值10次，预警时间阈值为1分钟(即1分钟内10次请求失败则触发预警)。Step3: ①Configuration of interface warning rules: Select the outlet system in the interface log warning configuration menu, add the interface name "test interface", the interface value is TEST_INTERFACE, and the request success condition is: {"is_success":"1","condition": "or","response_code":"0000"}, the warning threshold is 10 times, and the warning time threshold is 1 minute (that is, if 10 requests fail within 1 minute, the warning will be triggered).

②数据库预警规则配置：在数据库预警配置菜单中配置网点系统的数据库慢查询最大秒数为10，最大线程数为110。②Configuration of database early warning rules: In the database early warning configuration menu, configure the maximum number of seconds for the database slow query of the network point system to be 10, and the maximum number of threads to be 110.

③RabbitMQ中间件预警规则配置：在RabbitMQ预警配置菜单中配置网点系统的MQ连接配置信息并为网点系统添加物流数据队列，该物流数据队列对应的EXCHANGE(交换机)为test_exchange，QUEUE(队列)为test_queue(该队列用于存储网点系统的物流数据)，触发预警的预设长度阈值为5000。③ RabbitMQ middleware early warning rule configuration: Configure the MQ connection configuration information of the network point system in the RabbitMQ early warning configuration menu and add a logistics data queue for the network point system. The EXCHANGE (switch) corresponding to the logistics data queue is test_exchange, and the QUEUE (queue) is test_queue ( This queue is used to store the logistics data of the outlet system), and the preset length threshold for triggering an alert is 5000.

④Redis中间件预警规则配置：在Redis预警配置菜单中配置网点系统的Re dis连接配置信息并为网点系统添加Redis队列(名称例如是test_list)，触发预警的预设长度阈值为100。④ Redis middleware early warning rule configuration: configure the Redis connection configuration information of the network point system in the Redis early warning configuration menu and add a Redis queue (named such as test_list) for the network point system. The preset length threshold for triggering the warning is 100.

Step4：①接口监控：将日志监控脚本中的接口名称人工修改为TEST_INTE RFACE，将请求成功条件设置为{"is_success":"1","condition":"or","response_code":"0000"}，若1分钟内请求失败达到10次则触发预警。Step4: ①Interface monitoring: Manually change the interface name in the log monitoring script to TEST_INTE RFACE, and set the request success condition to {"is_success":"1","condition":"or","response_code":"0000" }, if the request fails 10 times within 1 minute, an alert will be triggered.

②数据库监控：监控网点系统数据库，若最大线程数达到110个则触发预警，若SQL执行时长(即查询时长)达到10秒钟则触发预警。②Database monitoring: monitor the network system database, if the maximum number of threads reaches 110, an early warning will be triggered, and if the SQL execution time (that is, the query time) reaches 10 seconds, an early warning will be triggered.

③MQ中间件监控：监控网点系统的物流数据队列test_queue，当队列中的数据堆积量达到5000时触发预警。③MQ middleware monitoring: monitor the logistics data queue test_queue of the outlet system, and trigger an early warning when the amount of data accumulated in the queue reaches 5,000.

④Redis中间件监控：监控网点系统的test_list队列，当队列长度达到100时触发预警规则。④Redis middleware monitoring: monitor the test_list queue of the network system, and trigger the early warning rule when the queue length reaches 100.

Step5：消费接口、数据库、中间件等预警队列中的预警信息数据，异步发送预警信息至网点系统对应的钉钉预警群。Step5: Consume the early warning information data in the early warning queues of interfaces, databases, middleware, etc., and asynchronously send the early warning information to the DingTalk early warning group corresponding to the outlet system.

预警信息如下：The warning information is as follows:

【网点系统预警】【Network system warning】

请求接口地址:http://10.172.242.66/index.php/test/interfaceRequest interface address: http://10.172.242.66/index.php/test/interface

接口名称:测试接口Interface name: test interface

接口:TEST_INTERFACEInterface: TEST_INTERFACE

统计时间段:2022-09-20 11:37:00至2022-09-20 11:37:59Statistical period: 2022-09-20 11:37:00 to 2022-09-20 11:37:59

统计时长:1分钟Statistics time: 1 minute

累计发生错误:34次责任人:请及时处理！！！Cumulative errors: 34 times Responsible person: Please deal with it in time! ! !

Step6：在接口日志菜单中查询接口请求异常的接口日志数据。例如针对上述预警消息对应的接口日志数据，查询步骤例如是：选择网点系统，查询开始时间为2022-09-2011:37:00，结束时间为2022-09-20 11:37:59，接口名称为TES T_INTERFACE，请求状态为失败的接口日志数据。Step6: In the interface log menu, query the interface log data of interface request exceptions. For example, for the interface log data corresponding to the above warning message, the query steps are, for example: select an outlet system, the query start time is 2022-09-2011:37:00, the end time is 2022-09-20 11:37:59, the interface name For TES T_INTERFACE, the request status is Failed interface log data.

装置实施例Device embodiment

参见图4，图4示出了本申请实施例提供的一种监控预警装置的结构示意图。Referring to FIG. 4 , FIG. 4 shows a schematic structural diagram of a monitoring and early warning device provided by an embodiment of the present application.

本申请实施例提供了一种监控预警装置，其具体实施方式与上述方法实施例中记载的实施方式、所达到的技术效果一致，部分内容不再赘述。The embodiment of the present application provides a monitoring and early warning device, and its specific implementation mode is consistent with the implementation mode and the achieved technical effect described in the above-mentioned method embodiment, and part of the content will not be repeated.

所述监控预警装置用于对目标系统中的监控对象进行监控和预警，所述装置包括：The monitoring and early warning device is used to monitor and warn the monitoring objects in the target system, and the device includes:

标识获取模块101，用于获取所述目标系统对应的预警对象标识，所述预警对象标识包括预警对象的数字账号、邮箱地址和电话号码中的至少一种；An identification acquisition module 101, configured to acquire an early warning object identification corresponding to the target system, the early warning object identification including at least one of the digital account number, email address and telephone number of the early warning object;

条件获取模块102，用于获取所述目标系统中的每个监控对象对应的预警条件，所述目标系统中的监控对象包括接口、数据库和中间件中的至少一种；A condition acquisition module 102, configured to acquire an early warning condition corresponding to each monitored object in the target system, where the monitored object in the target system includes at least one of an interface, a database, and middleware;

监控模块103，用于针对每个所述监控对象，当所述监控对象满足自身对应的预警条件时，生成预警信息并发送至所述监控对象对应的预警队列；The monitoring module 103 is configured to, for each of the monitored objects, generate early warning information and send it to the corresponding early warning queue of the monitored object when the monitored object meets its corresponding early warning conditions;

预警模块104，用于消费所述监控对象对应的预警队列，以将所述预警信息推送至所述预警对象标识对应的预警对象。The early warning module 104 is configured to consume the early warning queue corresponding to the monitored object, so as to push the early warning information to the early warning object corresponding to the early warning object identifier.

设备实施例device embodiment

本申请实施例还提供了一种电子设备，所述电子设备包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现上述任一项方法的步骤，其具体实施方式与上述方法实施例中记载的实施方式、所达到的技术效果一致，部分内容不再赘述。The embodiment of the present application also provides an electronic device, the electronic device includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of any one of the above methods when executing the computer program, wherein The specific implementation mode is consistent with the implementation mode and the achieved technical effect described in the above-mentioned method embodiments, and part of the content will not be repeated.

参见图5，图5示出了本申请实施例提供的一种电子设备200的结构框图。Referring to FIG. 5 , FIG. 5 shows a structural block diagram of an electronic device 200 provided by an embodiment of the present application.

电子设备200例如可以包括至少一个存储器210、至少一个处理器220以及连接不同平台系统的总线230。The electronic device 200 may include, for example, at least one memory 210, at least one processor 220, and a bus 230 connecting different platform systems.

存储器210可以包括易失性存储器形式的可读介质，例如随机存取存储器(RAM)211和/或高速缓存存储器212，还可以进一步包括只读存储器(ROM)213。Memory 210 may include readable media in the form of volatile memory, such as random access memory (RAM) 211 and/or cache memory 212 , and may further include read only memory (ROM) 213 .

其中，存储器210还存储有计算机程序，计算机程序可以被处理器220执行，使得处理器220实现上述任一项方法的步骤。Wherein, the memory 210 also stores a computer program, and the computer program can be executed by the processor 220, so that the processor 220 implements the steps of any one of the above-mentioned methods.

存储器210还可以包括具有至少一个程序模块215的实用工具214，这样的程序模块215包括但不限于：操作系统、一个或者多个应用程序、其它程序模块以及程序数据，这些示例的每一个或某种组合中可能包括网络环境的实现。Memory 210 may also include utility 214 having at least one program module 215 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some of these examples Implementations of network environments may be included in this combination.

相应的，处理器220可以执行上述计算机程序，以及可以执行实用工具214。Correspondingly, the processor 220 can execute the above-mentioned computer program, and can execute the utility tool 214 .

处理器220可以采用一个或多个应用专用集成电路(ASIC，Application Specific Integrated Circuit)、DSP、可编程逻辑器件(PLD，ProgrammableLogic Device)、复杂可编程逻辑器件(CPLD，Complex Programmable Logic Device)、现场可编程门阵列(FPGA，Field-Programmable Gate Array)或其他电子元件。The processor 220 can adopt one or more application-specific integrated circuits (ASIC, Application Specific Integrated Circuit), DSP, programmable logic device (PLD, Programmable Logic Device), complex programmable logic device (CPLD, Complex Programmable Logic Device), field Programmable gate array (FPGA, Field-Programmable Gate Array) or other electronic components.

总线230可以为表示几类总线结构的一种或多种，包括存储器总线或者存储器控制器、外围总线、图形加速端口、处理器或者使用多种总线结构的任意总线结构的局域总线。Bus 230 may be a local bus representing one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or any bus structure using a variety of bus structures.

电子设备200也可以与一个或多个外部设备240例如键盘、指向设备、蓝牙设备等通信，还可与一个或者多个能够与该电子设备200交互的设备通信，和/或与使得该电子设备200能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等)通信。这种通信可以通过输入输出接口250进行。并且，电子设备200还可以通过网络适配器260与一个或者多个网络(例如局域网(LAN)，广域网(WAN)和/或公共网络，例如因特网)通信。网络适配器260可以通过总线230与电子设备200的其它模块通信。应当明白，尽管图中未示出，可以结合电子设备200使用其它硬件和/或软件模块，包括但不限于：微代码、设备驱动器、冗余处理器、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储平台等。The electronic device 200 can also communicate with one or more external devices 240 such as keyboards, pointing devices, Bluetooth devices, etc., and can also communicate with one or more devices capable of interacting with the electronic device 200, and/or communicate with the electronic device 200 200 is capable of communicating with any device (eg, router, modem, etc.) that communicates with one or more other computing devices. Such communication may occur through input-output interface 250 . Moreover, the electronic device 200 can also communicate with one or more networks (such as a local area network (LAN), a wide area network (WAN) and/or a public network such as the Internet) through the network adapter 260 . The network adapter 260 can communicate with other modules of the electronic device 200 through the bus 230 . It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with electronic device 200, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives And data backup storage platform, etc.

介质实施例Media Example

本申请实施例还提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现上述任一项方法的步骤，其具体实施方式与上述方法实施例中记载的实施方式、所达到的技术效果一致，部分内容不再赘述。The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps of any one of the above-mentioned methods are implemented, and its specific implementation method is the same as The implementation manners and achieved technical effects described in the above-mentioned method embodiments are consistent, and part of the content will not be repeated here.

参见图6，图6示出了本申请实施例提供的一种程序产品的结构示意图。Referring to FIG. 6 , FIG. 6 shows a schematic structural diagram of a program product provided by an embodiment of the present application.

所述程序产品用于实现上述任一项方法。程序产品可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码，并可以在终端设备，例如个人电脑上运行。然而，本发明的程序产品不限于此，在本申请实施例中，可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括：具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The program product is used to implement any one of the above methods. The program product may take the form of a portable compact disc read-only memory (CD-ROM) and include program code, and be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto. In the embodiments of the present application, the readable storage medium may be any tangible medium containing or storing a program, and the program may be used by or in combination with an instruction execution system, device or device. A program product may take the form of any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

计算机可读存储介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了可读程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。可读存储介质还可以是任何可读介质，该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。可读存储介质上包含的程序代码可以用任何适当的介质传输，包括但不限于无线、有线、光缆、RF等，或者上述的任意合适的组合。可以以一种或多种程序设计语言的任意组合来编写用于执行本发明操作的程序代码，程序设计语言包括面向对象的程序设计语言诸如Java、C++等，还包括常规的过程式程序设计语言诸如C语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中，远程计算设备可以通过任意种类的网络，包括局域网(LAN)或广域网(WAN)，连接到用户计算设备，或者，可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。A computer readable storage medium may include a data signal carrying readable program code in baseband or as part of a carrier wave traveling as part of a data signal. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the readable storage medium can be transmitted by any appropriate medium, including but not limited to wireless, cable, optical cable, RF, etc., or any suitable combination of the above. The program codes for performing the operations of the present invention can be written in any combination of one or more programming languages, and the programming languages include object-oriented programming languages such as Java, C++, etc., and also include conventional procedural programming languages A programming language such as C or similar. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server to execute. In cases involving a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (e.g., using an Internet service provider). business to connect via the Internet).

本申请从使用目的上，效能上，进步及新颖性等观点进行阐述，已符合专利法所强调的功能增进及使用要件，本申请以上的说明书及说明书附图，仅为本申请的较佳实施例而已，并非以此局限本申请，因此，凡一切与本申请构造，装置，特征等近似、雷同的，即凡依本申请专利申请范围所作的等同替换或修饰等，皆应属本申请的专利申请保护的范围之内。This application is elaborated from the perspectives of purpose of use, performance, progress and novelty, etc., and has met the function enhancement and use requirements emphasized by the Patent Law. The above description and drawings of this application are only the preferred implementation of this application. It is just an example, and this application is not limited thereto. Therefore, all the structures, devices, features, etc. that are similar to or identical to those of this application, that is, all equivalent replacements or modifications made according to the patent application scope of this application, shall belong to this application. within the scope of patent application protection.

Claims

1. A method for monitoring and early warning, characterized in that, for monitoring and early warning of the monitoring object in the target system, the method comprises:

Obtain an early warning object identifier corresponding to the target system, where the early warning object identifier includes at least one of a digital account number, an email address, and a telephone number of the early warning object;

Obtaining an early warning condition corresponding to each monitoring object in the target system, where the monitoring object in the target system includes at least one of an interface, a database, and middleware;

For each of the monitored objects, when the monitored object satisfies its corresponding early warning condition, generate early warning information and send it to the corresponding early warning queue of the monitored object;

Consuming the warning queue corresponding to the monitoring object, so as to push the warning information to the warning object corresponding to the warning object identifier.

2. The monitoring and early warning method according to claim 1, wherein the early warning conditions corresponding to the interface include at least one of the following:

The number of request failures of the interface within the target time period is greater than a preset number of times threshold;

The ratio of the number of failed requests of the interface to the total number of requests within the target time period is greater than a preset ratio threshold.

3. The monitoring and early warning method according to claim 2, wherein the process of obtaining the number of request failures and the total number of requests of the interface within the target time period comprises:

In response to the configuration operation for the request success condition, determine the request success condition corresponding to the response parameter returned by the interface;

Use the log monitoring script to obtain the interface log data in the target time period from the log table corresponding to the target system, and the interface log data is used to indicate the response parameters returned by the interface corresponding to each request in the target time period ;

For each request, detect whether the response parameter returned by the interface corresponding to the request satisfies the request success condition; if so, add one to the number of successful requests of the interface within the target time period; if not, Add one to the number of request failures of the interface within the target time period;

The number of successful requests and the number of failed requests of the interface within the target time period are summed to obtain the total number of requests of the interface within the target time period.

4. The monitoring and early warning method according to claim 3, wherein said utilizing the log monitoring script to obtain the interface log data in the target time period from the log table corresponding to the target system includes:

In a multi-process manner, using the log monitoring script to obtain the interface log data corresponding to each system in the target time period from log tables corresponding to multiple systems;

Wherein, the target system is one of multiple systems, and the durations of the target time periods corresponding to different systems are the same or different.

5. The monitoring and early warning method according to claim 1, wherein the early warning conditions corresponding to the database include at least one of the following:

The query duration of the slow query in the database is greater than a preset duration threshold;

The number of threads in the database is greater than a preset number threshold.

6. The monitoring and early warning method according to claim 1, wherein the early warning conditions corresponding to the middleware include at least one of the following:

The queue length of the middleware is greater than the preset length threshold corresponding to the middleware.

7. The monitoring and early warning method according to claim 6, wherein said middleware comprises RabbitMQ middleware and/or Redis middleware.

8. A monitoring and early warning device, characterized in that it is used for monitoring and early warning of the monitoring object in the target system, and the device includes:

An identification acquisition module, configured to acquire an early warning object identification corresponding to the target system, where the early warning object identification includes at least one of a digital account number, an email address and a telephone number of the early warning object;

A condition acquisition module, configured to acquire an early warning condition corresponding to each monitoring object in the target system, where the monitoring object in the target system includes at least one of an interface, a database, and a middleware;

A monitoring module, for each of the monitored objects, when the monitored object meets its own corresponding early warning conditions, generate early warning information and send it to the corresponding early warning queue of the monitored object;

The early warning module is configured to consume the early warning queue corresponding to the monitoring object, so as to push the early warning information to the early warning object corresponding to the early warning object identifier.

9. An electronic device, characterized in that the electronic device comprises a memory and a processor, the memory stores a computer program, and the processor implements any one of claims 1-7 when executing the computer program method steps.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps of the method according to any one of claims 1-7 are implemented.