+

CN107995287A - A method for remotely monitoring the health status of data center nodes through IPMI - Google Patents

A method for remotely monitoring the health status of data center nodes through IPMI Download PDF

Info

Publication number
CN107995287A
CN107995287A CN201711240748.9A CN201711240748A CN107995287A CN 107995287 A CN107995287 A CN 107995287A CN 201711240748 A CN201711240748 A CN 201711240748A CN 107995287 A CN107995287 A CN 107995287A
Authority
CN
China
Prior art keywords
alarm
node
monitor terminal
information
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711240748.9A
Other languages
Chinese (zh)
Inventor
张希伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201711240748.9A priority Critical patent/CN107995287A/en
Publication of CN107995287A publication Critical patent/CN107995287A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • H04L67/025Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • H04L67/125Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks involving control of end-device applications over a network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本发明提供一种通过IPMI远程监控数据中心节点健康状态的方法,通过IPMI协议,监控服务器的运行状态,包括CPU、硬盘、风扇、内存、电源、主板等,通过监控终端即可远程监控所有服务器的状态,能够实时的监控服务器的负载,CPU以及内存的使用情况,网络的波动情况等,并根据管理员设置的安全参数警戒线进行判断,当出现超过阀值情况,自动根据管理员设置的策略进行处理,并自动向管理人员发送消息推送。该方法在日常维护服务器运行时,可以节省大量的人力,避免数据中心由于各种运行隐患导致宕机或其他异常。会大大的减少了数据中心节点宕机风险,保证数据中心的可靠性。

The present invention provides a method for remotely monitoring the health status of data center nodes through IPMI. Through the IPMI protocol, the running status of servers is monitored, including CPU, hard disk, fan, memory, power supply, motherboard, etc., and all servers can be remotely monitored through the monitoring terminal. It can monitor the server load, CPU and memory usage, network fluctuations, etc. in real time, and judge according to the safety parameter warning line set by the administrator. When the threshold value is exceeded, it will automatically The policies are processed and messages are automatically sent to managers. This method can save a lot of manpower during the daily maintenance of server operation, and avoid downtime or other abnormalities caused by various hidden dangers in the data center. It will greatly reduce the risk of data center node downtime and ensure the reliability of the data center.

Description

一种通过IPMI远程监控数据中心节点健康状态的方法A method for remotely monitoring the health status of data center nodes through IPMI

技术领域technical field

本发明涉及服务器领域,尤其涉及一种通过IPMI远程监控数据中心节点健康状态的方法。The invention relates to the field of servers, in particular to a method for remotely monitoring the health status of data center nodes through IPMI.

背景技术Background technique

当前在云计算的大数据世代下,数据中心服务器节点的稳定性对数据安全起到决定性作用,一般在大型的数据中心,服务器365x24小时在不停机持续工作,当前系统在运行过程中,不可避免地会出现各种部件故障、高负载运行、温度过高等情况发生,当这些问题发生时,会影响服务器的正常使用,严重的会影响整个系统的使用,出现宕机或数据丢失。In the current big data generation of cloud computing, the stability of server nodes in data centers plays a decisive role in data security. Generally, in large data centers, servers continue to work 365x24 hours without stopping. During the operation of the current system, it is inevitable There will be various component failures, high-load operation, and high temperature. When these problems occur, it will affect the normal use of the server, and seriously affect the use of the entire system, resulting in downtime or data loss.

发明内容Contents of the invention

为了克服上述现有技术中的不足,本发明提供一种通过IPMI远程监控数据中心节点健康状态的方法,包括:监控终端,其特征在于,方法包括:In order to overcome the deficiencies in the above-mentioned prior art, the present invention provides a method for remotely monitoring the health status of data center nodes through IPMI, including: a monitoring terminal, characterized in that the method includes:

监控终端与数据中心远程接入同一局域网络内,使监控终端与数据中心各个节点进行通信连接;The monitoring terminal and the data center are remotely connected to the same local area network, so that the monitoring terminal communicates with each node of the data center;

监控终端通过IPMI协议访问数据中心的各个节点;The monitoring terminal accesses each node of the data center through the IPMI protocol;

监控终端获取各个节点的温度信息、风扇信息、电压信息、网卡工作情况以及操作系统工作情况;The monitoring terminal obtains temperature information, fan information, voltage information, network card working conditions and operating system working conditions of each node;

监控终端将获取的信息与对应的阈值进行比较,当获取的数据信息中,有超出阈值的,在数据中心中发出报警信息,提示维护人员。The monitoring terminal compares the obtained information with the corresponding threshold value, and when the obtained data information exceeds the threshold value, an alarm message is sent in the data center to prompt the maintenance personnel.

优选地,方法包括:Preferably, the method includes:

监控终端每经过一预设时间间隔,对各个节点的温度信息、风扇信息、电压信息、网卡工作情况以及操作系统工作情况进行监控。The monitoring terminal monitors the temperature information, fan information, voltage information, network card working conditions and operating system working conditions of each node every time a preset time interval passes.

优选地,监控终端对接收的数据信息进行解析,将并解析后的数据信息通过相应的告警参数进行分类,对超过阈值的数据形成相应的数据信息,在数据中心中发出报警信息,提示维护人员,同时将报警信息形成报警日志;Preferably, the monitoring terminal analyzes the received data information, classifies the analyzed data information through corresponding alarm parameters, forms corresponding data information for data exceeding the threshold, and sends an alarm message in the data center to prompt maintenance personnel , and at the same time, the alarm information is formed into an alarm log;

监控终端将获取到的数据信息以及报警信息呈现给维护人员,同时接收维护人员对监控终端的控制指令;The monitoring terminal presents the obtained data information and alarm information to the maintenance personnel, and at the same time receives the control instructions from the maintenance personnel to the monitoring terminal;

监控终端提供了统一的CUI接口,供维护人员对历史日志、报警信息的查询和浏览以及对报警参数进行设置。The monitoring terminal provides a unified CUI interface for maintenance personnel to query and browse historical logs and alarm information and set alarm parameters.

优选地,温度信息包括CPU温度,主板温度,背板温度、热插拔模组温度;Preferably, the temperature information includes CPU temperature, motherboard temperature, backplane temperature, and hot-swap module temperature;

电压信息包括CPU电压,主板电压,SCSI背板电压,热插拔模组电压Voltage information includes CPU voltage, motherboard voltage, SCSI backplane voltage, and hot-swap module voltage

风扇信息包括CPU风扇,主板风扇,背板风扇,热插拔模组风扇。Fan information includes CPU fans, motherboard fans, backplane fans, and hot-swap module fans.

优选地,监控终端具有I2C设备接口,在各个节点设置有I2C Slave接口;Preferably, the monitoring terminal has an I2C device interface, and each node is provided with an I2C Slave interface;

监控终端通过I2C设备接口产生时钟向各个节点的I2C Slave接口发起通信,各个节点通过I2C Slave接口做出响应The monitoring terminal generates a clock through the I2C device interface to initiate communication to the I2C Slave interface of each node, and each node responds through the I2C Slave interface

监控终端的I2C设备接口与各个节点的I2C Slave接口通过IPMI协议传输数据。The I2C device interface of the monitoring terminal and the I2C Slave interface of each node transmit data through the IPMI protocol.

优选地,步骤监控终端通过IPMI协议访问数据中心的各个节点还包括:Preferably, the step monitoring terminal accessing each node of the data center through the IPMI protocol also includes:

监控终端对报警信息进行配置,包括:节点名称,节点设备名称,节点报警事件名称,节点报警描述,节点报警触发值;The monitoring terminal configures the alarm information, including: node name, node device name, node alarm event name, node alarm description, node alarm trigger value;

监控终端对报警信息类型进行配置,包括:报警供配电类、报警环境类、报警安防类;The monitoring terminal configures the types of alarm information, including: alarm power supply and distribution, alarm environment, alarm security;

监控终端配置报警屏蔽控制,节点报警屏蔽、节点设备报警屏蔽和节点报警事件屏蔽,通过配置屏蔽方式和屏蔽时间段来屏蔽节点报警,配置解除屏蔽控制;The monitoring terminal is configured with alarm masking control, node alarm masking, node device alarm masking, and node alarm event masking, node alarms are masked by configuring masking methods and masking time periods, and masking control is configured;

监控终端配置报警级别,配置节点级别;The monitoring terminal configures the alarm level and configures the node level;

配置节点CPU,硬盘,主板的报警为高级报警,配置节点温度报警为中级报警,配置节点软件类报警为低级报警;Configure node CPU, hard disk, and motherboard alarms as high-level alarms, configure node temperature alarms as intermediate-level alarms, and configure node software alarms as low-level alarms;

配置节点供配电类为高级报警,配置节点环境类报警为中级报警,配置节点安防类为低级报警;Configure the node power supply and distribution class as high-level alarm, configure the node environment class as medium-level alarm, and configure the node security class as low-level alarm;

监控终端根据报警规则,当多个报警产生时,根据配置的报警级别,优先发出报警级别高的报警;According to the alarm rules, when multiple alarms are generated, the monitoring terminal will give priority to the alarm with the highest alarm level according to the configured alarm level;

或在同一时间段获取多个报警,根据配置的报警级别,优先发出报警级别高的报警;Or get multiple alarms in the same time period, according to the configured alarm level, give priority to the alarm with higher alarm level;

当多个报警产生时,级别低的报警缓存预设时间后,发出;When multiple alarms are generated, the low-level alarms are cached for a preset time and sent out;

当某节点在预设时间段内多次报警,则认为是高频次报警,在经过预设时间段后发出高频次报警提示。When a node alarms multiple times within the preset time period, it is considered as a high-frequency alarm, and a high-frequency alarm prompt will be issued after the preset time period.

优选地,监控终端筛选报警关键字,将属于同一时间区间内,同一报警关键字合并,统计报警发生的频度,按照时间区间统计出报警发生的频度,展示给维护人员。Preferably, the monitoring terminal screens the alarm keywords, combines the same alarm keywords belonging to the same time interval, and counts the frequency of alarm occurrences, counts the frequency of alarm occurrences according to the time interval, and displays them to maintenance personnel.

优选地,步骤监控终端通过IPMI协议访问数据中心的各个节点还包括:Preferably, the step monitoring terminal accessing each node of the data center through the IPMI protocol also includes:

监控终端将配置后的报警信息,报警信息类型,报警屏蔽控制信息进行封装,形成封装信息发送至数据中心的各个节点The monitoring terminal encapsulates the configured alarm information, alarm information type, and alarm shielding control information, and sends the encapsulated information to each node in the data center

封装信息包括:配置报警信息MAC地址层,配置报警信息数据处理终端IP层,配置报警信息数据帧段;Encapsulation information includes: configure alarm information MAC address layer, configure alarm information data processing terminal IP layer, configure alarm information data frame segment;

配置报警信息数据帧段中当该帧段的最低位为0时表示请求/回复响应的组件的地址码,当最低位为1时表示请求/回复响应的软件ID;该字节的高7位表示具体的地址码和软件ID;In the configuration alarm information data frame segment, when the lowest bit of the frame segment is 0, it indicates the address code of the request/reply response component; when the lowest bit is 1, it indicates the software ID of the request/reply response; the upper 7 bits of the byte Indicates the specific address code and software ID;

在配置报警信息数据帧段中设有奇偶信息编码,当奇偶信息编码为偶数时表示此消息是一条请求,当奇偶信息编码为奇数时表示此消息是一条响应;当消息是一条基本的控制请求或状态响应时该字节为00h和01h;The parity information code is set in the configuration alarm information data frame segment. When the parity information code is an even number, it means that the message is a request. When the parity information code is an odd number, it means that the message is a response. When the message is a basic control request Or the byte is 00h and 01h in status response;

在配置报警信息数据帧段中还设有请求者自行生成的一序列号,当监控终端需要发出多条请求时用以标识不同的请求;序列号是响应/接收该消息的组件的子组件号或子地址号。There is also a serial number generated by the requester in the configuration alarm information data frame segment, which is used to identify different requests when the monitoring terminal needs to send multiple requests; the serial number is the subcomponent number of the component that responds/receives the message or subaddress number.

从以上技术方案可以看出,本发明具有以下优点:As can be seen from the above technical solutions, the present invention has the following advantages:

通过IPMI远程监控数据中心节点健康状态的方法通过IPMI协议,监控服务器的运行状态,包括CPU、硬盘、风扇、内存、电源、主板等,通过监控终端即可远程监控所有服务器的状态,能够实时的监控服务器的负载,CPU以及内存的使用情况,网络的波动情况等,并根据管理员设置的安全参数警戒线进行判断,当出现超过阀值情况,自动根据管理员设置的策略进行处理,并自动向管理人员发送消息推送。该方法在日常维护服务器运行时,可以节省大量的人力,避免数据中心由于各种运行隐患导致宕机或其他异常。会大大的减少了数据中心节点宕机风险,保证数据中心的可靠性。The method of remotely monitoring the health status of data center nodes through IPMI is used to monitor the running status of servers, including CPU, hard disk, fan, memory, power supply, motherboard, etc., and the status of all servers can be remotely monitored through the monitoring terminal, which can be real-time Monitor server load, CPU and memory usage, network fluctuations, etc., and make judgments based on the safety parameter warning line set by the administrator. Send notifications to managers. This method can save a lot of manpower during daily maintenance of server operation, and avoid downtime or other abnormalities in the data center due to various operational hidden dangers. It will greatly reduce the risk of data center node downtime and ensure the reliability of the data center.

附图说明Description of drawings

为了更清楚地说明本发明的技术方案,下面将对描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solution of the present invention more clearly, the accompanying drawings that need to be used in the description will be briefly introduced below. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. As far as people are concerned, other drawings can also be obtained based on these drawings on the premise of not paying creative work.

图1为通过IPMI远程监控数据中心节点健康状态的方法流程图;Fig. 1 is a flow chart of a method for remotely monitoring the health status of data center nodes through IPMI;

图2为本发明实施例示意图。Fig. 2 is a schematic diagram of an embodiment of the present invention.

具体实施方式Detailed ways

为使得本发明的发明目的、特征、优点能够更加的明显和易懂,下面将运用具体的实施例及附图,对本发明保护的技术方案进行清楚、完整地描述,显然,下面所描述的实施例仅仅是本发明一部分实施例,而非全部的实施例。基于本专利中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本专利保护的范围。In order to make the purpose, features and advantages of the present invention more obvious and understandable, the technical solutions protected by the present invention will be clearly and completely described below using specific embodiments and accompanying drawings. Obviously, the implementation described below Examples are only some embodiments of the present invention, but not all embodiments. Based on the embodiments in this patent, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this patent.

本实施例提供一种通过IPMI远程监控数据中心2节点3健康状态的方法,如图1和图2所示,监控终端1,方法包括:This embodiment provides a method for remotely monitoring the health status of node 3 of data center 2 through IPMI, as shown in Figure 1 and Figure 2, monitoring terminal 1, the method includes:

S1,监控终端1与数据中心2远程接入同一局域网络内,使监控终端1与数据中心2各个节点3进行通信连接;S1, the monitoring terminal 1 and the data center 2 are remotely connected to the same local area network, so that the monitoring terminal 1 communicates with each node 3 of the data center 2;

S2,监控终端1通过IPMI协议访问数据中心2的各个节点3;S2, the monitoring terminal 1 accesses each node 3 of the data center 2 through the IPMI protocol;

S3,监控终端1获取各个节点3的温度信息、风扇信息、电压信息、网卡工作情况以及操作系统工作情况;S3, the monitoring terminal 1 acquires temperature information, fan information, voltage information, network card working conditions and operating system working conditions of each node 3;

S4,监控终端1将获取的信息与对应的阈值进行比较,当获取的数据信息中,有超出阈值的,在数据中心2中发出报警信息,提示维护人员。S4, the monitoring terminal 1 compares the obtained information with the corresponding threshold value, and when any of the obtained data information exceeds the threshold value, an alarm message is sent in the data center 2 to prompt maintenance personnel.

本实施例中,方法包括:监控终端1每经过一预设时间间隔,对各个节点3的温度信息、风扇信息、电压信息、网卡工作情况以及操作系统工作情况进行监控。In this embodiment, the method includes: the monitoring terminal 1 monitors the temperature information, fan information, voltage information, network card working conditions and operating system working conditions of each node 3 every time a preset time interval passes.

本实施例中,监控终端1对接收的数据信息进行解析,将并解析后的数据信息通过相应的告警参数进行分类,对超过阈值的数据形成相应的数据信息,在数据中心2中发出报警信息,提示维护人员,同时将报警信息形成报警日志;In this embodiment, the monitoring terminal 1 analyzes the received data information, classifies the analyzed data information through corresponding alarm parameters, forms corresponding data information for data exceeding the threshold, and sends alarm information in the data center 2 , to prompt the maintenance personnel, and at the same time, the alarm information is formed into an alarm log;

监控终端1将获取到的数据信息以及报警信息呈现给维护人员,同时接收维护人员对监控终端1的控制指令;The monitoring terminal 1 presents the obtained data information and alarm information to the maintenance personnel, and at the same time receives the control instructions from the maintenance personnel to the monitoring terminal 1;

监控终端1提供了统一的CUI接口,供维护人员对历史日志、报警信息的查询和浏览以及对报警参数进行设置。The monitoring terminal 1 provides a unified CUI interface for maintenance personnel to query and browse historical logs and alarm information, and to set alarm parameters.

本实施例中,温度信息包括CPU温度,主板温度,背板温度、热插拔模组温度;In this embodiment, the temperature information includes CPU temperature, motherboard temperature, backplane temperature, and hot-swap module temperature;

电压信息包括CPU电压,主板电压,SCSI背板电压,热插拔模组电压Voltage information includes CPU voltage, motherboard voltage, SCSI backplane voltage, and hot-swap module voltage

风扇信息包括CPU风扇,主板风扇,背板风扇,热插拔模组风扇。Fan information includes CPU fans, motherboard fans, backplane fans, and hot-swap module fans.

本实施例中,监控终端1具有I2C设备接口,在各个节点3设置有I2C Slave接口;监控终端1通过I2C设备接口产生时钟向各个节点3的I2C Slave接口发起通信,各个节点3通过I2C Slave接口做出响应监控终端1的I2C设备接口与各个节点3的I2C Slave接口通过IPMI协议传输数据。In this embodiment, the monitoring terminal 1 has an I2C device interface, and each node 3 is provided with an I2C Slave interface; the monitoring terminal 1 generates a clock through the I2C device interface to initiate communication to the I2C Slave interface of each node 3, and each node 3 passes through the I2C Slave interface. To respond, the I2C device interface of the monitoring terminal 1 and the I2C Slave interface of each node 3 transmit data through the IPMI protocol.

本实施例中,步骤监控终端1通过IPMI协议访问数据中心2的各个节点3还包括:In the present embodiment, the step monitoring terminal 1 accessing each node 3 of the data center 2 through the IPMI protocol also includes:

监控终端1对报警信息进行配置,包括:节点3名称,节点3设备名称,节点3报警事件名称,节点3报警描述,节点3报警触发值;The monitoring terminal 1 configures the alarm information, including: node 3 name, node 3 device name, node 3 alarm event name, node 3 alarm description, node 3 alarm trigger value;

监控终端1对报警信息类型进行配置,包括:报警供配电类、报警环境类、报警安防类;The monitoring terminal 1 configures the types of alarm information, including: alarm power supply and distribution, alarm environment, and alarm security;

监控终端1配置报警屏蔽控制,节点3报警屏蔽、节点3设备报警屏蔽和节点3报警事件屏蔽,通过配置屏蔽方式和屏蔽时间段来屏蔽节点3报警,配置解除屏蔽控制;Monitoring terminal 1 is configured with alarm masking control, node 3 alarm masking, node 3 equipment alarm masking and node 3 alarm event masking, node 3 alarming is masked by configuring the masking mode and masking time period, and configuration release masking control is configured;

监控终端1配置报警级别,配置节点3级别;The monitoring terminal 1 configures the alarm level, and configures the node 3 level;

配置节点3CPU,硬盘,主板的报警为高级报警,配置节点3温度报警为中级报警,配置节点3软件类报警为低级报警;具体分级方式可以根据实际使用中进行分级,具体分级方式不做限定。Configure node 3 CPU, hard disk, and motherboard alarms as high-level alarms, configure node 3 temperature alarms as intermediate alarms, configure node 3 software alarms as low-level alarms; the specific grading method can be classified according to actual use, and the specific grading method is not limited.

配置节点3供配电类为高级报警,配置节点3环境类报警为中级报警,配置节点3安防类为低级报警;具体分级方式可以根据实际使用中进行分级,具体分级方式不做限定。Configure node 3’s power supply and distribution category as high-level alarms, configure node 3’s environment category as medium-level alarms, and configure node 3’s security category as low-level alarms; the specific grading method can be classified according to actual use, and the specific grading method is not limited.

监控终端1根据报警规则,当多个报警产生时,根据配置的报警级别,优先发出报警级别高的报警;According to the alarm rules, when multiple alarms are generated, the monitoring terminal 1 will give priority to alarms with higher alarm levels according to the configured alarm levels;

或在同一时间段获取多个报警,根据配置的报警级别,优先发出报警级别高的报警;Or get multiple alarms in the same time period, according to the configured alarm level, give priority to the alarm with higher alarm level;

当多个报警产生时,级别低的报警缓存预设时间后,发出;When multiple alarms are generated, the low-level alarms are cached for a preset time and sent out;

当某节点3在预设时间段内多次报警,则认为是高频次报警,在经过预设时间段后发出高频次报警提示。When a certain node 3 alarms multiple times within the preset time period, it is considered as a high-frequency alarm, and a high-frequency alarm prompt is issued after the preset time period.

本实施例中,监控终端1筛选报警关键字,将属于同一时间区间内,同一报警关键字合并,统计报警发生的频度,按照时间区间统计出报警发生的频度,展示给维护人员。In this embodiment, the monitoring terminal 1 screens the alarm keywords, combines the same alarm keywords belonging to the same time interval, and counts the frequency of alarm occurrences, counts the frequency of alarm occurrences according to the time interval, and displays them to maintenance personnel.

本实施例中,步骤监控终端1通过IPMI协议访问数据中心2的各个节点3还包括:In the present embodiment, the step monitoring terminal 1 accessing each node 3 of the data center 2 through the IPMI protocol also includes:

监控终端1将配置后的报警信息,报警信息类型,报警屏蔽控制信息进行封装,形成封装信息发送至数据中心2的各个节点3The monitoring terminal 1 encapsulates the configured alarm information, alarm information type, and alarm shielding control information, and sends the encapsulated information to each node 3 of the data center 2

封装信息包括:配置报警信息MAC地址层,配置报警信息数据处理终端IP层,配置报警信息数据帧段;Encapsulation information includes: configure alarm information MAC address layer, configure alarm information data processing terminal IP layer, configure alarm information data frame segment;

配置报警信息数据帧段中当该帧段的最低位为0时表示请求/回复响应的组件的地址码,当最低位为1时表示请求/回复响应的软件ID;该字节的高7位表示具体的地址码和软件ID;In the configuration alarm information data frame segment, when the lowest bit of the frame segment is 0, it indicates the address code of the request/reply response component; when the lowest bit is 1, it indicates the software ID of the request/reply response; the upper 7 bits of the byte Indicates the specific address code and software ID;

在配置报警信息数据帧段中设有奇偶信息编码,当奇偶信息编码为偶数时表示此消息是一条请求,当奇偶信息编码为奇数时表示此消息是一条响应;当消息是一条基本的控制请求或状态响应时该字节为00h和01h;The parity information code is set in the configuration alarm information data frame segment. When the parity information code is an even number, it means that the message is a request. When the parity information code is an odd number, it means that the message is a response. When the message is a basic control request Or the byte is 00h and 01h in status response;

在配置报警信息数据帧段中还设有请求者自行生成的一序列号,当监控终端1需要发出多条请求时用以标识不同的请求;序列号是响应/接收该消息的组件的子组件号或子地址号。In the configuration alarm information data frame segment, there is also a serial number generated by the requester itself, which is used to identify different requests when the monitoring terminal 1 needs to send multiple requests; the serial number is a subcomponent of the component that responds/receives the message number or sub-address number.

对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention will not be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (8)

1. a kind of method by IPMI remote monitoring data Centroid health status, including:Monitor terminal, its feature exist In method includes:
Monitor terminal is remotely accessed in same local area network with data center, makes monitor terminal node progress each with data center Communication connection;
Monitor terminal accesses each node of data center by IPMI protocol;
Monitor terminal obtains temperature information, fan information, information of voltage, network interface card working condition and the operating system of each node Working condition;
Monitor terminal by the information of acquisition compared with corresponding threshold value, when in the data message of acquisition, having beyond threshold value, Alert in the data center, prompting maintenance personnel.
2. the method according to claim 1 by IPMI remote monitoring data Centroid health status, its feature exists In,
Method includes:
Monitor terminal often passes through a prefixed time interval, to the temperature information of each node, fan information, information of voltage, network interface card Working condition and operating system working condition are monitored.
3. the method according to claim 1 by IPMI remote monitoring data Centroid health status, its feature exists In,
Monitor terminal parses the data message of reception, by and parse after data message by corresponding alarm parameter into Data more than threshold value are formed corresponding data message, in the data center alert, prompting maintenance people by row classification Member, while warning message is formed into alarm log;
The data message got and warning message are presented to maintenance personnel by monitor terminal, while receive maintenance personnel to prison The control instruction of control terminal;
Monitor terminal provides unified CUI interfaces, for inquiry of the maintenance personnel to history log, warning message and browse and Alarm parameters are configured.
4. the method according to claim 1 by IPMI remote monitoring data Centroid health status, its feature exists In,
Temperature information includes cpu temperature, mainboard temperature, backboard temperature, hot plug module temperature;
Information of voltage includes CPU voltages, mainboard voltage, SCSI backboard voltages, hot plug module voltage
Fan information includes cpu fan, mainboard fan, backboard fan, hot plug module fan.
5. the method according to claim 1 by IPMI remote monitoring data Centroid health status, its feature exists In,
Monitor terminal has I2C equipment interfaces, and each node is provided with I2C Slave interfaces;
The I2C Slave interfaces of each node of clockwise initiate communication, each node when monitor terminal is produced by I2C equipment interfaces Responded by I2C Slave interfaces
The I2C equipment interfaces of monitor terminal transmit data with the I2C Slave interfaces of each node by IPMI protocol.
6. the method according to claim 1 by IPMI remote monitoring data Centroid health status, its feature exists In,
Each node that step monitor terminal accesses data center by IPMI protocol further includes:
Monitor terminal configures warning message, including:Nodename, node device title, node alert event title, section Point alarm description, node alarm trigger value;
Monitor terminal configures warning message type, including:Power supply and distribution of alarming class, alarm environmental classes, alarm security type;
Monitor terminal configuration alarm shielding control, node alarm shielding, node device alarm shielding and the shielding of node alert event, Alarmed by configuring shielding mode and shielding period come masked nodes, configuration releases shielding control;
Monitor terminal configures alert levels, configuration node rank;
Configuration node CPU, hard disk, the alarm of mainboard is advanced alarm, and configuration node temperature alarming is alarmed for middle rank, configuration node Software class alarm is rudimentary alarm;
Configuration node power supply and distribution class is advanced alarm, and the alarm of configuration node environmental classes is middle rank alarm, and configuration node security type is Rudimentary alarm;
Monitor terminal, when multiple alarms produce, according to the alert levels of configuration, preferentially sends alert levels according to alarm rule High alarm;
Or multiple alarms are obtained in the same period, according to the alert levels of configuration, preferentially send the high alarm of alert levels;
When multiple alarms produce, after the low alarm caching preset time of rank, send;
When certain node is repeatedly alarmed in preset time period, then it is assumed that be high frequency time alarm, sent after preset time period High frequency time alarm.
7. the method according to claim 1 by IPMI remote monitoring data Centroid health status, its feature exists In,
Monitor terminal screening alarm keyword, will belong in same time interval, and Same Alarm keyword merges, statistics alarm hair Raw frequency, the frequency of alarm generation is counted according to time interval, shows maintenance personnel.
8. the method according to claim 6 by IPMI remote monitoring data Centroid health status, its feature exists In,
Each node that step monitor terminal accesses data center by IPMI protocol further includes:
Monitor terminal will be packaged with the warning message postponed, warning message type, alarm shielding control information, form encapsulation Information is sent to each node of data center
Packaging information includes:Warning message MAC Address layer is configured, configures IP layers of warning message data processing terminal, configuration alarm Information data frame section;
Configure the address for the component for representing request/reply response in warning message data frame section when the lowest order of the frame section is 0 Code, the software I D of request/reply response is represented when lowest order is 1;High 7 of the byte represent specific address code and software ID;
Parity information is equipped with warning message data frame section is configured to encode, this message is represented when parity information is encoded to even number It is a request, it is a response that this message is represented when parity information is encoded to odd number;When message is a basic control The byte is 00h and 01h when request or condition responsive;
The sequence number that requestor voluntarily generates is additionally provided with warning message data frame section is configured, when monitor terminal needs to send Identifying different requests during a plurality of request;Sequence number is in response to/receives the subgroup piece number or subaddressing of the component of the message Number.
CN201711240748.9A 2017-11-30 2017-11-30 A method for remotely monitoring the health status of data center nodes through IPMI Pending CN107995287A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711240748.9A CN107995287A (en) 2017-11-30 2017-11-30 A method for remotely monitoring the health status of data center nodes through IPMI

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711240748.9A CN107995287A (en) 2017-11-30 2017-11-30 A method for remotely monitoring the health status of data center nodes through IPMI

Publications (1)

Publication Number Publication Date
CN107995287A true CN107995287A (en) 2018-05-04

Family

ID=62034738

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711240748.9A Pending CN107995287A (en) 2017-11-30 2017-11-30 A method for remotely monitoring the health status of data center nodes through IPMI

Country Status (1)

Country Link
CN (1) CN107995287A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647130A (en) * 2018-05-28 2018-10-12 比特大陆科技有限公司 A kind of localization method, alarm method and the relevant device and system of failure mine machine
CN108920315A (en) * 2018-06-29 2018-11-30 郑州云海信息技术有限公司 A kind of querying method of network interface card information, device, system and readable storage medium storing program for executing
CN108984466A (en) * 2018-06-29 2018-12-11 深圳市同泰怡信息技术有限公司 The exchange method of BMC and server OS, system
CN109634397A (en) * 2018-12-07 2019-04-16 郑州云海信息技术有限公司 A kind of system and method for realizing intelligent network adapter or more Electricity Functional
CN114072770A (en) * 2019-07-23 2022-02-18 核心科学公司 Automatic repair of computing equipment in data centers
CN114371751A (en) * 2022-02-15 2022-04-19 联泰集群(北京)科技有限责任公司 Circuit board card and monitoring system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140195669A1 (en) * 2013-01-08 2014-07-10 American Megatrends, Inc. Emulated communication between master management instance and assisting management instances on baseboard management controller
CN104079434A (en) * 2014-07-07 2014-10-01 用友软件股份有限公司 Device and method for managing physical devices in cloud computing system
CN105791033A (en) * 2016-05-09 2016-07-20 浪潮电子信息产业股份有限公司 Method, device and system for regulating operating state of server
CN106603343A (en) * 2017-01-11 2017-04-26 郑州云海信息技术有限公司 A method for testing stability of servers in batch
CN106874162A (en) * 2017-02-23 2017-06-20 郑州云海信息技术有限公司 A kind of monitoring management pressure test integration method based on IPMI services

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140195669A1 (en) * 2013-01-08 2014-07-10 American Megatrends, Inc. Emulated communication between master management instance and assisting management instances on baseboard management controller
CN104079434A (en) * 2014-07-07 2014-10-01 用友软件股份有限公司 Device and method for managing physical devices in cloud computing system
CN105791033A (en) * 2016-05-09 2016-07-20 浪潮电子信息产业股份有限公司 Method, device and system for regulating operating state of server
CN106603343A (en) * 2017-01-11 2017-04-26 郑州云海信息技术有限公司 A method for testing stability of servers in batch
CN106874162A (en) * 2017-02-23 2017-06-20 郑州云海信息技术有限公司 A kind of monitoring management pressure test integration method based on IPMI services

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647130A (en) * 2018-05-28 2018-10-12 比特大陆科技有限公司 A kind of localization method, alarm method and the relevant device and system of failure mine machine
CN108920315A (en) * 2018-06-29 2018-11-30 郑州云海信息技术有限公司 A kind of querying method of network interface card information, device, system and readable storage medium storing program for executing
CN108984466A (en) * 2018-06-29 2018-12-11 深圳市同泰怡信息技术有限公司 The exchange method of BMC and server OS, system
CN109634397A (en) * 2018-12-07 2019-04-16 郑州云海信息技术有限公司 A kind of system and method for realizing intelligent network adapter or more Electricity Functional
CN114072770A (en) * 2019-07-23 2022-02-18 核心科学公司 Automatic repair of computing equipment in data centers
CN114371751A (en) * 2022-02-15 2022-04-19 联泰集群(北京)科技有限责任公司 Circuit board card and monitoring system

Similar Documents

Publication Publication Date Title
CN107995287A (en) A method for remotely monitoring the health status of data center nodes through IPMI
CN106657387A (en) Intelligent centralized air-traffic-control automation monitoring system
CN105282772B (en) Wireless network datacom device monitoring system and apparatus monitoring method
CN103152352B (en) A kind of perfect information security forensics monitor method based on cloud computing environment and system
CN103716173B (en) A kind of method for storing monitoring system and monitoring alarm issue
US20150127814A1 (en) Monitoring Server Method
CN107612748B (en) Multi-node server power consumption management system
CN107124315B (en) Multi-server monitoring system and monitoring method based on SNMP and IPMI protocol
CN107104840A (en) A kind of daily record monitoring method, apparatus and system
CN105335271A (en) State monitoring apparatus and comprehensive monitoring system and method
CN106357469B (en) A kind of dynamic adjusting method and device of monitoring resource mode
CN112631866A (en) Server hardware state monitoring method and device, electronic equipment and medium
CN111488258A (en) A system for software and hardware running state analysis and early warning
US11652831B2 (en) Process health information to determine whether an anomaly occurred
CN114362994B (en) Safety risk identification method for operation behavior of multi-layer heterogeneous granularity intelligent aggregation railway system
CN106506248B (en) A server intelligent monitoring system
CN108170702A (en) A kind of power communication alarm association model based on statistical analysis
CN107632907A (en) A kind of BMC chip mandatory system and its control method
CN106713281A (en) Monitoring system
CN103647662A (en) Fault monitoring alarm method and apparatus
CN116436821A (en) Operation and maintenance management software system based on artificial intelligent computing platform
CN119449433A (en) POE-driven multi-dimensional security monitoring and protection system for IoT devices
CN104618461A (en) Mobile code cloud mobile phone-based server monitoring method
CN114172693A (en) An industrial control network server illegal external connection detection device and detection method
CN108924095A (en) A kind of government website security monitoring alarm platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180504

RJ01 Rejection of invention patent application after publication
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载