WO2018131100A1 - Système de gestion pour gérer un système de calculateur - Google Patents
Système de gestion pour gérer un système de calculateur Download PDFInfo
- Publication number
- WO2018131100A1 WO2018131100A1 PCT/JP2017/000683 JP2017000683W WO2018131100A1 WO 2018131100 A1 WO2018131100 A1 WO 2018131100A1 JP 2017000683 W JP2017000683 W JP 2017000683W WO 2018131100 A1 WO2018131100 A1 WO 2018131100A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- resource
- resources
- information
- performance change
- change information
- Prior art date
Links
- 238000006243 chemical reaction Methods 0.000 claims abstract description 85
- 238000000034 method Methods 0.000 claims abstract description 58
- 238000012545 processing Methods 0.000 claims abstract description 31
- 238000005314 correlation function Methods 0.000 claims description 45
- 230000015654 memory Effects 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 9
- 230000009466 transformation Effects 0.000 claims 2
- 238000011426 transformation method Methods 0.000 claims 2
- 238000004891 communication Methods 0.000 description 9
- 238000012217 deletion Methods 0.000 description 7
- 230000037430 deletion Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 3
- 239000008186 active pharmaceutical agent Substances 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
Definitions
- the present invention generally relates to management of a computer system having a plurality of resources.
- the “performance history” is typically a time-series change of a metric value (performance value).
- performance value a metric value
- a “problem resource” is a resource in which a problem such as an abnormality (for example, a metric value exceeds a predetermined threshold) has occurred.
- a bottleneck resource is a resource that is a bottleneck (for example, root cause) of a problem that has occurred.
- a computer system usually includes two or more resources having different resource attributes.
- “resource attribute” means at least one of a resource type and a metric (performance type). For this reason, “resource attributes are different” means that the resource type is different and the metric is different.
- Patent Document 1 discloses an operation management apparatus.
- the operation management device includes a performance item or a managed device as an element, and at least a first performance series information indicating a time series change of the performance information related to the first element and a time series change of the performance information related to the second element.
- a correlation function with the second performance sequence information shown is derived.
- performance change information that is, time-series changes of measured metric values (actually measured values) are essential for deriving a correlation function. For this reason, even if a problem resource occurs within a short period of time after the operation of the computer system is started, the amount of performance change information that can derive an appropriate correlation function is not accumulated. It is difficult to find a bottleneck resource corresponding to. This problem is particularly great when a correlation function is derived using performance change information including a metric value that is a problem. This is because, in general, it is desirable that the computer system is always normal, and therefore it is unlikely that the metric value in question is obtained during the operation of the computer system.
- the configuration change of the computer system is appropriately performed.
- the configuration change of the computer system includes, for example, addition, change or deletion of a logical resource (for example, addition, change or deletion of a virtual machine, addition, change or deletion of a volume), addition, change or deletion of a physical resource ( For example, physical server addition, change or deletion) and path change between resources (for example, addition, change or deletion of resources intervening between resources).
- a logical resource for example, addition, change or deletion of a virtual machine, addition, change or deletion of a volume
- a physical resource for example, physical server addition, change or deletion
- path change between resources for example, addition, change or deletion of resources intervening between resources.
- Patent Document 1 As described above, resource performance change information is required to derive a correlation function. Therefore, the configuration change of the computer system is equivalent to the start of a new operation of the computer system in Patent Document 1, and therefore the above-described problem exists also about the configuration change of the computer system.
- Patent Document 1 it is necessary to re-derived a correlation function between resources for changing the configuration of a computer system. This is because in Patent Document 1, the correlation function is derived between individual resources, and when the configuration of the computer system is changed, as described above, a relationship between new resources arises or the same resource This is because the relationship may change even between the two, and as a result, the appropriate correlation function may become inappropriate.
- Such a problem may be present in at least one of other cases in which the relationship between resources is specified and cases where a conversion method other than the correlation function is adopted.
- the management system for each of one or more first resources of a computer system having a plurality of resources, actual performance change information according to the resource attribute of the first resource, of the second resource of the computer system Convert to estimated performance change information according to resource attributes.
- the plurality of resources include two or more resources having different resource attributes.
- the resource attribute is at least one of a resource type and a metric.
- the performance change information is information representing a time-series change in the metric value of the resource.
- the actual performance change information of the first resource is information representing a time-series change of the metric value measured for the first resource.
- the estimated performance change information of the second resource includes the resource attribute of the first resource and the second of the plurality of conversion methods represented by the management information prepared in advance.
- the actual performance change information of the first resource is the performance change information converted by using one or more conversion methods corresponding to the resource attribute of the resource.
- Each of the plurality of conversion methods is a method for converting performance change information according to a resource attribute into performance change information according to another resource attribute.
- Each of the one or more first resources is a resource related to the second resource in the topology of the computer system.
- the management system determines the difference between the first resource and the second resource according to the difference between the estimated performance change information of the second resource and the actual performance change information of the second resource.
- the actual performance change information of the second resource is information representing a time-series change of the metric value measured for the second resource.
- the management system displays processing result information that is information based on the relationship calculated for each of the one or more first resources.
- the processing result information includes information on at least one of the one or more first resources.
- the estimated performance change information of the second resource is acquired based on the actual performance change information of the first resource, and the estimated performance change information and the second resource It can be expected to specify the relationship between resources based on actual performance change information.
- FIG. 1 shows configurations of a computer system and a management system according to an embodiment.
- An example of a part of the topology (resource topology) of a computer system is shown.
- An example of a resource table is shown.
- An example of a related resource table is shown.
- An example of a correlation function related table is shown.
- An example of a correlation function definition table is shown.
- An example of a performance table is shown.
- the flow and summary of a bottleneck candidate display process are shown.
- the flow and outline of a route list generation process (S802 in FIG. 8) are shown.
- the flow and outline of a bottleneck determination process (S805 in FIG. 8) are shown.
- the flow and outline of a resource type list generation process (S1001 in FIG. 10) are shown.
- the flow and outline of problem resource performance history estimation processing (S1002 in FIG. 10) are shown.
- An outline of one embodiment is shown.
- each table may be described using the expression “abc table”, but the information may be expressed using a data configuration other than the table.
- At least one of the “abc tables” can be referred to as “abc information” to indicate that it does not depend on the data configuration.
- the configuration of each table is an example, and one table may be divided into two or more tables, or all or part of the two or more tables may be a single table. Good.
- the “interface unit” includes one or more interfaces.
- the one or more interfaces may be one or more similar interface devices (for example, one or more NIC (Network Interface Card)) or two or more different interface devices (for example, NIC and HBA (Host Bus Adapter)). There may be.
- NIC Network Interface Card
- HBA Home Bus Adapter
- the “storage unit” includes one or more memories.
- the at least one memory may be a volatile memory or a non-volatile memory.
- the storage unit is mainly used during processing by the processor unit.
- the “processor unit” includes one or more processors.
- the at least one processor is typically a microprocessor such as a CPU (Central Processing Unit).
- Each of the one or more processors may be a single core or a multi-core.
- the processor may include a hardware circuit that performs part or all of the processing.
- the process may be described with “program” as the subject, but the program is executed by a processor (for example, a CPU (Central Processing Unit)) so that a predetermined process can be appropriately performed. Since the processing is performed using a storage unit (for example, a memory) and / or a communication interface device (for example, a communication port), the subject of processing may be a processor.
- the processing described with the program as the subject may be processing performed by a processor or an apparatus having the processor.
- the processor may include a hardware circuit that performs part or all of the processing.
- the program may be installed in a computer-like device from a program source.
- the program source may be, for example, a recording medium (for example, a non-transitory recording medium) that can be read by a program distribution server or a computer.
- a recording medium for example, a non-transitory recording medium
- two or more programs may be realized as one program, or one program may be realized as two or more programs.
- the management system may be composed of one or more computers.
- the management computer displays information (specifically, the management computer displays information on its own display device, or the management computer transmits display information to a remote display computer)
- the management computer is the management system.
- the plurality of computers may include a display computer when the display computer performs display
- Input of information to the computer and output of information from the computer may be performed by an input / output device included in the computer. Examples of the input / output device include a display device, a keyboard, and a pointing device, but another device may be employed instead of or in addition to at least one of them.
- a serial interface device or an Ethernet interface device (Ethernet is a registered trademark) is adopted, and a display computer having a display device, a keyboard, and a pointer device is connected to such an interface device.
- the information may be output (for example, displayed) and input by the computer transmitting the display information to the display computer or the computer receiving the input information from the display computer.
- the management server 557 is a management computer
- the management client 555 is a display computer.
- resource means a component of a computer system. Specifically, each of a plurality of devices constituting each computer system and each of a plurality of components included in each device. It is a generic name.
- a device there are a physical device (for example, a network switch) and a logical device (for example, a virtual machine).
- a component there are a physical component (for example, a microprocessor) and a logical component (for example, an LDEV (logical volume)). That is, there are physical resources and logical resources as resources.
- the physical resource is, for example, a physical CPU and a physical memory.
- the logical resource is a resource corresponding to at least one of a resource to which at least a part of one or more physical resources is allocated and a resource using at least a part of the one or more physical resources.
- the logical resource is, for example, APP, logical volume, VM (Virtual Machine), or the like.
- a “related resource” (resource related to a resource) of a resource is a resource linked directly or indirectly to the resource.
- a related resource is “directly” linked to a resource, no other resource is interposed between the resource and the related resource.
- a related resource is “indirectly” linked to a resource, one or more other resources are interposed between the resource and the related resource.
- a related resource higher than the resource can be referred to as an “upper related resource”, and a lower related resource than the resource can be referred to as a “lower related resource”.
- a related resource directly linked to a resource among upper related resources can be referred to as a “parent resource”, and a related resource directly linked to the resource among lower related resources can be referred to as a “child resource”.
- FC Fibre Channel
- a “node” is a resource as an element in a resource topology (tree structure).
- an “edge” is a link between nodes. Thus, for example, if three resources are in series, there are three nodes and two edges.
- a name is used as resource identification information, but other types of identification information may be used instead of or in addition to the name.
- FIG. 13 shows an outline of one embodiment.
- the correlation function is an example of a conversion method for converting a performance history according to a resource attribute of a resource into a performance history according to the resource attribute of another resource.
- “Performance history” is an example of performance change information, and is a metric value history representing a time-series change of metric values.
- the “resource attribute” is a resource type and a metric.
- Correlation functions are prepared not for individual resources but for resource attributes. For this reason, the number of correlation functions to be prepared in advance (the amount of information in a correlation function related table described later) can be suppressed, and the versatility is high. For example, there are VMs (virtual machines) and ports between resources, but the resource attributes of all VMs are the same and the resource attributes of all ports are the same. In this case, the greater the number of VMs and the number of ports, the greater the number between resources, but the number between resource attributes is always 1. Further, when at least one of VM and port is added, changed or deleted, the resource is also added, changed or deleted, but the resource attribute is not changed, and therefore the addition, change or deletion of the correlation function may be unnecessary. .
- the correlation function can be defined in advance based on know-how (for example, design information and operation knowledge) according to the resource attribute of the resource. Even if the elapsed time from the start of operation or configuration change of the computer system is short (even if the amount of actual performance history of each resource is small), there is an appropriate correlation function, so the resource attribute of the related resource and the problem resource.
- the actual performance history of the related resource can be converted into the estimated performance history of the problem resource using one or more correlation functions corresponding to the resource attribute.
- the relationship (correlation coefficient) between the related resource and the problem resource can be calculated using the estimated performance history of the problem resource and the actual performance history of the problem resource.
- FIG. 1 shows a configuration of a computer system and a management system according to an embodiment.
- the computer system 100 includes one or more hosts 553 and one or more storage systems 551 connected to the one or more hosts 553.
- a host 553 is connected to the storage system 551 via a communication network 521 (for example, a SAN (Storage Area Network) or a LAN (Local Area Network)).
- a communication network 521 for example, a SAN (Storage Area Network) or a LAN (Local Area Network)
- the storage system 551 has a physical storage device group 563 and a controller 561 connected to the physical storage device group 563.
- the physical storage device group 563 has one or more PG (Parity Group).
- the PG may be called a RAID (Redundant Array of Independent (or Inexpensive) Disks) group.
- the PG is composed of a plurality of physical storage devices, and stores data according to a predetermined RAID level.
- the physical storage device is, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive).
- the storage system 551 has a plurality of logical volumes. As the logical volume, there is a substantive logical volume (real volume) 565 based on PG and a virtual logical volume (virtual volume) 567 according to thin provisioning or storage virtualization technology.
- One storage system 551 does not necessarily have a plurality of types of logical volumes. For example, the storage system 551 may have only the real volume 565 as a logical volume.
- a storage area is allocated from the pool to the virtual volume according to thin provisioning.
- the pool is a storage area group based on one or more physical storage devices (for example, PG), and may be a set of one or more logical volumes, for example.
- the pool may be a pool in which a difference between the original logical volume and its snapshot is stored instead of a pool having a storage area allocated to a virtual volume according to thin provisioning.
- the controller 561 includes a plurality of devices, for example, a port, an MPB (a blade (circuit board) having one or a plurality of microprocessors (MP)), and a cache memory.
- the port receives an I / O (Input / Output) command (write command or read command) from the host 553, and the MP included in the MPB controls I / O of data according to the I / O command.
- the MP specifies an I / O destination logical volume from the received I / O command, and performs data I / O on the specified logical volume. Data that is I / O to the logical volume is temporarily stored in the cache memory.
- the host 553 may be a physical machine (physical computer) or a virtual machine (VM).
- One or more application programs (APP) 552 are executed on the host 553.
- APP application programs
- an I / O command specifying a logical volume is transmitted from the host 553 to the storage system 551.
- the computer system 100 has a plurality of hierarchical resources.
- the plurality of resources include resources of two or more resource types such as APP 552, host 553, storage system 551, controller 561, port, MPB, cache memory, logical volume, and PG.
- a plurality of resources in the same layer may be grouped to define a higher layer resource than that layer resource.
- the “resource” may be a substantial resource (either a logical resource or a physical resource) such as an APP or a logical volume, and a virtual resource that is a group of a plurality of substantial resources.
- the management system includes a management server 557 and one or more management clients 555 connected to the management server 557.
- a management client 555 is connected to the management server 557 via a communication network (for example, LAN, WAN (World Area Network) or the Internet) 521.
- a communication network for example, LAN, WAN (World Area Network) or the Internet
- the management client 555 includes an input device 501, a display device 502, a storage device (for example, memory) 505, a communication interface device (hereinafter referred to as I / F) 507, and a processor (for example, a CPU (Central Processing Unit)) connected thereto. 503.
- the input device 501 is, for example, a pointing device and a keyboard.
- the display device 502 is a device having a physical screen on which information is displayed, for example. A touch screen in which the input device 501 and the display device 502 are integrated may be employed.
- the I / F 507 is connected to the communication network 521, and the management client 555 can communicate with the management server 557 via the I / F 507. Note that some or all of the communication network 521 and the network connecting the host 553 and the storage system 551 may be common.
- the storage unit 505 includes, for example, at least a main storage device (typically a memory) of a main storage device and an auxiliary storage device.
- the storage unit 505 can store a computer program executed by the processor 503 and information used by the processor 503.
- the storage unit 505 stores a Web browser 511 and a management client program 513.
- the management client program 513 may be RIA (Rich Internet Application).
- the management client program is a program file, which may be downloaded from the management server 557 (or another computer) and stored in the storage unit 505.
- the management server 557 includes a storage unit 535, an I / F 537, and a processor (for example, a CPU (Central Processing Unit)) 533 connected thereto.
- the processor 533 is an example of a processor unit.
- the I / F 537 is connected to the communication network 521, and the management server 557 can communicate with the management client 555 via the I / F 537.
- the management server 557 can receive an instruction in accordance with a user operation via the I / F 537 and can draw a display object in the layout area. For this reason, the I / F 537 is an example of an I / O interface device.
- the “layout area” here is an area where a display object can be drawn.
- the entire or partial range of the layout area is a display range in a frame (for example, a window) displayed by the Web browser 511 (or the management client program 513).
- a display image (including a display object) in the frame of the layout area in which the display object is drawn can be referred to as a display screen or a GUI screen.
- drawing an object in the layout area is substantially an example of displaying the object.
- the storage unit 535 includes, for example, at least a main storage device (typically a memory) of a main storage device and an auxiliary storage device.
- the storage unit 535 can store a computer program executed by the processor 533 and information used by the processor 533.
- the storage unit 535 stores a management server program 541 and a management table group 543.
- the management table group 543 includes a hierarchical relationship (configuration information) of a plurality of resources included in the computer system, failure information of each resource, and the like. Information of at least a part of the management table group 543 may be collected by the management server program 541 or may be acquired by accessing another management system that holds the information.
- the management server program 541 receives an instruction according to a user operation from the management client 555, and transmits information drawn in the layout area to the management client 555.
- the GUI screen display corresponding to the user operation is realized by the cooperation processing of the management server program 541, the Web browser 511 (or the client RIA execution environment), and the management client program 513.
- the management server program 541 creates a screen, provides the display information for the created screen to the management client program 513, and the management client program 513 may display the screen based on the display information.
- a part of the creation processing (for example, drawing processing) may be offloaded from the management server program 541 to the management client program 513. Examples of cooperation include the following. For simplification of explanation, a case where (cooperation example 2) is adopted in the present embodiment will be described, but it is needless to say that it can also be applied to cooperation example 1.
- the management server program 541 transmits at least part of the information included in the management table group 543 to the Web browser 511 (or the management client program 513), which is transmitted to the Web browser 511 (or the management client program 513). ) Is stored in the storage unit 505 as temporary information.
- the web browser 511 (or the management client program 513) draws a display object in the layout area based on the instruction according to the user operation and the temporary information (for example, newly draws, enlarges or reduces the display object).
- the management server program 541 receives an instruction according to a user operation on the display screen from the Web browser 511 (or the management client program 513), and displays display object display information based on the instruction and the management table group 543.
- the Web browser 511 (or the management client program 513) receives the display information and draws the display object in the layout area according to the display information. In short, the management server program 541 draws a display object in the layout area.
- the Web browser 511 (or the management client program 513) transmits an instruction according to the user operation to the management server program 541.
- FIG. 2 shows an example of a part of the topology (resource topology) of the computer system 100.
- VM is a virtual machine.
- HV is a hypervisor that controls one or more virtual machines and is executed on the host.
- CPU is a physical storage device.
- DS is a data store recognized as a storage device by the hypervisor.
- the resource belonging to the layer “SAN” is “FC-Switch” (FC (Fibre Channel) switch in SAN).
- the resource belonging to the layer “Storage” is “Storage” (storage system).
- “Port” is a communication port that is connected to the FC switch and receives an I / O command from the VM.
- LDEV is a logical volume (real volume or virtual volume).
- MP is a microprocessor.
- “Pool” is a storage area including a real area allocated to a virtual volume according to thin provisioning.
- “PG” is a parity group.
- “Cache” is a cache memory in which data input / output to / from a logical volume is temporarily stored.
- the topology configuration as shown in FIG. 2 is a configuration specified from the configuration information represented by the management table group 543.
- One or more resource types may belong to one layer.
- One group may be composed of two or more resources of the same resource type. In this case, there are a plurality of different groups for one resource type, and one or more resources of the resource type exist for each group. May be. That is, the “layer” is an aggregation of different resource types, and the “group” is an aggregation of different resources with the same resource type. At least one of the layer and the group may be defined by the user.
- FIG. 3 shows an example of a resource table.
- the resource table 400 has information on resources.
- the resource table 400 has a record for each resource, for example.
- Each record holds information such as a resource name (resource name) and a resource type name (resource type name).
- FIG. 4 shows an example of a related resource table.
- the related resource table 500 represents the relationship between resources.
- the related resource table 500 has a record for each resource, and each record holds information such as a resource name and a child resource name (name of a resource child resource).
- the management server program 541 can specify the related resource of the selected resource from the related resource table 500 using the resource name of the selected resource.
- the management server program 541 can specify the lower related resource from the record specified from the related resource table 500 based on the record having the resource name of the selected resource.
- the management server program 541 can specify the upper related resource from the record specified from the related resource table 500 based on the record having the resource name of the selected resource as the child resource name.
- Each record of the related resource table 500 may hold a parent resource name instead of or in addition to the child resource name.
- FIG. 5 shows an example of the correlation function relation table.
- the correlation function relation table 550 holds information regarding the resource attribute of the conversion source (before conversion) and the resource attribute of the conversion destination (after conversion) for each correlation function.
- the correlation function relation table 550 has a record for each correlation function, and each record includes a function name (correlation function name), a conversion source attribute (information indicating the resource attribute of the conversion source), and a conversion destination attribute (conversion). Information such as information representing the previous resource attribute).
- the conversion source attribute is a conversion source resource type name (name of the conversion source resource type) and a conversion source metric name (name of the conversion source metric).
- the conversion destination attributes are a conversion destination resource type name (the name of the conversion destination resource type) and a conversion destination metric name (the name of the conversion destination metric).
- FIG. 6 shows an example of a correlation function definition table.
- the correlation function definition table 600 has information on the definition of the correlation function for each correlation function.
- the correlation function definition table 600 has a record for each correlation function, for example. Each record holds information such as the function name and function details (eg, the correlation function itself and variables used in the correlation function).
- FIG. 7 shows an example of the performance table.
- the performance table is a table having metric data collected for resources.
- Each record of the performance table has information included in one metric data.
- each record includes a resource name (a resource name corresponding to the collected metric value), a metric name (a metric name for the collected metric value), a time (a metric value collection time), Information such as a metric value is held.
- the time is expressed in year / month / day / hour / minute / second, but the expression is not limited thereto.
- the management server program 541 performs processing for collecting metric data of each of a plurality of resources in the computer system 100 and processing for adding a record including at least part of information of the collected metric data to the performance table 700. It's okay.
- the metric and the resource type correspond to 1: 1 or many: 1. That is, one or more metrics exist for one resource type, but one metric does not correspond to a plurality of resource types.
- the present invention is not limited thereto.
- one metric type may correspond to a plurality of resource types.
- the correlation coefficient (an example of the relationship) between the problem resource and its related resources can be calculated and displayed.
- FIG. 8 shows the flow and outline of the bottleneck candidate display process. This process is started, for example, when a problem resource is detected by the management server program 541.
- the management server program 541 executes a route list generation process (FIG. 9).
- the route list is a list of resource pairs (a combination of a resource name and a child resource name) related to the problem resource.
- the resource name included in the route list is as illustrated in FIG. 8 (that is, “HV4”, “DS3” etc.).
- the resource names of the related resources lower than each of the resources “LDEV15” to “LDEV18” are omitted from the route list, but “Pool31”, “PG58”, and “MP4” Such lower related resource names are also included in the route list.
- the management server program 541 acquires all child resource names from the route list generated in S801, and eliminates any duplicate child resource names from the acquired child resource names.
- the list of remaining child resource names is the related resource list.
- a child resource name constituting the related resource list is referred to as “related resource name”, and a resource represented by “related resource name” is referred to as “related resource”.
- the management server program 541 sets “bottleneck candidate list” as an internal variable. Thereby, a bottleneck candidate list is generated in the bottleneck candidate display process.
- S804 and S805 are executed for all related resources corresponding to the related resource list.
- one related resource is taken as an example (referred to as “target related resource” in the description of FIG. 8).
- the management server program 541 refers to the route list and identifies a route having the problem resource as the start point and the related resource as the end point.
- the management server program 541 executes bottleneck determination processing (FIG. 10) for the route specified in S804. In this process, the bottleneck candidate list is updated.
- the management server program 541 displays a completed bottleneck candidate list.
- the bottleneck candidate list is a list of information (for example, resource names and correlation coefficients) related to related resources for which the calculated correlation coefficient (an example of the relationship) is greater than or equal to a predetermined value among the related resources of the problem resource “VM21”.
- VM21 resource names and correlation coefficients
- FIG. 9 shows the flow and outline of the route list generation process (S802 in FIG. 8).
- the management server program 541 sets “route list” as an internal variable. Thereby, a route list is generated in the route list generation process.
- the management server program 541 sets “resource name” as an internal variable.
- the management server program 541 substitutes the resource name “VM21” of the problem resource as an initial value for “resource name”.
- S904 and S905 are executed for all child resource names acquired directly or indirectly from the related resource table using the resource name of the problem resource as a key.
- one child resource name is taken as an example (referred to as “target child resource name” in the description of FIG. 9).
- the management server program 541 registers a pair of the target child resource name and the corresponding resource name in the route list.
- the management server program 541 assigns the target child resource name to the internal variable “resource name”.
- FIG. 10 shows the flow and outline of the bottleneck determination process (S805 in FIG. 8).
- a single route (a route having the problem resource “VM21” as one end and one related resource “Port3” as the other end) is taken as an example.
- the one route is referred to as “target route”, and the related resource at one end of the target route is referred to as “target related resource”.
- the management server program 541 executes a resource type list generation process (FIG. 11).
- the resource type list includes a resource name and a resource type name for each resource on the target route.
- S1002 to S1006 are executed for all metrics of the target related resource.
- one metric referred to as “target metric” in the description of FIGS. 10 and 12
- target metric is taken as an example.
- the management server program 541 executes problem resource performance history estimation processing (FIG. 12). That is, the management server program 541 converts the actual performance history for the target metric of the target related resource into the estimated performance history of the problem resource.
- the management server program 541 acquires the actual performance history of the problem resource from the performance table using the resource name, metric name, and period of the problem resource as keys.
- the management server program 541 calculates a correlation coefficient between the estimated performance history of the problem resource (performance history acquired in S1002) and the actual performance history of the problem resource (performance history acquired in S1003).
- the management server program 541 determines whether or not the correlation coefficient calculated in S1004 is greater than or equal to a threshold value.
- the management server program 541 registers the resource name of the target related resource and the correlation coefficient calculated in S1004 in the bottleneck candidate list.
- the bottleneck candidate list is displayed in S806 of FIG. 8, but the related resource names (and correlation coefficients) included in the bottleneck candidate list are narrowed down based on the correlation coefficient. Thereby, the visibility of the displayed bottleneck candidate list can be improved.
- the bottleneck candidate list has been updated for the target related resource.
- FIG. 11 shows the flow and outline of the resource type list generation process (S1001 in FIG. 10).
- the management server program 541 sets “resource type list” as an internal variable. Thereby, a resource type list is generated in the resource type list generation process.
- S1102 and S1103 are executed for all resources on the target route.
- one resource is taken as an example in the description of S1102 and S1103 (referred to as “target resource” in the description of FIG. 11).
- the management server program 541 acquires the resource type name from the resource table using the resource name of the target resource as a key.
- the management server program 541 sets the resource name of the target resource and the resource type name acquired in S1102 in the resource type list.
- FIG. 12 shows a flow and an outline of the problem resource performance history estimation process (S1002 in FIG. 10).
- the management server program 541 sets “estimation history” as an internal variable. Thereby, in the problem resource performance history estimation process, an estimated performance history of the problem resource is generated.
- the management server program 541 acquires the actual performance history of the target related resource from the performance table using the metric name and period of the target related resource as a key.
- the management server program 541 substitutes the actual performance history of the target related resource as an initial value for the internal variable “estimation history”.
- S1204 to S1207 are executed for all edges in the target route that is a route from the target related resource to the problem resource.
- the edges are sequentially selected from the target related resource side to the problem resource side.
- one edge is taken as an example (referred to as “target edge” in the description of FIG. 12).
- the management server program 541 acquires the resource type name from the resource type list for each of the conversion source resource and the conversion destination resource using the resource name as a key.
- the “conversion source resource” is a resource on the target related resource side among the resources at both ends of the target edge.
- the “conversion destination resource” is a resource on the problem resource side among the resources at both ends of the target edge.
- the management server program 541 determines the conversion source resource type name (conversion source resource resource type name), conversion source metric name (conversion source resource metric name), conversion destination resource type name (conversion destination resource resource type name). ) And the conversion destination metric name (metric name of the conversion destination resource) as keys, the function name of the correlation function used for estimation is acquired from the correlation function relation table 550.
- the management server program 541 acquires the correlation function from the correlation function definition table 600 using the acquired function name as a key, and the performance history assigned to the internal variable “estimation history” is used as the acquired correlation function. Use to convert to another performance history.
- the management server program 541 substitutes the converted estimation history (another performance history in S1206) into the internal variable “estimation history”.
- edges are sequentially selected from the target related resource side to the problem resource side, and S1204 to S1207 are executed for the selected edge.
- the target route includes four edges, four performance history conversions, that is, (1) the parent resource “FC” from the actual performance history of the target related resource “Port3” Conversion to the estimated performance history of Switch 4 ”, (2) Conversion from the estimated performance history of the resource“ FC Switch 4 ”to the estimated performance history of the parent resource“ DS3 ”, and (3) From the estimated performance history of the resource“ DS3 ” Conversion to the estimated performance history of the parent resource “HV4” and (4) conversion from the estimated performance history of the resource “HV4” to the estimated performance history of the parent resource (problem resource) “VM21” are executed.
- the performance history set in the internal variable “estimation history” when S1204 to S1207 are performed for all the edges is the estimated performance history of the problem resource.
- the performance history conversion may be performed once, that is, the actual performance history of the target related resource may be directly converted to the estimated performance history of the problem resource.
- the following effects can be expected by sequentially performing edge selection and performance history conversion for the selected edge from the target related resource side to the problem resource side. . That is, without preparing a correlation function for all combinations of resource attributes (resource type and metric) (ie, without considering each of one or more resource attributes interposed between resource attributes), in other words, While suppressing the amount of information in the correlation function relation table 550, it can be expected to improve the accuracy of the performance estimation history of the problem resource.
- the present invention can be implemented in various other forms.
- the present invention is not limited to the case of specifying the bottleneck candidate resource of the problem resource, but can be applied to other cases of specifying the relationship between resources. Further, for example, the present invention can be applied to a case where a conversion method other than the correlation function is employed.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
Selon l'invention, le présent système de gestion convertit des informations d'évolution de performances réelles, qui dépendent d'attributs de ressource d'une première ressource, en informations d'évolution de performances estimées, qui dépendent d'attributs de ressource d'une deuxième ressource. Les attributs de ressource sont constitués d'au moins un attribut parmi un type de ressource et une métrique. Les informations d'évolution de performances réelles représentent une évolution en série chronologique d'une valeur de métrique mesurée. Les informations d'évolution de performances estimées sont des informations de variations de performances qui sont converties à partir d'informations d'évolution de performances réelles de la première ressource en utilisant un ou plusieurs procédés de conversion correspondant aux attributs de ressource de la première ressource et aux attributs de ressource de la deuxième ressource parmi une pluralité de procédés de conversion représentés par des informations de gestion préparées à l'avance. La première ressource est liée à une deuxième ressource dans une topologie d'un système de calculateur. Un système de gestion calcule une relation entre la première ressource et la deuxième ressource d'après une différence entre des informations d'évolution de performances estimées de la deuxième ressource et des informations d'évolution de performances réelles de la deuxième ressource. Le système de gestion affiche des informations de résultats de traitement selon la relation calculée par rapport à la première ressource.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2018537551A JP6591689B2 (ja) | 2017-01-11 | 2017-01-11 | 計算機システムを管理する管理システム |
PCT/JP2017/000683 WO2018131100A1 (fr) | 2017-01-11 | 2017-01-11 | Système de gestion pour gérer un système de calculateur |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2017/000683 WO2018131100A1 (fr) | 2017-01-11 | 2017-01-11 | Système de gestion pour gérer un système de calculateur |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018131100A1 true WO2018131100A1 (fr) | 2018-07-19 |
Family
ID=62839638
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2017/000683 WO2018131100A1 (fr) | 2017-01-11 | 2017-01-11 | Système de gestion pour gérer un système de calculateur |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP6591689B2 (fr) |
WO (1) | WO2018131100A1 (fr) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013001609A1 (fr) * | 2011-06-28 | 2013-01-03 | 株式会社日立製作所 | Système de surveillance et procédé de surveillance |
JP2013206321A (ja) * | 2012-03-29 | 2013-10-07 | Fujitsu Ltd | 管理装置、資源管理方法、資源管理プログラム及び情報処理システム |
JP2016197450A (ja) * | 2016-07-25 | 2016-11-24 | 日本電気株式会社 | 運用管理装置、運用管理システム、情報処理方法、及び運用管理プログラム |
-
2017
- 2017-01-11 JP JP2018537551A patent/JP6591689B2/ja active Active
- 2017-01-11 WO PCT/JP2017/000683 patent/WO2018131100A1/fr active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013001609A1 (fr) * | 2011-06-28 | 2013-01-03 | 株式会社日立製作所 | Système de surveillance et procédé de surveillance |
JP2013206321A (ja) * | 2012-03-29 | 2013-10-07 | Fujitsu Ltd | 管理装置、資源管理方法、資源管理プログラム及び情報処理システム |
JP2016197450A (ja) * | 2016-07-25 | 2016-11-24 | 日本電気株式会社 | 運用管理装置、運用管理システム、情報処理方法、及び運用管理プログラム |
Also Published As
Publication number | Publication date |
---|---|
JPWO2018131100A1 (ja) | 2019-01-17 |
JP6591689B2 (ja) | 2019-10-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9864517B2 (en) | Actively responding to data storage traffic | |
WO2016199232A1 (fr) | Ordinateur de gestion de stockage et procédé de gestion pour un dispositif de stockage | |
US10983822B2 (en) | Volume management by virtual machine affiliation auto-detection | |
US11030060B2 (en) | Data validation during data recovery in a log-structured array storage system | |
US9854060B2 (en) | Methods and systems for monitoring network storage system resources by an API server | |
US10084861B2 (en) | Systems and methods for managing resources in networked environment | |
US20170250919A1 (en) | Systems and methods for resource management in a networked environment | |
US20190317787A1 (en) | Rebuilding a virtual infrastructure based on user data | |
US10691337B2 (en) | Artificial intelligence and machine learning systems and methods for a storage system | |
US10019182B2 (en) | Management system and management method of computer system | |
US11500750B2 (en) | Methods and systems for managing networked storage system resources | |
US10313439B2 (en) | Methods and systems for managing resources in a networked storage environment | |
JP6442642B2 (ja) | 計算機システムを管理する管理システム及び管理方法 | |
US20230370086A1 (en) | Partitional data compression | |
JP6591689B2 (ja) | 計算機システムを管理する管理システム | |
US20160004584A1 (en) | Method and computer system to allocate actual memory area from storage pool to virtual volume | |
US10037156B1 (en) | Techniques for converging metrics for file- and block-based VVols | |
US11675539B2 (en) | File system operations for a storage supporting a plurality of submission queues | |
JP6842502B2 (ja) | 障害解析支援システム、障害解析支援方法、及び、コンピュータプログラム | |
US10789139B2 (en) | Method of rebuilding real world storage environment | |
US9983814B1 (en) | Techniques for aggregating metrics for VVols within a storage container | |
US11983147B2 (en) | Deduplicating data integrity checks across systems | |
US12298873B2 (en) | Methods and systems for managing networked storage system resources | |
US20230116173A1 (en) | Console command composition | |
Tate et al. | IBM Real-time Compression in IBM SAN Volume Controller and IBM Storwize V7000 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2018537551 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17891711 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17891711 Country of ref document: EP Kind code of ref document: A1 |