WO2019241199A1 - System and method for predictive maintenance of networked devices - Google Patents
System and method for predictive maintenance of networked devices Download PDFInfo
- Publication number
- WO2019241199A1 WO2019241199A1 PCT/US2019/036478 US2019036478W WO2019241199A1 WO 2019241199 A1 WO2019241199 A1 WO 2019241199A1 US 2019036478 W US2019036478 W US 2019036478W WO 2019241199 A1 WO2019241199 A1 WO 2019241199A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- predictive
- network
- alert
- devices
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
- G06F11/0754—Error or fault detection not based on redundancy by exceeding limits
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0769—Readable error formats, e.g. cross-platform generic formats, human understandable formats
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/32—Monitoring with visual or acoustical indication of the functioning of the machine
- G06F11/324—Display of status information
- G06F11/327—Alarm or error message display
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3452—Performance evaluation by statistical analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
- G06N5/025—Extracting rules from data
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Definitions
- the disclosed technology pertains to a system for predicting and preventing or mitigating errors within a network of devices.
- Edge computing concepts can be implemented within a network to decentralize processing for some tasks by moving those tasks to one or more computers that serve as entry points for data to the network rather than having them simply receive the data and provide to a central node to be processed.
- a grocery store or other retail environment may have several servers within a brick and mortar location that gather data related to transactions occurring at that brick and mortar location.
- this data might be gathered and stored throughout the day on local servers with minimal manipulation, and then transported to a central node such as a cloud computing environment or a data room based upon some schedule during the night.
- Processing tasks at the central node may include more demanding and sophisticated tasks such as analytics, compression, encryption, and machine learning.
- An example of an inefficiency in this process is that the local servers may only use 10-20% of their processing potential on basic data transport and storage tasks throughout the day, with unused processing potential being wasted.
- Another example is a heightened and intermittent bandwidth requirement between the central node and the brick and mortar location.
- Another example is a delay in sophisticated processing tasks such as supply chain forecasting and analytics, which could result in slow reaction to increasing demand and resulting shortages of products or services.
- edge computing While there are numerous advantages provided by edge computing, it further amplifies the problem of preventing and addressing infrastructure errors.
- a technician located at a central node may now be responsible for and reliant on the proper operation of devices at that node, cloud computing devices scattered across the globe, and hundreds or thousands of edge computing devices scattered across their company’s geographical footprint. Detecting problems within these complex networks is often a reactive process, and so problems are frequently not identified until a nightly batch of data fails to arrive. Further compounding this initial delay is the difficulty of a troubleshooting process that may now include phone calls to cloud computing vendor support lines, and phone calls to non technical personnel at brick and mortar locations, in order to describe, identify, and address problems.
- FIG. 1 is a schematic diagram of an exemplary system configured to perform predictive analysis and preventative maintenance on a network
- FIG. 2 is a flowchart of an exemplary set of high-level steps that could be performed to provide predictive analysis and preventative maintenance on the network;
- FIG. 3 is a flowchart of an exemplary set of steps that that could be performed to configure a network component to monitor one or more devices on the network;
- FIG. 4 is a flowchart of an exemplary set of steps that that could be performed to monitor the network for predictive data
- FIG. 5 is a flowchart of an exemplary set of steps that could be performed to address alerts produced during predictive detection
- FIG. 6 is a flowchart of an exemplary set of steps that could be performed to provide notifications based on alerts produced during predictive detection
- FIG. 7 is a simulated screenshot of an exemplary monitor view of an exemplary predictive engine interface
- FIG. 8 is a simulated screenshot of an exemplary device view of the exemplary predictive engine interface
- FIG. 9 is a simulated screenshot of an exemplary site view of the exemplary predictive engine interface.
- the inventors have conceived of novel technology that, for the purpose of illustration, is disclosed herein as applied in the context of proactive and predictive maintenance of a network of devices. While the disclosed applications of the inventors’ technology satisfy a long-felt but unmet need in the art of predictive maintenance of a network of devices, it should be understood that the inventors’ technology is not limited to being implemented in the precise manners set forth herein, but could be implemented in other manners without undue experimentation by those of ordinary skill in the art in light of this disclosure. Accordingly, the examples set forth herein should be understood as being illustrative only, and should not be treated as limiting.
- FIG. 1 shows a schematic diagram of an exemplary system configured to perform predictive analysis and preventative maintenance on a network (100) of devices. While the devices present in a particular network will vary greatly, FIG. 1 shows the network (100) comprising an edge device (106), an edge device (108), a core device (110), and a predictive engine (102).
- the core device (110) may be, for example, one or more cloud hosted virtual machines, a centralized data and processing room with one or more servers, or another computing environment that may offer high performance or scalability for processing and storage.
- the edge device (106) and the edge device (108) may be, for example, a tower server, rack server, point of sale server, or other computing device that may be located remotely from the core device (110), for example, at the edge of the network (100).
- the predictive engine (102) may be a device configured to perform tasks associated with predicting and preventing or mitigating issues present in the network (100).
- the predictive engine (102) may be configured and installed on a host device (101) connected to the network (100).
- the host device (101) may be a dedicated physical or virtual device that is placed on the network (100) to provide the predictive engine (102), or may be a device already present on the network for another purpose. This could include, for example, configuring any one or more of the core device (110), the edge devices (106, 108) or the network optimization device (103) to act as the host device (101).
- Some networks may also have additional devices that are appropriate for configuration as the predictive engine (102).
- some networks may also comprise one or more specialized devices that provide network management and performance enhancing features, such as a network optimization device (103).
- the network optimization device (103) could be configured to manage some or all of a networks available resources for processing, storage, and data traffic, across the entire network including both centralized locations and remote office/branch office (“ROBO”) locations, and causing those resources to process and transfer data in a way that improves efficiency, reliability, and redundancy.
- ROBO remote office/branch office
- the network optimization device (103) may be centrally located and configured to manage network performance across the network (100), or one or more network optimization devices (103) may be placed around data traffic paths within the network (100). For example, in some implementations a network optimization device (103) may be physically or virtually deployed to each remote location or remote branch of a network. In this manner, a network optimization device (103) may monitor and improve the efficiency of a section of the network between the core device (110) and a particular edge device or set of edge devices at that remote location. Examples of a network optimization device (103) could include various proprietary devices configured to optimize one or more aspects of network performance, and include devices offered by Riverbed Technology, Inc. such as the Riverbed SteelHead.
- the predictive engine (102) provides a predictive engine interface (104) that may be accessed and used by end users to configure various aspects of the predictive engine (102), view data relating to the predictive monitoring and maintenance of the network (100) devices, and view information relating to other characteristics of the network (100) devices.
- the predictive engine interface (104) may be provided as one or more of a website accessible by various web browsers and computing devices, or may be a software application configured to be installed on a mobile computing, desktop computing device, or other computing device.
- One example of an implementation of the network (100) could include a brick and mortar retail enterprise that uses a mixture of a physical data room and cloud environments as the core device (110), and the edge device (106) is a store-level server at a location in California, and the edge device (108) is a store-level server at a location in Ohio.
- the predictive engine (102) may be installed on a network optimization device (103) in the physical data room, and may be accessed by technicians responsible for the predictive engine’s (102) performance via laptops or mobile phones configured with the predictive engine interface (104). In this implementation, the predictive engine (102) would predict and identify potential problems in the edge devices (106, 108) relating to the function of the network optimization device (103).
- Another example of an implementation of the network (100) could include a state government that primarily uses a cloud environment as the core device (110), and the edge device (106) is a server located in the state capitol building, and the edge device (108) is a server located in a detention facility.
- the predictive engine (102) may be installed as a virtual machine with access to the cloud environment, and may be accessed by technicians responsible for the predictive engine’s (102) performance via laptops or mobile phones configured with the predictive engine interface (104) via a web browser or other software application.
- other implementations and variations on the network (100), its components, and the configuration and location of predictive engine (102) and predictive engine interface (104) will be apparent to one of ordinary skill in the art in light of this disclosure.
- FIG. 2 that figure shows a flowchart of an exemplary set of high- level steps (200) that could be performed to provide predictive analysis and preventative maintenance on a network (100).
- the steps of FIG. 2, as well as FIGS. 3-6, which describe one or more of the steps of FIG. 2 in more detail, could be performed by or with the predictive engine (102) and predictive engine interface (104), or by or with other computing devices and interfaces in various implementations. These steps include configuring (202) the predictive features and predictive maintenance features on the network (100).
- a component such as the predictive engine (102) may provide (204) predictive detection features associated with one or more devices present on the network (100). This could include gathering information on the core device (110) and the edge devices (106, 108) from the devices themselves, as well as other devices on the network that interact with those devices, and analyzing that data to predict and identify potential problems.
- a component such as the predictive engine (102) may execute (206) preventative maintenance associated with those problems. This could include automatically changing configurations of one or more devices of the network, causing devices to shut down, reboot, or update software, causing power supplies to cycle and force connected devices to hard-reboot, or other functions that may be configured to address particular problems.
- a component such as the predictive engine (102) may provide (208) notifications associated with those problems. This could include updating and displaying information via an interface such as the predictive engine interface (104), providing visual or audible alerts, or providing electronic communications on various platforms such as email, text, and phone.
- FIG. 3 is a flowchart of an exemplary set of steps (210) that that could be performed to configure (200) a network component such as the predictive engine (102) to monitor and provide predictive detection for one or more devices on the network.
- the predictive engine (102) may be configured (212) on a device that is already present on the network (100) such as the core device (100), an edge device (106, 108), or a network optimization device (103) or other specialized device or component of the network (100) that is capable of providing sufficient processing, storage, and network traffic as required by the predictive engine (102). In practice, this could include downloading or otherwise receiving an application or other software and installing the software to the pre-existing device.
- the predictive engine (102) may also be configured (214) on a cloud device or environment that is newly added to the network (100) to provide scalable capabilities as required by the predictive engine (102). In practice, this could include downloading or otherwise receiving an application or other software and deploying or installing the software to the cloud device or environment. With the predictive engine (102) configured (212, 214), the predictive engine interface (104) may also be configured (216). This could include initially enabling accessing the predictive engine interface (104) and providing information such as user account details, licensing details, and general network details in order to allow one or more users to access the predictive engine interface (104) via one or more users devices.
- configuration (212, 214) of the predictive engine (102) may be performed in varying ways depending upon a particular network and implementation, it should be noted that in some implementations it may be as simple as installing a software application on a single device of the network (100) such as the host device (101), without requiring any major configuration changes or software to be installed on other devices of the network (100) such as the edge devices (106, 108) or core device (110). This aspect of the predictive engine (102) may be advantageous for networks where physical or remote access to devices to make significant configuration changes or install new software is costly, difficult, or otherwise undesirable.
- the predictive engine interface (104) may be used to configure (218) one or more devices on the network (100) that the predictive engine (102) will monitor. Configuration of devices could be performed semi-automatically by, for example, installing a beacon or other self- reporting application on those devices to cause them to broadcast their details to the predictive engine (102) or predictive engine interface (104) once they join the network (100), or by querying another device or database in order to identify and automatically populate data. Configuration of devices could also be performed manually by providing the required information via the predictive engine interface (104).
- Information provided when configuring a device may include, for example, a name or descriptor for the device, a physical address or location associated with the device, a serial number or device model for device, network information such as IP address, MAC address, in-path IP address, in-path MAC address, hypervisor IP address, baseband management controller (“BMC”) IP address, subnet mask, default gateway, and personnel information such as contact name, phone, email, and authentication information such as a username and password for remotely accessing the device.
- network information such as IP address, MAC address, in-path IP address, in-path MAC address, hypervisor IP address, baseband management controller (“BMC”) IP address, subnet mask, default gateway, and personnel information such as contact name, phone, email, and authentication information such as a username and password for remotely accessing the device.
- BMC baseband management controller
- FIG. 4 that figure shows a flowchart of an exemplary set of steps
- the network optimization device (103) may also be monitoring (400) the network (100) resources and managing (402) the network (100) resources in order to improve utilization of processing time and reduce the size and/or frequency of data transfers across the network (100), while the predictive engine (102) is performing predictive tasks associated with many of those same devices and data points in parallel.
- the predictive engine (102) is configured to regularly receive (228) device data from the configured devices of the network (100), and perform (230) predictive analysis of that data.
- Receiving (228) data from the devices may occur by the predictive engine (102) requesting such data from one or more devices or by one or more device pushing such data to the predictive engine (102), either of which may occur based upon one or more of a set schedule or randomized schedule, user request via the predictive engine interface (104), utilization or availability of the predictive engine (102), or in response to environmental factors (e.g., detection of problems with devices or networks located in the same building or geographical area, major weather events or other emergencies in the area, prioritization based upon a customer request).
- environmental factors e.g., detection of problems with devices or networks located in the same building or geographical area, major weather events or other emergencies in the area, prioritization based upon a customer request.
- Performing (230) predictive analysis on the data may include comparing the data to a set of predictive rules configured on the predictive engine (102) both as it arrives, in order to predict problems based upon that snapshot of data, as well as over time, in order to predict problems that may only be apparent based upon data collected from a device over hours, days, or even weeks.
- a set of predictive rules configured on the predictive engine (102) both as it arrives, in order to predict problems based upon that snapshot of data, as well as over time, in order to predict problems that may only be apparent based upon data collected from a device over hours, days, or even weeks.
- predictive analysis is performed (230)
- one or more configured predictive rules may be triggered by a particular data set.
- Particular predictive rules are more appropriately discussed in the context of FIG.
- the predictive engine (102) may first determine (234) whether the scenario causing the alert is of a high priority. If the alert is not determined (234) to be of a high priority, the predictive engine (102) may only execute (240) a notification task related to the alert, which could include generating a notice of the alert via the predictive engine interface (104) or providing some other low priority or non-emergency electronic communication to one or more users.
- the predictive engine (102) may determine (236) if there are any preventative tasks associated with that type of alert, and then perform (238) the preventative task if there are. In either case, the predictive engine (102) may then execute (240) a notification task related to the high priority alert, which may include one or more urgent notifications beyond simply updated the predictive engine interface (104), such as directly calling, texting, or otherwise communication with one or more support staff and providing a description of the high priority alert, and the results of an executed (238) preventative task, if any. While the types, classifications, preventative tasks, and notification tasks associated with a set of predictive rules in a particular implementation will vary greatly based upon the scope and characteristics of that implementation, some examples are provided below.
- predictive rules may be associated with four categories of alerts: critical, major, minor, and informational.
- Each predictive rule may be associated with one or more preventative tasks, regardless of its priority (e.g., even informational alerts may be associated with preventative tasks), as well as one or more notification tasks, to be performed or executed when that predictive rule is triggered.
- a predictive rule may be triggered based upon recently received data, or aggregated data received over a period of time. In such an implementation, the predictive rules could be configured to provide varying levels of automated task performance depending upon the priority of the alert.
- critical alerts may be configured with preventative tasks intended to proactively address a network problem even at a significant resulting cost in network performance (e.g., automatically rebooting a critical device that results in a significant loss in network optimization), while major alerts may be configured with preventative tasks that allow for a minor loss in network performance (e.g., automatically reconfiguring a device or refreshing a connection that results in a short or minor loss in network optimization), and minor or informational alerts may only be configured with preventative tasks that can address a problem without any impact on the network optimization (e.g., disabling a single storage drive from a multi-drive mirrored storage array).
- preventative tasks intended to proactively address a network problem even at a significant resulting cost in network performance
- major alerts may be configured with preventative tasks that allow for a minor loss in network performance (e.g., automatically reconfiguring a device or refreshing a connection that results in a short or minor loss in network optimization)
- minor or informational alerts may only be configured with preventative tasks that can address
- critical and major tasks may be more likely to be associated with preventative tasks that may have some cost in network functionality or optimization, as well as intrusive notification tasks such as direct electronic communication with personnel, while minor and informative alerts may be more likely to be associated with non-intrusive predictive tasks such as reconfiguring a device in a way that does not interrupt its normal performance, or notification tasks such as updating the predictive engine interface (104) or providing summarized results by email every 24 hours.
- intrusive notification tasks such as direct electronic communication with personnel
- minor and informative alerts may be more likely to be associated with non-intrusive predictive tasks such as reconfiguring a device in a way that does not interrupt its normal performance, or notification tasks such as updating the predictive engine interface (104) or providing summarized results by email every 24 hours.
- notification tasks such as updating the predictive engine interface (104) or providing summarized results by email every 24 hours.
- an example of a critical alert may include a software licensing issue, whereby the predictive engine (102) may receive information from a device indicating that a software license or key on the device has expired, is missing, or is invalid, which may result in certain critical software functions of the device to fail. This could include, for example, the device failing to perform a software function for about 30-50% of a duration of time between 1 and 6 hours.
- a preventative task associated with this alert may include identifying and reporting license information, and querying a license management system and requesting a new license or reverting to a backup data set where the license may still be present or uncorrupted.
- Notification tasks associated with this alert may include notifying personnel responsible for enterprise license management, and, as with many tasks, notifying personnel responsible for the particular device in question.
- One critical alert may include the predictive engine (102) receiving information from a network optimization device (103) indicating that the network optimization device (103) is in a bypass mode, which causes it halt some or all of its network optimization tasks, thereby reducing the overall efficiency and reliability of the network (100).
- This alert may trigger when the network optimization device (103) is in bypass mode about 30-50% of a duration of time between 1 and 6 hours.
- a preventative task associated with this alert may include de-activating the bypass mode, soft-rebooting the device, or hard-rebooting the device.
- a notification task associated with this alert may include checking the device status to determine if it is healthy, and indicating that the condition will resolve itself if the device healthy, or indicating that one or more devices should be restarted.
- Another critical alert may include critically high uncommitted data.
- Uncommitted data is data that is stored on a device, such as the edge device (106, 108), that needs to be transmitted to another device, such as the core device (110), another edge device (106, 108) or a storage repository.
- Uncommitted data may be pre-processing (e.g., unmodified data that has been received by the edge device (106) and which the network optimization device (103) determines needs to be processed by the core device (110) or another edge device (108)), or post processing data (e.g., data that has been received by the edge device (106) and processed to compress it, encrypt it, reduce its overall size through paring irrelevant data or aggregating into a different format).
- edge devices (106) are generally not as powerful or scalable as core devices (110), and so relying on them for processing and storage tasks increases the risk that their modest capabilities will be overwhelmed, especially where uncommitted data remains on the device longer than necessary.
- a critical alert may be appropriate.
- An appropriate preventative task may be allocate or connect to additional storage, to clear the storage of unnecessary caches and stored files, to prioritize committing the data, or to soft-reboot or hard-reboot the device to see if data begins to commit normally.
- Other preventative and notification tasks for this alert may include reporting whether the device is optimizing traffic, reporting whether the device is in bypass mode, reporting whether the bypass mode was intentionally set or occurred unintentionally, reporting whether a SF core is connected, automatically restarting an optimization service, and reporting whether restarting the optimization service addressed the issue.
- a Device Down alert which indicates that a network optimization device (103) or other device is down or offline.
- This alert may trigger when a device on the network (100) does not respond to a ping on any interface, but a gateway associated with the device does respond.
- Preventative and notification tasks associated with this alert may include pinging the device on a primary interface, pinging a default gateway, pinging an in-path IP address, and reporting whether the device is not responding, or whether the gateway is not responding.
- a critical alert is an Interface Issue alert, which indicates that the primary interface of the device is unresponsive. This alert may trigger when the primary interface does not respond to a ping, but a response is received from an in-path ping and a gateway ping, and the primary interface continues to be unresponsive after being reset by the predictive engine (102). Preventative and notification tasks associated with this alert may include logging the alert, resetting the primary interface one or more times, and reporting the results of resetting the primary interface.
- a critical alert is a Secure Vault Locked alert, which indicates that a device’s secure vault is locked or has an error.
- This alert may be triggered when the device data indicates that a device has been unable to access the secure vault for between about 30 and 50% of a duration of time between about 1 and 6 hours.
- Preventative and notification tasks associated with this alert may include reporting that the secure vault is locked and providing instructions to unlock the vault.
- Error alert which indicates that an error in an optimization service is causing it to stop running.
- This alert may trigger when an optimization service has failed to run for between about 30 and 50% of a duration of time between about 1 and 6 hours.
- Preventative and notification tasks associated with the alert may include automatically restarting the service, and reporting the status of the service before and after an automatic restart attempt.
- Other critical alerts and associated exemplary preventative tasks may include a deactivated logical unit number (“LUN”) alert (i.e., where one or more storage capabilities of the device are deactivated or unavailable) resulting in activation or reboot, or SSH authentication failure alert (i.e., credentials are being rejected when an SSH connection is attempted to the device) resulting in a configuration change to the device or the network optimization device (103).
- LUN logical unit number
- SSH authentication failure alert i.e., credentials are being rejected when an SSH connection is attempted to the device
- An example of a major alert is a Degraded Mode alert, which indicates that a device has one or more alarms activated causing it to be in a degraded state. This alert may trigger when the device is in a Degraded Mode for between about 30 and 50% of a period of time between about 1 and 6 hours. Preventative and notification tasks performed for this alert may include reporting the number and description of alarms on the device.
- Another example of a major alert is a Disk Error alert, which indicates that a device has a disk error. This alert may be triggered when the device reports disk errors for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include reporting a disk error, reporting a disk state, and reporting any alarms from the device.
- a Major alert is a High Uncommitted Data alert, which indicates that a higher than usual amount of uncommitted data is stored on the datastore of the device.
- This alert may be similar to the Critical Uncommitted Data Alert, but may trigger at between about 10 and 20% of storage being utilized by uncommitted data, and may have similar preventative and notification tasks.
- a Service Halt Error alert which indicates that a service such as an optimization service has stopped running on the device. This alert may trigger when the device data indicates that a service on the device has not functioned for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include restarting the service, and determining if the service crashed but restarted itself.
- a Site Down alert which indicates that a site where the device is located is unreachable. This alert may trigger when a gateway device associated with the device does not respond to ping attempts. Preventative and notification tasks associated with this alert may include reporting that the gateway or site is unreachable, and attempting to automatically power cycle the gateway.
- an Expiring Peering Certificate alert which indicates that a device has an expiring Peering Certificate. This alert may be triggered when the device data indicates a Peering Certificate may expire in about thirty days or less. Preventative and notification tasks associated with this alert ay include identifying the expiring certificate, reporting the certificate, and providing a copy or a description of the certificate.
- Some examples of major alerts and associated exemplary preventative tasks include a no data reduction alert (i.e., device shows no data reduction due to no peers to connect to) resulting in a device or network switch reboot, SSH connection failure alert (i.e., system cannot reach the device to attempt SSH connection, indicating a possible firewall or IP address misconfiguration) resulting in a configuration change to the device or network, or an unused device alert (i.e., device is showing unusual data traffic over a period of time) resulting in a device reboot or a configuration change on the network optimization device (103).
- a no data reduction alert i.e., device shows no data reduction due to no peers to connect to
- SSH connection failure alert i.e., system cannot reach the device to attempt SSH connection, indicating a possible firewall or IP address misconfiguration
- an unused device alert i.e., device is showing unusual data traffic over a period of time
- An example of a minor alert is a Datastore Corruption alert, which indicates that the Device has a corrupt datastore. This alert may trigger when the device reports a corrupt Datastore between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include reporting that the Datastore is corrupt and providing instruction on clearing the Datastore, and reporting the current state of the Datastore.
- Another example of a minor alert is an Expiring CA Certificate alert, which indicates that a device has a certificate or authentication that will soon expire. This alert may be triggered when a device has a certificate or authentication that will expire within about thirty days or less. Preventative and notification tasks associated with this alert may include checking for expiring certificates, reporting expiring certificates and providing a copy of the certificate, and deleting the certificate so that it can be refreshed.
- Another example of a minor alert is an Expiring Mobile Trusts Certificate alert, which indicates that a device has an expiring Mobile Trust certificate. This alert may be triggered when the device data indicates that a device has a mobile trust certificate that will expire in about thirty days or less. Preventative and notification tasks associated with this alert ay include identifying the expiring certificate, reporting the certificate, and providing a copy or a description of the certificate.
- Another example of a minor alert is an Expiring Peering CA Certificate alert, which indicates that a device has an expiring Peering CA certificate. This alert may be triggered when the device data indicates that a Peering CA Certificate will expire in about thirty days or less. Preventative and notification tasks associated with this alert ay include identifying the expiring certificate, reporting the certificate, and providing a copy or a description of the certificate.
- Another example of a minor alert is an Expiring Peering Whitelist Certificate alert, which indicates that a device has an expiring Peering Whitelist Certificate. This alert may be triggered when the device data indicates a Peering Whitelist Certificate may expire in about thirty days or less. Preventative and notification tasks associated with this alert ay include identifying the expiring certificate, reporting the certificate, and providing a copy or a description of the certificate.
- Another example of a minor alert is an Expiring Server Certificate alert, which indicates that a device has an expiring Server Certificate. This alert may be triggered when the device data indicates a Server Certificate may expire in about thirty days or less. Preventative and notification tasks associated with this alert ay include identifying the expiring certificate, reporting the certificate, and providing a copy or a description of the certificate.
- Another example of a minor alert is an Expiring Server Chain Certificate alert, which indicates that a device has an expiring Server Chain Certificate. This alert may be triggered when the device data indicates a Server Chain Certificate may expire in about thirty days or less. Preventative and notification tasks associated with this alert ay include identifying the expiring certificate, reporting the certificate, and providing a copy or a description of the certificate.
- Flash Error alert indicating that a device flash drive has become unresponsive. This alert may be triggered when the device data indicates a flash error for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include providing a notification of the error and instructions on resetting the device and flash drive.
- a minor alert is a Half Duplex Interface alert, which indicates that a device has an interface that is communicating at half duplex speed. This alert may trigger when the device data indicates that an interface is communicating at a reduced speed. Preventative and notification tasks associated with this alert may include reporting and identifying the interface and its status.
- a minor alert is a High Average System Load alert, which indicates that the device has exceeded a recommended CPU utilization.
- This alert may trigger when the device data for a device indicates that the CPU utilization for the device is above about 70 to 95% for a period of time between about 1 and 6 hours.
- Preventative and notification tasks associated with this alert may include reporting the CPU utilization, reporting processes that are using the CPU, identifying processes that may be closed or restarted, reporting any virtual machines utilizing CPU, reporting compression level settings and indicating whether those settings may impact CPU utilization.
- a minor alert is a High Paging Activity alert, which indicates that the device has run out of memory and is using a swap partition. This alert may trigger when the device data indicates a device has been using a swap partition for about 30 to 50% of a duration of time between about 1 and 6 hours. Predictive and notification tasks associated with this alert may include reporting memory swap statistics, reporting the number of pages swapped, and providing an indicator of whether that is a higher than normal number.
- a minor alert is a High System Temperature alert, which indicates a device has exceeded a recommended temperature. This alert may trigger when the device data indicates a device has been above a recommended temperature for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include reporting the current temperature of the device, and reporting the status of any fans or other cooling devices.
- a minor alert is a High CPU Utilization alert, which indicates that the device has exceeded a recommended CPU utilization.
- This alert may trigger when the device data for a device indicates that the CPU utilization for the device is above about 70 to 95% for a period of time between about 1 and 6 hours.
- Preventative and notification tasks associated with this alert may include reporting the CPU utilization, reporting processes that are using the CPU, identifying processes that may be closed or restarted, reporting any virtual machines utilizing CPU, reporting compression level settings and indicating whether those settings may impact CPU utilization.
- Another example of a minor alert is an Interface Error alert, which indicates that a device has errors on a physical interface. This alert may trigger when the device data indicates a device has interface errors between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include checking the interface speed and duplex settings and reporting their status, providing instructions on configuring the interfaces and duplex settings properly.
- Another example of a minor alert is an Interface Link Lost alert, which indicates that a device has lost a link to a physical interface.
- This alert may trigger when the device data indicates a device has not communicated via a physical interface for between about 30 and 50% of a duration of time between about 1 and 6 hours.
- Preventative and notification tasks associated with this alert may include reporting instructions on physically verifying the interface link, reporting the identify and status of the interface, and reporting any recent hardware log entries associated with the device or interface.
- IPMI Event alert Another example of a minor alert is an IPMI Event alert, which indicates that a device has triggered an IPMI event. This alert may be triggered when the device data indicates that a device has been in an IPMI event state for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include reporting any error log entries associated with the device, and reporting any alarms associated with the device.
- a Minor alert is a Memory Error alert, which indicates that a device has errors in a memory module. This alert may be triggered when the device data indicates that a device memory module has been producing errors for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include identifying the memory and reporting its status.
- Another example of a minor alert is a No Data Reduction - No Peers alert, which indicates that a device is not achieving any data reduction, and has no peers. This alert may be triggered when the device data indicates that a device is not reducing data, and there are no connected peers. Preventative and notification tasks associated with this alert may include reporting that there are no peers connected to the device.
- a minor alert is a No Data Reduction - Peers Disconnected alert, which indicates that a device is not achieving any data reduction, and has peers configured but they are disconnected. This alert may be triggered when the device data indicates that a device is not reducing data, and there are configured peers that are all offline. Preventative and notification tasks associated with this alert may include reporting that there are no peers connected to the device, automatically restarting the optimization service, and reporting whether the restart was successful.
- a minor alert is a No Data Reduction - Peers Connected alert, which indicates that a device has no data reduction and has connected peers. This alert may trigger when the device data indicates that the data reduction for a device is zero, but that there are configured and connected peers that are currently online. Preventative and notification tasks associated with this alert may include restarting the optimization service, and if the issue is not corrected, reporting that data is not being reduced despite peers being connected and that other configurations may need to be manually changed.
- Inaccessible alert which indicates that a device has experienced an error while creating a process dump.
- This alert may trigger when the device data indicates that a device has experienced errors while creating a process dump for between about 30 and 50% of a duration of time between about 1 and 6 hours.
- Preventative tasks and notifications associated with this alert may include providing instructions for clearing space on the device, reporting a list of dump files on the device, and reporting disk information for the device.
- RAID Error alert Another example of a minor alert is a RAID Error alert, which indicate that a device has encountered RAID errors. This alert may trigger when the device data indicates that a device has experienced RAID errors for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include reporting current RAID alarms, RAID failures, a RAID diagram, RAID array info, and RAID physical drive info.
- a minor alert is a Service Error alert, which indicate that a service such as an optimization services has stopped running on a device.
- This alert may trigger when the device data indicates that a device has had a service inactive for between about 30 and 50% of a duration of time between about 1 and 6 hours.
- Preventative and notification tasks associated with this alert may include automatically restarting the service, and checking if the service is currently running and indicating that it crashed but restarted automatically.
- Another example of a minor alert is an SMB Authentication Error alert, which indicates that a device is unable to authenticate a domain, and is unable to optimize CIF and SMB connections.
- This alert may trigger when the device data indicates that a device has been unable to authenticate a domain for between about 30 and 50% of a duration of time between about 1 and 6 hours.
- Preventative and notification tasks associated with this alert may include reporting on a domain status, providing instructions to rejoin a domain, and providing instruction to initially join a domain.
- Paused alert which indicates that an optimization service is paused. This alert may trigger when an optimization service has failed to run for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with the alert may include automatically restarting the service, and reporting the status of the service before and after an automatic restart attempt.
- a System Disk Full alert which indicates that a devices system disk partitions are full and cannot be written to. This alert may trigger when the device data indicates that a device system disk has been full for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include reporting that the disk is full and providing instructions on clearing space, reporting a list of dump files on the device, and reporting disk information of the device.
- Admission Control Another example of a minor alert is Admission Control, which indicates that a device has reached its license limitations for a number of optimized connections. New connections will not be optimized while the device is in Admission Control mode.
- This predictive rule may trigger when device data indicates that a device is in Admission Control mode for more than about 30-50% of a period of time between about 1 and 6 hours.
- Preventative and notification tasks associated with an Admission Control alert may include identifying protocols and reporting protocol percentages that are less than 6%, reporting protocols that are getting negative optimization and the extent of negative optimization, reporting a number of connections opened by a particular protocol, reporting that no protocols were identified that would benefit from passthrough, checking the current license on the device to determine if an upgraded license is available that would address the Admission Control mode, and reporting the possible upgrade.
- Some additional examples of minor alerts that may only trigger notification tasks or updates to the predictive engine interface (104), but which may also be associated with preventative tasks if desired, include Under-Optimized Port Traffic (i.e., Device is performing below thresholds on a particular port), and Unknown SSH Error (i.e., An unknown error occurred when attempting to create a SSH connection to the device).
- Under-Optimized Port Traffic i.e., Device is performing below thresholds on a particular port
- Unknown SSH Error i.e., An unknown error occurred when attempting to create a SSH connection to the device.
- An example of an informational alert is Critical Mode, which indicates that an optimization service on the device has ceased functioning or is in bypass mode. This alert may trigger when a device is in critical mode between about 30 and 50% of a duration of time between 1 and 6 hours.
- Preventative and notification tasks associated with this alert may include determining whether the optimization service is currently running and reporting that status. Where the optimization service is not running, tasks may include indicating that the service is being restarted, automatically restarting the service, and then reporting whether the service restart was successful or not.
- An example of an informational alert is an Inactive Monitored VM alert, which indicates that a device has a virtual machine that is currently offline or powered off. This alert may trigger when a virtual machine on the device is marked as monitored and powered on by default, but is currently powered off. Preventative and notification tasks may include reporting the VM that is currently powered off.
- An example of an informational alert is an Internal Fan Error alert, which indicates a device has an error with one or more fans. This alert may be triggered when the device data indicates a fan on a device has failed to function normally for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include identifying and reporting the fan that is not functioning normally, providing instructions for checking and cleaning the fan, and reporting any new error log entries for the device.
- Some additional examples of informational alerts that may only trigger notification tasks or updates to the predictive engine interface (104), but which may also be associated with preventatives tasks if desired, include Collection Protocol Syslog (i.e., agent can’t fetch the Syslogs from the device), and WAN0_0 Interface Out Errors (i.e., WAN0_0 interface on the device shows errors). Additional predictive rules, alert types, categorizations, and preventative tasks, exist and will be apparent to one of ordinary skill in the art in light of this disclosure.
- FIG. 6 shows a flowchart of an exemplary set of steps that could be performed by the predictive engine (102) to provide notifications based on alerts produced during predictive detection.
- the specifics of the notification task will be determined (244) and the predictive engine interface (104) will be updated (246) to reflect the occurrence of the alert, so that a user of the predictive engine interface (104) may view information on the details and devices associated with the alert.
- It will also be determined (248) if the notification task is an active notification that requires some action beyond updating the predictive engine interface (246). This may be the case where a critical or major alert is configured to provide direct notifications as electronic communications to one or more support personnel.
- the predictive engine (102) will provide (25) the active notification based upon the configuration for that notification and predictive rule.
- FIG. 7 shows a simulated screenshot of an exemplary monitor view (300) of an exemplary predictive engine interface such as the predictive engine interface (104).
- the monitor view (300) may be a view that is displayed to users by default when they access the predictive engine interface (104).
- the monitor view comprises a map pane (302), a data pane (304), a high priority alert pane (306), a low priority alert pane (308), an offline device indicator (310), and a new alert indicator (312).
- the map pane (302) uses information that is associated with the network (100) and the configured devices (e.g., core device (104), edge device (106), edge device (108)) to display a map (314) that is associated with the geographical area covered by the network (100).
- the map may be marked with indicators showing the location and status of devices that are associated with a normal operational status (316) that are being monitored by the predictive engine (102), and may also be marked to show the location and status of devices associated with a low- priority alert (317) and devices associated with a high priority alert (318), by changing one or more of the color, texture, or graphical image of the devices to be differentiated.
- the data pane (304) may show data associated with the monitored devices of the network (100) in a variety of ways, including as a chart, graph, table, or other desirable data visualization ⁇
- the data pane (304) shown in FIG. 7 displays a graph of the data reduction achieved over the network (100) during a period of time.
- Data reduction is one of the advantages provided by the efficient management of the network (100) by a network optimization device (103), and the data reduction percentage indicates the extent to which data was pre-processed at an edge device (106, 108) before transport across the network (100) at various times.
- the percentage of data reduction (324) was 50%. Supposing that all edge devices on the network (100) received or produced a total of 1TB of data during that period of time which ultimately needs to be committed to the core device (110), a 50% data reduction indicates that edge computing allowed the data to be reduced to 500GB before being committed and transported across the network (100).
- the data pane (304) also shows a list (320) of the devices achieving the highest efficiency in data reduction, as well as a list (322) of the devices achieving the lowest efficiency in data reduction. [0095] While the data pane (304) depicted in FIG.
- the data pane (304) may show one or more types of data relating to the efficiency, performance, or health of the network (100). For example, one data pane may show a percentage of processor utilization of devices in the network (100), and yet another data pane may show a percentage of devices storage that is being used on uncommitted data.
- the offline device indicator (310) and the new alert indicator (312) each show important information and are positioned to be visible to a user.
- the offline device indicator (310) indicates any devices in the network (100) that are normally monitored by the predictive engine (102), but which have been excepted from monitoring for some reason. Halting monitoring on a device may be useful where it is being manually maintained, upgraded, or replaced, in order to avoid generating false critical alerts for the device throughout that time period.
- the offline device indicator (310) serves as a reminder that some devices may be offline, and that monitoring of those devices should be re-enabled at a future time.
- the new alert indicator (312) shows alerts that have been generated and not yet viewed or addressed, and may be clicked on by a user to view additional information on each alert in a variety of summary views or detailed views.
- the predictive engine interface (104) may also allow users to review and search alert histories over varying periods of time, and could allow, for example, all alerts that occurred across the network (100) during a particular time period to be viewed in a list, or all alerts that occurred for a particular device during a particular time period, or other subsets of alerts as may be desired.
- the predictive engine interface (104) may also offer additional tools and views that may be used to interact with and enhance the predicative maintenance and troubleshooting functionality provided by the system.
- FIG. 8 shows a simulated screenshot of an exemplary device view (328) of the predictive engine interface (104).
- the device view (328) comprises a set of interface features for selecting to view device images (330), selecting to view site install images (332), and selecting to view and select other command utilities (334).
- the device view (328) also displays a set of device images (336) for one or more of the devices that are monitored by the predictive engine (102), and may advantageously include simulated schematic diagrams of the devices.
- Schematic diagrams for devices may be desirable for providing remote support for devices, and may in some implementations be viewable via the predictive interface (104) by both centralized support personnel as well as personnel located proximate to the displayed device to provide a common reference during troubleshooting.
- Providing the device image (336) as a schematic diagram avoids the need for exchanging photographs or verbal descriptions of the device, which can be an inefficient and error prone process.
- the device images (336) may be manually added at the time a device is configured (218), or may be automatically retrieved and added based upon a device serial number, model type, or other characteristic at the time the device is configured (218), in order to ensure the accuracy and availability of the schematic.
- the device images (336) provide information on the appearance, features, connections, and interfaces of the device, and may assist users of the predictive engine (102) in effectively providing troubleshooting or support of a remotely located device that they are not directly viewing.
- FIG. 9 shows a simulated screenshot of an exemplary site view (342).
- the site view (342) displays one or more images of the devices or the site at which they are installed, which may be captured by a technician at the site and provided to the predictive engine interface (104). While the devices images (336) are a useful tool for providing remote troubleshooting, actual images of the site and device may also be useful.
- Site images may advantageously include a serial number image (338) showing the serial number of the device, a connection image (340) showing the cables and cords that are currently connected to the device, an install location image showing the rack, room, closet, or other area where the device is installed, and other images.
- the predictive engine interface (104) may require that one or more site images such as those described above be uploaded when the device is configured, or may produce notifications via the predictive engine interface (104) until such images are uploaded, in order to increase the likelihood that configured devices will be accurately and fully represented in the site view (342).
- the predictive engine (102) may also perform image analysis on uploaded pictures to extract and verify serial numbers or model numbers, verify installation of cables and connections, or verify an uploaded image more likely than not actually displays the required image (i.e., and is not a blank image uploaded simply to bypass the requirement).
- Other functionality available through the predictive engine interface (104) may include various features and functions available by selecting a command utilities (334) button.
- Such functionality may include tools for pinging site locations and devices to determine their status manually, testing interfaces to determine if primary and in-path interfaces are operating, restarting the command line interface if that interface becomes non-responsive, restarting a client Webserver if a device UI becomes non responsive, and entering a command mode where commands can be typed, transmitted, and executed on remote monitored devices.
- “configured” should be understood to mean that the thing“configured” is adapted, designed or modified for a specific purpose.
- An example of “configuring” in the context of computers is to provide a computer with specific data (which may include instructions) which can be used in performing the specific acts the computer is being “configured” to do. For example, installing Microsoft® WORD on a computer“configures” that computer to function as a word processor, which it does by using the instructions for Microsoft WORD in combination with other inputs, such as an operating system, and various peripherals (e.g., a keyboard, monitor, etc).
- determining should be understood to refer to generating, selecting, defining, calculating or otherwise specifying something. For example, to obtain an output as the result of analysis would be an example of “determining” that output. As a second example, to choose a response from a list of possible responses would be a method of“determining” a response. As a third example, to identify data received from an external source (e.g., a microphone) as being a thing would be an example of“determining” the thing.
- an external source e.g., a microphone
- a“set” should be understood to refer to a collection containing zero or more objects of the type that it refers to. So, for example, a “set of integers” describes an object configured to contain an integer value, which includes an object that contains multiple integer values, an object that contains only a single integer value, and an object that contains no integer value whatsoever.
- a“means for determining the existence of an anomaly caused by network device behavior” should be understood as a means plus function limitation as provided for in 35 USC ⁇ 112(f) where the function is “determining the existence of an anomaly caused by network device behavior” and the corresponding structure is a computer configured as described in FIGS. 4- 5 and their associated discussion, as well as paragraphs [0035]-[0089].
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Computer Hardware Design (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Debugging And Monitoring (AREA)
Abstract
A predictive engine and interface may be configured on a newly added device, pre¬ existing device, or cloud device within a complex enterprise network. The predictive engine gathers data from devices on the network and applies predictive rules and analytics in order to identify early and current problems with those devices and generate alerts that may be used to predict potential later issues. Based on alerts, the predictive engine may perform automated preventative maintenance and notification tasks, The predictive engine interface may be accessed by users in order to configure devices and rules, view alerts, view efficiency metrics for the network such as data reduction, CPU usage, and available storage, and view information, schematics, and images of devices and installation sites in order to aid with remote troubleshooting.
Description
SYSTEM AND METHOD FOR PREDICTIVE MAINTENANCE OF NETWORKED
DEVICES
FIELD
[0001] The disclosed technology pertains to a system for predicting and preventing or mitigating errors within a network of devices.
BACKGROUND
[0002] As the concept and technology for networking devices and creating networked infrastructures has evolved and expanded, so too has the necessity for sophisticated solutions for maintenance and monitoring of those networks. In the past, it would not have been uncommon for a large company or organization to have every computer, server, data storage device, or other infrastructure component housed in the same building, and communicating through communication hubs and bridges located in a single room of that building.
[0003] Since then, modern networks have grown in complexity and scale such that even very small companies maintain cloud computing and cloud storage systems that may be distributed around the globe, and it would be a routine occurrence for sophisticated and successful technology companies to not have physical possession of a single component of the infrastructure that provides their software and services to customers.
[0004] As cloud environments have been improved and optimized for efficiency and access, the desire for further gains in performance, reliability, and efficiency has led to interest and development in additional networking techniques. With overall gains in network bandwidth and reliability, and continuing improvements in the performance-to-cost ratio of so called“edge” devices, there has been a growing emphasis on distributing tasks in order to take advantage of unutilized processing power across and at the edge of networks, rather than relying primarily or solely on core computing within the cloud.
[0005] Edge computing concepts can be implemented within a network to decentralize processing for some tasks by moving those tasks to one or more computers that serve as entry points for data to the network rather than having them simply receive the data and provide to a central node to be processed. For example, a grocery store or other retail environment may have several servers within a brick and mortar location that gather data related to transactions occurring at that brick and mortar location. In many conventional networks, this data might be gathered and stored throughout the day on local servers with minimal manipulation, and then transported to a central node such as a cloud computing environment or a data room based upon some schedule during the night. Processing tasks at the central node may include more demanding and sophisticated tasks such as analytics, compression, encryption, and machine learning.
[0006] An example of an inefficiency in this process is that the local servers may only use 10-20% of their processing potential on basic data transport and storage tasks throughout the day, with unused processing potential being wasted. Another example is a heightened and intermittent bandwidth requirement between the central node and the brick and mortar location. Another example is a delay in sophisticated processing tasks such as supply chain forecasting and analytics, which could result in slow reaction to increasing demand and resulting shortages of products or services. These inefficiencies may seem manageable or even trivial at a very small scale, but when multiplied across large and even moderately sized networks, whether a grocery chain with several thousand brick and mortar locations, or a state or local government having a ten to twenty separate locations providing different services, the loss in efficiency can become critical.
[0007] While there are numerous advantages provided by edge computing, it further amplifies the problem of preventing and addressing infrastructure errors. A technician located at a central node may now be responsible for and reliant on the proper operation of devices at that node, cloud computing devices scattered across the globe, and hundreds or thousands of edge computing devices scattered across their company’s geographical footprint. Detecting problems within these
complex networks is often a reactive process, and so problems are frequently not identified until a nightly batch of data fails to arrive. Further compounding this initial delay is the difficulty of a troubleshooting process that may now include phone calls to cloud computing vendor support lines, and phone calls to non technical personnel at brick and mortar locations, in order to describe, identify, and address problems.
[0008] What is needed, therefore, is an improved system for providing predictive maintenance of complex networks.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] The drawings and detailed description that follow are intended to be merely illustrative and are not intended to limit the scope of the invention as contemplated by the inventors.
[0010] FIG. 1 is a schematic diagram of an exemplary system configured to perform predictive analysis and preventative maintenance on a network;
[0011] FIG. 2 is a flowchart of an exemplary set of high-level steps that could be performed to provide predictive analysis and preventative maintenance on the network;
[0012] FIG. 3 is a flowchart of an exemplary set of steps that that could be performed to configure a network component to monitor one or more devices on the network;
[0013] FIG. 4 is a flowchart of an exemplary set of steps that that could be performed to monitor the network for predictive data;
[0014] FIG. 5 is a flowchart of an exemplary set of steps that could be performed to address alerts produced during predictive detection;
[0015] FIG. 6 is a flowchart of an exemplary set of steps that could be performed to provide notifications based on alerts produced during predictive detection;
[0016] FIG. 7 is a simulated screenshot of an exemplary monitor view of an exemplary predictive engine interface;
[0017] FIG. 8 is a simulated screenshot of an exemplary device view of the exemplary predictive engine interface; and
[0018] FIG. 9 is a simulated screenshot of an exemplary site view of the exemplary predictive engine interface.
DETAILED DESCRIPTION
[0019] The inventors have conceived of novel technology that, for the purpose of illustration, is disclosed herein as applied in the context of proactive and predictive maintenance of a network of devices. While the disclosed applications of the inventors’ technology satisfy a long-felt but unmet need in the art of predictive maintenance of a network of devices, it should be understood that the inventors’ technology is not limited to being implemented in the precise manners set forth herein, but could be implemented in other manners without undue experimentation by those of ordinary skill in the art in light of this disclosure. Accordingly, the examples set forth herein should be understood as being illustrative only, and should not be treated as limiting.
[0020] Turning now to the figures, FIG. 1 shows a schematic diagram of an exemplary system configured to perform predictive analysis and preventative maintenance on a network (100) of devices. While the devices present in a particular network will vary greatly, FIG. 1 shows the network (100) comprising an edge device (106), an edge device (108), a core device (110), and a predictive engine (102). The core device (110) may be, for example, one or more cloud hosted virtual machines, a centralized data and processing room with one or more servers, or another computing environment that may offer high performance or scalability for processing and storage. The edge device (106) and the edge device (108) may be, for example, a tower server, rack server, point of sale server, or other computing device that may be located remotely from the core device (110), for example, at the edge of the network (100).
[0021] The predictive engine (102) may be a device configured to perform tasks associated with predicting and preventing or mitigating issues present in the
network (100). The predictive engine (102) may be configured and installed on a host device (101) connected to the network (100). The host device (101) may be a dedicated physical or virtual device that is placed on the network (100) to provide the predictive engine (102), or may be a device already present on the network for another purpose. This could include, for example, configuring any one or more of the core device (110), the edge devices (106, 108) or the network optimization device (103) to act as the host device (101).
[0022] Some networks may also have additional devices that are appropriate for configuration as the predictive engine (102). For example, some networks may also comprise one or more specialized devices that provide network management and performance enhancing features, such as a network optimization device (103). The network optimization device (103) could be configured to manage some or all of a networks available resources for processing, storage, and data traffic, across the entire network including both centralized locations and remote office/branch office (“ROBO”) locations, and causing those resources to process and transfer data in a way that improves efficiency, reliability, and redundancy.
[0023] The network optimization device (103) may be centrally located and configured to manage network performance across the network (100), or one or more network optimization devices (103) may be placed around data traffic paths within the network (100). For example, in some implementations a network optimization device (103) may be physically or virtually deployed to each remote location or remote branch of a network. In this manner, a network optimization device (103) may monitor and improve the efficiency of a section of the network between the core device (110) and a particular edge device or set of edge devices at that remote location. Examples of a network optimization device (103) could include various proprietary devices configured to optimize one or more aspects of network performance, and include devices offered by Riverbed Technology, Inc. such as the Riverbed SteelHead.
[0024] In addition to the predictive maintenance tasks and functions, the predictive engine (102) provides a predictive engine interface (104) that may be accessed
and used by end users to configure various aspects of the predictive engine (102), view data relating to the predictive monitoring and maintenance of the network (100) devices, and view information relating to other characteristics of the network (100) devices. In various implementations the predictive engine interface (104) may be provided as one or more of a website accessible by various web browsers and computing devices, or may be a software application configured to be installed on a mobile computing, desktop computing device, or other computing device.
[0025] One example of an implementation of the network (100) could include a brick and mortar retail enterprise that uses a mixture of a physical data room and cloud environments as the core device (110), and the edge device (106) is a store-level server at a location in California, and the edge device (108) is a store-level server at a location in Ohio. The predictive engine (102) may be installed on a network optimization device (103) in the physical data room, and may be accessed by technicians responsible for the predictive engine’s (102) performance via laptops or mobile phones configured with the predictive engine interface (104). In this implementation, the predictive engine (102) would predict and identify potential problems in the edge devices (106, 108) relating to the function of the network optimization device (103).
[0026] Another example of an implementation of the network (100) could include a state government that primarily uses a cloud environment as the core device (110), and the edge device (106) is a server located in the state capitol building, and the edge device (108) is a server located in a detention facility. The predictive engine (102) may be installed as a virtual machine with access to the cloud environment, and may be accessed by technicians responsible for the predictive engine’s (102) performance via laptops or mobile phones configured with the predictive engine interface (104) via a web browser or other software application. In addition to the above examples, other implementations and variations on the network (100), its components, and the configuration and location of predictive engine (102) and
predictive engine interface (104) will be apparent to one of ordinary skill in the art in light of this disclosure.
[0027] Turning now to FIG. 2, that figure shows a flowchart of an exemplary set of high- level steps (200) that could be performed to provide predictive analysis and preventative maintenance on a network (100). The steps of FIG. 2, as well as FIGS. 3-6, which describe one or more of the steps of FIG. 2 in more detail, could be performed by or with the predictive engine (102) and predictive engine interface (104), or by or with other computing devices and interfaces in various implementations. These steps include configuring (202) the predictive features and predictive maintenance features on the network (100). This could include configuring the predictive engine (102) itself to join the network (100) and execute tasks on the network (100), configuring the predictive engine (102) to recognize and communicate with other devices on the network including the core device (110) and the edge devices (106, 108), and configuring the predictive engine interface (104) to be accessible to one or more users.
[0028] Once the network (100) is configured (202) with the predictive features (e.g., by configuration on one or more devices of the network (100)), a component such as the predictive engine (102) may provide (204) predictive detection features associated with one or more devices present on the network (100). This could include gathering information on the core device (110) and the edge devices (106, 108) from the devices themselves, as well as other devices on the network that interact with those devices, and analyzing that data to predict and identify potential problems.
[0029] As problems are identified by the provided (204) predictive detection, a component such as the predictive engine (102) may execute (206) preventative maintenance associated with those problems. This could include automatically changing configurations of one or more devices of the network, causing devices to shut down, reboot, or update software, causing power supplies to cycle and force connected devices to hard-reboot, or other functions that may be configured to address particular problems.
[0030] As problems are identified by the provided (204) predictive detection, and preventative maintenance is executed (206) for those problems, a component such as the predictive engine (102) may provide (208) notifications associated with those problems. This could include updating and displaying information via an interface such as the predictive engine interface (104), providing visual or audible alerts, or providing electronic communications on various platforms such as email, text, and phone.
[0031] FIG. 3 is a flowchart of an exemplary set of steps (210) that that could be performed to configure (200) a network component such as the predictive engine (102) to monitor and provide predictive detection for one or more devices on the network. The predictive engine (102) may be configured (212) on a device that is already present on the network (100) such as the core device (100), an edge device (106, 108), or a network optimization device (103) or other specialized device or component of the network (100) that is capable of providing sufficient processing, storage, and network traffic as required by the predictive engine (102). In practice, this could include downloading or otherwise receiving an application or other software and installing the software to the pre-existing device.
[0032] The predictive engine (102) may also be configured (214) on a cloud device or environment that is newly added to the network (100) to provide scalable capabilities as required by the predictive engine (102). In practice, this could include downloading or otherwise receiving an application or other software and deploying or installing the software to the cloud device or environment. With the predictive engine (102) configured (212, 214), the predictive engine interface (104) may also be configured (216). This could include initially enabling accessing the predictive engine interface (104) and providing information such as user account details, licensing details, and general network details in order to allow one or more users to access the predictive engine interface (104) via one or more users devices.
[0033] While configuration (212, 214) of the predictive engine (102) may be performed in varying ways depending upon a particular network and implementation, it
should be noted that in some implementations it may be as simple as installing a software application on a single device of the network (100) such as the host device (101), without requiring any major configuration changes or software to be installed on other devices of the network (100) such as the edge devices (106, 108) or core device (110). This aspect of the predictive engine (102) may be advantageous for networks where physical or remote access to devices to make significant configuration changes or install new software is costly, difficult, or otherwise undesirable.
[0034] Once configured (216) and accessible, the predictive engine interface (104) may be used to configure (218) one or more devices on the network (100) that the predictive engine (102) will monitor. Configuration of devices could be performed semi-automatically by, for example, installing a beacon or other self- reporting application on those devices to cause them to broadcast their details to the predictive engine (102) or predictive engine interface (104) once they join the network (100), or by querying another device or database in order to identify and automatically populate data. Configuration of devices could also be performed manually by providing the required information via the predictive engine interface (104). Information provided when configuring a device may include, for example, a name or descriptor for the device, a physical address or location associated with the device, a serial number or device model for device, network information such as IP address, MAC address, in-path IP address, in-path MAC address, hypervisor IP address, baseband management controller (“BMC”) IP address, subnet mask, default gateway, and personnel information such as contact name, phone, email, and authentication information such as a username and password for remotely accessing the device.
[0035] Turning now to FIG. 4, that figure shows a flowchart of an exemplary set of steps
(220) that that could be performed by the predictive engine (102) to monitor the network (100) for predictive data and provide (204) one or more predictive detection features. Portions shown in dashed-lines may be performed in parallel by a separate device or component of the network (100), such as a network
optimization device (103) that is configured to control the flow of data and processing tasks across the network to improve efficiency and reliability of data processing. While not required for the predictive engine (102), the presence of such a network optimization device (103) on the network (100) may provide additional sources of data that may be used by the predictive engine (102).
[0036] It should be understood that in some implementations the network optimization device (103) may also be monitoring (400) the network (100) resources and managing (402) the network (100) resources in order to improve utilization of processing time and reduce the size and/or frequency of data transfers across the network (100), while the predictive engine (102) is performing predictive tasks associated with many of those same devices and data points in parallel. In either case, the predictive engine (102) is configured to regularly receive (228) device data from the configured devices of the network (100), and perform (230) predictive analysis of that data. Receiving (228) data from the devices may occur by the predictive engine (102) requesting such data from one or more devices or by one or more device pushing such data to the predictive engine (102), either of which may occur based upon one or more of a set schedule or randomized schedule, user request via the predictive engine interface (104), utilization or availability of the predictive engine (102), or in response to environmental factors (e.g., detection of problems with devices or networks located in the same building or geographical area, major weather events or other emergencies in the area, prioritization based upon a customer request).
[0037] Performing (230) predictive analysis on the data may include comparing the data to a set of predictive rules configured on the predictive engine (102) both as it arrives, in order to predict problems based upon that snapshot of data, as well as over time, in order to predict problems that may only be apparent based upon data collected from a device over hours, days, or even weeks. As predictive analysis is performed (230), one or more configured predictive rules may be triggered by a particular data set.
[0038] Particular predictive rules are more appropriately discussed in the context of FIG.
5, which shows a flowchart of an exemplary set of steps that could be performed to identify and address alerts that are triggered when predictive detection is performed (230). As predictive rules are satisfied or otherwise triggered by the contents of a particular data set, the predictive engine (102) may first determine (234) whether the scenario causing the alert is of a high priority. If the alert is not determined (234) to be of a high priority, the predictive engine (102) may only execute (240) a notification task related to the alert, which could include generating a notice of the alert via the predictive engine interface (104) or providing some other low priority or non-emergency electronic communication to one or more users.
[0039] If the alert is determined (234) to be high priority, the predictive engine (102) may determine (236) if there are any preventative tasks associated with that type of alert, and then perform (238) the preventative task if there are. In either case, the predictive engine (102) may then execute (240) a notification task related to the high priority alert, which may include one or more urgent notifications beyond simply updated the predictive engine interface (104), such as directly calling, texting, or otherwise communication with one or more support staff and providing a description of the high priority alert, and the results of an executed (238) preventative task, if any. While the types, classifications, preventative tasks, and notification tasks associated with a set of predictive rules in a particular implementation will vary greatly based upon the scope and characteristics of that implementation, some examples are provided below.
[0040] It should be understood that the steps of FIG. 5 represent one example of a method for addressing alerts from predictive rules, but other examples exist. In some exemplary implementations, predictive rules may be associated with four categories of alerts: critical, major, minor, and informational. Each predictive rule may be associated with one or more preventative tasks, regardless of its priority (e.g., even informational alerts may be associated with preventative tasks), as well as one or more notification tasks, to be performed or executed
when that predictive rule is triggered. A predictive rule may be triggered based upon recently received data, or aggregated data received over a period of time. In such an implementation, the predictive rules could be configured to provide varying levels of automated task performance depending upon the priority of the alert. As an example, critical alerts may be configured with preventative tasks intended to proactively address a network problem even at a significant resulting cost in network performance (e.g., automatically rebooting a critical device that results in a significant loss in network optimization), while major alerts may be configured with preventative tasks that allow for a minor loss in network performance (e.g., automatically reconfiguring a device or refreshing a connection that results in a short or minor loss in network optimization), and minor or informational alerts may only be configured with preventative tasks that can address a problem without any impact on the network optimization (e.g., disabling a single storage drive from a multi-drive mirrored storage array).
[0041] As can be seen in the above example, critical and major tasks may be more likely to be associated with preventative tasks that may have some cost in network functionality or optimization, as well as intrusive notification tasks such as direct electronic communication with personnel, while minor and informative alerts may be more likely to be associated with non-intrusive predictive tasks such as reconfiguring a device in a way that does not interrupt its normal performance, or notification tasks such as updating the predictive engine interface (104) or providing summarized results by email every 24 hours. Other variations exist on categorizing and configuring predictive rules and alerts, and such variations will be apparent to one of ordinary skill in the art in light of this disclosure.
[0042] In this implementation, an example of a critical alert may include a software licensing issue, whereby the predictive engine (102) may receive information from a device indicating that a software license or key on the device has expired, is missing, or is invalid, which may result in certain critical software functions of the device to fail. This could include, for example, the device failing to perform a software function for about 30-50% of a duration of time between 1 and 6 hours.
A preventative task associated with this alert may include identifying and reporting license information, and querying a license management system and requesting a new license or reverting to a backup data set where the license may still be present or uncorrupted. Notification tasks associated with this alert may include notifying personnel responsible for enterprise license management, and, as with many tasks, notifying personnel responsible for the particular device in question.
[0043] One critical alert may include the predictive engine (102) receiving information from a network optimization device (103) indicating that the network optimization device (103) is in a bypass mode, which causes it halt some or all of its network optimization tasks, thereby reducing the overall efficiency and reliability of the network (100). This alert may trigger when the network optimization device (103) is in bypass mode about 30-50% of a duration of time between 1 and 6 hours. A preventative task associated with this alert may include de-activating the bypass mode, soft-rebooting the device, or hard-rebooting the device. A notification task associated with this alert may include checking the device status to determine if it is healthy, and indicating that the condition will resolve itself if the device healthy, or indicating that one or more devices should be restarted.
[0044] Another critical alert may include critically high uncommitted data.
Uncommitted data is data that is stored on a device, such as the edge device (106, 108), that needs to be transmitted to another device, such as the core device (110), another edge device (106, 108) or a storage repository. Uncommitted data may be pre-processing (e.g., unmodified data that has been received by the edge device (106) and which the network optimization device (103) determines needs to be processed by the core device (110) or another edge device (108)), or post processing data (e.g., data that has been received by the edge device (106) and processed to compress it, encrypt it, reduce its overall size through paring irrelevant data or aggregating into a different format). One potential weakness with edge devices (106) is that they are generally not as powerful or scalable as
core devices (110), and so relying on them for processing and storage tasks increases the risk that their modest capabilities will be overwhelmed, especially where uncommitted data remains on the device longer than necessary.
[0045] For example, where the edge device (106) uses between about 30% to about 50% of its local storage for uncommitted data, a critical alert may be appropriate. An appropriate preventative task may be allocate or connect to additional storage, to clear the storage of unnecessary caches and stored files, to prioritize committing the data, or to soft-reboot or hard-reboot the device to see if data begins to commit normally. Other preventative and notification tasks for this alert may include reporting whether the device is optimizing traffic, reporting whether the device is in bypass mode, reporting whether the bypass mode was intentionally set or occurred unintentionally, reporting whether a SF core is connected, automatically restarting an optimization service, and reporting whether restarting the optimization service addressed the issue.
[0046] Another example of a critical alert is a Device Down alert, which indicates that a network optimization device (103) or other device is down or offline. This alert may trigger when a device on the network (100) does not respond to a ping on any interface, but a gateway associated with the device does respond. Preventative and notification tasks associated with this alert may include pinging the device on a primary interface, pinging a default gateway, pinging an in-path IP address, and reporting whether the device is not responding, or whether the gateway is not responding.
[0047] Another example of a critical alert is an Interface Issue alert, which indicates that the primary interface of the device is unresponsive. This alert may trigger when the primary interface does not respond to a ping, but a response is received from an in-path ping and a gateway ping, and the primary interface continues to be unresponsive after being reset by the predictive engine (102). Preventative and notification tasks associated with this alert may include logging the alert, resetting the primary interface one or more times, and reporting the results of resetting the primary interface.
[0048] Another example of a critical alert is a Secure Vault Locked alert, which indicates that a device’s secure vault is locked or has an error. This alert may be triggered when the device data indicates that a device has been unable to access the secure vault for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include reporting that the secure vault is locked and providing instructions to unlock the vault.
[0049] Another example of a critical alert is a Storage Optimization Service Replication
Error alert, which indicates that an error in an optimization service is causing it to stop running. This alert may trigger when an optimization service has failed to run for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with the alert may include automatically restarting the service, and reporting the status of the service before and after an automatic restart attempt.
[0050] Other critical alerts and associated exemplary preventative tasks may include a deactivated logical unit number (“LUN”) alert (i.e., where one or more storage capabilities of the device are deactivated or unavailable) resulting in activation or reboot, or SSH authentication failure alert (i.e., credentials are being rejected when an SSH connection is attempted to the device) resulting in a configuration change to the device or the network optimization device (103). Other preventative tasks for various critical alerts, beyond those already discussed above, will be apparent to one of ordinary skill in the art in light of this disclosure.
[0051] An example of a major alert is a Degraded Mode alert, which indicates that a device has one or more alarms activated causing it to be in a degraded state. This alert may trigger when the device is in a Degraded Mode for between about 30 and 50% of a period of time between about 1 and 6 hours. Preventative and notification tasks performed for this alert may include reporting the number and description of alarms on the device.
[0052] Another example of a major alert is a Disk Error alert, which indicates that a device has a disk error. This alert may be triggered when the device reports disk errors for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include reporting a disk error, reporting a disk state, and reporting any alarms from the device.
[0053] Another example of a major alert is a High Uncommitted Data alert, which indicates that a higher than usual amount of uncommitted data is stored on the datastore of the device. This alert may be similar to the Critical Uncommitted Data Alert, but may trigger at between about 10 and 20% of storage being utilized by uncommitted data, and may have similar preventative and notification tasks.
[0054] Another example of a major alert is a Service Halt Error alert, which indicates that a service such as an optimization service has stopped running on the device. This alert may trigger when the device data indicates that a service on the device has not functioned for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include restarting the service, and determining if the service crashed but restarted itself.
[0055] Another example of a major alert is a Site Down alert, which indicates that a site where the device is located is unreachable. This alert may trigger when a gateway device associated with the device does not respond to ping attempts. Preventative and notification tasks associated with this alert may include reporting that the gateway or site is unreachable, and attempting to automatically power cycle the gateway.
[0056] Another example of a major alert is an Expiring Peering Certificate alert, which indicates that a device has an expiring Peering Certificate. This alert may be triggered when the device data indicates a Peering Certificate may expire in about thirty days or less. Preventative and notification tasks associated with this alert ay
include identifying the expiring certificate, reporting the certificate, and providing a copy or a description of the certificate.
[0057] Some examples of major alerts and associated exemplary preventative tasks include a no data reduction alert (i.e., device shows no data reduction due to no peers to connect to) resulting in a device or network switch reboot, SSH connection failure alert (i.e., system cannot reach the device to attempt SSH connection, indicating a possible firewall or IP address misconfiguration) resulting in a configuration change to the device or network, or an unused device alert (i.e., device is showing unusual data traffic over a period of time) resulting in a device reboot or a configuration change on the network optimization device (103).
[0058] An example of a minor alert is a Datastore Corruption alert, which indicates that the Device has a corrupt datastore. This alert may trigger when the device reports a corrupt Datastore between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include reporting that the Datastore is corrupt and providing instruction on clearing the Datastore, and reporting the current state of the Datastore.
[0059] Another example of a minor alert is an Expiring CA Certificate alert, which indicates that a device has a certificate or authentication that will soon expire. This alert may be triggered when a device has a certificate or authentication that will expire within about thirty days or less. Preventative and notification tasks associated with this alert may include checking for expiring certificates, reporting expiring certificates and providing a copy of the certificate, and deleting the certificate so that it can be refreshed.
[0060] Another example of a minor alert is an Expiring Mobile Trusts Certificate alert, which indicates that a device has an expiring Mobile Trust certificate. This alert may be triggered when the device data indicates that a device has a mobile trust certificate that will expire in about thirty days or less. Preventative and notification tasks associated with this alert ay include identifying the expiring
certificate, reporting the certificate, and providing a copy or a description of the certificate.
[0061] Another example of a minor alert is an Expiring Peering CA Certificate alert, which indicates that a device has an expiring Peering CA certificate. This alert may be triggered when the device data indicates that a Peering CA Certificate will expire in about thirty days or less. Preventative and notification tasks associated with this alert ay include identifying the expiring certificate, reporting the certificate, and providing a copy or a description of the certificate.
[0062] Another example of a minor alert is an Expiring Peering Whitelist Certificate alert, which indicates that a device has an expiring Peering Whitelist Certificate. This alert may be triggered when the device data indicates a Peering Whitelist Certificate may expire in about thirty days or less. Preventative and notification tasks associated with this alert ay include identifying the expiring certificate, reporting the certificate, and providing a copy or a description of the certificate.
[0063] Another example of a minor alert is an Expiring Server Certificate alert, which indicates that a device has an expiring Server Certificate. This alert may be triggered when the device data indicates a Server Certificate may expire in about thirty days or less. Preventative and notification tasks associated with this alert ay include identifying the expiring certificate, reporting the certificate, and providing a copy or a description of the certificate.
[0064] Another example of a minor alert is an Expiring Server Chain Certificate alert, which indicates that a device has an expiring Server Chain Certificate. This alert may be triggered when the device data indicates a Server Chain Certificate may expire in about thirty days or less. Preventative and notification tasks associated with this alert ay include identifying the expiring certificate, reporting the certificate, and providing a copy or a description of the certificate.
[0065] Another example of a minor alert is a Flash Error alert, indicating that a device flash drive has become unresponsive. This alert may be triggered when the device data indicates a flash error for between about 30 and 50% of a duration of
time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include providing a notification of the error and instructions on resetting the device and flash drive.
[0066] Another example of a minor alert is a Half Duplex Interface alert, which indicates that a device has an interface that is communicating at half duplex speed. This alert may trigger when the device data indicates that an interface is communicating at a reduced speed. Preventative and notification tasks associated with this alert may include reporting and identifying the interface and its status.
[0067] Another example of a minor alert is a High Average System Load alert, which indicates that the device has exceeded a recommended CPU utilization. This alert may trigger when the device data for a device indicates that the CPU utilization for the device is above about 70 to 95% for a period of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include reporting the CPU utilization, reporting processes that are using the CPU, identifying processes that may be closed or restarted, reporting any virtual machines utilizing CPU, reporting compression level settings and indicating whether those settings may impact CPU utilization.
[0068] Another example of a minor alert is a High Paging Activity alert, which indicates that the device has run out of memory and is using a swap partition. This alert may trigger when the device data indicates a device has been using a swap partition for about 30 to 50% of a duration of time between about 1 and 6 hours. Predictive and notification tasks associated with this alert may include reporting memory swap statistics, reporting the number of pages swapped, and providing an indicator of whether that is a higher than normal number.
[0069] Another example of a minor alert is a High System Temperature alert, which indicates a device has exceeded a recommended temperature. This alert may trigger when the device data indicates a device has been above a recommended temperature for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may
include reporting the current temperature of the device, and reporting the status of any fans or other cooling devices.
[0070] Another example of a minor alert is a High CPU Utilization alert, which indicates that the device has exceeded a recommended CPU utilization. This alert may trigger when the device data for a device indicates that the CPU utilization for the device is above about 70 to 95% for a period of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include reporting the CPU utilization, reporting processes that are using the CPU, identifying processes that may be closed or restarted, reporting any virtual machines utilizing CPU, reporting compression level settings and indicating whether those settings may impact CPU utilization.
[0071] Another example of a minor alert is an Interface Error alert, which indicates that a device has errors on a physical interface. This alert may trigger when the device data indicates a device has interface errors between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include checking the interface speed and duplex settings and reporting their status, providing instructions on configuring the interfaces and duplex settings properly.
[0072] Another example of a minor alert is an Interface Link Lost alert, which indicates that a device has lost a link to a physical interface. This alert may trigger when the device data indicates a device has not communicated via a physical interface for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include reporting instructions on physically verifying the interface link, reporting the identify and status of the interface, and reporting any recent hardware log entries associated with the device or interface.
[0073] Another example of a minor alert is an IPMI Event alert, which indicates that a device has triggered an IPMI event. This alert may be triggered when the device data indicates that a device has been in an IPMI event state for between about 30
and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include reporting any error log entries associated with the device, and reporting any alarms associated with the device.
[0074] Another example of a minor alert is a Memory Error alert, which indicates that a device has errors in a memory module. This alert may be triggered when the device data indicates that a device memory module has been producing errors for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include identifying the memory and reporting its status.
[0075] Another example of a minor alert is a No Data Reduction - No Peers alert, which indicates that a device is not achieving any data reduction, and has no peers. This alert may be triggered when the device data indicates that a device is not reducing data, and there are no connected peers. Preventative and notification tasks associated with this alert may include reporting that there are no peers connected to the device.
[0076] Another example of a minor alert is a No Data Reduction - Peers Disconnected alert, which indicates that a device is not achieving any data reduction, and has peers configured but they are disconnected. This alert may be triggered when the device data indicates that a device is not reducing data, and there are configured peers that are all offline. Preventative and notification tasks associated with this alert may include reporting that there are no peers connected to the device, automatically restarting the optimization service, and reporting whether the restart was successful.
[0077] Another example of a minor alert is a No Data Reduction - Peers Connected alert, which indicates that a device has no data reduction and has connected peers. This alert may trigger when the device data indicates that the data reduction for a device is zero, but that there are configured and connected peers that are currently online. Preventative and notification tasks associated with this alert may include
restarting the optimization service, and if the issue is not corrected, reporting that data is not being reduced despite peers being connected and that other configurations may need to be manually changed.
[0078] Another example of a minor alert is a Process Dump Staging Directory
Inaccessible alert, which indicates that a device has experienced an error while creating a process dump. This alert may trigger when the device data indicates that a device has experienced errors while creating a process dump for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative tasks and notifications associated with this alert may include providing instructions for clearing space on the device, reporting a list of dump files on the device, and reporting disk information for the device.
[0079] Another example of a minor alert is a RAID Error alert, which indicate that a device has encountered RAID errors. This alert may trigger when the device data indicates that a device has experienced RAID errors for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include reporting current RAID alarms, RAID failures, a RAID diagram, RAID array info, and RAID physical drive info.
[0080] Another example of a minor alert is a Service Error alert, which indicate that a service such as an optimization services has stopped running on a device. This alert may trigger when the device data indicates that a device has had a service inactive for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include automatically restarting the service, and checking if the service is currently running and indicating that it crashed but restarted automatically.
[0081] Another example of a minor alert is an SMB Authentication Error alert, which indicates that a device is unable to authenticate a domain, and is unable to optimize CIF and SMB connections. This alert may trigger when the device data indicates that a device has been unable to authenticate a domain for between
about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include reporting on a domain status, providing instructions to rejoin a domain, and providing instruction to initially join a domain.
[0082] Another example of a minor alert is a Storage Optimization Service Replication
Paused alert, which indicates that an optimization service is paused. This alert may trigger when an optimization service has failed to run for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with the alert may include automatically restarting the service, and reporting the status of the service before and after an automatic restart attempt.
[0083] Another example of a minor alert is a System Disk Full alert, which indicates that a devices system disk partitions are full and cannot be written to. This alert may trigger when the device data indicates that a device system disk has been full for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include reporting that the disk is full and providing instructions on clearing space, reporting a list of dump files on the device, and reporting disk information of the device.
[0084] Another example of a minor alert is Admission Control, which indicates that a device has reached its license limitations for a number of optimized connections. New connections will not be optimized while the device is in Admission Control mode. This predictive rule may trigger when device data indicates that a device is in Admission Control mode for more than about 30-50% of a period of time between about 1 and 6 hours. Preventative and notification tasks associated with an Admission Control alert may include identifying protocols and reporting protocol percentages that are less than 6%, reporting protocols that are getting negative optimization and the extent of negative optimization, reporting a number of connections opened by a particular protocol, reporting that no protocols were identified that would benefit from passthrough, checking the current license on
the device to determine if an upgraded license is available that would address the Admission Control mode, and reporting the possible upgrade.
[0085] Some additional examples of minor alerts that may only trigger notification tasks or updates to the predictive engine interface (104), but which may also be associated with preventative tasks if desired, include Under-Optimized Port Traffic (i.e., Device is performing below thresholds on a particular port), and Unknown SSH Error (i.e., An unknown error occurred when attempting to create a SSH connection to the device).
[0086] An example of an informational alert is Critical Mode, which indicates that an optimization service on the device has ceased functioning or is in bypass mode. This alert may trigger when a device is in critical mode between about 30 and 50% of a duration of time between 1 and 6 hours. Preventative and notification tasks associated with this alert may include determining whether the optimization service is currently running and reporting that status. Where the optimization service is not running, tasks may include indicating that the service is being restarted, automatically restarting the service, and then reporting whether the service restart was successful or not.
[0087] An example of an informational alert is an Inactive Monitored VM alert, which indicates that a device has a virtual machine that is currently offline or powered off. This alert may trigger when a virtual machine on the device is marked as monitored and powered on by default, but is currently powered off. Preventative and notification tasks may include reporting the VM that is currently powered off.
[0088] An example of an informational alert is an Internal Fan Error alert, which indicates a device has an error with one or more fans. This alert may be triggered when the device data indicates a fan on a device has failed to function normally for between about 30 and 50% of a duration of time between about 1 and 6 hours. Preventative and notification tasks associated with this alert may include identifying and reporting the fan that is not functioning normally, providing
instructions for checking and cleaning the fan, and reporting any new error log entries for the device.
[0089] Some additional examples of informational alerts that may only trigger notification tasks or updates to the predictive engine interface (104), but which may also be associated with preventatives tasks if desired, include Collection Protocol Syslog (i.e., agent can’t fetch the Syslogs from the device), and WAN0_0 Interface Out Errors (i.e., WAN0_0 interface on the device shows errors). Additional predictive rules, alert types, categorizations, and preventative tasks, exist and will be apparent to one of ordinary skill in the art in light of this disclosure.
[0090] Turning now to FIG. 6, that figure shows a flowchart of an exemplary set of steps that could be performed by the predictive engine (102) to provide notifications based on alerts produced during predictive detection. When an alert is triggered that is associated with an executed (240) notification task, the specifics of the notification task will be determined (244) and the predictive engine interface (104) will be updated (246) to reflect the occurrence of the alert, so that a user of the predictive engine interface (104) may view information on the details and devices associated with the alert. It will also be determined (248) if the notification task is an active notification that requires some action beyond updating the predictive engine interface (246). This may be the case where a critical or major alert is configured to provide direct notifications as electronic communications to one or more support personnel. Where it is determined (248) that there is an active notification required, the predictive engine (102) will provide (25) the active notification based upon the configuration for that notification and predictive rule.
[0091] Turning now to FIG. 7, that figure shows a simulated screenshot of an exemplary monitor view (300) of an exemplary predictive engine interface such as the predictive engine interface (104). The monitor view (300) may be a view that is displayed to users by default when they access the predictive engine interface (104). The monitor view comprises a map pane (302), a data pane (304), a high
priority alert pane (306), a low priority alert pane (308), an offline device indicator (310), and a new alert indicator (312).
[0092] The map pane (302) uses information that is associated with the network (100) and the configured devices (e.g., core device (104), edge device (106), edge device (108)) to display a map (314) that is associated with the geographical area covered by the network (100). The map may be marked with indicators showing the location and status of devices that are associated with a normal operational status (316) that are being monitored by the predictive engine (102), and may also be marked to show the location and status of devices associated with a low- priority alert (317) and devices associated with a high priority alert (318), by changing one or more of the color, texture, or graphical image of the devices to be differentiated.
[0093] The data pane (304) may show data associated with the monitored devices of the network (100) in a variety of ways, including as a chart, graph, table, or other desirable data visualization· The data pane (304) shown in FIG. 7 displays a graph of the data reduction achieved over the network (100) during a period of time. Data reduction is one of the advantages provided by the efficient management of the network (100) by a network optimization device (103), and the data reduction percentage indicates the extent to which data was pre-processed at an edge device (106, 108) before transport across the network (100) at various times.
[0094] As an example, at one period of time (326) on the graph, the percentage of data reduction (324) was 50%. Supposing that all edge devices on the network (100) received or produced a total of 1TB of data during that period of time which ultimately needs to be committed to the core device (110), a 50% data reduction indicates that edge computing allowed the data to be reduced to 500GB before being committed and transported across the network (100). The data pane (304) also shows a list (320) of the devices achieving the highest efficiency in data reduction, as well as a list (322) of the devices achieving the lowest efficiency in data reduction.
[0095] While the data pane (304) depicted in FIG. 7 shows data reduction information, it should be understood that the data pane (304) may show one or more types of data relating to the efficiency, performance, or health of the network (100). For example, one data pane may show a percentage of processor utilization of devices in the network (100), and yet another data pane may show a percentage of devices storage that is being used on uncommitted data.
[0096] The offline device indicator (310) and the new alert indicator (312) each show important information and are positioned to be visible to a user. The offline device indicator (310) indicates any devices in the network (100) that are normally monitored by the predictive engine (102), but which have been excepted from monitoring for some reason. Halting monitoring on a device may be useful where it is being manually maintained, upgraded, or replaced, in order to avoid generating false critical alerts for the device throughout that time period. The offline device indicator (310) serves as a reminder that some devices may be offline, and that monitoring of those devices should be re-enabled at a future time.
[0097] The new alert indicator (312) shows alerts that have been generated and not yet viewed or addressed, and may be clicked on by a user to view additional information on each alert in a variety of summary views or detailed views. The predictive engine interface (104) may also allow users to review and search alert histories over varying periods of time, and could allow, for example, all alerts that occurred across the network (100) during a particular time period to be viewed in a list, or all alerts that occurred for a particular device during a particular time period, or other subsets of alerts as may be desired.
[0098] The predictive engine interface (104) may also offer additional tools and views that may be used to interact with and enhance the predicative maintenance and troubleshooting functionality provided by the system. As an example, FIG. 8 shows a simulated screenshot of an exemplary device view (328) of the predictive engine interface (104). The device view (328) comprises a set of interface features for selecting to view device images (330), selecting to view site install images (332), and selecting to view and select other command utilities (334). The
device view (328) also displays a set of device images (336) for one or more of the devices that are monitored by the predictive engine (102), and may advantageously include simulated schematic diagrams of the devices. Schematic diagrams for devices may be desirable for providing remote support for devices, and may in some implementations be viewable via the predictive interface (104) by both centralized support personnel as well as personnel located proximate to the displayed device to provide a common reference during troubleshooting. Providing the device image (336) as a schematic diagram avoids the need for exchanging photographs or verbal descriptions of the device, which can be an inefficient and error prone process.
[0099] The device images (336) may be manually added at the time a device is configured (218), or may be automatically retrieved and added based upon a device serial number, model type, or other characteristic at the time the device is configured (218), in order to ensure the accuracy and availability of the schematic. The device images (336) provide information on the appearance, features, connections, and interfaces of the device, and may assist users of the predictive engine (102) in effectively providing troubleshooting or support of a remotely located device that they are not directly viewing.
[00100] Another view that may be available via the predictive engine interface (104) is shown in FIG. 9, which shows a simulated screenshot of an exemplary site view (342). The site view (342) displays one or more images of the devices or the site at which they are installed, which may be captured by a technician at the site and provided to the predictive engine interface (104). While the devices images (336) are a useful tool for providing remote troubleshooting, actual images of the site and device may also be useful. Site images may advantageously include a serial number image (338) showing the serial number of the device, a connection image (340) showing the cables and cords that are currently connected to the device, an install location image showing the rack, room, closet, or other area where the device is installed, and other images. In some implementations, the predictive engine interface (104) may require that one or more site images such as those
described above be uploaded when the device is configured, or may produce notifications via the predictive engine interface (104) until such images are uploaded, in order to increase the likelihood that configured devices will be accurately and fully represented in the site view (342). The predictive engine (102) may also perform image analysis on uploaded pictures to extract and verify serial numbers or model numbers, verify installation of cables and connections, or verify an uploaded image more likely than not actually displays the required image (i.e., and is not a blank image uploaded simply to bypass the requirement).
[00101] Other functionality available through the predictive engine interface (104) may include various features and functions available by selecting a command utilities (334) button. Such functionality may include tools for pinging site locations and devices to determine their status manually, testing interfaces to determine if primary and in-path interfaces are operating, restarting the command line interface if that interface becomes non-responsive, restarting a client Webserver if a device UI becomes non responsive, and entering a command mode where commands can be typed, transmitted, and executed on remote monitored devices.
[00102] Further variations on, and features for, the inventors’ technology will be immediately apparent to, and could be practiced without undue experimentation by, those of ordinary skill in the art in light of this disclosure. Accordingly, instead of limiting the protection accorded by this document, or by any document which is related to this document, to the material explicitly disclosed herein, the protection should be understood to be defined by the claims, if any, set forth herein or in the relevant related document when the terms in those claims which are listed below under the label“Explicit Definitions” are given the explicit definitions set forth therein, and the remaining terms are given their broadest reasonable interpretation as shown by a general purpose dictionary. To the extent that the interpretation which would be given to such claims based on the above disclosure is in any way narrower than the interpretation which would be given based on the“Explicit Definitions” and the broadest reasonable interpretation as provided by a general purpose dictionary, the interpretation provided by the
“Explicit Definitions” and broadest reasonable interpretation as provided by a general purpose dictionary shall control, and the inconsistent usage of terms in the specification or priority documents shall have no effect.
[00103] Explicit Definitions
[00104] When appearing in the claims, a statement that something is “based on” something else should be understood to mean that something is determined at least in part by the thing that it is indicated as being“based on.” When something is required to be completely determined by a thing, it will be described as being “based exclusively on” the thing.
[00105] When used in the claims,“configured” should be understood to mean that the thing“configured” is adapted, designed or modified for a specific purpose. An example of “configuring” in the context of computers is to provide a computer with specific data (which may include instructions) which can be used in performing the specific acts the computer is being “configured” to do. For example, installing Microsoft® WORD on a computer“configures” that computer to function as a word processor, which it does by using the instructions for Microsoft WORD in combination with other inputs, such as an operating system, and various peripherals (e.g., a keyboard, monitor, etc).
[00106] When used in the claims, “determining” should be understood to refer to generating, selecting, defining, calculating or otherwise specifying something. For example, to obtain an output as the result of analysis would be an example of “determining” that output. As a second example, to choose a response from a list of possible responses would be a method of“determining” a response. As a third example, to identify data received from an external source (e.g., a microphone) as being a thing would be an example of“determining” the thing.
[00107] When used in the claims, a“set” should be understood to refer to a collection containing zero or more objects of the type that it refers to. So, for example, a “set of integers” describes an object configured to contain an integer value, which includes an object that contains multiple integer values, an object that contains
only a single integer value, and an object that contains no integer value whatsoever.
[00108] When used in the claims, a“means for determining the existence of an anomaly caused by network device behavior” should be understood as a means plus function limitation as provided for in 35 USC § 112(f) where the function is “determining the existence of an anomaly caused by network device behavior” and the corresponding structure is a computer configured as described in FIGS. 4- 5 and their associated discussion, as well as paragraphs [0035]-[0089].
[00109] We claim:
Claims
1. A system that is configurable on a network in order to provide predictive features, the system comprising:
(a) a predictive engine comprising a set of device definitions associated with a set of monitored devices on the network and usable to receive information associated with the set of monitored devices, and a set of predictive rules usable to analyze data received from the set of monitored devices; and
(b) a predictive engine interface accessible and viewable by a user to interact with the predictive engine;
wherein the predictive engine is configured to:
(i) receive a set of device data from a device of the set of monitored devices;
(ii) determine if any predictive rule from the set of predictive rules is triggered based upon the set of device data;
(iii) where a predictive rule from the set of predictive rules is triggered based upon the set of device data:
(A) generate a device alert based upon the predictive rule and the set of device data, and
(B) perform a preventative task associated with the device alert.
2. The system of claim 1, further comprising a network optimization device that is configured to control the transport and processing of data by one or more devices on the network, wherein the predictive engine is configured on the network optimization device.
3. The system of claim 1, wherein:
(a) the set of device definitions is usable to receive information associated with the set of monitored devices indirectly via another device of the network;
(b) wherein the network comprises a core device and two or more edge devices, and the set of monitored devices comprises the two or more edge devices; and
(c) the predictive engine is installed on the core device.
4. The system of claim 3, wherein the network further comprises:
(a) a set of network paths, wherein each of the two or more edge devices is connected to the core device by a network path of the set of network paths; and
(b) a network monitoring device configured on each network path of the set of network paths; and
wherein the set of device definitions is usable to receive information associated with the set of monitored devices via the network monitoring device configured on each network path.
5. The system of claim 1, wherein the preventative task comprises one or more of:
(a) a preventative maintenance task that, when performed, causes a change in state or configuration of the device;
(b) an active notification task that, when performed, causes an electronic communication to be provided to one or more personnel associated with the device;
(c) an update to the predictive engine interface to display information associated with the device alert to the user.
6. The system of claim 1, wherein the predictive engine is further configured to:
(a) receive a set of efficiency data from a network optimization device of the network, wherein the set of efficiency data describes one or more aspects of efficient performance of the device;
(b) determine if any efficiency rule from the set of predictive rules is triggered based upon the set of efficiency data;
(c) where an efficiency rule from the set of predictive rules is triggered based upon the set of efficiency data:
(A) generate an efficiency alert based upon the efficiency rule and the set of efficiency data, and
(B) perform a preventative task associated with the efficiency alert.
7. The system of claim 6, wherein:
(a) the set of efficiency data describes a level of data reduction achieved by the device; and
(b) the efficiency rule is usable to determine, based on the set of efficiency data, whether a threshold level of data reduction is being achieved on the device.
8. The system of claim 7, wherein the predictive engine interface is configured to display:
(a) a map pane comprising a geographical display and a set of device indicators positioned on the geographical display, wherein each device indicator of the set of device indicators shows the location and status of a monitored device of the set of monitored devices;
(b) a data pane comprising:
(i) a graph of data reduction for each of the set of monitored devices over a period of time;
(ii) an ordered list of the set of monitored devices having the highest data reduction over the period of time; and
(iii) an ordered list of the set of monitored devices having the lowest data reduction over the period of time; and
(c) an alert pane comprising a set of efficiency alerts and a set of device alerts.
9. The system of claim 1, wherein the predictive rule is usable to determine, based on the set of device data, whether a threshold level of storage of the device is being used to store a set of uncommitted data, wherein the set of uncommitted data is data that the device is configured to transmit to a second device of the network when possible.
10. The system of claim 1, wherein the predictive engine interface is configured to display a device image pane for any of the set of monitored devices, wherein the device image pane comprises a schematic diagram of that device showing its features and connections.
11. The system of claim 1, wherein the predictive engine interface is configured to display a site image pane for any of the set of monitored devices, wherein the site image pane comprises, for a viewed device for which the image pane is displayed, a set of photographic images, the set of photographic images comprising a device identifier for the viewed device, a photographic image of a set of connections for the viewed device, and a photographic image of an area in which the viewed device is located.
12. The system of claim 11, wherein the predictive engine interface is configured to, when the viewed device is being configured as one of the set of monitored devices:
(a) request from the user each of the set of photographic images; and
(b) when the user does not provide each of the set of photographic images, display an alert on the predictive engine interface.
13. A method for providing predictive maintenance for a network comprising the steps:
(a) configuring a predictive engine on a device present on the network;
(b) configuring the predictive engine to monitor a set of monitored devices present on the network;
(c) receiving a set of device data from the set of monitored devices;
(d) determining if any predictive rule from a set of predictive rules stored on the predictive engine is triggered based upon the set of device data;
(e) when any predictive rule from the set of predictive rules is triggered based upon the set of device data:
(i) generating a device alert based upon that predictive rule and the set of device data; and
(ii) performing a preventative task associated with that device alert.
14. The method of claim 13, further comprising the steps:
(a) receiving a set of efficiency data associated with an edge device of the set of monitored devices; and
(b) adding the set of efficiency data to the set of device data.
15. The method of claim 14, wherein the set of efficiency data describes a level of data reduction achieved by the edge device, and the set of the predictive rules is usable to determine, based on the set of efficiency data, whether a threshold level of data reduction is being achieved on the device.
16. The method of claim 15, wherein the set of device data comprises a level of uncommitted data stored on the edge device, and the set of predictive rules is usable to determine, based on the level of uncommitted data stored on the edge device, whether a threshold level of uncommitted data is stored on the edge device.
17. The method of claim 16, further comprising the steps of:
(a) performing the preventative task to increase the level of data reduction being achieved on the device when any predictive rule of the set of predictive rules is
used to determine that less than the threshold level of data reduction is being achieved on the device; and
(b) performing the preventative task to cause the device to transmit the uncommitted data to another device when any predictive rule of the set of predictive rules is used to determine that more than the threshold level of uncommitted data is being stored on the device.
18. A system that is configurable on a network in order to provide predictive features, the system comprising:
(a) a means for determining the existence of an anomaly caused by network device behavior; and
(b) a predictive engine interface;
wherein the predictive engine interface is configured to:
(i) configure a set of monitored devices based upon input from a user;
(ii) provide the set of monitored devices to the means for determining the existence of an anomaly caused by network device behavior; and
(iii) where the existence of an anomaly is determined, generate a device alert based upon the anomaly and a monitored device associated with the anomaly, wherein the device alert comprises a description of a preventative task that was performed in response to the anomaly.
19. The system of claim 18, wherein the preventative task that was performed is one or more of:
(a) resetting one or more devices of the set of monitored devices;
(b) resetting one or more communication devices of the network; and
(c) causing one or more devices of the set of monitored devices to transmit a set of uncommitted data to another device.
20. The system of claim 18, wherein the set of monitored devices is usable by the means for determining the existence of an anomaly caused by network device behavior to receive a set of efficiency data from a network optimization device of the network.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201862683832P | 2018-06-12 | 2018-06-12 | |
| US62/683,832 | 2018-06-12 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019241199A1 true WO2019241199A1 (en) | 2019-12-19 |
Family
ID=68843564
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2019/036478 Ceased WO2019241199A1 (en) | 2018-06-12 | 2019-06-11 | System and method for predictive maintenance of networked devices |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2019241199A1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11201779B1 (en) * | 2020-09-11 | 2021-12-14 | Cerner Innovation, Inc. | Generation of synthetic alerts and unified dashboard for viewing multiple layers of data center simultaneously |
| US11429474B2 (en) | 2020-08-03 | 2022-08-30 | Bank Of America Corporation | Enterprise IOT system for onboarding and maintaining peripheral devices |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6446123B1 (en) * | 1999-03-31 | 2002-09-03 | Nortel Networks Limited | Tool for monitoring health of networks |
| US20030224827A1 (en) * | 2002-05-17 | 2003-12-04 | Hideaki Chiba | Concentrator and reset control method therefor |
| US7120819B1 (en) * | 2001-11-15 | 2006-10-10 | 3Com Corporation | Method and system for fault diagnosis in a data network |
| US20080133178A1 (en) * | 2006-11-30 | 2008-06-05 | Solar Turbines Incorporated | Maintenance management of a machine |
| US7822417B1 (en) * | 2005-12-01 | 2010-10-26 | At&T Intellectual Property Ii, L.P. | Method for predictive maintenance of a communication network |
| US20160098337A1 (en) * | 2014-10-02 | 2016-04-07 | International Business Machines Corporation | Sampling of device states for mobile software applications |
| US9361348B1 (en) * | 2011-10-05 | 2016-06-07 | Google Inc. | Database replication |
| US20170011298A1 (en) * | 2015-02-23 | 2017-01-12 | Biplab Pal | Internet of things based determination of machine reliability and automated maintainenace, repair and operation (mro) logs |
-
2019
- 2019-06-11 WO PCT/US2019/036478 patent/WO2019241199A1/en not_active Ceased
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6446123B1 (en) * | 1999-03-31 | 2002-09-03 | Nortel Networks Limited | Tool for monitoring health of networks |
| US7120819B1 (en) * | 2001-11-15 | 2006-10-10 | 3Com Corporation | Method and system for fault diagnosis in a data network |
| US20030224827A1 (en) * | 2002-05-17 | 2003-12-04 | Hideaki Chiba | Concentrator and reset control method therefor |
| US7822417B1 (en) * | 2005-12-01 | 2010-10-26 | At&T Intellectual Property Ii, L.P. | Method for predictive maintenance of a communication network |
| US20080133178A1 (en) * | 2006-11-30 | 2008-06-05 | Solar Turbines Incorporated | Maintenance management of a machine |
| US9361348B1 (en) * | 2011-10-05 | 2016-06-07 | Google Inc. | Database replication |
| US20160098337A1 (en) * | 2014-10-02 | 2016-04-07 | International Business Machines Corporation | Sampling of device states for mobile software applications |
| US20170011298A1 (en) * | 2015-02-23 | 2017-01-12 | Biplab Pal | Internet of things based determination of machine reliability and automated maintainenace, repair and operation (mro) logs |
Non-Patent Citations (1)
| Title |
|---|
| ANONYMOUS: "What is predictive maintenance?", ACCELIX, 9 April 2018 (2018-04-09), XP055664401, Retrieved from the Internet <URL:https://www.accelix.com/community/predictive-maintenance/predictive-maintenance-explained/> [retrieved on 20190809] * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11429474B2 (en) | 2020-08-03 | 2022-08-30 | Bank Of America Corporation | Enterprise IOT system for onboarding and maintaining peripheral devices |
| US11201779B1 (en) * | 2020-09-11 | 2021-12-14 | Cerner Innovation, Inc. | Generation of synthetic alerts and unified dashboard for viewing multiple layers of data center simultaneously |
| US11558242B2 (en) | 2020-09-11 | 2023-01-17 | Cerner Innovation, Inc. | Generation of synthetic alerts and unified dashboard for viewing multiple layers of data center simultaneously |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11632320B2 (en) | Centralized analytical monitoring of IP connected devices | |
| JP7436737B1 (en) | Server management system that supports multi-vendors | |
| JP2022171958A (en) | System, method, apparatus, and non-temporary computer readable storage medium for providing mobile device support services | |
| US10048996B1 (en) | Predicting infrastructure failures in a data center for hosted service mitigation actions | |
| US9450700B1 (en) | Efficient network fleet monitoring | |
| US8910172B2 (en) | Application resource switchover systems and methods | |
| JP5211160B2 (en) | How to automatically manage computer network system downtime | |
| US20190379576A1 (en) | Providing dynamic serviceability for software-defined data centers | |
| US20080046552A1 (en) | Service resiliency within on-premise products | |
| US11599404B2 (en) | Correlation-based multi-source problem diagnosis | |
| US12181954B2 (en) | Computing cluster health reporting engine | |
| US10929259B2 (en) | Testing framework for host computing devices | |
| US20240370330A1 (en) | Method for managing server in information technology asset management system | |
| US12360857B2 (en) | System and method for managing automatic service requests for workload management | |
| WO2019241199A1 (en) | System and method for predictive maintenance of networked devices | |
| US20240356796A1 (en) | System for monitoring servers totally | |
| CN115934453A (en) | Troubleshooting method, troubleshooting device and storage medium | |
| US12386633B2 (en) | System and method for managing automatic service requests for scaling nodes in a client environment | |
| US20240372780A1 (en) | Information technology asset management system for providing server configuration automation | |
| US20240362104A1 (en) | Server management system using ai | |
| US12306698B2 (en) | Method and system for end-to-end prediction of unexpected events occurred in a disaster recovery system | |
| US12386691B1 (en) | Method and system for detecting anomalous sub- sequences in metadata using rolling windows | |
| US12229298B2 (en) | Method and system for generating an automatic service request based on metadata | |
| US20240303170A1 (en) | Proactive method to predict system failure and recommend user for an uninterrupted connection while remotely managing devices | |
| CN108920164A (en) | The management method and device of host in cloud computing system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19819782 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 19819782 Country of ref document: EP Kind code of ref document: A1 |