+

WO2018228667A1 - Automatic feature selection in machine learning - Google Patents

Automatic feature selection in machine learning Download PDF

Info

Publication number
WO2018228667A1
WO2018228667A1 PCT/EP2017/064317 EP2017064317W WO2018228667A1 WO 2018228667 A1 WO2018228667 A1 WO 2018228667A1 EP 2017064317 W EP2017064317 W EP 2017064317W WO 2018228667 A1 WO2018228667 A1 WO 2018228667A1
Authority
WO
WIPO (PCT)
Prior art keywords
analysis
factor
training data
features
rules
Prior art date
Application number
PCT/EP2017/064317
Other languages
French (fr)
Inventor
Janakiraman THIYAGARAJAH
Peter Valeryevich Bazanov
Peng Lv
Luca De Matteis
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to EP17731111.5A priority Critical patent/EP3612980A1/en
Priority to PCT/EP2017/064317 priority patent/WO2018228667A1/en
Publication of WO2018228667A1 publication Critical patent/WO2018228667A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Definitions

  • the present disclosure relates to machine learning.
  • the present disclosure relates to automatic feature selection in machine learning.
  • Feature selection aims to solve this problem by selecting only a subset of relevant features from a large set of available features. By removing redundant or irrelevant features, feature selection may help reducing the dimensionality of the data, speed up the learning process, simplify the learnt model, and/or increase the performance.
  • a system comprising a learning module to extract rules from training data, and a feature selection module to determine features of the training data to be used for extracting the rules, wherein the feature selection module is to receive context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains.
  • module refers to software, hardware, or a combination of software and hardware.
  • the feature selection module may automatically select the features based on the context data specifying the area of analysis in which the extracted rules are to be used, and the domain information indicating the one or more technical environments to which the training data pertains. Accordingly, the feature selection module may be enabled to map the area of analysis to the features which are relevant, while taking into account the technical environment in which the features were produced.
  • the system comprises an analytics module, the analytics module to provide a plurality of services, wherein the services are directed at different areas of analysis, wherein the context data is to specify one area of analysis of the different areas of analysis at which the services are directed.
  • the term "service” as used throughout the description and claims in particular refers to the provision of data in response to a request.
  • the analytics module may be directed at data mining and provide data in response to a request to identify a pattern in live data.
  • the context data is to further specify a technique to be applied by the learning module.
  • the technique comprises one or more of classification, regression, clustering, prediction, and anomaly detection.
  • the different areas of analysis comprise one or more of a root cause analysis, a service impact analysis, a fault prediction analysis, a traffic prediction analysis, a security/threat analysis, a service/resource optimization analysis, and a service/application performance analysis.
  • the one or more technical environments include one or more of application management, server management, telecommunications networks, wide area networks, data center network operations, cloud operations, and security operations.
  • the feature selection module is to assign different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
  • a factor vector may indicate a relevance of factors to an area of analysis.
  • the feature selection module is to determine a relationship between features and factors, wherein a congruence between a feature and a factor is to be determined based on fuzzification.
  • a feature may be processed based on a fuzzy logic normalization and a factor analysis may be performed to determine factors which identify the context.
  • the feature selection module is to assign different attribute vectors to the different areas of analysis, wherein an attribute vector comprises a subset of the factors, wherein the subset is selected based on a relevance score of the factors in view of the area of analysis.
  • an attribute vector may indicate which factors are particularly relevant to an area of analysis.
  • the feature selection module is to assign scores to the features of the training data based on the attribute vector corresponding to the area of analysis.
  • the features having scores above a threshold may be selected and used for training of the learning module.
  • a method of training data feature selection for extracting rules from the training data comprising receiving context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains, selecting features of the training data based on the context data and the domain information, and feeding a machine learning module with the training data and information on the selected features.
  • the method may automatically select the features based on the context data specifying the area of analysis in which the extracted rules are to be used, and the domain information indicating the one or more technical environments to which the training data pertains. Accordingly, the method may map the area of analysis to the features which are relevant, while taking into account the technical environment in which the features were produced.
  • the different areas of analysis comprise one or more of a root cause analysis, a service impact analysis, a fault prediction analysis, a traffic prediction analysis, a security/threat analysis, a service/resource optimization analysis, and a service/application performance analysis.
  • the one or more technical environments include one or more of application management, server management, telecommunications networks, wide area networks, data center network operations, cloud operations, and security operations.
  • the method comprises assigning different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
  • a factor vector may indicate a relevance of factors to an area of analysis.
  • the method comprises determining a relationship between features and factors, wherein determining a congruence between a feature and a factor is based on fuzzification.
  • a feature may be processed based on a fuzzy logic normalization and a factor analysis may be performed to determine factors which identify the context.
  • Fig. 1 shows a block diagram of an exemplary system
  • Fig. 2 shows a flow-chart of a machine learning process
  • FIG. 3 shows another flow-chart of the machine learning process of Fig. 2;
  • Fig. 4 shows examples of domain information
  • Fig. 5 shows examples of context data
  • Fig. 6 shows an exemplary process of assigning factors to a context
  • Fig. 7 shows an exemplary process of assigning an attribute vector to an area of analysis
  • Fig. 8 shows a process for fusing context factors and attribute factors
  • Fig. 9 shows a process of selecting features
  • Fig. 10 shows an overview of the steps of the feature selection process.
  • the following exemplary system and method relate to unsupervised machine learning for addressing challenges faced in the operation complex systems such as of cloud computing systems involving a plurality of interoperating computing devices, although the system and method are not limited to cloud computing systems.
  • the exemplary system and method are directed at optimizing the feature selection prior to machine learning, e.g., in the area of operational analytics, and may improve the usability and accuracy as compared to feature selection by experts .
  • Fig. 1 shows a block diagram of an exemplary system 10.
  • the system 10 which may be a computing system comprising one or more interoperating computing devices may comprise a feature selection module 12 and a machine learning module 14.
  • the feature selection module 12 may be provided with training data 16a, 16b.
  • the training data 16a, 16b may be collected from a single source or multiple sources and may comprise a plurality of data fields 18.
  • Each data field 18 may include one or more features 20.
  • a feature 20 may refer to an alarm serial number, an alarm type, a (first) occurrence time, a clearance time, a location, etc.
  • the training data 16a, 16b may be obtained from multiple sources wherein some training data 16a is sparse and some training data 16b is dense.
  • multiple indicators may be extracted from the data distribution and a normalization and feature scaling may be performed, e.g., using softmax, sigmoid functions, etc.
  • the feature selection module 12 may carry-out a factor analysis to extract initial group components and then improve the grouping according to relevance based on context semantic reward functions and rules.
  • the factor analysis may produce a decorrelation of features and an independent group extraction.
  • a relevance mechanism may evaluate the factor groups and associate the factor groups with domain knowledge.
  • the features 20 remaining after feature selection within the filtered training data 22 may then be used to train the machine learning module 14 for purposes such as pattern classification, regression, clustering, prediction, anomality detection, etc. This may reduce the computational cost and infrastructure to select the features while allowing to consider the whole set of features relevant to the context including the scope of a learnt model/rules and provides a semantic approach to complex problems, which may be fused with the factor analysis.
  • the trained machine learning module 14 may be validated using test data. Once validated, rules may be extracted from the machine learning module 14 and used to analyze a system.
  • selecting the features 20 may be based on domain information (such as the domain information shown in Fig. 4) indicating the source of the training data 16a, 16b. Furthermore, as also indicated in Fig. 3, selecting the features 20 may also be based on context data indicating the scope of application of the learnt model/rules as well as a type of machine learning algorithm/strategy employed by the machine learning module 14, as exemplarily illustrated in Fig. 5, where the broken line indicates an example of a chosen combination of a scope of application and a machine learning algorithm/strategy employed by the machine learning module 14.
  • Raw context may be transformed to "feature space”.
  • Cloud operation fuzzy functions may encode factors of reliability, availability of services and data, shared resources data, number of active client, security, complexity, energy consumption and costs, regulations and legal issues, performance, migration, reversion, the lack of standards, limited customization, issues of privacy, etc.:
  • Pi ⁇ - the count statistic (prior probability) that i-th cell-id occurs in serving cell-id#l or next neighbor cell-id#2 occurred within timeslot (2 minutes/1 minutes/30 sec).
  • N N- cardinality, power of alphabet that represent the maximum entropy log(N).
  • N is the total number of unique cells (alphabet like 334-23799 ⁇ ", 334-1 1277 ' ⁇ ')
  • a factor analysis may be applied to construct factor groups in unsupervised mode.
  • the initial groups may be decomposed to 4 categories of the cloud state:
  • Fig. 6 shows an exemplary process of assigning factors to a context.
  • a context may be represented by a function of defined input factors that impact the system under consideration.
  • contexts may be represented by numerical vectors. This may involve fuzzification of the input and basic features to numerical values and initial groups that represent stronger factors.
  • the fuzzy functions could be sigmoid functions, softmax transform, tanh, logsig, etc.
  • the input may be normalized, de-noised and follow the normal distribution.
  • features may be considered in aggregation and additional fuzzification may be employed using expert rules.
  • Factor analysis may be regarded as a statistical method used to describe variability among observed variables in terms of fewer unobserved variables called factors.
  • a common group factor snapshot picture may, for example, be:
  • Fig. 7 shows an exemplary process of assigning an attribute vector to an area of analysis.
  • the attribute vector may be created from domain information. I.e., for each domain, an attribute factor (AF) may be generated as a function of the context and attributes of the domain context driven factor/weights for the attributes/properties of the relevant Managed ObjectX Types based on the relevance to the context.
  • AF attribute factor
  • a typical implementation of attribute vector generation may use a factor analysis that allows to select independent feature components, and coefficients in a new reduced feature space that creates an attribute vector by unsupervised learning.
  • factor analysis may operate with variation and covariance matrix and hence be sensitive to fuzzification and normalization. Weighting of group factors may additionally be used to increase the confidence and robustness.
  • An analysis like a RCA root-cause analysis
  • the unsupervised factor analysis groups may be associated with standard common factors.
  • the factor analysis groups may be checked and improved using expert rules from domain context. Also, there may be a tradeoff between unsupervised factor analysis groups and context domain driven factor groups.
  • As output there may be a group factors snapshot picture in dynamic for each major object resource in a cloud:
  • Factors can usually be the common group of features and type of external or internal force. Some factors may be basic and evaluated as simple unique features.
  • Fig. 8 shows a process for fusing context factors and attribute factors.
  • Feature factors may be generated for the set of all features given as input, as a function of the attribute factor and the set of all features given as input.
  • FeatureFactor FFi f 3 (3 ⁇ 4, AF), where xi £ X, and X represents the set of all features given as input for learning with AF as the attribute factor which may be obtained from domain and context.
  • fusion concatenation of features of context factor generation and features of attribute feature generation may be used and factor analysis may be applied in order to find common group correlation between some of the features.
  • These features may be fused from the 'static context' indicator snapshot, 'dynamic' attributes of each entity, object, and shared resources in cloud. The features from selected groups may be controlled and evaluated.
  • Fig. 9 shows a process of selecting features.
  • Each feature 20 of the training data 16a, 16b data may be assessed using the feature factor and select the features as a function of the input features and the feature factor, which is generated using the properties of MO (App/Service/Resource/%) and the context.
  • the features 20 may be selected under a certain limitation regarding a confidence threshold.
  • Fig. 10 shows an overview of the steps of the feature selection process.
  • si could be basic fuzzified features (based on using softmax, hyperbolic tang, sigmoid function, etc.) and fi- could be the mapping of the normalized features to factor components:

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided is a learning module to extract rules from training data and a feature selection module to determine features of the training data to be used for extracting the rules. The feature selection module is to receive context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains.

Description

AUTOMATIC FEATURE SELECTION IN MACHINE LEARNING
FIELD
The present disclosure relates to machine learning. In particular, the present disclosure relates to automatic feature selection in machine learning. BACKGROUND
In machine learning, real-world problems often involve data with a large number of features. However, not all features may be essential as features may be redundant or even irrelevant. Taking into account redundant or irrelevant feature may reduce the performance of an algorithm. Feature selection aims to solve this problem by selecting only a subset of relevant features from a large set of available features. By removing redundant or irrelevant features, feature selection may help reducing the dimensionality of the data, speed up the learning process, simplify the learnt model, and/or increase the performance.
SUMMARY
According to a first aspect of the present invention, there is provided a system comprising a learning module to extract rules from training data, and a feature selection module to determine features of the training data to be used for extracting the rules, wherein the feature selection module is to receive context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains.
In this regard, it is noted that the term "module" as used throughout the description and claims in particular refers to software, hardware, or a combination of software and hardware.
Hence, the feature selection module may automatically select the features based on the context data specifying the area of analysis in which the extracted rules are to be used, and the domain information indicating the one or more technical environments to which the training data pertains. Accordingly, the feature selection module may be enabled to map the area of analysis to the features which are relevant, while taking into account the technical environment in which the features were produced. In a first possible implementation form of the first aspect, the system comprises an analytics module, the analytics module to provide a plurality of services, wherein the services are directed at different areas of analysis, wherein the context data is to specify one area of analysis of the different areas of analysis at which the services are directed. In this regard, it is noted that the term "service" as used throughout the description and claims in particular refers to the provision of data in response to a request.
For example, the analytics module may be directed at data mining and provide data in response to a request to identify a pattern in live data.
In a second possible implementation form of the first aspect, the context data is to further specify a technique to be applied by the learning module.
Hence, the selection of relevant features may be particularly focused on features which lend themselves to application of a particular machine learning algorithm.
In a third possible implementation form of the first aspect, the technique comprises one or more of classification, regression, clustering, prediction, and anomaly detection. In a fourth possible implementation form of the first aspect, the different areas of analysis comprise one or more of a root cause analysis, a service impact analysis, a fault prediction analysis, a traffic prediction analysis, a security/threat analysis, a service/resource optimization analysis, and a service/application performance analysis.
In a fifth possible implementation form of the first aspect, the one or more technical environments include one or more of application management, server management, telecommunications networks, wide area networks, data center network operations, cloud operations, and security operations.
In a sixth possible implementation form of the first aspect, the feature selection module is to assign different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
Accordingly, a factor vector may indicate a relevance of factors to an area of analysis. In a seventh possible implementation form of the first aspect, the feature selection module is to determine a relationship between features and factors, wherein a congruence between a feature and a factor is to be determined based on fuzzification.
For instance, a feature may be processed based on a fuzzy logic normalization and a factor analysis may be performed to determine factors which identify the context.
In an eighth possible implementation form of the first aspect, the feature selection module is to assign different attribute vectors to the different areas of analysis, wherein an attribute vector comprises a subset of the factors, wherein the subset is selected based on a relevance score of the factors in view of the area of analysis. Hence, an attribute vector may indicate which factors are particularly relevant to an area of analysis.
In a ninth possible implementation form of the first aspect, the feature selection module is to assign scores to the features of the training data based on the attribute vector corresponding to the area of analysis. Thus, the features having scores above a threshold may be selected and used for training of the learning module.
According to a second aspect of the present invention, there is provided a method of training data feature selection for extracting rules from the training data, comprising receiving context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains, selecting features of the training data based on the context data and the domain information, and feeding a machine learning module with the training data and information on the selected features.
Hence, the method may automatically select the features based on the context data specifying the area of analysis in which the extracted rules are to be used, and the domain information indicating the one or more technical environments to which the training data pertains. Accordingly, the method may map the area of analysis to the features which are relevant, while taking into account the technical environment in which the features were produced.
In a first possible implementation form of the second aspect, the different areas of analysis comprise one or more of a root cause analysis, a service impact analysis, a fault prediction analysis, a traffic prediction analysis, a security/threat analysis, a service/resource optimization analysis, and a service/application performance analysis.
In a second possible implementation form of the second aspect, the one or more technical environments include one or more of application management, server management, telecommunications networks, wide area networks, data center network operations, cloud operations, and security operations.
In a third possible implementation form of the second aspect, the method comprises assigning different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
Accordingly, as indicated above, a factor vector may indicate a relevance of factors to an area of analysis.
In a fourth possible implementation form of the second aspect, the method comprises determining a relationship between features and factors, wherein determining a congruence between a feature and a factor is based on fuzzification.
Hence, as indicated above, a feature may be processed based on a fuzzy logic normalization and a factor analysis may be performed to determine factors which identify the context.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 shows a block diagram of an exemplary system;
Fig. 2 shows a flow-chart of a machine learning process;
Fig. 3 shows another flow-chart of the machine learning process of Fig. 2;
Fig. 4 shows examples of domain information;
Fig. 5 shows examples of context data;
Fig. 6 shows an exemplary process of assigning factors to a context;
Fig. 7 shows an exemplary process of assigning an attribute vector to an area of analysis;
Fig. 8 shows a process for fusing context factors and attribute factors; Fig. 9 shows a process of selecting features; and
Fig. 10 shows an overview of the steps of the feature selection process. DETAILED DESCRIPTION
The following exemplary system and method relate to unsupervised machine learning for addressing challenges faced in the operation complex systems such as of cloud computing systems involving a plurality of interoperating computing devices, although the system and method are not limited to cloud computing systems. In particular, the exemplary system and method are directed at optimizing the feature selection prior to machine learning, e.g., in the area of operational analytics, and may improve the usability and accuracy as compared to feature selection by experts .
Fig. 1 shows a block diagram of an exemplary system 10. The system 10 which may be a computing system comprising one or more interoperating computing devices may comprise a feature selection module 12 and a machine learning module 14. The feature selection module 12 may be provided with training data 16a, 16b. The training data 16a, 16b may be collected from a single source or multiple sources and may comprise a plurality of data fields 18. Each data field 18 may include one or more features 20.
For instance, a feature 20 may refer to an alarm serial number, an alarm type, a (first) occurrence time, a clearance time, a location, etc. For example, the training data 16a, 16b may be obtained from multiple sources wherein some training data 16a is sparse and some training data 16b is dense. Hence, in a pre-processing step multiple indicators may be extracted from the data distribution and a normalization and feature scaling may be performed, e.g., using softmax, sigmoid functions, etc.
The feature selection module 12 may carry-out a factor analysis to extract initial group components and then improve the grouping according to relevance based on context semantic reward functions and rules. The factor analysis may produce a decorrelation of features and an independent group extraction. A relevance mechanism may evaluate the factor groups and associate the factor groups with domain knowledge. The features 20 remaining after feature selection within the filtered training data 22 may then be used to train the machine learning module 14 for purposes such as pattern classification, regression, clustering, prediction, anomality detection, etc. This may reduce the computational cost and infrastructure to select the features while allowing to consider the whole set of features relevant to the context including the scope of a learnt model/rules and provides a semantic approach to complex problems, which may be fused with the factor analysis. As shown in Fig. 2, the trained machine learning module 14 may be validated using test data. Once validated, rules may be extracted from the machine learning module 14 and used to analyze a system.
As indicated in Fig. 3, selecting the features 20 may be based on domain information (such as the domain information shown in Fig. 4) indicating the source of the training data 16a, 16b. Furthermore, as also indicated in Fig. 3, selecting the features 20 may also be based on context data indicating the scope of application of the learnt model/rules as well as a type of machine learning algorithm/strategy employed by the machine learning module 14, as exemplarily illustrated in Fig. 5, where the broken line indicates an example of a chosen combination of a scope of application and a machine learning algorithm/strategy employed by the machine learning module 14. A context may be defined as a function of input factors that are assumed to impact the system under consideration. For example, a context may be expressed by C = fl(si ), which may represent adaptive context modes of an environment factor that influence the feature selection such as:
Cloud Operations Multiple context indicators may be extracted. Raw context may be transformed to "feature space".
Cloud operation fuzzy functions may encode factors of reliability, availability of services and data, shared resources data, number of active client, security, complexity, energy consumption and costs, regulations and legal issues, performance, migration, reversion, the lack of standards, limited customization, issues of privacy, etc.:
1. Control Reliability Factor si - stddev ()
S2 - mean ()
S3 - snapshot entropy rate = entropy (current state snapshot)/max entropy (all states) entropy lto2 sentropy =
Figure imgf000008_0001
Pi log Pi , where Pi = ~ - the count statistic (prior probability) that i-th cell-id occurs in serving cell-id#l or next neighbor cell-id#2 occurred within timeslot (2 minutes/1 minutes/30 sec). In this encoding, there is also N- cardinality, power of alphabet that represent the maximum entropy log(N). N is the total number of unique cells (alphabet like 334-23799 Ά", 334-1 1277 'Β')
fentropy
Srate entropy ~ maxentropy
54 - presence of 'error type -1 ' 2. Availability of services and data
55 - presence of service Application Management
1. Number of clients and server
2. Number of VMType of interaction async/sync Server Management
1. Resource infrastructure factor
2. Runtime operation factor
3. Memory fragmentation factor
4. Memory free factor
5. Resource synchronization factor
6. Number of processes and their complexity
Network Operations
1. Migration stability factor
2. Network infrastructure stability factor Security Operations
1. Access rights snapshots
2. Process and resource dependencies snapshot A factor analysis may be applied to construct factor groups in unsupervised mode. The initial groups may be decomposed to 4 categories of the cloud state:
Common factors components detected
Unexplained factors Groups may be compared to previous state
New common factors established
Tracked factors and factor age
Fig. 6 shows an exemplary process of assigning factors to a context. A context may be represented by a function of defined input factors that impact the system under consideration. Hence, contexts may be represented by numerical vectors. This may involve fuzzification of the input and basic features to numerical values and initial groups that represent stronger factors. As indicated above, the fuzzy functions could be sigmoid functions, softmax transform, tanh, logsig, etc. For better accuracy, the input may be normalized, de-noised and follow the normal distribution. In case of partial sparse data, features may be considered in aggregation and additional fuzzification may be employed using expert rules.
Hence, a context may be given by C = fi(si ), where Si £ Rm and Si represents the factors considered for identifying a context and may be represented by a context factor vector comprising common factors, new common factors, and unique factors.
Factor analysis may be regarded as a statistical method used to describe variability among observed variables in terms of fewer unobserved variables called factors. The observed variables may be modeled as linear combinations of the factors plus an error value: χ = · Ρ + μ + ζ where x is the vector of observed variables, μ is the constant vector of means, A is the matrix of ΝχΜ factor loadings, F is the matrix of common factors and z is the vector of independently distributed error, which leads to: xi = λα · Fi + - + λΐΜ · FM + i + Zi
In factor analysis, two main types of rotation may be used: orthogonal when the new axes are also orthogonal to each other and oblique when the new axes are not required to be orthogonal to each other. Because the rotations are always performed in a subspace (the so-called factor space), the new axes will always exhibit less variance than the original factors.
As output, a common group factor snapshot picture may, for example, be:
• Cloud operation
· Application management
• Server management
• Security operations
Fig. 7 shows an exemplary process of assigning an attribute vector to an area of analysis. In particular, the attribute vector may be created from domain information. I.e., for each domain, an attribute factor (AF) may be generated as a function of the context and attributes of the domain context driven factor/weights for the attributes/properties of the relevant Managed ObjectX Types based on the relevance to the context.
In this regard, each object and shared resources in the system/cloud may provide an additional specification of the factors: AttrF actor AFi = f2(ci , ¾ ), where Ci £ C, ¾ E A wherein A represents the attributes/properties of the relevant Managed Object Types of the domain model.
A typical implementation of attribute vector generation may use a factor analysis that allows to select independent feature components, and coefficients in a new reduced feature space that creates an attribute vector by unsupervised learning. Using factor analysis, common factor groups and very unique features (or sparse) features without common factor may be defined. Factor analysis may operate with variation and covariance matrix and hence be sensitive to fuzzification and normalization. Weighting of group factors may additionally be used to increase the confidence and robustness. An analysis like a RCA (root-cause analysis) may operate with standard common factors that are defined from the domain context. The unsupervised factor analysis groups may be associated with standard common factors. The factor analysis groups may be checked and improved using expert rules from domain context. Also, there may be a tradeoff between unsupervised factor analysis groups and context domain driven factor groups. As output, there may be a group factors snapshot picture in dynamic for each major object resource in a cloud:
• Factors selected from principal Objects Cloud operation indicators according the timeframe dynamic features and background context.
• Factors Selected from principal Objects in Application Management & Context.
• Factors from principal Objects in Server Management & Context.
• Factors from represent principal Objects Security Operations.
Factors can usually be the common group of features and type of external or internal force. Some factors may be basic and evaluated as simple unique features.
Fig. 8 shows a process for fusing context factors and attribute factors. Feature factors may be generated for the set of all features given as input, as a function of the attribute factor and the set of all features given as input.
FeatureFactor FFi = f3(¾, AF), where xi £ X, and X represents the set of all features given as input for learning with AF as the attribute factor which may be obtained from domain and context.
Hence, fusion concatenation of features of context factor generation and features of attribute feature generation may be used and factor analysis may be applied in order to find common group correlation between some of the features. These features may be fused from the 'static context' indicator snapshot, 'dynamic' attributes of each entity, object, and shared resources in cloud. The features from selected groups may be controlled and evaluated.
Fig. 9 shows a process of selecting features. Each feature 20 of the training data 16a, 16b data may be assessed using the feature factor and select the features as a function of the input features and the feature factor, which is generated using the properties of MO (App/Service/Resource/...) and the context.
The selected feature set may thus be represented as X"= f4(X, FF), where X represents the set of all features 20 given as input and FF represents the feature factor obtained from the domain and context. The features 20 may be selected under a certain limitation regarding a confidence threshold. Fig. 10 shows an overview of the steps of the feature selection process. In particular, si could be basic fuzzified features (based on using softmax, hyperbolic tang, sigmoid function, etc.) and fi- could be the mapping of the normalized features to factor components:
Figure imgf000012_0001

Claims

1. A system comprising: a learning module configured to extract rules from training data; and a feature selection module configured to determine features of the training data to be used for extracting the rules; wherein the feature selection module is configured to receive context data of the rules to be extracted and domain information on the training data, wherein the context data specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains.
2. The system of claim 1, comprising: an analytics module configured to provide a plurality of services, wherein the services are directed at different areas of analysis, wherein the context data specify one area of analysis of the different areas of analysis at which the services are directed.
3. The system of claim 1 or 2, wherein the context data further specify a technique to be applied by the learning module.
4. The system of claim 3, wherein the technique comprises one or more of: classification; and clustering.
5. The system of any one of claims 1 to 4, wherein the different areas of analysis comprise one or more of: a root cause analysis; a service impact analysis; a fault prediction analysis; a traffic prediction analysis; a security/threat analysis; a service/resource optimization analysis; and a service/application performance analysis.
6. The system of any one of claims 1 to 5, wherein the one or more technical environments include one or more of: application management; server management; telecommunications networks; wide area networks; data center network operations; cloud operations; and security operations.
7. The system of any one of claims 1 to 6, wherein the feature selection module is configured to assign different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
8. The system of claim 7, wherein the feature selection module is configured to determine a relationship between features and factors, wherein a congruence between a feature and a factor is to be determined based on fuzzification.
9. The system of any one of claims 7 or 8, wherein the feature selection module is configured to assign different attribute vectors to the different areas of analysis, wherein an attribute vector comprises a subset of the factors, wherein the subset is selected based on a relevance score of the factors in view of the area of analysis.
10. The system of claim 9, wherein the feature selection module is configured to assign scores to the features of the training data based on the attribute vector corresponding to the area of analysis.
11. A method of training data feature selection for extracting rules from the training data, comprising: receiving context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains; selecting features of the training data based on the context data and the domain information; and feeding a machine learning module with the training data and information on the selected features.
12. The method of claim 11 , wherein the different areas of analysis comprise one or more of: a root cause analysis; a service impact analysis; a fault prediction analysis; a traffic prediction analysis; a security/threat analysis; a service/resource optimization analysis; and a service/application performance analysis.
13. The method of claim 11 or 12, wherein the one or more technical environments include one or more of: application management; server management; telecommunications networks; wide area networks; data center network operations; cloud operations; and security operations.
14. The method of any one of claims 11 to 13, comprising: assigning different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
15. The method of claim 14, comprising: determining a relationship between features and factors, wherein determining a congruence between a feature and a factor is based on fuzzification.
PCT/EP2017/064317 2017-06-12 2017-06-12 Automatic feature selection in machine learning WO2018228667A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP17731111.5A EP3612980A1 (en) 2017-06-12 2017-06-12 Automatic feature selection in machine learning
PCT/EP2017/064317 WO2018228667A1 (en) 2017-06-12 2017-06-12 Automatic feature selection in machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2017/064317 WO2018228667A1 (en) 2017-06-12 2017-06-12 Automatic feature selection in machine learning

Publications (1)

Publication Number Publication Date
WO2018228667A1 true WO2018228667A1 (en) 2018-12-20

Family

ID=59078048

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2017/064317 WO2018228667A1 (en) 2017-06-12 2017-06-12 Automatic feature selection in machine learning

Country Status (2)

Country Link
EP (1) EP3612980A1 (en)
WO (1) WO2018228667A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598760A (en) * 2019-08-26 2019-12-20 华北电力大学(保定) Unsupervised characteristic selection method for transformer vibration data
US12045317B2 (en) 2021-11-23 2024-07-23 International Business Machines Corporation Feature selection using hypergraphs

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009141631A2 (en) * 2008-05-23 2009-11-26 Sanctuary Personnel Limited An improved neuro type-2 fuzzy based method for decision making
US20100005042A1 (en) * 2003-11-18 2010-01-07 Aureon Laboratories, Inc. Support vector regression for censored data
AU2013100982A4 (en) * 2013-07-19 2013-08-15 Huaiyin Institute Of Technology, China Feature Selection Method in a Learning Machine
US20150058993A1 (en) * 2013-08-23 2015-02-26 The Boeing Company System and method for discovering optimal network attack paths

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100005042A1 (en) * 2003-11-18 2010-01-07 Aureon Laboratories, Inc. Support vector regression for censored data
WO2009141631A2 (en) * 2008-05-23 2009-11-26 Sanctuary Personnel Limited An improved neuro type-2 fuzzy based method for decision making
AU2013100982A4 (en) * 2013-07-19 2013-08-15 Huaiyin Institute Of Technology, China Feature Selection Method in a Learning Machine
US20150058993A1 (en) * 2013-08-23 2015-02-26 The Boeing Company System and method for discovering optimal network attack paths

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
STEVEN LAUWEREINS ET AL: "Context-and cost-aware feature selection in ultra-low-power sensor interfaces", ESANN 2014 PROCEEDINGS, EUROPEAN SYMPOSIUM ON ARTIFICIAL NEURAL NETWORKS, COMPUTATIONAL INTELLIGENCE AND MACHINE LEARNING, 23 April 2014 (2014-04-23), Bruges, Belgium, XP055452498, ISBN: 978-2-87419-095-7, Retrieved from the Internet <URL:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.644.928&rep=rep1&type=pdf> [retrieved on 20180219] *
XINCHUAN ZENG ET AL: "Feature weighting using neural networks", NEURAL NETWORKS, 2004. PROCEEDINGS. 2004 IEEE INTERNATIONAL JOINT CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, vol. 2, 25 July 2004 (2004-07-25), pages 1327 - 1330, XP010758822, ISBN: 978-0-7803-8359-3, DOI: 10.1109/IJCNN.2004.1380137 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598760A (en) * 2019-08-26 2019-12-20 华北电力大学(保定) Unsupervised characteristic selection method for transformer vibration data
CN110598760B (en) * 2019-08-26 2023-10-24 华北电力大学(保定) Unsupervised feature selection method for vibration data of transformer
US12045317B2 (en) 2021-11-23 2024-07-23 International Business Machines Corporation Feature selection using hypergraphs

Also Published As

Publication number Publication date
EP3612980A1 (en) 2020-02-26

Similar Documents

Publication Publication Date Title
Kaluvakuri et al. AI-Powered Predictive Thread Deadlock Resolution: An intelligent system for early detection and prevention of thread deadlocks in cloud applications
CN119071052B (en) Network anomaly monitoring method and system for switch
WO2020172122A1 (en) Anomaly detection with reduced memory overhead
CN110768971B (en) Confrontation sample rapid early warning method and system suitable for artificial intelligence system
CN113158664B (en) Lexical analyzer for neuro-linguistic behavior recognition system
WO2022053163A1 (en) Distributed trace anomaly detection with self-attention based deep learning
CN113239065A (en) Big data based security interception rule updating method and artificial intelligence security system
AU2020200629A1 (en) Method and system for reducing incident alerts
CN113821418A (en) Fault tracking analysis method and device, storage medium and electronic equipment
Grover Anomaly detection for application log data
CN116541792A (en) Method for carrying out group partner identification based on graph neural network node classification
WO2018228667A1 (en) Automatic feature selection in machine learning
Liu et al. IoT Network Traffic Analysis with Deep Learning
KR20230024747A (en) Apparatus and method for classifying failure alarm for heterogeneous network apparatuses
Tao et al. Giving Every Modality a Voice in Microservice Failure Diagnosis via Multimodal Adaptive Optimization
CN118708932A (en) A feature perception pre-training method and system for time series anomaly detection
Kim et al. Revitalizing self-organizing map: Anomaly detection using forecasting error patterns
CN116681350A (en) Intelligent factory fault detection method and system
US20230053322A1 (en) Script Classification on Computing Platform
Zhao et al. Research on machine learning-based correlation analysis method for power equipment alarms
LYU et al. Alarm-Based Root Cause Analysis Based on Weighted Fault Propagation Topology for Distributed Information Network
Katragadda et al. Machine Learning-Enhanced Root Cause Analysis for Rapid Incident Management in High-Complexity Systems
Zheng et al. Unsafe Behavior Detection with Adaptive Contrastive Learning in Industrial Control Systems
Ramamoorthy et al. Digital Twin-Driven Intrusion Detection for IoT and Cyber-Physical System
CN118474682B (en) Service short message monitoring method and system based on big data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17731111

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017731111

Country of ref document: EP

Effective date: 20191119

NENP Non-entry into the national phase

Ref country code: DE

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载