WO2018228667A1 - Automatic feature selection in machine learning - Google Patents
Automatic feature selection in machine learning Download PDFInfo
- Publication number
- WO2018228667A1 WO2018228667A1 PCT/EP2017/064317 EP2017064317W WO2018228667A1 WO 2018228667 A1 WO2018228667 A1 WO 2018228667A1 EP 2017064317 W EP2017064317 W EP 2017064317W WO 2018228667 A1 WO2018228667 A1 WO 2018228667A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- analysis
- factor
- training data
- features
- rules
- Prior art date
Links
- 238000010801 machine learning Methods 0.000 title claims description 20
- 238000012549 training Methods 0.000 claims abstract description 37
- 238000000034 method Methods 0.000 claims description 32
- 239000013598 vector Substances 0.000 claims description 27
- 238000013433 optimization analysis Methods 0.000 claims description 4
- 238000000556 factor analysis Methods 0.000 description 15
- 230000008569 process Effects 0.000 description 15
- 230000006870 function Effects 0.000 description 11
- 238000010606 normalization Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
Definitions
- the present disclosure relates to machine learning.
- the present disclosure relates to automatic feature selection in machine learning.
- Feature selection aims to solve this problem by selecting only a subset of relevant features from a large set of available features. By removing redundant or irrelevant features, feature selection may help reducing the dimensionality of the data, speed up the learning process, simplify the learnt model, and/or increase the performance.
- a system comprising a learning module to extract rules from training data, and a feature selection module to determine features of the training data to be used for extracting the rules, wherein the feature selection module is to receive context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains.
- module refers to software, hardware, or a combination of software and hardware.
- the feature selection module may automatically select the features based on the context data specifying the area of analysis in which the extracted rules are to be used, and the domain information indicating the one or more technical environments to which the training data pertains. Accordingly, the feature selection module may be enabled to map the area of analysis to the features which are relevant, while taking into account the technical environment in which the features were produced.
- the system comprises an analytics module, the analytics module to provide a plurality of services, wherein the services are directed at different areas of analysis, wherein the context data is to specify one area of analysis of the different areas of analysis at which the services are directed.
- the term "service” as used throughout the description and claims in particular refers to the provision of data in response to a request.
- the analytics module may be directed at data mining and provide data in response to a request to identify a pattern in live data.
- the context data is to further specify a technique to be applied by the learning module.
- the technique comprises one or more of classification, regression, clustering, prediction, and anomaly detection.
- the different areas of analysis comprise one or more of a root cause analysis, a service impact analysis, a fault prediction analysis, a traffic prediction analysis, a security/threat analysis, a service/resource optimization analysis, and a service/application performance analysis.
- the one or more technical environments include one or more of application management, server management, telecommunications networks, wide area networks, data center network operations, cloud operations, and security operations.
- the feature selection module is to assign different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
- a factor vector may indicate a relevance of factors to an area of analysis.
- the feature selection module is to determine a relationship between features and factors, wherein a congruence between a feature and a factor is to be determined based on fuzzification.
- a feature may be processed based on a fuzzy logic normalization and a factor analysis may be performed to determine factors which identify the context.
- the feature selection module is to assign different attribute vectors to the different areas of analysis, wherein an attribute vector comprises a subset of the factors, wherein the subset is selected based on a relevance score of the factors in view of the area of analysis.
- an attribute vector may indicate which factors are particularly relevant to an area of analysis.
- the feature selection module is to assign scores to the features of the training data based on the attribute vector corresponding to the area of analysis.
- the features having scores above a threshold may be selected and used for training of the learning module.
- a method of training data feature selection for extracting rules from the training data comprising receiving context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains, selecting features of the training data based on the context data and the domain information, and feeding a machine learning module with the training data and information on the selected features.
- the method may automatically select the features based on the context data specifying the area of analysis in which the extracted rules are to be used, and the domain information indicating the one or more technical environments to which the training data pertains. Accordingly, the method may map the area of analysis to the features which are relevant, while taking into account the technical environment in which the features were produced.
- the different areas of analysis comprise one or more of a root cause analysis, a service impact analysis, a fault prediction analysis, a traffic prediction analysis, a security/threat analysis, a service/resource optimization analysis, and a service/application performance analysis.
- the one or more technical environments include one or more of application management, server management, telecommunications networks, wide area networks, data center network operations, cloud operations, and security operations.
- the method comprises assigning different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
- a factor vector may indicate a relevance of factors to an area of analysis.
- the method comprises determining a relationship between features and factors, wherein determining a congruence between a feature and a factor is based on fuzzification.
- a feature may be processed based on a fuzzy logic normalization and a factor analysis may be performed to determine factors which identify the context.
- Fig. 1 shows a block diagram of an exemplary system
- Fig. 2 shows a flow-chart of a machine learning process
- FIG. 3 shows another flow-chart of the machine learning process of Fig. 2;
- Fig. 4 shows examples of domain information
- Fig. 5 shows examples of context data
- Fig. 6 shows an exemplary process of assigning factors to a context
- Fig. 7 shows an exemplary process of assigning an attribute vector to an area of analysis
- Fig. 8 shows a process for fusing context factors and attribute factors
- Fig. 9 shows a process of selecting features
- Fig. 10 shows an overview of the steps of the feature selection process.
- the following exemplary system and method relate to unsupervised machine learning for addressing challenges faced in the operation complex systems such as of cloud computing systems involving a plurality of interoperating computing devices, although the system and method are not limited to cloud computing systems.
- the exemplary system and method are directed at optimizing the feature selection prior to machine learning, e.g., in the area of operational analytics, and may improve the usability and accuracy as compared to feature selection by experts .
- Fig. 1 shows a block diagram of an exemplary system 10.
- the system 10 which may be a computing system comprising one or more interoperating computing devices may comprise a feature selection module 12 and a machine learning module 14.
- the feature selection module 12 may be provided with training data 16a, 16b.
- the training data 16a, 16b may be collected from a single source or multiple sources and may comprise a plurality of data fields 18.
- Each data field 18 may include one or more features 20.
- a feature 20 may refer to an alarm serial number, an alarm type, a (first) occurrence time, a clearance time, a location, etc.
- the training data 16a, 16b may be obtained from multiple sources wherein some training data 16a is sparse and some training data 16b is dense.
- multiple indicators may be extracted from the data distribution and a normalization and feature scaling may be performed, e.g., using softmax, sigmoid functions, etc.
- the feature selection module 12 may carry-out a factor analysis to extract initial group components and then improve the grouping according to relevance based on context semantic reward functions and rules.
- the factor analysis may produce a decorrelation of features and an independent group extraction.
- a relevance mechanism may evaluate the factor groups and associate the factor groups with domain knowledge.
- the features 20 remaining after feature selection within the filtered training data 22 may then be used to train the machine learning module 14 for purposes such as pattern classification, regression, clustering, prediction, anomality detection, etc. This may reduce the computational cost and infrastructure to select the features while allowing to consider the whole set of features relevant to the context including the scope of a learnt model/rules and provides a semantic approach to complex problems, which may be fused with the factor analysis.
- the trained machine learning module 14 may be validated using test data. Once validated, rules may be extracted from the machine learning module 14 and used to analyze a system.
- selecting the features 20 may be based on domain information (such as the domain information shown in Fig. 4) indicating the source of the training data 16a, 16b. Furthermore, as also indicated in Fig. 3, selecting the features 20 may also be based on context data indicating the scope of application of the learnt model/rules as well as a type of machine learning algorithm/strategy employed by the machine learning module 14, as exemplarily illustrated in Fig. 5, where the broken line indicates an example of a chosen combination of a scope of application and a machine learning algorithm/strategy employed by the machine learning module 14.
- Raw context may be transformed to "feature space”.
- Cloud operation fuzzy functions may encode factors of reliability, availability of services and data, shared resources data, number of active client, security, complexity, energy consumption and costs, regulations and legal issues, performance, migration, reversion, the lack of standards, limited customization, issues of privacy, etc.:
- Pi ⁇ - the count statistic (prior probability) that i-th cell-id occurs in serving cell-id#l or next neighbor cell-id#2 occurred within timeslot (2 minutes/1 minutes/30 sec).
- N N- cardinality, power of alphabet that represent the maximum entropy log(N).
- N is the total number of unique cells (alphabet like 334-23799 ⁇ ", 334-1 1277 ' ⁇ ')
- a factor analysis may be applied to construct factor groups in unsupervised mode.
- the initial groups may be decomposed to 4 categories of the cloud state:
- Fig. 6 shows an exemplary process of assigning factors to a context.
- a context may be represented by a function of defined input factors that impact the system under consideration.
- contexts may be represented by numerical vectors. This may involve fuzzification of the input and basic features to numerical values and initial groups that represent stronger factors.
- the fuzzy functions could be sigmoid functions, softmax transform, tanh, logsig, etc.
- the input may be normalized, de-noised and follow the normal distribution.
- features may be considered in aggregation and additional fuzzification may be employed using expert rules.
- Factor analysis may be regarded as a statistical method used to describe variability among observed variables in terms of fewer unobserved variables called factors.
- a common group factor snapshot picture may, for example, be:
- Fig. 7 shows an exemplary process of assigning an attribute vector to an area of analysis.
- the attribute vector may be created from domain information. I.e., for each domain, an attribute factor (AF) may be generated as a function of the context and attributes of the domain context driven factor/weights for the attributes/properties of the relevant Managed ObjectX Types based on the relevance to the context.
- AF attribute factor
- a typical implementation of attribute vector generation may use a factor analysis that allows to select independent feature components, and coefficients in a new reduced feature space that creates an attribute vector by unsupervised learning.
- factor analysis may operate with variation and covariance matrix and hence be sensitive to fuzzification and normalization. Weighting of group factors may additionally be used to increase the confidence and robustness.
- An analysis like a RCA root-cause analysis
- the unsupervised factor analysis groups may be associated with standard common factors.
- the factor analysis groups may be checked and improved using expert rules from domain context. Also, there may be a tradeoff between unsupervised factor analysis groups and context domain driven factor groups.
- As output there may be a group factors snapshot picture in dynamic for each major object resource in a cloud:
- Factors can usually be the common group of features and type of external or internal force. Some factors may be basic and evaluated as simple unique features.
- Fig. 8 shows a process for fusing context factors and attribute factors.
- Feature factors may be generated for the set of all features given as input, as a function of the attribute factor and the set of all features given as input.
- FeatureFactor FFi f 3 (3 ⁇ 4, AF), where xi £ X, and X represents the set of all features given as input for learning with AF as the attribute factor which may be obtained from domain and context.
- fusion concatenation of features of context factor generation and features of attribute feature generation may be used and factor analysis may be applied in order to find common group correlation between some of the features.
- These features may be fused from the 'static context' indicator snapshot, 'dynamic' attributes of each entity, object, and shared resources in cloud. The features from selected groups may be controlled and evaluated.
- Fig. 9 shows a process of selecting features.
- Each feature 20 of the training data 16a, 16b data may be assessed using the feature factor and select the features as a function of the input features and the feature factor, which is generated using the properties of MO (App/Service/Resource/%) and the context.
- the features 20 may be selected under a certain limitation regarding a confidence threshold.
- Fig. 10 shows an overview of the steps of the feature selection process.
- si could be basic fuzzified features (based on using softmax, hyperbolic tang, sigmoid function, etc.) and fi- could be the mapping of the normalized features to factor components:
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Provided is a learning module to extract rules from training data and a feature selection module to determine features of the training data to be used for extracting the rules. The feature selection module is to receive context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains.
Description
AUTOMATIC FEATURE SELECTION IN MACHINE LEARNING
FIELD
The present disclosure relates to machine learning. In particular, the present disclosure relates to automatic feature selection in machine learning. BACKGROUND
In machine learning, real-world problems often involve data with a large number of features. However, not all features may be essential as features may be redundant or even irrelevant. Taking into account redundant or irrelevant feature may reduce the performance of an algorithm. Feature selection aims to solve this problem by selecting only a subset of relevant features from a large set of available features. By removing redundant or irrelevant features, feature selection may help reducing the dimensionality of the data, speed up the learning process, simplify the learnt model, and/or increase the performance.
SUMMARY
According to a first aspect of the present invention, there is provided a system comprising a learning module to extract rules from training data, and a feature selection module to determine features of the training data to be used for extracting the rules, wherein the feature selection module is to receive context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains.
In this regard, it is noted that the term "module" as used throughout the description and claims in particular refers to software, hardware, or a combination of software and hardware.
Hence, the feature selection module may automatically select the features based on the context data specifying the area of analysis in which the extracted rules are to be used, and the domain information indicating the one or more technical environments to which the training data pertains. Accordingly, the feature selection module may be enabled to map the area of analysis to the features which are relevant, while taking into account the technical environment in which the features were produced.
In a first possible implementation form of the first aspect, the system comprises an analytics module, the analytics module to provide a plurality of services, wherein the services are directed at different areas of analysis, wherein the context data is to specify one area of analysis of the different areas of analysis at which the services are directed. In this regard, it is noted that the term "service" as used throughout the description and claims in particular refers to the provision of data in response to a request.
For example, the analytics module may be directed at data mining and provide data in response to a request to identify a pattern in live data.
In a second possible implementation form of the first aspect, the context data is to further specify a technique to be applied by the learning module.
Hence, the selection of relevant features may be particularly focused on features which lend themselves to application of a particular machine learning algorithm.
In a third possible implementation form of the first aspect, the technique comprises one or more of classification, regression, clustering, prediction, and anomaly detection. In a fourth possible implementation form of the first aspect, the different areas of analysis comprise one or more of a root cause analysis, a service impact analysis, a fault prediction analysis, a traffic prediction analysis, a security/threat analysis, a service/resource optimization analysis, and a service/application performance analysis.
In a fifth possible implementation form of the first aspect, the one or more technical environments include one or more of application management, server management, telecommunications networks, wide area networks, data center network operations, cloud operations, and security operations.
In a sixth possible implementation form of the first aspect, the feature selection module is to assign different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
Accordingly, a factor vector may indicate a relevance of factors to an area of analysis.
In a seventh possible implementation form of the first aspect, the feature selection module is to determine a relationship between features and factors, wherein a congruence between a feature and a factor is to be determined based on fuzzification.
For instance, a feature may be processed based on a fuzzy logic normalization and a factor analysis may be performed to determine factors which identify the context.
In an eighth possible implementation form of the first aspect, the feature selection module is to assign different attribute vectors to the different areas of analysis, wherein an attribute vector comprises a subset of the factors, wherein the subset is selected based on a relevance score of the factors in view of the area of analysis. Hence, an attribute vector may indicate which factors are particularly relevant to an area of analysis.
In a ninth possible implementation form of the first aspect, the feature selection module is to assign scores to the features of the training data based on the attribute vector corresponding to the area of analysis. Thus, the features having scores above a threshold may be selected and used for training of the learning module.
According to a second aspect of the present invention, there is provided a method of training data feature selection for extracting rules from the training data, comprising receiving context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains, selecting features of the training data based on the context data and the domain information, and feeding a machine learning module with the training data and information on the selected features.
Hence, the method may automatically select the features based on the context data specifying the area of analysis in which the extracted rules are to be used, and the domain information indicating the one or more technical environments to which the training data pertains. Accordingly, the method may map the area of analysis to the features which are relevant, while taking into account the technical environment in which the features were produced.
In a first possible implementation form of the second aspect, the different areas of analysis comprise one or more of a root cause analysis, a service impact analysis, a fault prediction
analysis, a traffic prediction analysis, a security/threat analysis, a service/resource optimization analysis, and a service/application performance analysis.
In a second possible implementation form of the second aspect, the one or more technical environments include one or more of application management, server management, telecommunications networks, wide area networks, data center network operations, cloud operations, and security operations.
In a third possible implementation form of the second aspect, the method comprises assigning different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
Accordingly, as indicated above, a factor vector may indicate a relevance of factors to an area of analysis.
In a fourth possible implementation form of the second aspect, the method comprises determining a relationship between features and factors, wherein determining a congruence between a feature and a factor is based on fuzzification.
Hence, as indicated above, a feature may be processed based on a fuzzy logic normalization and a factor analysis may be performed to determine factors which identify the context.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 shows a block diagram of an exemplary system;
Fig. 2 shows a flow-chart of a machine learning process;
Fig. 3 shows another flow-chart of the machine learning process of Fig. 2;
Fig. 4 shows examples of domain information;
Fig. 5 shows examples of context data;
Fig. 6 shows an exemplary process of assigning factors to a context;
Fig. 7 shows an exemplary process of assigning an attribute vector to an area of analysis;
Fig. 8 shows a process for fusing context factors and attribute factors;
Fig. 9 shows a process of selecting features; and
Fig. 10 shows an overview of the steps of the feature selection process. DETAILED DESCRIPTION
The following exemplary system and method relate to unsupervised machine learning for addressing challenges faced in the operation complex systems such as of cloud computing systems involving a plurality of interoperating computing devices, although the system and method are not limited to cloud computing systems. In particular, the exemplary system and method are directed at optimizing the feature selection prior to machine learning, e.g., in the area of operational analytics, and may improve the usability and accuracy as compared to feature selection by experts .
Fig. 1 shows a block diagram of an exemplary system 10. The system 10 which may be a computing system comprising one or more interoperating computing devices may comprise a feature selection module 12 and a machine learning module 14. The feature selection module 12 may be provided with training data 16a, 16b. The training data 16a, 16b may be collected from a single source or multiple sources and may comprise a plurality of data fields 18. Each data field 18 may include one or more features 20.
For instance, a feature 20 may refer to an alarm serial number, an alarm type, a (first) occurrence time, a clearance time, a location, etc. For example, the training data 16a, 16b may be obtained from multiple sources wherein some training data 16a is sparse and some training data 16b is dense. Hence, in a pre-processing step multiple indicators may be extracted from the data distribution and a normalization and feature scaling may be performed, e.g., using softmax, sigmoid functions, etc.
The feature selection module 12 may carry-out a factor analysis to extract initial group components and then improve the grouping according to relevance based on context semantic reward functions and rules. The factor analysis may produce a decorrelation of features and an independent group extraction. A relevance mechanism may evaluate the factor groups and associate the factor groups with domain knowledge. The features 20 remaining after feature selection within the filtered training data 22 may then be used to train the machine learning module 14 for purposes such as pattern classification, regression, clustering, prediction, anomality detection, etc.
This may reduce the computational cost and infrastructure to select the features while allowing to consider the whole set of features relevant to the context including the scope of a learnt model/rules and provides a semantic approach to complex problems, which may be fused with the factor analysis. As shown in Fig. 2, the trained machine learning module 14 may be validated using test data. Once validated, rules may be extracted from the machine learning module 14 and used to analyze a system.
As indicated in Fig. 3, selecting the features 20 may be based on domain information (such as the domain information shown in Fig. 4) indicating the source of the training data 16a, 16b. Furthermore, as also indicated in Fig. 3, selecting the features 20 may also be based on context data indicating the scope of application of the learnt model/rules as well as a type of machine learning algorithm/strategy employed by the machine learning module 14, as exemplarily illustrated in Fig. 5, where the broken line indicates an example of a chosen combination of a scope of application and a machine learning algorithm/strategy employed by the machine learning module 14. A context may be defined as a function of input factors that are assumed to impact the system under consideration. For example, a context may be expressed by C = fl(si ), which may represent adaptive context modes of an environment factor that influence the feature selection such as:
Cloud Operations Multiple context indicators may be extracted. Raw context may be transformed to "feature space".
Cloud operation fuzzy functions may encode factors of reliability, availability of services and data, shared resources data, number of active client, security, complexity, energy consumption and costs, regulations and legal issues, performance, migration, reversion, the lack of standards, limited customization, issues of privacy, etc.:
1. Control Reliability Factor si - stddev ()
S2 - mean ()
S3 - snapshot
entropy rate = entropy (current state snapshot)/max entropy (all states) entropy lto2 sentropy =
Pi log Pi , where Pi = ~ - the count statistic (prior probability) that i-th cell-id occurs in serving cell-id#l or next neighbor cell-id#2 occurred within timeslot (2 minutes/1 minutes/30 sec). In this encoding, there is also N- cardinality, power of alphabet that represent the maximum entropy log(N). N is the total number of unique cells (alphabet like 334-23799 Ά", 334-1 1277 'Β')
fentropy
Srate entropy ~ maxentropy
54 - presence of 'error type -1 ' 2. Availability of services and data
55 - presence of service Application Management
1. Number of clients and server
2. Number of VMType of interaction async/sync Server Management
1. Resource infrastructure factor
2. Runtime operation factor
3. Memory fragmentation factor
4. Memory free factor
5. Resource synchronization factor
6. Number of processes and their complexity
Network Operations
1. Migration stability factor
2. Network infrastructure stability factor Security Operations
1. Access rights snapshots
2. Process and resource dependencies snapshot
A factor analysis may be applied to construct factor groups in unsupervised mode. The initial groups may be decomposed to 4 categories of the cloud state:
Common factors components detected
Unexplained factors Groups may be compared to previous state
New common factors established
Tracked factors and factor age
Fig. 6 shows an exemplary process of assigning factors to a context. A context may be represented by a function of defined input factors that impact the system under consideration. Hence, contexts may be represented by numerical vectors. This may involve fuzzification of the input and basic features to numerical values and initial groups that represent stronger factors. As indicated above, the fuzzy functions could be sigmoid functions, softmax transform, tanh, logsig, etc. For better accuracy, the input may be normalized, de-noised and follow the normal distribution. In case of partial sparse data, features may be considered in aggregation and additional fuzzification may be employed using expert rules.
Hence, a context may be given by C = fi(si ), where Si £ Rm and Si represents the factors considered for identifying a context and may be represented by a context factor vector comprising common factors, new common factors, and unique factors.
Factor analysis may be regarded as a statistical method used to describe variability among observed variables in terms of fewer unobserved variables called factors. The observed variables may be modeled as linear combinations of the factors plus an error value: χ = · Ρ + μ + ζ where x is the vector of observed variables, μ is the constant vector of means, A is the matrix of ΝχΜ factor loadings, F is the matrix of common factors and z is the vector of independently distributed error, which leads to: xi = λα · Fi + - + λΐΜ · FM + i + Zi
In factor analysis, two main types of rotation may be used: orthogonal when the new axes are also orthogonal to each other and oblique when the new axes are not required to be orthogonal to
each other. Because the rotations are always performed in a subspace (the so-called factor space), the new axes will always exhibit less variance than the original factors.
As output, a common group factor snapshot picture may, for example, be:
• Cloud operation
· Application management
• Server management
• Security operations
Fig. 7 shows an exemplary process of assigning an attribute vector to an area of analysis. In particular, the attribute vector may be created from domain information. I.e., for each domain, an attribute factor (AF) may be generated as a function of the context and attributes of the domain context driven factor/weights for the attributes/properties of the relevant Managed ObjectX Types based on the relevance to the context.
In this regard, each object and shared resources in the system/cloud may provide an additional specification of the factors: AttrF actor AFi = f2(ci , ¾ ), where Ci £ C, ¾ E A wherein A represents the attributes/properties of the relevant Managed Object Types of the domain model.
A typical implementation of attribute vector generation may use a factor analysis that allows to select independent feature components, and coefficients in a new reduced feature space that creates an attribute vector by unsupervised learning. Using factor analysis, common factor groups and very unique features (or sparse) features without common factor may be defined. Factor analysis may operate with variation and covariance matrix and hence be sensitive to fuzzification and normalization. Weighting of group factors may additionally be used to increase the confidence and robustness. An analysis like a RCA (root-cause analysis) may operate with standard common factors that are defined from the domain context. The unsupervised factor analysis groups may be associated with standard common factors. The factor analysis groups may be checked and improved using expert rules from domain context. Also, there may be a tradeoff between unsupervised factor analysis groups and context domain driven factor groups.
As output, there may be a group factors snapshot picture in dynamic for each major object resource in a cloud:
• Factors selected from principal Objects Cloud operation indicators according the timeframe dynamic features and background context.
• Factors Selected from principal Objects in Application Management & Context.
• Factors from principal Objects in Server Management & Context.
• Factors from represent principal Objects Security Operations.
Factors can usually be the common group of features and type of external or internal force. Some factors may be basic and evaluated as simple unique features.
Fig. 8 shows a process for fusing context factors and attribute factors. Feature factors may be generated for the set of all features given as input, as a function of the attribute factor and the set of all features given as input.
FeatureFactor FFi = f3(¾, AF), where xi £ X, and X represents the set of all features given as input for learning with AF as the attribute factor which may be obtained from domain and context.
Hence, fusion concatenation of features of context factor generation and features of attribute feature generation may be used and factor analysis may be applied in order to find common group correlation between some of the features. These features may be fused from the 'static context' indicator snapshot, 'dynamic' attributes of each entity, object, and shared resources in cloud. The features from selected groups may be controlled and evaluated.
Fig. 9 shows a process of selecting features. Each feature 20 of the training data 16a, 16b data may be assessed using the feature factor and select the features as a function of the input features and the feature factor, which is generated using the properties of MO (App/Service/Resource/...) and the context.
The selected feature set may thus be represented as X"= f4(X, FF), where X represents the set of all features 20 given as input and FF represents the feature factor obtained from the domain and context. The features 20 may be selected under a certain limitation regarding a confidence threshold.
Fig. 10 shows an overview of the steps of the feature selection process. In particular, si could be basic fuzzified features (based on using softmax, hyperbolic tang, sigmoid function, etc.) and fi- could be the mapping of the normalized features to factor components:
Claims
1. A system comprising: a learning module configured to extract rules from training data; and a feature selection module configured to determine features of the training data to be used for extracting the rules; wherein the feature selection module is configured to receive context data of the rules to be extracted and domain information on the training data, wherein the context data specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains.
2. The system of claim 1, comprising: an analytics module configured to provide a plurality of services, wherein the services are directed at different areas of analysis, wherein the context data specify one area of analysis of the different areas of analysis at which the services are directed.
3. The system of claim 1 or 2, wherein the context data further specify a technique to be applied by the learning module.
4. The system of claim 3, wherein the technique comprises one or more of: classification; and clustering.
5. The system of any one of claims 1 to 4, wherein the different areas of analysis comprise one or more of: a root cause analysis; a service impact analysis; a fault prediction analysis; a traffic prediction analysis; a security/threat analysis;
a service/resource optimization analysis; and a service/application performance analysis.
6. The system of any one of claims 1 to 5, wherein the one or more technical environments include one or more of: application management; server management; telecommunications networks; wide area networks; data center network operations; cloud operations; and security operations.
7. The system of any one of claims 1 to 6, wherein the feature selection module is configured to assign different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
8. The system of claim 7, wherein the feature selection module is configured to determine a relationship between features and factors, wherein a congruence between a feature and a factor is to be determined based on fuzzification.
9. The system of any one of claims 7 or 8, wherein the feature selection module is configured to assign different attribute vectors to the different areas of analysis, wherein an attribute vector comprises a subset of the factors, wherein the subset is selected based on a relevance score of the factors in view of the area of analysis.
10. The system of claim 9, wherein the feature selection module is configured to assign scores to the features of the training data based on the attribute vector corresponding to the area of analysis.
11. A method of training data feature selection for extracting rules from the training data, comprising: receiving context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains; selecting features of the training data based on the context data and the domain information; and feeding a machine learning module with the training data and information on the selected features.
12. The method of claim 11 , wherein the different areas of analysis comprise one or more of: a root cause analysis; a service impact analysis; a fault prediction analysis; a traffic prediction analysis; a security/threat analysis; a service/resource optimization analysis; and a service/application performance analysis.
13. The method of claim 11 or 12, wherein the one or more technical environments include one or more of: application management; server management; telecommunications networks; wide area networks; data center network operations; cloud operations; and
security operations.
14. The method of any one of claims 11 to 13, comprising: assigning different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
15. The method of claim 14, comprising: determining a relationship between features and factors, wherein determining a congruence between a feature and a factor is based on fuzzification.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP17731111.5A EP3612980A1 (en) | 2017-06-12 | 2017-06-12 | Automatic feature selection in machine learning |
PCT/EP2017/064317 WO2018228667A1 (en) | 2017-06-12 | 2017-06-12 | Automatic feature selection in machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2017/064317 WO2018228667A1 (en) | 2017-06-12 | 2017-06-12 | Automatic feature selection in machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018228667A1 true WO2018228667A1 (en) | 2018-12-20 |
Family
ID=59078048
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2017/064317 WO2018228667A1 (en) | 2017-06-12 | 2017-06-12 | Automatic feature selection in machine learning |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP3612980A1 (en) |
WO (1) | WO2018228667A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110598760A (en) * | 2019-08-26 | 2019-12-20 | 华北电力大学(保定) | Unsupervised characteristic selection method for transformer vibration data |
US12045317B2 (en) | 2021-11-23 | 2024-07-23 | International Business Machines Corporation | Feature selection using hypergraphs |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009141631A2 (en) * | 2008-05-23 | 2009-11-26 | Sanctuary Personnel Limited | An improved neuro type-2 fuzzy based method for decision making |
US20100005042A1 (en) * | 2003-11-18 | 2010-01-07 | Aureon Laboratories, Inc. | Support vector regression for censored data |
AU2013100982A4 (en) * | 2013-07-19 | 2013-08-15 | Huaiyin Institute Of Technology, China | Feature Selection Method in a Learning Machine |
US20150058993A1 (en) * | 2013-08-23 | 2015-02-26 | The Boeing Company | System and method for discovering optimal network attack paths |
-
2017
- 2017-06-12 EP EP17731111.5A patent/EP3612980A1/en not_active Ceased
- 2017-06-12 WO PCT/EP2017/064317 patent/WO2018228667A1/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100005042A1 (en) * | 2003-11-18 | 2010-01-07 | Aureon Laboratories, Inc. | Support vector regression for censored data |
WO2009141631A2 (en) * | 2008-05-23 | 2009-11-26 | Sanctuary Personnel Limited | An improved neuro type-2 fuzzy based method for decision making |
AU2013100982A4 (en) * | 2013-07-19 | 2013-08-15 | Huaiyin Institute Of Technology, China | Feature Selection Method in a Learning Machine |
US20150058993A1 (en) * | 2013-08-23 | 2015-02-26 | The Boeing Company | System and method for discovering optimal network attack paths |
Non-Patent Citations (2)
Title |
---|
STEVEN LAUWEREINS ET AL: "Context-and cost-aware feature selection in ultra-low-power sensor interfaces", ESANN 2014 PROCEEDINGS, EUROPEAN SYMPOSIUM ON ARTIFICIAL NEURAL NETWORKS, COMPUTATIONAL INTELLIGENCE AND MACHINE LEARNING, 23 April 2014 (2014-04-23), Bruges, Belgium, XP055452498, ISBN: 978-2-87419-095-7, Retrieved from the Internet <URL:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.644.928&rep=rep1&type=pdf> [retrieved on 20180219] * |
XINCHUAN ZENG ET AL: "Feature weighting using neural networks", NEURAL NETWORKS, 2004. PROCEEDINGS. 2004 IEEE INTERNATIONAL JOINT CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, vol. 2, 25 July 2004 (2004-07-25), pages 1327 - 1330, XP010758822, ISBN: 978-0-7803-8359-3, DOI: 10.1109/IJCNN.2004.1380137 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110598760A (en) * | 2019-08-26 | 2019-12-20 | 华北电力大学(保定) | Unsupervised characteristic selection method for transformer vibration data |
CN110598760B (en) * | 2019-08-26 | 2023-10-24 | 华北电力大学(保定) | Unsupervised feature selection method for vibration data of transformer |
US12045317B2 (en) | 2021-11-23 | 2024-07-23 | International Business Machines Corporation | Feature selection using hypergraphs |
Also Published As
Publication number | Publication date |
---|---|
EP3612980A1 (en) | 2020-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kaluvakuri et al. | AI-Powered Predictive Thread Deadlock Resolution: An intelligent system for early detection and prevention of thread deadlocks in cloud applications | |
CN119071052B (en) | Network anomaly monitoring method and system for switch | |
WO2020172122A1 (en) | Anomaly detection with reduced memory overhead | |
CN110768971B (en) | Confrontation sample rapid early warning method and system suitable for artificial intelligence system | |
CN113158664B (en) | Lexical analyzer for neuro-linguistic behavior recognition system | |
WO2022053163A1 (en) | Distributed trace anomaly detection with self-attention based deep learning | |
CN113239065A (en) | Big data based security interception rule updating method and artificial intelligence security system | |
AU2020200629A1 (en) | Method and system for reducing incident alerts | |
CN113821418A (en) | Fault tracking analysis method and device, storage medium and electronic equipment | |
Grover | Anomaly detection for application log data | |
CN116541792A (en) | Method for carrying out group partner identification based on graph neural network node classification | |
WO2018228667A1 (en) | Automatic feature selection in machine learning | |
Liu et al. | IoT Network Traffic Analysis with Deep Learning | |
KR20230024747A (en) | Apparatus and method for classifying failure alarm for heterogeneous network apparatuses | |
Tao et al. | Giving Every Modality a Voice in Microservice Failure Diagnosis via Multimodal Adaptive Optimization | |
CN118708932A (en) | A feature perception pre-training method and system for time series anomaly detection | |
Kim et al. | Revitalizing self-organizing map: Anomaly detection using forecasting error patterns | |
CN116681350A (en) | Intelligent factory fault detection method and system | |
US20230053322A1 (en) | Script Classification on Computing Platform | |
Zhao et al. | Research on machine learning-based correlation analysis method for power equipment alarms | |
LYU et al. | Alarm-Based Root Cause Analysis Based on Weighted Fault Propagation Topology for Distributed Information Network | |
Katragadda et al. | Machine Learning-Enhanced Root Cause Analysis for Rapid Incident Management in High-Complexity Systems | |
Zheng et al. | Unsafe Behavior Detection with Adaptive Contrastive Learning in Industrial Control Systems | |
Ramamoorthy et al. | Digital Twin-Driven Intrusion Detection for IoT and Cyber-Physical System | |
CN118474682B (en) | Service short message monitoring method and system based on big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17731111 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2017731111 Country of ref document: EP Effective date: 20191119 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |