-
Notifications
You must be signed in to change notification settings - Fork 72
Development #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Development #38
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Push small update to dev
importance scores. Might as well do this as soon as possible.
scoring_utils.py. Inserted commented out code to fix the distance array calculation when missing data is present. Not implemented yet.
calculation is properly normalized by the number of uniquely missing features in comparing instance pair distances. This way distance is computed agnostically with respect to missing values. The previous implementation would make instances with more missing values appear closer to one another.
distance calculation within get_row_missing(). Added 'cmins' variable passed forward for subtraction of minimum value. Some modifications still needed to complete this fix.
It appears it it only set up to handle binary endpoints correctly, not multiclass or continuous valued endpoints. This problem does not exist for the other 4 core algorithms (SURF, SURF*, MultiSURF, and MultiSURF*)
binary class endpoints.
endpoint nearest neighbor determination. Also got rid of mmdiff in scoring_utils for discrete endpoints (only needed for continuous endpoints.) Identified a problem in 'compute_score'. For #far score contributions, continuous valued features should not check for equality (this will lead to many scores being left out.)
original data array only for datasets with all continuous values) now prenormalization is run on 'xc' for such data. Also Fixed scoring update for any continuous features (no prior feature equivalence check. This causes definitely problems for data with continuous features particularly in MultiSURF*. So far only fixed for binary endpoints. *fixed normalization for binary endpoint scoring. Added normalization by 'n' (number of training instances) this doesn't appear to be in here anywhere for any of the Relief methods.
issues as well as normalization fixes and update to be more relatable to the rebate papers.
and mmdiff. I also changed the use of abs value continuous feature difference and mmdiff normalization so it's only called when a continuous feature is present rather than for any feature regardless.
count_miss happens to be zero.
happen when there are very few features and instances.
and multisurfstar for parallelization check.
Ryan multiclass - This merge incorporates a number of essential ReBATE bug fixes having to do with proper score normalization, multi-class functionality, continuous feature performance consistency, and the incorporation of a new ramp function that helps counter some of the discrete feature bias in data with a mix of discrete/continuous variables.
Ryan multiclass
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Version 5 update including multiclass functionality, fixed score normalizations, and a ramp function for mixed variable datasets.