这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@ryanurbs
Copy link
Member

@ryanurbs ryanurbs commented Apr 3, 2018

Version 5 update including multiclass functionality, fixed score normalizations, and a ramp function for mixed variable datasets.

rhiever and others added 30 commits December 27, 2017 08:22
importance scores. Might as well do this as soon as possible.
scoring_utils.py.  Inserted commented out code to fix the distance array
calculation when missing data is present. Not implemented yet.
calculation is properly normalized by the number of uniquely missing
features in comparing instance pair distances. This way distance is
computed agnostically with respect to missing values.  The previous
implementation would make instances with more missing values appear
closer to one another.
distance calculation within get_row_missing().  Added 'cmins' variable
passed forward for subtraction of minimum value.  Some modifications
still needed to complete this fix.
It appears it it only set up to handle binary endpoints correctly, not
multiclass or continuous valued endpoints. This problem does not exist
for the other 4 core algorithms (SURF, SURF*, MultiSURF, and MultiSURF*)
endpoint nearest neighbor determination.  Also got rid of mmdiff in
scoring_utils for discrete endpoints (only needed for continuous
endpoints.) Identified a problem in 'compute_score'.  For #far score
contributions, continuous valued features should not check for equality
(this will lead to many scores being left out.)
original data array only for datasets with all continuous values) now
prenormalization is run on 'xc' for such data. 
Also Fixed scoring update for any continuous features (no prior feature
equivalence check.  This causes definitely problems for data with
continuous features particularly in MultiSURF*. So far only fixed for
binary endpoints.
*fixed normalization for binary endpoint scoring.  Added normalization
by 'n' (number of training instances) this doesn't appear to be in here
anywhere for any of the Relief methods.
issues as well as normalization fixes and update to be more relatable to
the rebate papers.
and mmdiff.  I also changed the use of abs value continuous feature
difference and mmdiff normalization so it's only called when a
continuous feature is present rather than for any feature regardless.
happen when there are very few features and instances.
ryanurbs added 8 commits April 3, 2018 14:51
Ryan multiclass - This merge incorporates a number of essential ReBATE bug fixes having to do with proper score normalization, multi-class functionality, continuous feature performance consistency, and the incorporation of a new ramp function that helps counter some of the discrete feature bias in data with a mix of discrete/continuous variables.
@coveralls
Copy link

Coverage Status

Coverage increased (+6.5%) to 76.667% when pulling d5b6ef8 on development into e620798 on master.

@ryanurbs ryanurbs merged commit 08aa096 into master Apr 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants