Development #38

ryanurbs · 2018-04-03T23:08:29Z

Version 5 update including multiclass functionality, fixed score normalizations, and a ramp function for mixed variable datasets.

Push small update to dev

…re algos

importance scores. Might as well do this as soon as possible.

scoring_utils.py. Inserted commented out code to fix the distance array calculation when missing data is present. Not implemented yet.

calculation is properly normalized by the number of uniquely missing features in comparing instance pair distances. This way distance is computed agnostically with respect to missing values. The previous implementation would make instances with more missing values appear closer to one another.

distance calculation within get_row_missing(). Added 'cmins' variable passed forward for subtraction of minimum value. Some modifications still needed to complete this fix.

It appears it it only set up to handle binary endpoints correctly, not multiclass or continuous valued endpoints. This problem does not exist for the other 4 core algorithms (SURF, SURF*, MultiSURF, and MultiSURF*)

possible changes

binary class endpoints.

endpoint nearest neighbor determination. Also got rid of mmdiff in scoring_utils for discrete endpoints (only needed for continuous endpoints.) Identified a problem in 'compute_score'. For #far score contributions, continuous valued features should not check for equality (this will lead to many scores being left out.)

original data array only for datasets with all continuous values) now prenormalization is run on 'xc' for such data. Also Fixed scoring update for any continuous features (no prior feature equivalence check. This causes definitely problems for data with continuous features particularly in MultiSURF*. So far only fixed for binary endpoints. *fixed normalization for binary endpoint scoring. Added normalization by 'n' (number of training instances) this doesn't appear to be in here anywhere for any of the Relief methods.

issues as well as normalization fixes and update to be more relatable to the rebate papers.

and mmdiff. I also changed the use of abs value continuous feature difference and mmdiff normalization so it's only called when a continuous feature is present rather than for any feature regardless.

count_miss happens to be zero.

happen when there are very few features and instances.

unit testing

test errors.

testing.

and multisurfstar for parallelization check.

Ryan multiclass - This merge incorporates a number of essential ReBATE bug fixes having to do with proper score normalization, multi-class functionality, continuous feature performance consistency, and the incorporation of a new ramp function that helps counter some of the discrete feature bias in data with a mix of discrete/continuous variables.

Ryan multiclass

coveralls · 2018-04-03T23:26:05Z

Coverage increased (+6.5%) to 76.667% when pulling d5b6ef8 on development into e620798 on master.

rhiever and others added 30 commits December 27, 2017 08:22

Merge pull request #32 from EpistasisLab/master

386ea28

Push small update to dev

TuRF detailed prints with header settings

3f33d0c

All core algos removed comments

bc0057f

Modified docs

f903f5a

Updated docstrings

b072185

multiclass

ae380ff

multiclass

5c3e72b

made attr a class variable and added multiclass capability for all co…

52debc4

…re algos

Updated far scoring in multiclass case

f02739f

Added clarifying comments to fit() in relieff

600ea71

Moved deletion of distance array to just after calculation of feature

660132d

importance scores. Might as well do this as soon as possible.

Added clarifying comments to the rest of relieff.py and

fe4f332

scoring_utils.py. Inserted commented out code to fix the distance array calculation when missing data is present. Not implemented yet.

Partial fix to proper normalization of continuous value range in

66fc318

distance calculation within get_row_missing(). Added 'cmins' variable passed forward for subtraction of minimum value. Some modifications still needed to complete this fix.

Added more comments. Identified an issue in relieff, find_neighbors().

6d5b150

It appears it it only set up to handle binary endpoints correctly, not multiclass or continuous valued endpoints. This problem does not exist for the other 4 core algorithms (SURF, SURF*, MultiSURF, and MultiSURF*)

Minor comment and code rearrangement

3c137a5

added further comments to the code for clarification and to identify

ea92361

possible changes

Added Saurav's fix for nearest neighbor selection of multiclass and

e8742e9

binary class endpoints.

Completed fixes to 'compute_score' to fixe continuous feature scoring

3e52bef

issues as well as normalization fixes and update to be more relatable to the rebate papers.

Added remaining clarifying comments to individual RBA module files.

594a92e

minor comment addition

24343f5

Manually added weixuan fu's bug fixes for scoring_units about data_len

2b95aa0

and mmdiff. I also changed the use of abs value continuous feature difference and mmdiff normalization so it's only called when a continuous feature is present rather than for any feature regardless.

Made fix pointed out by Weixuan regarding special case when count_hit or

27177fa

count_miss happens to be zero.

analysis example dataset file path (csv changed to tsv)

0902bc5

Fixed special case where both count hit and miss are zero. This could

77dbeef

happen when there are very few features and instances.

in tests.py, changed all TuRF step sizes from 0.1 to 0.4 to speed up

008fdf9

unit testing

Fixed some errors in my ramp function implementation causing additional

d0778c4

test errors.

Minor fix to ramp function implementation to fix error in integrated

3551877

testing.

ryanurbs added 8 commits April 3, 2018 14:51

Included TuRF to init import, and removed unnecessary code from surfstar

2e2e214

and multisurfstar for parallelization check.

Added documentation updates for scikit-rebate version 0.5

942bd76

minor update to README

d8c459f

update html documentation generated by mkdocs

08c11a1

updata html documentation generated by mkdocs

44ade39

more html changes

9c38f30

Merge pull request #37 from sauravbose/ryan_multiclass

d5b6ef8

Ryan multiclass

ryanurbs merged commit 08aa096 into master Apr 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Development #38

Development #38

Uh oh!

ryanurbs commented Apr 3, 2018

Uh oh!

coveralls commented Apr 3, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Development #38

Development #38

Uh oh!

Conversation

ryanurbs commented Apr 3, 2018

Uh oh!

coveralls commented Apr 3, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants