-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[ENH] fABBA Transform #8838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[ENH] fABBA Transform #8838
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Welcome to open source!
Some comments above.
Further questions/requests:
- code formatting seems to fail - if you want to know how to ensure code formatting is done automatically on your computer, follow this guide: https://www.sktime.net/en/latest/developer_guide/coding_standards.html
- this looks like a new implementation of fabba - I think there is one by the authors? This could be used, through an import: https://github.com/nla-group/fABBA/blob/master/fABBA/fabba.py no? It is of course legitimate to carry out a new implementation, I am just wondering why you chose to reimplement? Or is this partly copied code?
Will look into code formatting. |
I see - we should cite it and the paper then. |
Should I add the repo citation below the paper citation or somewhere else as well? |
Once in |
Hi @fkiraly and @poopsiclepooding, Thanks @poopsiclepooding for implementing it, later I will also update my repo to support length variability for time series. I think my paper [1] has mentioned this method can be easily extended to arbtrary length time series: I suggest the citations can be: Thanks, |
Thanks a lot @chenxinye . I will add those citations |
The failures seem to indicate that a single time series is getting transformed intio multiple. Is this expeted behaviour? If yes, we need to set tags differently. |
Yes that is expected behaviour. If we set partition hyperparameter then time series is broken into parts before apply fABBA and each part's symbolic representation is returned. What should I change the tags to? |
there is currently no valid tag for this behaviour, we will have to look into it on framework level. May I suggest to skip the failing tests for now via if an mtype is passed that can only represent equal length, and the transform changes it to unequal length, the transform tries to convert it back to the original mtype which fails. We need to have a design decision on this. |
@poopsiclepooding, I have updated the framework to handle transformers that turn series into unequal length, here: #8959 I will merge #8959 into this Pr and the tests for the fabba transformer should pass now. FYI, I changed the name to |
I think the example is now failing, see the failing tests. |
Yes fixed |
What does this implement/fix? Explain your changes.
This code implements fABBA transform as found in https://github.com/nla-group/fABBA/tree/master .
fABBA is a time series symbolic representation transform, it compresses and digitizes the time series
into series of symbols.
Have added a python implementation of fABBA along with example usage.
Depends on #8959 for the unequal length output handling, #8959 should be merged first.
Does your contribution introduce a new dependency? If yes, which one?
No new dependency.
What should a reviewer concentrate their feedback on?
Any formatting or major code issues. Any errors with particular combination of hyperparameters
Did you add any tests for the change?
Haven't added any particular test. The example provided illustrates the working of algorithm. If any test suggestions are there I can add them.
Any other comments?
Original fABBA implementation was really helpful to write this - https://github.com/nla-group/fABBA/tree/master
For all contributions
How to: add yourself to the all-contributors file in the sktime root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
See here for full badge reference
See here for further details on the algorithm maintainer role.
For new estimators