这是indexloc提供的服务,不要输入任何密码
Skip to main content

Succinct BWT-Based Sequence Prediction

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11707))

Included in the following conference series:

Abstract

Sequences of symbols can be used to represent data in many domains such as text documents, activity logs, customer transactions and website click-streams. Sequence prediction is a popular task, which consists of predicting the next symbol of a sequence, given a set of training sequences. Although numerous prediction models have been proposed, many have a low accuracy because they are lossy models (they discard information from training sequences to build the model), while lossless models are often more accurate but typically consume a large amount of memory. This paper addresses these issues by proposing a novel sequence prediction model named SuBSeq that is lossless and utilizes the succinct Wavelet Tree data structure and the Burrows-Wheeler Transform to compactly store and efficiently access training sequences for prediction. An experimental evaluation shows that SuBSeq has a very low memory consumption and excellent accuracy when compared to eight state-of-the-art predictors on seven real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Logs are to base 2 unless stated otherwise.

  2. 2.

    Available at http://www.philippe-fournier-viger.com/spmf.

  3. 3.

    Details about QUEST exported data, are available at github.com/rafkt/SUBSEQ.

  4. 4.

    Same evaluation approach was followed for CPT+[6].

References

  1. Balle, B., Eyraud, R., Luque, F.M., Quattoni, A., Verwer, S.: Results of the sequence PredIction ChallengE (SPiCe): a competition on learning the next symbol in a sequence. In: Proceedings 13th International Conference in Grammatical Inference, vol. 57. JMLR W&CP, Delft (2016)

    Google Scholar 

  2. Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Technical report, 124, Digital Equiptment Corporation (1994)

    Google Scholar 

  3. Cleary, J., Witten, I.: Data compression using adaptive coding and partial string matching. IEEE Trans. Commun. 32(4), 396–402 (1984)

    Article  Google Scholar 

  4. Gog, S.: simongog/sdsl-lite (2015). https://github.com/simongog/sdsl-lite

  5. Gog, S., Petri, M.: Optimized succinct data structures for massive data. Softw. Pract. Experience 44(11), 1287–1314 (2014)

    Article  Google Scholar 

  6. Gueniche, T., Fournier-Viger, P., Raman, R., Tseng, V.S.: CPT+: decreasing the time/space complexity of the compact prediction tree. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9078, pp. 625–636. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18032-8_49

    Chapter  Google Scholar 

  7. Gueniche, T., Fournier-Viger, P., Tseng, V.S.: Compact prediction tree: a lossless model for accurate sequence prediction. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) ADMA 2013. LNCS (LNAI), vol. 8347, pp. 177–188. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-53917-6_16

    Chapter  Google Scholar 

  8. Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Hybrid compression of bitvectors for the FM-index. In: Proceedings DCC, pp. 302–311. IEEE (2014)

    Google Scholar 

  9. Laird, P., Saul, R.: Discrete sequence prediction and its applications. Mach. Learn. 15(1), 43–68 (1994)

    MATH  Google Scholar 

  10. Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  11. Manzini, G.: An analysis of the Burrows-Wheeler transform. J. ACM 48(3), 407–430 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  12. Navarro, G.: Wavelet trees for all. J. Discrete Algorithms 25, 2–20 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  13. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1) (2007). Article 2

    Article  MATH  Google Scholar 

  14. Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: Proceedings ALENEX, pp. 60–70. SIAM (2007)

    Google Scholar 

  15. Padmanabhan, V.N., Mogul, J.C.: Using predictive prefetching to improve world wide web latency. SIGCOMM Comput. Commun. Rev. 26(3), 22–36 (1996)

    Article  Google Scholar 

  16. Pitkow, J., Pirolli, P.: Mining longest repeating subsequences to predict world wide web surfing. In: Proceedings of the 2nd Conference on USENIX Symposium on Internet Technologies and Systems - Volume 2, USITS 1999, pp. 139–150 (1999)

    Google Scholar 

  17. Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms 3(4), 43 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  18. Rjeily, C.B., Badr, G., Al Hassani, A.H., Andres, E.: Predicting heart failure class using a sequence prediction algorithm. In: 2017 Fourth International Conference on Advances in Biomedical Engineering (ICABME), pp. 1–4. IEEE (2017)

    Google Scholar 

  19. Tax, N.: Human activity prediction in smart home environments with LSTM neural networks. In: 2018 14th International Conference on Intelligent Environments (IE), pp. 40–47. IEEE (2018)

    Google Scholar 

  20. Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: Proceedings ACM SIGKDD, pp. 401–406. ACM (2001)

    Google Scholar 

  21. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theor. 24(5), 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafael Ktistakis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ktistakis, R., Fournier-Viger, P., Puglisi, S.J., Raman, R. (2019). Succinct BWT-Based Sequence Prediction. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11707. Springer, Cham. https://doi.org/10.1007/978-3-030-27618-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27618-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27617-1

  • Online ISBN: 978-3-030-27618-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics