这是indexloc提供的服务,不要输入任何密码
Skip to content

Cannot generate sff_file unlabelled data set file #51

@binhngoc17

Description

@binhngoc17

My vw data is of this format

| this is great
| I try to learn English everyday
[...]

saved as data.vw
I try to run this code:

from rosetta.text.vw_helpers import LDAResults
from rosetta.text.text_processors import SFileFilter, VWFormatter

def generate_filefilter():
    sff = SFileFilter(VWFormatter())
    sff.load_sfile('data.lda.vw')

    df = sff.to_frame()
    df.head()
    df.describe()

    sff.filter_extremes(doc_freq_min=5, doc_fraction_max=0.8)
    sff.compactify()
    sff.save('sff_file.pkl')

if __name__ == '__main__':
    generate_filefilter()

And the error is:

Traceback (most recent call last):
  File "/<home>/.venv/lib/python2.7/site-packages/rosetta/text/text_processors.py", line 380, in _parse_preamble
    if preamble[-1] != ' ':
IndexError: string index out of range

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions