这是indexloc提供的服务,不要输入任何密码
Skip to content

Add in a min_tokens flag in SFileFilter.filter_sfile()? #37

@ApproximateIdentity

Description

@ApproximateIdentity

When dealing with sfiles I sometimes just want to drop documents (lines) altogether if they have too few tokens in them. Up until now I've just used hacked solutions and been too lazy to actually figure how and if it should be integrated into rosetta. Here is a mock-up commit of what I mean:

ApproximateIdentity@6e2916d

All it does is add a flag min_tokens to filter_sfile() which will cause the filtering to not write lines with fewer than that many tokens. Would it make sense to add this thing here? Is there a better place to add it?

And maybe most importantly, does this functionality already exist somewhere else and I've just always missed it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions