When dealing with sfiles I sometimes just want to drop documents (lines) altogether if they have too few tokens in them. Up until now I've just used hacked solutions and been too lazy to actually figure how and if it should be integrated into rosetta. Here is a mock-up commit of what I mean:
ApproximateIdentity@6e2916d
All it does is add a flag min_tokens to filter_sfile() which will cause the filtering to not write lines with fewer than that many tokens. Would it make sense to add this thing here? Is there a better place to add it?
And maybe most importantly, does this functionality already exist somewhere else and I've just always missed it?