-
Notifications
You must be signed in to change notification settings - Fork 2
Description
We're working on PechaData, a multilingual Buddhist corpus project in collaboration with bdrc.io and pecha.org. As a format, Stam is a dream for our project, and we're starting to build our project on top of it with a mechanism to update annotation coordinates when the base text is updated.
However, our dataset includes many large texts (>10mb .txt) featuring multiple annotation layers often larger than the initial text file and we are concerned about performance issues when we have to load all the annotations in memory even when we only need a couple of sets of annotations. (i.e. we have a file with 15 annotation sets including POS tags and dependencies but we only need the text and the annotations for the table of content.)
Have you considered externalizing annotations in separate files like the .ann files of BrAT or do you have another solution to load annotations selectively? We thought about patching Stam to find a solution but we would much prefer a solution coming from the creators.
Thanks a lot for your work!