这是indexloc提供的服务,不要输入任何密码
Skip to content

Text segmentation will vary by language #6

@r12a

Description

@r12a

https://github.com/WICG/handwriting-recognition/blob/main/explainer.md#the-prediction-result

segmentationResult: [TODO] Come up with a way to represent text segmentation.

Just a reminder that segmentation strategies can be very different across languages.

Some scripts don't separate words (at all), some do so with special wordspace characters, rather than spaces. Some don't have sentence punctuation, but separate phrases with gaps, or use punctuation in somewhat different ways, some not only don't separate words but also combine letters at the end+start of a word, etc.

So some flexibility will be needed, and it's really important to avoid the trap of relying on spaces to indicate segmentation boundaries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    i18n-trackerGroup bringing to attention of Internationalization, or tracked by i18n but not needing response.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions