这是indexloc提供的服务,不要输入任何密码
Skip to content

Unicode and regexs etc #168

@gtoal

Description

@gtoal

You mentioned "mpc Only supports ASCII. Sorry! Writing a parser library that supports Unicode is pretty difficult. I welcome contributions!". I have a parser at https://github.com/gtoal/uparse which works entirely in Unicode - I modified a regex package to work in 32-bit code points rather than 8-bit ascii characters. I don't expect you'll be able to take anything from that parser directly but maybe reading through it you'll get some hints how to update your own code to handle Unicode. Basically it boils down to reading UTF8 on input and writing UTF8 on output, but doing all operations within the code using a 32-bit object rather than a char, which it turns out isn't really all that difficult once you get started.

Best regards,
Graham Toal

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions