+
Skip to content
@datalab-to

Datalab

Developing state of the art document intelligence models.

Pinned Loading

  1. marker marker Public

    Convert PDF to markdown + JSON quickly with high accuracy

    Python 29.1k 1.9k

  2. surya surya Public

    OCR, layout analysis, reading order, table recognition in 90+ languages

    Python 18.7k 1.3k

  3. pdftext pdftext Public

    Extract structured text from pdfs quickly

    Python 610 58

Repositories

Showing 8 of 8 repositories
  • marker Public

    Convert PDF to markdown + JSON quickly with high accuracy

    datalab-to/marker’s past year of commit activity
    Python 29,146 1,931 279 39 Updated Oct 11, 2025
  • surya Public

    OCR, layout analysis, reading order, table recognition in 90+ languages

    datalab-to/surya’s past year of commit activity
    Python 18,693 1,269 123 13 Updated Oct 9, 2025
  • oss_container Public
    datalab-to/oss_container’s past year of commit activity
    Python 0 0 0 0 Updated Oct 2, 2025
  • sdk Public
    datalab-to/sdk’s past year of commit activity
    HTML 5 MIT 3 4 1 Updated Sep 24, 2025
  • datalab-on-prem Public

    Scripts to run Datalab's self-service on-prem container

    datalab-to/datalab-on-prem’s past year of commit activity
    Shell 1 0 0 0 Updated Aug 29, 2025
  • datalab-to/inference-mirror’s past year of commit activity
    Python 3 1 0 1 Updated Aug 12, 2025
  • docext Public

    An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

    datalab-to/docext’s past year of commit activity
    Python 6 Apache-2.0 1 0 0 Updated Jun 18, 2025
  • pdftext Public

    Extract structured text from pdfs quickly

    datalab-to/pdftext’s past year of commit activity
    Python 610 Apache-2.0 58 9 5 Updated Jun 11, 2025

Most used topics

Loading…

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载