+
Skip to content

RvanB/fastmarc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fastmarc

A high-performance MARC file reader built on top of pymarc, written with Cython and memory-mapped I/O.

fastmarc gives you the ability to index, count, and retrieve MARC records in ways that are not possible with vanilla pymarc, while still remaining fully compatible with pymarc.Record.

Features

Instant record counts (__len__)

The entire file is scanned once using a lightweight Cython indexer. You can get the number of records with len(reader) without parsing a single record.

Same iteration speed as pymarc.MARCReader

Iterating through fastmarc.FastMARCReader yields pymarc.Record objects at the same throughput as pymarc.MARCReader, because parsing is still handled by pymarc internally.

Random access by record index to file position

  • get_seek_map() gives you the exact byte offset of every record in the file.
  • get_record(i) returns the pymarc.Record for the record at index i without parsing any other records

Drop-in compatibility

Iteration still yields pymarc.Record objects, so you can keep using existing pymarc workflows without modification.

Why use fastmarc?

If you work with large MARC files (hundreds of MBs to many GBs), pymarc can be limiting:

  • Counting records requires iterating through and parsing them all.
  • Random access is not possible — you must walk the file sequentially.
  • Large files can take minutes just to “skip ahead.”

With fastmarc you can:

  • Check the number of records almost instantly.
  • Jump to any record’s byte position in constant time.
  • Iterate with the same speed as pymarc but without losing indexing metadata.

Installation

fastmarc is still in development and not yet published to PyPI. For now, you’ll need to install it from this repository:

pip install git+https://github.com/RvanB/fastmarc.git

Usage

from fastmarc import MARCReader

with open("records.mrc", "rb") as f:
    reader = MARCReader(f)

    # Record count instantly (no parsing)
    print(len(reader))

    # Iterate through records (same as pymarc)
    for rec in reader:
        print(rec["245"]["a"])

    # Get the byte offset of record 100
    offsets = reader.get_seek_map()
    print("Record 100 starts at byte:", offsets[99])

    # Get the 100th record
    print(reader.get_record(99))

About

A fast MARCReader for use with Pymarc providing some nice performance features.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载