fastmarc

A high-performance MARC file reader built on top of pymarc, written with Cython and memory-mapped I/O.

fastmarc gives you the ability to index, count, and retrieve MARC records in ways that are not possible with vanilla pymarc, while still remaining fully compatible with pymarc.Record.

Features

Instant record counts (`len`)

The entire file is scanned once using a lightweight Cython indexer. You can get the number of records with len(reader) without parsing a single record.

Same iteration speed as `pymarc.MARCReader`

Iterating through fastmarc.FastMARCReader yields pymarc.Record objects at the same throughput as pymarc.MARCReader, because parsing is still handled by pymarc internally.

Random access by record index to file position

get_seek_map() gives you the exact byte offset of every record in the file.
get_record(i) returns the pymarc.Record for the record at index i without parsing any other records

Drop-in compatibility

Iteration still yields pymarc.Record objects, so you can keep using existing pymarc workflows without modification.

Why use fastmarc?

If you work with large MARC files (hundreds of MBs to many GBs), pymarc can be limiting:

Counting records requires iterating through and parsing them all.
Random access is not possible — you must walk the file sequentially.
Large files can take minutes just to “skip ahead.”

With fastmarc you can:

Check the number of records almost instantly.
Jump to any record’s byte position in constant time.
Iterate with the same speed as pymarc but without losing indexing metadata.

Installation

fastmarc is still in development and not yet published to PyPI. For now, you’ll need to install it from this repository:

pip install git+https://github.com/RvanB/fastmarc.git

Usage

from fastmarc import MARCReader

with open("records.mrc", "rb") as f:
    reader = MARCReader(f)

    # Record count instantly (no parsing)
    print(len(reader))

    # Iterate through records (same as pymarc)
    for rec in reader:
        print(rec["245"]["a"])

    # Get the byte offset of record 100
    offsets = reader.get_seek_map()
    print("Record 100 starts at byte:", offsets[99])

    # Get the 100th record
    print(reader.get_record(99))

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
fastmarc		fastmarc
README.org		README.org
benchmark.py		benchmark.py
pyproject.toml		pyproject.toml
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

fastmarc

Features

Instant record counts (`len`)

Same iteration speed as `pymarc.MARCReader`

Random access by record index to file position

Drop-in compatibility

Why use fastmarc?

Installation

Usage

About

Uh oh!

Releases

Packages

Languages

RvanB/fastmarc

Folders and files

Latest commit

History

Repository files navigation

fastmarc

Features

Instant record counts (__len__)

Same iteration speed as pymarc.MARCReader

Random access by record index to file position

Drop-in compatibility

Why use fastmarc?

Installation

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Instant record counts (`len`)

Same iteration speed as `pymarc.MARCReader`

Packages