A high-performance MARC file reader built on top of pymarc, written with Cython and memory-mapped I/O.
fastmarc
gives you the ability to index, count, and retrieve MARC records in ways that are not possible with vanilla pymarc
, while still remaining fully compatible with pymarc.Record
.
The entire file is scanned once using a lightweight Cython indexer.
You can get the number of records with len(reader)
without parsing a single record.
Iterating through fastmarc.FastMARCReader
yields pymarc.Record
objects at the same throughput as pymarc.MARCReader
, because parsing is still handled by pymarc internally.
get_seek_map()
gives you the exact byte offset of every record in the file.get_record(i)
returns the pymarc.Record for the record at indexi
without parsing any other records
Iteration still yields pymarc.Record
objects, so you can keep using existing pymarc
workflows without modification.
If you work with large MARC files (hundreds of MBs to many GBs), pymarc
can be limiting:
- Counting records requires iterating through and parsing them all.
- Random access is not possible — you must walk the file sequentially.
- Large files can take minutes just to “skip ahead.”
With fastmarc
you can:
- Check the number of records almost instantly.
- Jump to any record’s byte position in constant time.
- Iterate with the same speed as
pymarc
but without losing indexing metadata.
fastmarc
is still in development and not yet published to PyPI.
For now, you’ll need to install it from this repository:
pip install git+https://github.com/RvanB/fastmarc.git
from fastmarc import MARCReader
with open("records.mrc", "rb") as f:
reader = MARCReader(f)
# Record count instantly (no parsing)
print(len(reader))
# Iterate through records (same as pymarc)
for rec in reader:
print(rec["245"]["a"])
# Get the byte offset of record 100
offsets = reader.get_seek_map()
print("Record 100 starts at byte:", offsets[99])
# Get the 100th record
print(reader.get_record(99))