Comicbox

A comic book archive metadata reader and writer.

✨ Features

📚 Comic Formats

Comicbox reads CBZ, CBR, CBT, and optionally PDF. Comicbox archives and writes CBZ archives and PDF metadata.

🏷️ Metadata Formats

Comicbox reads and writes:

ComicRack ComicInfo.xml v2.1 (draft) schema,
Metron MetronInfo.xml v1.0
Comic Book Lover ComicBookInfo schema
CoMet schema.
PDF Metadata.
- Embedding ComicInfo.xml or MetronInfo.xml inside PDFs.
A variety of filename schemes that encode metadata.

Usefulness

Comicbox's primary purpose is a library for use by Codex comic reader. The API isn't well documented, but you can infer what it does pretty easily here: comicbox.comic_archive as the primary interface.

The command line can perform most of comicbox's functions including reading and writing metadata recursively, converting between metadata formats and extracting pages.

Limitations and Alternatives

Comicbox does not use popular metadata database APIs or have a GUI!

Comictagger probably the most useful comicbook tagger. It does most of what Comicbox does but also automatically tags comics with the ComicVine API and has a desktop UI.

📜 News

Comicbox has a NEWS file to summarize changes that affect users.

🕸️ HTML Docs

HTML formatted docs are available here

📦 Installation

pip install comicbox

Comicbox supports PDFs as an extra when installed like:

pip install comicbox[pdf]

Dependencies

Base

Comicbox generally works without any binary dependencies but requires unrar be on the path to convert CBR into CBZ or extract files from CBRs.

PDF

The pymupdf dependency has wheels that install a local version of libmupdf. But for some platforms (e.g. Linux on ARM, Windows) it may require libstdc++ and c/c++ build tools installed to compile a libmupdf. More detail on this is available in the pymupdf docs.

Installing Comicbox on ARM (AARCH64) with Python 3.13

Pymupdf has no pre-built wheels for AARCH64 so pip must build it and the build fails on Python 3.13 without this environment variable set:

PYMUPDF_SETUP_PY_LIMITED_API=0 pip install comicbox

You will also have to have the build-essential and python3-dev or equivalent packages installed on on your Linux.

⌨️ Use

Related Projects

Comicbox makes use of two of my other small projects:

comicfn2dict which parses metadata in comic filenames into python dicts. This library is also used by Comictagger.

pdffile which presents a ZipFile like interface for PDF files.

Console

Type

comicbox -h

see the CLI help.

Examples

comicbox test.cbz -m "{Tags: a,b,c, story_arcs: {d:1,e:'',f:3}" -m "Publisher: SmallComics" -w cr

Will write those tags to comicinfo.xml in the archive.

Be sure to add spaces after colons so they parse as valid YAML key value pairs. This is easy to forget.

But it's probably better to use the --print action to see what it's going to do before you actually write to the archive:

comicbox test.cbz -m "{Tags: a,b,c, story_arcs: {d:1,e:'',f:3}" -m "Publisher: SmallComics" -p

A recursive example:

comicbox --recurse -m "publisher: 'SC Comics'" -w cr ./SmallComicsComics/

Will recursively change the publisher to "SC Comics" for every comic found in under the SmallComicsComics directory.

Escaping YAML

the -m command line argument accepts the YAML language for tags. Certain characters like \,:;_()$%^@ are part of the YAML language. To successful include them as data in your tags, look up "Escaping YAML" documentation online

Deleting Metadata

To delete metadata from the cli you're best off exporting the current metadata, editing the file and then re-importing it with the delete previous metadata option:

# export the current metadata
comicbox --export cix "My Overtagged Comic.cbz"
# Adjust the metadata in an editor.
nvim comicinfo.xml
# Check that importing the metadata will look how you like
comicbox --import comicinfo.xml -p "My Overtagged Comic.cbz"
# Delete all previous metadata from the comic (careful!)
comicbox --delete-all-tags "My Overtagged Comic.cbz"
# Import the metadata into the file and write it.
comicbox --import comicinfo.xml --write cix "My Overtagged Comic.cbz"

Quirks

--metadata parses all formats.

The comicbox.yaml format represents the ComicInfo.xml Web tag as sub an identifiers.<NID>.url tag. But fear not, you don't have to remember this. The CLI accepts heterogeneous tag types with the -m option, so you can type:

comicbox -p -m "Web: https://foo.com" mycomic.cbz

and the identifier tag should appear in comicbox.yaml as:

identifiers:
    foo.com:
        id_key: ""
        url: https://foo.com

You don't even need the root tag.

Setting Title when Stories are present.

If the metadata contains Stories (MetronInfo.xml only) the title is computed from the Stories. If you wish to set the title regardless, use the --replace option. e.g.

comicbox -m "series: 'G.I. Robot', title: 'Foreign and Domestic'" -Rp

But be aware it will also create a story with the title's new name.

Identifiers

Comicbox aggregates IDS, GTINS and URLS from other formats into a common Identifiers structure.

Reprints

Comicbox aggregates Alternate Names, Aliases and IsVersionOf from other formats into a common Reprints list.

URNs

Because the Notes field is commonly abused in ComicInfo.xml to represent fields ComicInfo does not (yet?) support comicbox parses the notes field heavily looking for embedded data. Comicbox also writes identifiers into the Notes field using an Uniform Resource Name format.

Comicbox also looks for identifiers in Tag fields of formats that don't have their own Identifiers field.

Prettified Fields

Comicbox liberally accepts all kinds of values that may be enums in other formats, like AgeRating, Formats and Creidit Roles. In a weak attempt to standardize these values comicbox will Title case values submitted to these fields. When writing to standard formats, comicbox attempts to transforms these values into enums supported by the output format.

Packages

Comicbox actually installs three different packages:

comicbox The main API and CLI script.
comicfn2dict A separate library for parsing comic filenames into dicts it also includes a CLI script.
pdffile A utility library for reading and writing PDF files with an API like Python's ZipFile

⚙️ Config

comicbox accepts command line arguments but also an optional config file and environment variables.

The variables have defaults specified in a default yaml

The environment variables are the variable name prefixed with COMICBOX_. (e.g. COMICBOX_COMICINFOXML=0)

Log Level

change logging level:

LOGLEVEL=ERROR comicbox -p <path>

🛠 API

Comicbox is mostly used by me in Codex as a metadata extractor. Here's a brief example, but the API remains undocumented.

with Comicbox(path_to_comic) as cb:
  metadata = cb.to_dict()
  page_count = cb.page_count()
  file_type = cb.get_file_type()
  mtime = cb.get_metadata_mtime()
  image_data = car.get_cover_page(to_pixmap=True)

Attached to these docs in the navigation header there are some auto generated API docs that might be better than nothing.

API Example

I don't have many examples yet. But here's one someone asked about on GitHub.

Adding a ComicInfo.xml formatted dict to the metadata

from argparse import Namespace

from comicbox.box import Comicbox
from comicbox.transforms.comicinfo import ComicInfoTransform


CBZ_PATH = Path("/Users/GullyFoyle/Comics/DC Comics/Star Spangled War Stories/Star Spangled War Stories (1962) #101.cbz")
CONFIG = Namespace(
   # This config writes comicinfo.xml and also reads comicinfo.xml from the source file.
   # If you don't want to read old data, do not include the read argument.
    comicbox=Namespace(write=["cix"], read=["cix"], compute_pages=False)
)

# You can use any comic metadata format as long as it matches it's transform class.
CIX_DICT = { .... } # A ComicInfo.xml style dict.
# xml dicts are those parsed and emitted by xmltodict https://github.com/martinblech/xmltodict
# read about complex elements with attributes on that page.
SOURCE_TRANSFORM_CLASS = ComicInfoTransform

with Comicbox(CBZ_PATH, config=WRITE_CONFIG) as car:
    car.add_source(CIX_DICT, SOURCE_TRANSFORM_CLASS)
    car.write()   # this will write using the config to the cbz_path.

This code would be similar to these command line arguments:

comicbox --import my-own-comicbox.json --import my-own-comicinfo.xml --write cr "Star Spangled War Stories (1962) #101.cbz"

📋 Schemas

Comicbox supports most popular comicbook metadata schema definitions. These are defined on the SCHEMAS page.

🔀 Tag Translations

A rough table of how Comicbox handles tag translations between popular comic book metadata formats.

🛠 Development

Comicbox code is hosted at Github

You may access most development tasks from the makefile. Run make to see documentation.

Environment variables

There is a special environment variable DEBUG_TRANSFORM that will print verbose schema transform information

Name		Name	Last commit message	Last commit date
Latest commit History 435 Commits
.circleci		.circleci
bin		bin
comicbox		comicbox
docs		docs
schemas		schemas
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
.readthedocs.yaml		.readthedocs.yaml
.remarkignore		.remarkignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
NEWS.md		NEWS.md
README.md		README.md
check-jsonschema.sh		check-jsonschema.sh
comicbox.py		comicbox.py
comicbox.sh		comicbox.sh
debian.sources		debian.sources
docker-compose.yaml		docker-compose.yaml
eslint.config.js		eslint.config.js
mkdocs.yml		mkdocs.yml
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock
validate-schema.sh		validate-schema.sh

License

ajslater/comicbox

Folders and files

Latest commit

History

Repository files navigation

Comicbox

✨ Features

📚 Comic Formats

🏷️ Metadata Formats

Usefulness

Limitations and Alternatives

📜 News

🕸️ HTML Docs

📦 Installation

Dependencies

Base

PDF

Installing Comicbox on ARM (AARCH64) with Python 3.13

⌨️ Use

Related Projects

Console

Examples

Escaping YAML

Deleting Metadata

Quirks

--metadata parses all formats.

Setting Title when Stories are present.

Identifiers

Reprints

URNs

Prettified Fields

Packages

⚙️ Config

Log Level

🛠 API

API Example

Adding a ComicInfo.xml formatted dict to the metadata

📋 Schemas

🔀 Tag Translations

🛠 Development

Environment variables

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 17

Packages 0

Uh oh!

Languages

Packages