avalpdf - PDF Accessibility Validator

A command-line tool for validating PDF accessibility, analyzing document structure, and generating detailed reports.

Features

Document structure analysis and support

Document structure analysis
Support for both local and remote PDF files

Document tags and metadata validation

Document tagging status
Title presence
Language declaration (Italian)

Heading hierarchy validation

H1 presence
Correct heading levels sequence

Figure alt text validation

Missing alternative text detection
Complex or problematic alt text patterns

Tables structure validation

Header presence and proper structure
Empty cells detection
Duplicate headers check
Multiple header rows warning
Empty tables detection

Lists structure validation

Proper list tagging
Detection of untagged lists (consecutive paragraphs with bullets/numbers)
Misused list types (numbered items in unordered lists)
List hierarchy consistency

Links validation

Detection of non-descriptive links
Raw URL text warnings
Email and institutional domain exceptions

Formatting issues detection

Excessive underscores (used for underlining)
Spaced capital letters (like "T E S T")
Extra spaces used for layout (3+ consecutive spaces)

Empty elements detection

Empty paragraphs
Whitespace-only elements
Empty headings
Empty spans
Empty table cells

Output formats

Detailed JSON structure
Simplified JSON
Accessibility validation report
Consolidated batch report for multiple files
Console reports with color-coded structure visualization

Scoring and reporting

Weighted scoring system based on accessibility criteria
Detailed issue categorization (issues, warnings, successes)

Batch processing

Process multiple files with glob patterns (e.g., *.pdf)
Directory scanning
Concise progress display for multiple files
Consolidated batch report with aggregated statistics
Parallel processing for faster validation on multi-core systems

Installation

Using pip

pip install avalpdf

Or uv

uv tool install avalpdf

Updates

Using pip

pip install avalpdf --upgrade

Or uv

uv tool install avalpdf --upgrade

Usage

After installation, you can run avalpdf from any directory.

Quick start

Simply run

avalpdf thesis.pdf

or

avalpdf https://example.com/document.pdf

to get a report like this

and a preview of the structure

Details

# Basic validation with console output
avalpdf document.pdf

# Display version information
avalpdf --version

Multi-file Analysis

avalpdf supports analyzing multiple PDF files in a single command using parallel processing:

# Multiple files specified directly
avalpdf file1.pdf file2.pdf file3.pdf

# Using wildcard pattern (use quotes on some shells)
avalpdf "*.pdf"

# Process all PDFs in a specific directory
avalpdf "reports/quarterly/*.pdf"

# Analyze all PDFs in the current directory
avalpdf *.pdf

# Specify a directory to scan
avalpdf /path/to/documents/

# Mix of patterns and specific files
avalpdf annual_report.pdf "monthly/*.pdf" project_docs/specs.pdf

When processing multiple files, avalpdf automatically uses parallel processing to take advantage of multi-core systems, significantly improving performance for large batches of documents.

When using wildcards on Unix/Linux shells, you may need to quote the pattern if you want avalpdf to handle the expansion rather than the shell.

Multi-file Output

When analyzing multiple files, avalpdf displays a concise progress view:

[1/5] ✅ document1.pdf: 0 issues, 2 warnings
[2/5] ❌ document2.pdf: 3 issues, 5 warnings
[3/5] ⚠️ document3.pdf: Error - Failed to open PDF
[4/5] ✅ document4.pdf: 0 issues, 0 warnings
[5/5] ❌ document5.pdf: 2 issues, 1 warnings

📊 Batch Processing Summary:
  • Total files processed: 5
  • Files with issues: 2
  • Total issues: 5
  • Total warnings: 8
  • Average accessibility score: 82.5%

✨ Batch processing complete!

By default, a consolidated batch report is saved when processing multiple files. This JSON file contains:

Analysis results for each file
Metadata and accessibility score for each file
Aggregated statistics across all files
Timestamp of the analysis

To specify the output location for the batch report, you have multiple options:

# Specify output directory (report will have a timestamp-based name)
avalpdf *.pdf -o /path/to/output/

# Specify exact filename (including path)
avalpdf *.pdf -o /path/to/output/report.json

# Alternative: specify output directory and custom filename
avalpdf *.pdf -o /path/to/output --batch-report=my_report.json

When -o points to a file ending with .json, it will be used as the exact batch report path. Otherwise, it's treated as a directory.

Analyzing Batch Reports

The batch report JSON file can be analyzed with command-line tools to extract useful information. For example, you can convert the batch report to CSV format for analysis in spreadsheet software:

avalpdf_batch_report_20250323_012754.json jq '.files[] | {filename, poducer: .metadata.producer, creator: .metadata.creator, standard: .metadata.standard, n_issues: .issues_count, n_warnings: .warnings_count, accessibility_score}' | mlr --j2c cat | vd

This command uses:

jq to extract specific fields from each file entry
miller (mlr) to convert JSON to CSV
visidata (vd) to view and analyze the data interactively

You can modify the jq query to extract different fields based on your analysis needs.

Common Multi-file Scenarios

# Analyze all PDFs in a directory, save individual reports
avalpdf "reports/*.pdf" --report

# Analyze multiple files silently and save batch report
avalpdf file1.pdf file2.pdf file3.pdf --quiet

# Process files in different directories
avalpdf "team1/*.pdf" "team2/*.pdf" "shared/*.pdf"

# Analyze all PDFs in a directory and subdirectories
# (use find in Unix/Linux or dir /s in Windows to collect paths)
find . -name "*.pdf" | xargs avalpdf

Command Line Options

--full: Save full JSON structure
--simple: Save simplified JSON structure
--report: Save validation report
--batch-report[=FILENAME]: Save consolidated batch report when processing multiple files. Optionally specify filename
--output-dir, -o: Specify output directory
--show-structure: Display document structure
--show-validation: Display validation results
--quiet, -q: Suppress console output
--rich: Use enhanced visual formatting for document structure
--tree: Use tree view instead of panel view with Rich formatting
--version, -v: Display the version number and exit

Examples

Quick accessibility check:

avalpdf thesis.pdf

Generate all reports:

avalpdf report.pdf --full --simple --report -o ./analysis

Silent operation with report generation:

avalpdf document.pdf --report -q

Analyze multiple files:

avalpdf *.pdf

Analyze directory:

avalpdf documents/

Process specific file pattern and save reports in output directory:

avalpdf "invoices/2023_*.pdf" -o validation_results --report

Quiet batch processing:

avalpdf *.pdf --quiet --batch-report -o reports

Batch Report Format

The consolidated batch report is saved as a JSON file with this structure:

{
  "timestamp": "2023-05-20T14:30:45.123456",
  "formatted_date": "2023-05-20 14:30:45",
  "summary": {
    "total_files": 3,
    "files_with_issues": 1,
    "total_issues": 3,
    "total_warnings": 7,
    "average_accessibility_score": 70.25,
    "successful_files": 2,
    "failed_files": 1
  },
  "files": [
    {
      "filename": "document1.pdf",
      "path": "/path/to/document1.pdf",
      "index": 1,
      "metadata": {
        "title": "Sample Document",
        "tagged": "true",
        "lang": "it",
        "num_pages": "10"
      },
      "issues_count": 0,
      "warnings_count": 2,
      "accessibility_score": 95.5,
      "success": true,
      "has_issues": false
    },
    {
      "filename": "document2.pdf",
      "path": "/path/to/document2.pdf",
      "index": 2,
      "metadata": {
        "title": "Another Document",
        "tagged": "false",
        "lang": "",
        "num_pages": "5"
      },
      "issues_count": 3,
      "warnings_count": 5,
      "accessibility_score": 45.0,
      "success": true,
      "has_issues": true
    },
    {
      "filename": "document3.pdf",
      "path": "/path/to/document3.pdf",
      "index": 3,
      "success": false,
      "error": "Failed to open PDF",
      "issues_count": 0,
      "warnings_count": 0,
      "accessibility_score": 0.0
    }
  ]
}

This structured format makes it easy to:

Sort files by name, accessibility score, or issues count
Filter files with issues or errors
Process results using data analysis tools
Generate custom reports from the consolidated data

Validation Output

The tool provides three types of findings:

✅ Successes: Correctly implemented accessibility features
⚠️ Warnings: Potential issues that need attention
❌ Issues: Problems that must be fixed

Report Format

{
  "validation_results": {
    "issues": ["..."],
    "warnings": ["..."],
    "successes": ["..."]
  }
}

License

MIT License

Support

For issues or suggestions:

Open an issue on GitHub
Provide the PDF file (if possible) and the complete error message
Include the command you used and your operating system information

Local development

uv venv .test
source .test/bin/activate
uv pip install -e . --upgrade

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
avalpdf		avalpdf
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

avalpdf - PDF Accessibility Validator

Features

Installation

Updates

Usage

Quick start

Details

Multi-file Analysis

Multi-file Output

Analyzing Batch Reports

Common Multi-file Scenarios

Command Line Options

Examples

Batch Report Format

Validation Output

License

Support

Local development

About

Uh oh!

Releases

Packages

Uh oh!

Languages

dennisangemi/avalpdf

Folders and files

Latest commit

History

Repository files navigation

avalpdf - PDF Accessibility Validator

Features

Installation

Updates

Usage

Quick start

Details

Multi-file Analysis

Multi-file Output

Analyzing Batch Reports

Common Multi-file Scenarios

Command Line Options

Examples

Batch Report Format

Validation Output

License

Support

Local development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages