BigDiff is a cross-platform Python tool to compare two directory trees and generate a third one containing only the differences, in a human-readable and auditable way.
It is useful when you need to track new files, deletions, modifications, or even line-by-line changes with annotations that respect the comment syntax of each file type.
- Compares folder1 (base) and folder2 (target), producing folder3 (output).
- Unchanged files → omitted from the output.
- New files → copied with
.new
suffix. - Deleted files → copied with
.deleted
suffix. - Deleted directories → appear with
.deleted
suffix, including their contents. - Modified files:
- Copied with
.modified
suffix. - Line-level diff is embedded directly in the file:
- Deleted lines → commented with
DELETED
. - Added lines → preserved with
NEW
annotation. - Unchanged lines remain unchanged for context.
- Deleted lines → commented with
- Comment syntax matches file type (
#
,//
,/* */
,<!-- -->
, etc).
- Copied with
- Binary or oversized files:
- Copied directly with
.modified
suffix. - An extra
.NOTE.txt
file explains that line-level diff was skipped.
- Copied directly with
Clone this repository or download the script directly:
git clone https://github.com/your-username/bigdiff.git
cd bigdiff
chmod +x bigdiff.py
Or just grab bigdiff.py and run it with Python 3.8+.
python bigdiff.py FOLDER1 FOLDER2 FOLDER3 [options]
# Basic comparison
python bigdiff.py ./before ./after ./diff_out
# Normalize line endings and ignore temporary files
python bigdiff.py ./a ./b ./out --normalize-eol --ignore ".git,__pycache__,*.log"
# Dry-run (does not write, only shows the plan)
python bigdiff.py ./a ./b ./out --dry-run
--ignore, -i
: glob patterns to ignore (repeatable or comma-separated).--normalize-eol, -E
: normalize CRLF/LF before comparing text.--max-text-size, -S
: maximum size for text diff (default5MB
).--dry-run
: only show what would be done, no output written.
If example.py
was modified:
# DELETED: print("Hello World")
print("New line added") # NEW
If config.ini
was removed:
config.ini.deleted
If notes.txt
was created:
notes.txt.new
- Comparison by SHA-256 hash (fast and safe).
- Text diff via
difflib.ndiff
. - Simple heuristics to detect binary files.
- Collision avoidance in output (creates
file (1).modified
, etc). - Ensures output folder is never inside input folders.
- JSON report with statistics.
- Parallel processing for large datasets.
- Rename detection (
delete+new
→rename
). - Plugins for binary formats (e.g., DICOM, medical imaging).
MIT – free to use and modify.