+
Skip to content

Python tool that compares two directory trees and generates a third containing only the differences with line-by-line annotations respecting each file type's comment syntax.

Notifications You must be signed in to change notification settings

ThalesMMS/BigDiff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

BigDiff

BigDiff is a cross-platform Python tool to compare two directory trees and generate a third one containing only the differences, in a human-readable and auditable way.

It is useful when you need to track new files, deletions, modifications, or even line-by-line changes with annotations that respect the comment syntax of each file type.


✨ Features

  • Compares folder1 (base) and folder2 (target), producing folder3 (output).
  • Unchanged files → omitted from the output.
  • New files → copied with .new suffix.
  • Deleted files → copied with .deleted suffix.
  • Deleted directories → appear with .deleted suffix, including their contents.
  • Modified files:
    • Copied with .modified suffix.
    • Line-level diff is embedded directly in the file:
      • Deleted lines → commented with DELETED.
      • Added lines → preserved with NEW annotation.
      • Unchanged lines remain unchanged for context.
    • Comment syntax matches file type (#, //, /* */, <!-- -->, etc).
  • Binary or oversized files:
    • Copied directly with .modified suffix.
    • An extra .NOTE.txt file explains that line-level diff was skipped.

📦 Installation

Clone this repository or download the script directly:

git clone https://github.com/your-username/bigdiff.git
cd bigdiff
chmod +x bigdiff.py

Or just grab bigdiff.py and run it with Python 3.8+.


🚀 Usage

python bigdiff.py FOLDER1 FOLDER2 FOLDER3 [options]

Examples

# Basic comparison
python bigdiff.py ./before ./after ./diff_out

# Normalize line endings and ignore temporary files
python bigdiff.py ./a ./b ./out --normalize-eol --ignore ".git,__pycache__,*.log"

# Dry-run (does not write, only shows the plan)
python bigdiff.py ./a ./b ./out --dry-run

⚙️ Options

  • --ignore, -i : glob patterns to ignore (repeatable or comma-separated).
  • --normalize-eol, -E : normalize CRLF/LF before comparing text.
  • --max-text-size, -S : maximum size for text diff (default 5MB).
  • --dry-run : only show what would be done, no output written.

📝 Example Output

If example.py was modified:

# DELETED: print("Hello World")
print("New line added")  # NEW

If config.ini was removed:

config.ini.deleted

If notes.txt was created:

notes.txt.new

🔍 Internal Strategy

  • Comparison by SHA-256 hash (fast and safe).
  • Text diff via difflib.ndiff.
  • Simple heuristics to detect binary files.
  • Collision avoidance in output (creates file (1).modified, etc).
  • Ensures output folder is never inside input folders.

🛠️ Future Improvements

  • JSON report with statistics.
  • Parallel processing for large datasets.
  • Rename detection (delete+newrename).
  • Plugins for binary formats (e.g., DICOM, medical imaging).

📄 License

MIT – free to use and modify.


About

Python tool that compares two directory trees and generates a third containing only the differences with line-by-line annotations respecting each file type's comment syntax.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载