A modernized implementation of the DIRT algorithm (Lin and Pantel, 2001) with modifications for action recognition from natural language text.
DIRTAR implements a modified version of the DIRT (Discovery of Inference Rules from Text) algorithm specifically designed for action recognition. The system processes movie scripts and other narrative text to discover semantic relationships and inference rules for action classification.
- Modified DIRT Algorithm: Enhanced with lemmatization, constituency parsing, slot similarity, slot types, hypernyms, and semantic discrimination
- Action Recognition: Specialized for recognizing and classifying actions in narrative text
- Movie Script Processing: Optimized for processing movie script corpora
- Semantic Parsing: Frame-net style rules for discriminating candidate nouns from slots
- Modern Python: Updated for Python 3.8+ with modern dependencies
- Python 3.8 or higher
- Virtual environment (recommended)
- Clone the repository:
git clone <repository-url>
cd DIRTAR
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Download required NLTK data:
python -c "import nltk; nltk.download('wordnet')"
- Install the package in development mode:
pip install -e .
DIRTAR/
├── src/dirtar/ # Main package source code
│ ├── __init__.py # Package initialization
│ ├── dirtar.py # Core DIRT algorithm implementation
│ ├── sentence_parser.py # Sentence and clause parsing
│ ├── semantic_parser.py # Semantic parsing and frame-net rules
│ ├── sentence_splitter.py # Text preprocessing utilities
│ ├── moviescript_crawler.py # Movie corpus collection
│ ├── assign_labels_*.py # Label assignment modules
│ ├── score_labels_*.py # Evaluation modules
│ └── run_dirtar_tests.py # Test runner
├── data/ # Data directories
│ ├── experimental_labels/ # Experimental condition outputs
│ ├── scored_labels/ # Evaluation results
│ └── redo_labels_420/ # Additional label data
├── tests/ # Unit tests
├── docs/ # Documentation
├── requirements.txt # Python dependencies
├── setup.py # Package setup configuration
└── README.md # This file
The main implementation of the modified DIRT algorithm with experimental conditions:
- Lemma-based processing
- Constituency parse integration
- Slot similarity calculations
- Semantic type discrimination
- Hypernym-based generalization
- Parses movie scripts into sentences and clauses
- Uses constituency parsing for clause extraction
- Outputs structured clause triples
- Hand-written frame-net style rules
- Discriminates candidate nouns from slots
- Supports experimental semantic conditions
- Movie Corpus Collection:
moviescript_crawler.py
- Sentence Splitting:
sentence_splitter.py
- Label Assignment:
assign_labels_*.py
- Evaluation:
score_labels_*.py
from dirtar import dirtar
# Load and process corpus
database = dirtar.readCorpus('movie_clauses.txt')
# Run DIRT algorithm with experimental conditions
results = dirtar.run_experiments(database)
# Process movie scripts
python src/dirtar/moviescript_crawler.py
# Parse sentences into clauses
python src/dirtar/sentence_parser.py
# Run DIRT algorithm
python src/dirtar/dirtar.py
# Assign labels for evaluation
python src/dirtar/assign_labels_moviedirt.py
# Score results
python src/dirtar/score_labels_dirtar.py
- IE_sent_key.txt: Test sentences from DUEL corpus with action class labels
- movie_combo.txt: Combined movie script corpus (not included due to size)
- movie_clauses.txt: Preprocessed clause triples with parse annotations
- experimental_labels/: Text files for each experimental condition
- scored_labels/: F-score evaluations per experimental condition
- dirtar_database_*.pkl: Serialized DIRT databases
The system supports multiple experimental conditions:
- Baseline DIRT: Standard algorithm
- Lemma Integration: Using lemmatized forms
- Slot Similarity: Enhanced slot matching
- Semantic Types: Type-based discrimination
- Hypernym Generalization: WordNet-based generalization
The system evaluates action recognition performance using:
- F-score calculation for each action class
- Overall performance across all experimental conditions
- Per-action analysis for detailed evaluation
Results are saved in the scored_labels/
directory with detailed breakdowns.
- nltk>=3.8: Natural language processing
- pycorenlp>=0.3.0: Stanford CoreNLP integration
- setuptools>=65.0: Package management
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this work, please cite:
@misc{winer2017dirtar,
title={DIRTAR: Discovery of Inference Rules from Text for Action Recognition},
author={Winer, David},
year={2017},
note={Modernized implementation 2024}
}
Based on the DIRT algorithm:
- Lin, D., & Pantel, P. (2001). DIRT - Discovery of Inference Rules from Text. ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
For questions or issues, please contact David Winer
- Modernized for Python 3.8+
- Reorganized project structure
- Updated dependencies
- Added proper packaging
- Enhanced documentation
- Fixed compatibility issues
- Original implementation
- Core DIRT algorithm
- Movie script processing
- Action recognition evaluation