A comprehensive system for evaluating and comparing different information retrieval methods on product search datasets, specifically designed for Amazon ESCI (Shopping Queries Dataset).
This package compares three search approaches:
- BM25: Traditional keyword-based search
- SBERT: Dense vector search using sentence transformers
- Hybrid: Combination of BM25 + SBERT using Reciprocal Rank Fusion
- Python 3.8+
- Elasticsearch 8.0+ running on localhost:9200
- Your data files:
- Product data in JSONL format
- Queries in ESCI JSON format
# Clone or download this package
cd ir_evaluation_package
# Install dependencies
pip install -r requirements.txt
# Download NLTK data
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
# Download spaCy model (optional)
python -m spacy download en_core_web_sm
# Start Elasticsearch (adjust for your installation)
# Default: localhost:9200
# Verify it's running
curl -X GET "localhost:9200/"
Update file paths in main_pipeline.py
:
# Update these paths
jsonl_file = "data/your_products.jsonl" # Product data
queries_file = "data/your_queries.json" # ESCI queries
python main_pipeline.py
ir_evaluation_package/
├── main_pipeline.py # Main execution script
├── ir_setup.py # Data preprocessing & Elasticsearch setup
├── search_engines.py # Search method implementations
├── evaluation.py # Run generation & evaluation
├── advanced_analysis.py # Advanced analysis & visualizations
├── requirements.txt # Python dependencies
├── README.md # This file
├── example_config.py # Configuration examples
├── data/ # Your data files
│ ├── products.jsonl
│ └── queries.json
└── results/ # Generated results
├── summary_report.txt
├── comparison_table.csv
└── *.png
{"id": "B0013REMTE", "title": "Product Title", "description": "Product description...", "full_text": "Complete product text..."}
[
{
"query": "search query text",
"query_id": 1,
"product_asins": ["B0013REMTE"],
"esci_labels": ["E"],
"product_locales": ["us"]
}
]
- E (Exact): 3 relevance points
- S (Substitute): 2 relevance points
- C (Complement): 1 relevance point
- I (Irrelevant): 0 relevance points
summary_report.txt
: Executive summary with key findingscomparison_table.csv
: Side-by-side performance metricsevaluation_results.csv
: Detailed metrics for all methodsper_query_results.csv
: Per-query performance breakdownperformance_comparison.png
: Visualization chartsrun_*.txt
: TREC format runs for each methodsignificance_tests.json
: Statistical significance results
Edit main_pipeline.py
to adjust:
- File paths
- Elasticsearch connection
- Processing parameters
- Search parameters
Create config.py
:
config = {
'max_products': 10000, # Limit for testing
'sentence_model': 'all-MiniLM-L6-v2',
'use_stemming': True,
'remove_stopwords': True,
'top_k': 1000,
'embedding_batch_size': 32
}
The system evaluates using standard IR metrics:
- MAP: Mean Average Precision
- nDCG@k: Normalized Discounted Cumulative Gain
- P@k: Precision at k
- R@k: Recall at k
- MRR: Mean Reciprocal Rank
- Success@k: Success rate at k
Automatically generated visualizations include:
- Performance comparison charts
- Query difficulty analysis
- System agreement heatmaps
- Per-query performance distributions
- ESCI label breakdown analysis
# Setup only
python ir_setup.py
# Test search engines
python search_engines.py
# Generate runs and evaluate
python evaluation.py
# Advanced analysis
python advanced_analysis.py
from ir_setup import setup_ir_system
from search_engines import SearchEngineManager
from evaluation import QrelsManager, EvaluationManager
# Setup your custom analysis
es_manager, preprocessor, processor = setup_ir_system("data/products.jsonl")
search_manager = SearchEngineManager(es_manager.es)
# Run custom searches
results = search_manager.search_all("your query", top_k=100)
-
Elasticsearch Connection Error
# Check if ES is running curl -X GET "localhost:9200/" # Check logs tail -f /path/to/elasticsearch/logs/elasticsearch.log
-
Memory Issues
# Reduce batch sizes in config config['max_products'] = 1000 config['embedding_batch_size'] = 16
-
NLTK Download Issues
import nltk nltk.download('punkt') nltk.download('stopwords')
- Start with smaller datasets (
max_products=1000
) - Use faster embedding models for testing
- Increase batch sizes if you have more RAM
- Use SSD storage for Elasticsearch
Core libraries:
elasticsearch
: Search engine interfacesentence-transformers
: Dense vector embeddingsir-measures
: Standard IR evaluation metricspandas/numpy
: Data processingmatplotlib/seaborn
: Visualizationsscipy
: Statistical testingnltk
: Text preprocessing
To extend the system:
- Add new search methods in
search_engines.py
- Add new metrics in
evaluation.py
- Add new visualizations in
advanced_analysis.py
- Update the main pipeline accordingly
This package is provided for research and educational purposes.
For issues:
- Check the logs in
ir_evaluation.log
- Verify your data format matches the expected input
- Ensure all dependencies are installed correctly
- Check Elasticsearch is running and accessible
Happy Information Retrieval! 🔍📊