MEDRAG - Medical RAG Application

A comprehensive Medical Retrieval-Augmented Generation (RAG) application that allows users to upload medical PDF documents and query them using natural language through an AI-powered interface.

Features

PDF Processing: Extract and process medical documents with intelligent text cleaning
Vector Database: Store document embeddings using ChromaDB for efficient similarity search
Medical Context Awareness: Specialized prompt engineering for medical document queries
Interactive Web Interface: User-friendly Streamlit interface for document upload and querying
Source Attribution: Always cite the source documents when providing answers
Medical Term Normalization: Automatically expand common medical abbreviations
Advanced Search: Configurable similarity thresholds and retrieval parameters
Batch Processing: Support for multiple document uploads
System Monitoring: Comprehensive logging and system statistics

Quick Start

Prerequisites

Python 3.12 or higher
OpenAI API key
Medical PDF documents to query

Installation

Clone and navigate to the project:
```
cd /path/to/your/project
```
Install dependencies:
```
pip install -e .
```

Set up environment variables: Create a .env file in the project root:

OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-3.5-turbo
OPENAI_MAX_TOKENS=1000
OPENAI_TEMPERATURE=0.7

Run the application:
```
streamlit run main.py
```
Open your browser and navigate to http://localhost:8501

Usage

1. Upload Medical Documents

Click "Browse files" to select one or more PDF documents
Supported format: PDF files
Documents are automatically processed and added to the vector database

2. Ask Questions

Type your medical question in the text area
Examples:
- "What are the symptoms of diabetes?"
- "What medications are mentioned in the documents?"
- "What are the treatment options for hypertension?"

3. View Results

AI Response: The main answer based on your documents
Sources: Expandable sections showing the source documents and relevance scores
Advanced Options: Adjust retrieval parameters for better results

4. System Information

Sidebar: Shows document count and processed files
Settings: Clear all documents or view system statistics

Configuration

Environment Variables

Variable	Default	Description
`OPENAI_API_KEY`	Required	Your OpenAI API key
`OPENAI_MODEL`	`gpt-3.5-turbo`	OpenAI model to use
`OPENAI_MAX_TOKENS`	`1000`	Maximum tokens for responses
`OPENAI_TEMPERATURE`	`0.7`	Response creativity (0-1)
`VECTOR_STORE_PATH`	`./vector_store`	Path for ChromaDB storage
`EMBEDDING_MODEL`	`sentence-transformers/all-MiniLM-L6-v2`	Embedding model
`CHUNK_SIZE`	`1000`	Text chunk size for processing
`CHUNK_OVERLAP`	`200`	Overlap between chunks
`TOP_K_RETRIEVAL`	`5`	Number of documents to retrieve
`SIMILARITY_THRESHOLD`	`0.7`	Minimum similarity score

Advanced Configuration

You can modify the configuration in medrag/utils/config.py or set environment variables for runtime configuration.

Architecture

MEDRAG Application
├── main.py                 # Main Streamlit application
├── medrag/
│   ├── core/
│   │   ├── pdf_processor.py    # PDF text extraction and chunking
│   │   ├── vector_store.py     # ChromaDB vector database management
│   │   ├── llm_client.py       # OpenAI API integration
│   │   └── rag_system.py       # RAG pipeline orchestration
│   └── utils/
│       └── config.py           # Configuration management
└── vector_store/          # ChromaDB data storage

Core Components

PDFProcessor: Handles PDF text extraction, cleaning, and chunking
VectorStore: Manages document embeddings and similarity search
LLMClient: Communicates with OpenAI for response generation
RAGSystem: Orchestrates the complete RAG pipeline

Testing

System Test

The application includes built-in system testing:

from medrag.core.rag_system import RAGSystem

# Test the complete system
test_results = rag_system.test_system()
print(test_results)

Manual Testing

Upload a medical PDF document
Ask a question about the document content
Verify the response cites the source document
Check that the answer is based on the uploaded content

Security & Privacy

Local Processing: All document processing happens locally
No Data Sharing: Documents are not sent to external services (except OpenAI for query processing)
Secure Storage: Vector database is stored locally
API Key Protection: Use environment variables for sensitive configuration

Important Notes

Medical Disclaimer

This application is for educational and research purposes only. It should not be used for medical diagnosis or treatment decisions. Always consult qualified healthcare professionals for medical advice.

Limitations

The quality of responses depends on the uploaded documents
Complex medical queries may require multiple documents for comprehensive answers
The system works best with well-structured medical documents
Large PDF files may take longer to process

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

If you encounter any issues:

Check the console logs for error messages
Verify your OpenAI API key is valid
Ensure your PDF documents are not password-protected
Check that you have sufficient disk space for the vector database

Updates

v0.1.0: Initial release with basic RAG functionality
Future versions will include:
- Support for more document formats
- Advanced medical term recognition
- Multi-language support
- Enhanced UI features

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
medrag		medrag
.gitignore		.gitignore
QUICKSTART.md		QUICKSTART.md
README.md		README.md
Untitled.ipynb		Untitled.ipynb
cli.py		cli.py
demo.py		demo.py
env.example		env.example
main.py		main.py
pyproject.toml		pyproject.toml
test_medrag.py		test_medrag.py
test_with_sample.py		test_with_sample.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MEDRAG - Medical RAG Application

Features

Quick Start

Prerequisites

Installation

Usage

1. Upload Medical Documents

2. Ask Questions

3. View Results

4. System Information

Configuration

Environment Variables

Advanced Configuration

Architecture

Core Components

Testing

System Test

Manual Testing

Security & Privacy

Important Notes

Medical Disclaimer

Limitations

Contributing

License

Support

Updates

About

Uh oh!

Releases

Packages

Languages

Doberman12/MedRAG

Folders and files

Latest commit

History

Repository files navigation

MEDRAG - Medical RAG Application

Features

Quick Start

Prerequisites

Installation

Usage

1. Upload Medical Documents

2. Ask Questions

3. View Results

4. System Information

Configuration

Environment Variables

Advanced Configuration

Architecture

Core Components

Testing

System Test

Manual Testing

Security & Privacy

Important Notes

Medical Disclaimer

Limitations

Contributing

License

Support

Updates

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages