An AI-powered file analysis tool that converts various document formats into clear, concise summaries using OpenAI.
- 📄 Supports multiple file formats (PDF, DOCX, PPTX, Images, etc.)
- 🤖 AI-powered document summarization
- 🖼️ Intelligent image analysis and text extraction
- 📊 Text extraction from various formats
- 📱 Mobile-friendly interface
- 🔄 Conversion history tracking
- 🔒 Secure file handling
- ⚡ Fast processing
- Python 3.8 or higher
- OpenAI API key
- Docker (optional, for containerized deployment)
- Clone the repository:
git clone https://github.com/timtech4u/ai-file-analyzer.git
cd file-analyzer
- Create and activate a virtual environment (recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
Create a
.env
file in the project root:
OPENAI_API_KEY=your-api-key-here
DEBUG=False
MAX_FILE_SIZE=10 # in MB
- Start the Streamlit app:
streamlit run app.py
- Visit
http://localhost:8501
in your web browser.
The easiest way to run File Analyzer is using our pre-built Docker image:
docker pull ghcr.io/timtech4u/ai-file-analyzer:latest
docker run -p 8501:8501 -e OPENAI_API_KEY=your-api-key ghcr.io/timtech4u/ai-file-analyzer:latest
Available tags:
latest
: Latest stable releasevX.Y.Z
: Specific version releasesmain
: Latest development build
If you prefer to build the image yourself:
- Build the Docker image:
docker build -t file-analyzer .
- Run the container:
docker run -p 8501:8501 -e OPENAI_API_KEY=your-api-key file-analyzer
The application processes files through several stages:
- Upload: Files are temporarily stored and validated
- Processing: Content is extracted based on file type
- Analysis: AI generates summaries and extracts key information
- Response: Results are displayed and stored in history
Type | Extensions | Features |
---|---|---|
Documents | PDF, DOCX, PPTX, XLSX | Full text extraction, summarization |
Images | JPG, PNG | OCR, content description |
Audio | MP3, WAV | Transcription (coming soon) |
Web/Data | HTML, CSV, JSON, XML | Structure preservation |
Archives | ZIP | Content listing |
file-analyzer/
├── app.py # Main application
├── requirements.txt # Dependencies
├── Dockerfile # Container configuration
├── .env # Environment variables
└── tests/ # Test suite
pytest tests/
This project follows PEP 8 guidelines. Run linting with:
flake8 .
The application implements comprehensive error handling:
- File validation errors
- API connection issues
- Processing timeouts
- Size limit exceeded
- Unsupported file types
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
- Add tests for new features
- Update documentation
- Follow the existing code style
- Add type hints to new functions
- Include error handling
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with Streamlit
- Powered by OpenAI
- Uses MarkItDown for file conversion
For support, please open an issue in the GitHub repository.