AI-powered PDF data extraction tool with visual verification and confidence
Precision PDF is an open-source document processing platform that extracts structured data from PDFs while showing you exactly where every piece of data comes from. Built with Next.js 15, Convex, and Clerk authentication.
- 🔍 Visual Data Verification - See exactly where extracted data comes from in the original PDF
- ⚡ Real-time Processing - Live updates as documents are processed
- 📊 Smart Table Recognition - Automatic table detection and CSV export
- 📄 Multiple Export Formats - JSON, CSV, DOCX, Markdown, Text, XLSX
- 🏥 Document Type Support - Invoices, medical records, bank statements, forms
- 📱 Multi-page Documents - Handle complex documents with multiple pages
- 🔌 API Access - Full REST API for developers
- 🎯 Interactive Demo - Try 8 real examples without signing up
This repository is currently configured for easy local development with ALL AUTHENTICATION AND SECURITY FEATURES DISABLED.
For production deployment, you MUST:
- Re-enable authentication in
middleware.ts
- Configure all environment variables properly
- Follow the Security Configuration Guide
See Security Documentation for complete details.
- Node.js (Latest LTS recommended)
- pnpm package manager
- Convex CLI (
npm install -g convex
)
# Clone the repository
git clone https://github.com/yourusername/precision-pdf.git
cd precision-pdf
# Install dependencies
pnpm install
# Set up environment variables
cp .env.example .env.local
# Initialize Convex (creates a new deployment)
npx convex dev
# Start the development server
pnpm run dev
Your app will be running at http://localhost:3000
Note: The FastAPI processing service is optional for local development. Example documents work without it.
graph TD
A[Next.js Frontend] --> B[Convex Backend]
A --> C[API Routes]
C --> D[FastAPI Service]
D --> E[Landing AI]
B --> F[Document Storage]
B --> G[User Management]
A --> H[Clerk Auth - DISABLED]
C --> I[Stripe Payments]
Core Components:
- Frontend: Next.js 15 with App Router and Tailwind CSS
- Backend: Convex for real-time database and serverless functions
- Authentication: Clerk (currently disabled for local development)
- Processing: External FastAPI service with Landing AI
- Payments: Stripe integration
- UI Components: shadcn/ui component library
Topic | Description | Link |
---|---|---|
Getting Started | Complete setup guide | 📖 Getting Started |
Security Config | 🔐 Security Guide | |
Architecture | System design & diagrams | 🏗 Architecture |
API Reference | All endpoints & examples | 📡 API Docs |
Components | UI components & styling | 🎨 Components |
Testing | Writing & running tests | 🧪 Testing |
Deployment | Production deployment | 🚀 Deployment |
Topic | Description | Link |
---|---|---|
Getting Started | How to use the app | 👤 User Guide |
Uploading Documents | PDF upload process | 📄 Upload Guide |
Export Formats | Available export options | 💾 Export Guide |
Troubleshooting | Common issues | 🔧 Troubleshooting |
Resource | Description | Link |
---|---|---|
curl Examples | Command-line usage | 💻 curl Examples |
JavaScript SDK | JS/TS integration | ⚛️ JavaScript |
Python Examples | Python integration | 🐍 Python |
# Start development servers (both frontend and backend)
pnpm run dev
# Run only frontend (Next.js)
pnpm run dev:frontend
# Run only backend (Convex)
pnpm run dev:backend
# Build for production
pnpm run build
# Run tests
pnpm run test # Unit tests with Vitest
pnpm run pw:test # E2E tests with Playwright
pnpm run pw:test:ui # Playwright UI mode
# Linting and formatting
pnpm run lint
Copy .env.example
to .env.local
and configure:
# Core Services (Required)
NEXT_PUBLIC_CONVEX_URL="https://your-deployment.convex.cloud"
NEXT_PUBLIC_APP_URL="http://localhost:3000"
# Authentication (Clerk) - Currently disabled
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY="pk_test_your-clerk-key"
CLERK_SECRET_KEY="sk_test_your-clerk-secret"
# Document Processing (Optional for local dev)
FAST_API_URL="http://localhost:8000"
FAST_API_SECRET_KEY="your-secret-key"
# Payments (Stripe) - Optional for local dev
STRIPE_PUBLISHABLE_KEY="pk_test_your-stripe-key"
STRIPE_SECRET_KEY="sk_test_your-stripe-secret"
See Environment Variables Guide for complete reference.
-
Convex - Backend database and serverless functions
- Sign up at convex.dev
- Free tier available
-
FastAPI Service (Optional for local development)
- Repository: precision_pdf_fast_api
- Handles PDF processing with Landing AI
- Can run locally or deploy to Render
- Clerk - Authentication (currently disabled)
- Stripe - Payment processing
- Landing AI - Document processing AI
- Sentry - Error monitoring
The project includes comprehensive testing infrastructure:
# Unit Tests (Vitest)
pnpm run test # Run once
pnpm run test:watch # Watch mode
pnpm run test:ui # UI interface
# E2E Tests (Playwright)
pnpm run pw:test # Headless
pnpm run pw:test:ui # UI mode
pnpm run pw:test:debug # Debug mode
Currently no tests are implemented, but infrastructure is ready. See Testing Guide.
- Next.js 15 - React framework with App Router
- React 19 - UI library
- Tailwind CSS - Utility-first styling
- shadcn/ui - Component library
- TypeScript - Type safety
- Convex - Real-time database and serverless functions
- Clerk - Authentication (currently disabled)
- Stripe - Payment processing
- FastAPI - Document processing service
- Landing AI - AI-powered document extraction
- Vercel - Frontend hosting
- Render - FastAPI hosting
- Sentry - Error monitoring
- PostHog - Analytics
The app includes 8 pre-processed example documents:
- 📧 Invoice
- 🏦 Bank Statements (2)
- 🏥 Medical Reports (2)
- 📑 Medical Journal Article
- 🏠 Mortgage Application
- 📋 Settlement Statement
Examples are stored in /public/examples/
and can be explored without authentication.
We welcome contributions! Please see our Contributing Guide for details on:
- Code style and standards
- Development workflow
- Pull request process
- Issue reporting
# Fork the repo and clone your fork
git clone https://github.com/yourusername/precision-pdf.git
# Create a feature branch
git checkout -b feature/your-feature-name
# Make your changes and test
pnpm run test
pnpm run lint
# Submit a pull request
"User not authenticated" errors in development:
- This is expected since authentication is disabled
- Check the security configuration guide
Documents not processing:
- Ensure FastAPI service is running
- Check environment variable configuration
- See Troubleshooting Guide
Build errors:
- Ensure you're using the latest Node.js LTS
- Delete
node_modules
and runpnpm install
- ✅ Core document processing
- ✅ Visual verification interface
- ✅ Multiple export formats
- ✅ Real-time processing updates
⚠️ Authentication (disabled for local dev)⚠️ Testing (infrastructure ready)- 🔄 Documentation (in progress)
This project is open source. License details coming soon.
- Documentation: Browse
/docs
folder - Issues: GitHub Issues
- Discussions: GitHub Discussions
For questions about this project:
- GitHub: @robertguss
- Website: precisionpdf.com
⭐ Star this repository if you find it useful! ⭐