+
Skip to content

robertguss/convex-precision-pdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Precision PDF

Precision PDF Logo

AI-powered PDF data extraction tool with visual verification and confidence

Precision PDF is an open-source document processing platform that extracts structured data from PDFs while showing you exactly where every piece of data comes from. Built with Next.js 15, Convex, and Clerk authentication.

✨ Key Features

  • 🔍 Visual Data Verification - See exactly where extracted data comes from in the original PDF
  • Real-time Processing - Live updates as documents are processed
  • 📊 Smart Table Recognition - Automatic table detection and CSV export
  • 📄 Multiple Export Formats - JSON, CSV, DOCX, Markdown, Text, XLSX
  • 🏥 Document Type Support - Invoices, medical records, bank statements, forms
  • 📱 Multi-page Documents - Handle complex documents with multiple pages
  • 🔌 API Access - Full REST API for developers
  • 🎯 Interactive Demo - Try 8 real examples without signing up

🚨 Security Notice (Important for Developers)

This repository is currently configured for easy local development with ALL AUTHENTICATION AND SECURITY FEATURES DISABLED.

For production deployment, you MUST:

See Security Documentation for complete details.

🚀 Quick Start

Prerequisites

  • Node.js (Latest LTS recommended)
  • pnpm package manager
  • Convex CLI (npm install -g convex)

5-Minute Setup

# Clone the repository
git clone https://github.com/yourusername/precision-pdf.git
cd precision-pdf

# Install dependencies
pnpm install

# Set up environment variables
cp .env.example .env.local

# Initialize Convex (creates a new deployment)
npx convex dev

# Start the development server
pnpm run dev

Your app will be running at http://localhost:3000

Note: The FastAPI processing service is optional for local development. Example documents work without it.

🏗 Architecture Overview

graph TD
    A[Next.js Frontend] --> B[Convex Backend]
    A --> C[API Routes]
    C --> D[FastAPI Service]
    D --> E[Landing AI]
    B --> F[Document Storage]
    B --> G[User Management]
    A --> H[Clerk Auth - DISABLED]
    C --> I[Stripe Payments]
Loading

Core Components:

  • Frontend: Next.js 15 with App Router and Tailwind CSS
  • Backend: Convex for real-time database and serverless functions
  • Authentication: Clerk (currently disabled for local development)
  • Processing: External FastAPI service with Landing AI
  • Payments: Stripe integration
  • UI Components: shadcn/ui component library

📚 Documentation

For Developers

Topic Description Link
Getting Started Complete setup guide 📖 Getting Started
Security Config ⚠️ Critical: Auth setup 🔐 Security Guide
Architecture System design & diagrams 🏗 Architecture
API Reference All endpoints & examples 📡 API Docs
Components UI components & styling 🎨 Components
Testing Writing & running tests 🧪 Testing
Deployment Production deployment 🚀 Deployment

For End Users

Topic Description Link
Getting Started How to use the app 👤 User Guide
Uploading Documents PDF upload process 📄 Upload Guide
Export Formats Available export options 💾 Export Guide
Troubleshooting Common issues 🔧 Troubleshooting

API Integration

Resource Description Link
curl Examples Command-line usage 💻 curl Examples
JavaScript SDK JS/TS integration ⚛️ JavaScript
Python Examples Python integration 🐍 Python

🛠 Development Commands

# Start development servers (both frontend and backend)
pnpm run dev

# Run only frontend (Next.js)
pnpm run dev:frontend

# Run only backend (Convex)
pnpm run dev:backend

# Build for production
pnpm run build

# Run tests
pnpm run test              # Unit tests with Vitest
pnpm run pw:test          # E2E tests with Playwright
pnpm run pw:test:ui       # Playwright UI mode

# Linting and formatting
pnpm run lint

🌍 Environment Variables

Copy .env.example to .env.local and configure:

# Core Services (Required)
NEXT_PUBLIC_CONVEX_URL="https://your-deployment.convex.cloud"
NEXT_PUBLIC_APP_URL="http://localhost:3000"

# Authentication (Clerk) - Currently disabled
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY="pk_test_your-clerk-key"
CLERK_SECRET_KEY="sk_test_your-clerk-secret"

# Document Processing (Optional for local dev)
FAST_API_URL="http://localhost:8000"
FAST_API_SECRET_KEY="your-secret-key"

# Payments (Stripe) - Optional for local dev
STRIPE_PUBLISHABLE_KEY="pk_test_your-stripe-key"
STRIPE_SECRET_KEY="sk_test_your-stripe-secret"

See Environment Variables Guide for complete reference.

🔌 External Dependencies

Required Services

  1. Convex - Backend database and serverless functions

  2. FastAPI Service (Optional for local development)

    • Repository: precision_pdf_fast_api
    • Handles PDF processing with Landing AI
    • Can run locally or deploy to Render

Optional Services (For production)

  1. Clerk - Authentication (currently disabled)
  2. Stripe - Payment processing
  3. Landing AI - Document processing AI
  4. Sentry - Error monitoring

🧪 Testing

The project includes comprehensive testing infrastructure:

# Unit Tests (Vitest)
pnpm run test          # Run once
pnpm run test:watch    # Watch mode
pnpm run test:ui       # UI interface

# E2E Tests (Playwright)
pnpm run pw:test       # Headless
pnpm run pw:test:ui    # UI mode
pnpm run pw:test:debug # Debug mode

Currently no tests are implemented, but infrastructure is ready. See Testing Guide.

📦 Tech Stack

Frontend

  • Next.js 15 - React framework with App Router
  • React 19 - UI library
  • Tailwind CSS - Utility-first styling
  • shadcn/ui - Component library
  • TypeScript - Type safety

Backend

  • Convex - Real-time database and serverless functions
  • Clerk - Authentication (currently disabled)
  • Stripe - Payment processing

External Services

  • FastAPI - Document processing service
  • Landing AI - AI-powered document extraction

DevOps & Monitoring

  • Vercel - Frontend hosting
  • Render - FastAPI hosting
  • Sentry - Error monitoring
  • PostHog - Analytics

📄 Example Documents

The app includes 8 pre-processed example documents:

  • 📧 Invoice
  • 🏦 Bank Statements (2)
  • 🏥 Medical Reports (2)
  • 📑 Medical Journal Article
  • 🏠 Mortgage Application
  • 📋 Settlement Statement

Examples are stored in /public/examples/ and can be explored without authentication.

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details on:

  • Code style and standards
  • Development workflow
  • Pull request process
  • Issue reporting

Quick Contribution Setup

# Fork the repo and clone your fork
git clone https://github.com/yourusername/precision-pdf.git

# Create a feature branch
git checkout -b feature/your-feature-name

# Make your changes and test
pnpm run test
pnpm run lint

# Submit a pull request

🐛 Troubleshooting

Common Issues

"User not authenticated" errors in development:

  • This is expected since authentication is disabled
  • Check the security configuration guide

Documents not processing:

Build errors:

  • Ensure you're using the latest Node.js LTS
  • Delete node_modules and run pnpm install

📊 Project Status

  • ✅ Core document processing
  • ✅ Visual verification interface
  • ✅ Multiple export formats
  • ✅ Real-time processing updates
  • ⚠️ Authentication (disabled for local dev)
  • ⚠️ Testing (infrastructure ready)
  • 🔄 Documentation (in progress)

📜 License

This project is open source. License details coming soon.

🆘 Support

📞 Contact

For questions about this project:


Star this repository if you find it useful!

About

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载