Tabula helps you liberate data tables trapped inside PDF files.
© 2012-2013 Manuel Aristarán. Available under MIT License. See
AUTHORS.md and LICENSE.md.
Notice: July 8, 2013 --- If you are using the Amazon EC2 AMI for Tabula (released earlier this year), it will cease to function on next reboot. You should terminate all instances using this AMI. See the Using Tabula section below for instructions on using the new, desktop-oriented version of Tabula.
If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful this is — you can’t easily copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data in CSV format, through a simple web interface (Check out this short screencast)
Caveat: Tabula only works on text-based PDFs, not scanned documents.
First, make sure you have a recent copy of Java installed. You can download Java here. Tabula requires a Java Runtime Environment compatible with Java 6 or Java 7.
-
Windows -- Download
tabula-win.zipfrom the download site. Unzip the whole thing and open thetabula.exefile inside. A browser should automatically open to http://127.0.0.1:8080/ . If not, open your web browser of choice and visit that link.To close Tabula, just go back to the console window and press "Control-C" (as if to copy).
-
Mac OS X -- Download
tabula-mac.zipfrom the download site. Unzip and open the Tabula app inside. A browser should automatically open to http://127.0.0.1:8080/ . If not, open your web browser of choice and visit that link.To close Tabula, find the Tabula icon in your dock, right-click (or control-click) on it, and press "Quit".
-
Other platforms -- Download
tabula-jar.zipfrom the download site and unzip it to the directory of your choice. Open a terminal window, andcdto inside thetabuladirectory you just unzipped. Then run:java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -jar tabula.jar
If the program fails to run, double-check that you have Java installed and then try again.
-
Download JRuby. You can install it from its website, or using tools like
rvmorrbenv -
Download Tabula and install the Ruby dependencies. (Note: if using
rvmorrbenv, ensure that JRuby is being used.git clone git://github.com/jazzido/tabula.git cd tabula gem install bundler gem install tabula-extractor bundle install
Then, start the development server:
bundle exec rackup
The site instance should now be viewable at http://127.0.0.1:9292/ .
You can a couple some options when executing the server in this manner:
TABULA_DATA_DIR="/tmp/tabula" \
TABULA_DEBUG=1 \
bundle exec rackup
TABULA_DATA_DIRcontrols where uploaded data for Tabula is stored. By default, data is stored in the OS-dependent application data directory for the current user. (similar to:C:\Users\foo\AppData\Roaming\Tabulaon Windows,~/Library/Application Support/Tabulaon Mac,~/.tabulaon Linux/UNIX)TABULA_DEBUGprints out extra status data when PDF files are being processed. (falseby default.)
Alternatively, running the server as a JAR file
Testing in this manner will be closer to testing the "packaged application" version of the app.
bundle exec rake war
java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -jar build/tabula.jar
After performing the above steps ("Running Tabula from source"), you can compile Tabula into a standalone application:
Mac OS X
rake macosx
This will result in a portable "tabula_mac.zip" archive (inside the build directory)
for Mac OS X users.
Windows
You can build .exe files for the Windows target on any platform.
Download a 3.1.X (beta) copy of Launch4J.
Unzip it into the Tabula repo so that "launch4j" (with subdirectories "bin", etc.) is in the repository root.
Then:
rake windows
This will result in a portable "tabula_win.zip" archive (inside the build directory)
for Mac OS X users.
If you have issues, you can try building manually. (These commands are for OS X/Linux and may need to be adjusted for Windows users.)
# (from the root directory of the repo)
rake war
cd launch4j
ant -f ../build.xml windows
A "tabula.exe" file will be generated in "build/windows". To run, the exe file needs "tabula.jar" (contained in "build") in the same directory. You can create a .zip archive by doing:
# (from the root directory of the repo)
cd build/windows
mkdir tabula
cp tabula.exe ./tabula/
cp ../tabula.jar ./tabula/
zip -r9 tabula_win.zip tabula
rm -fr tabula
Interested in helping out? See TODO.md for ideas.