Pfs is a distributed filesystem for large scale data processing similar to HDFS. It's designed to integrate painlessly into a Docker stack by using containers itself for deployment and by letting users specify distributed computations as containers. Furthermore pfs leverage btrfs (the same filesystem that powers Docker itself) to offer cluster wide filesystem snapshotting.
- Fault tolerant system built around CoreOS primitives (implemented)
- Rich commit history (implemented)
- Branching (not implemented)
- Dockerized Map Reduce (not implemented)
Absolutely not, pfs has only recently hit MVP status.
Pfs is designed to run on CoreOS. To start you'll need a working CoreOS cluster. Currently global containers, which are required by pfs, are only available in the beta channel (CoreOS 444.5.0.)
- Google Compute Engine (recommended): [https://coreos.com/docs/running-coreos/cloud-providers/google-compute-engine/]
- Amazon EC2: [https://coreos.com/docs/running-coreos/cloud-providers/ec2/]
SSH in to one of your new machines CoreOS machines.
$ wget https://github.com/pachyderm-io/pfs/raw/master/deploy/static/3Node.tar.gz
$ tar -xvf 3Node.tar.gz
$ fleetctl start 3Node/*
The startup process takes a little while the first time your run it because each node has to pull a Docker image.
The easiest way to see what's going on in your cluster is to use list-units
$ fleetctl list-units
If things are working correctly you should see something like:
UNIT MACHINE ACTIVE SUB
announce-master-0-3.service 3817102d.../10.240.199.203 active running
announce-master-1-3.service 06c6dba9.../10.240.177.113 active running
announce-master-2-3.service 3817102d.../10.240.199.203 active running
announce-replica-0-3.service f652105a.../10.240.229.124 active running
announce-replica-1-3.service 06c6dba9.../10.240.177.113 active running
announce-replica-2-3.service f652105a.../10.240.229.124 active running
master-0-3.service 3817102d.../10.240.199.203 active running
master-1-3.service 06c6dba9.../10.240.177.113 active running
master-2-3.service 3817102d.../10.240.199.203 active running
replica-0-3.service f652105a.../10.240.229.124 active running
replica-1-3.service 06c6dba9.../10.240.177.113 active running
replica-2-3.service f652105a.../10.240.229.124 active running
router.service 06c6dba9.../10.240.177.113 active running
router.service 3817102d.../10.240.199.203 active running
router.service f652105a.../10.240.229.124 active running
Pfs exposes a git like interface to the file system:
$ curl -XPOST localhost/pfs/file_name -d @local_file
$ curl localhost/pfs/file_name
$ curl -XPUT localhost/pfs/file_name -d @local_file
$ curl -XDELETE localhost/pfs/file_name
$ curl localhost/commit
Committing in pfs creates a lightweight snapshot of the file system state and pushes it to replicas. Where it remains accessible by commit id.
$ curl localhost/pfs/file_name?commit=n