+
Skip to content

thomasopsomer/maestro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Maestro

Python API Usage

from maestro import Maestro
import os


# init maestro
m = Maestro()


# set conf for driver
MASTER_DRIVER_CONF = {
    "security_group": "default",
    "access_key": os.environ["AWS_ACCESS_KEY_ID"],
    "secret_key": os.environ["AWS_SECRET_ACCESS_KEY"],
    "instance_type": "r4.xlarge",
    "region": "eu-west-1",
    "ssh_keypath": "/Users/thomasopsomer/AWS/asgard.pem",
    "request_spot_instance": True,
    "spot_price": 0.2
    "zone": "a",
    "iam_instance_profile": "asgard",
    "root_size": "50"
}

# set conf for workers
WORKER_DRIVER_CONF = {
    "security_group": "default",
    "access_key": os.environ["AWS_ACCESS_KEY_ID"],
    "secret_key": os.environ["AWS_SECRET_ACCESS_KEY"],
    # "instance_type": "m4.large",
    "region": "eu-west-1",
    "instance_type": "r4.4xlarge",
    "ssh_keypath": "/Users/thomasopsomer/AWS/asgard.pem",
    "zone": "a",
    "iam_instance_profile": "asgard",
    "root_size": "50"
    "request_spot_instance": True,
    "spot_price": 0.4
}


# cluster name and docker image
cluster_name = "ac"
spark_image = "asgard/asgard:develop"


# launch a spark cluster
m.add_spark_cluster(cluster_name,
					spark_image=spark_image,
                    provider="amazonec2",
                    n_workers=4,
                    master_driver_conf=MASTER_DRIVER_CONF,
                    worker_driver_conf=WORKER_DRIVER_CONF)


# run a task on the cluster
cmd = "spark-submit --master spark://master:7077 --class org.apache.spark.examples.SparkPi /usr/spark/lib/spark-examples-1.6.0-hadoop2.6.0.jar"

m.run_on_cluster(cluster_name=cluster_name,
                 docker_image=spark_image,
                 command=cmd)

REST API Usage

  • Start Rabbitmq and Celery
celery -A maestro.api.task worker --loglevel=info
  • start maestro server
python maestro/api/server.py

TODO

  • Allow to install stuff on each machine on request

  • Try flask + celery

  • Add database layer with sqlite for cluster data persistence and finding all created cluster ...

  • Issue with pydm: be sure that when machine is delete, the created keypair is also deleted

  • Specify network for each cluster base on cluster name

About

Spark clusters on the fly with docker

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载