Apache Spark is a research project for distributed computing which interacts with HDFS and heavily utilizes in-memory caching. Spark 1.0.0 can be deployed to traditional cloud and job management services such as EC2, Mesos, or Yarn. Further, Spark's standalone cluster mode enables Spark to run on other servers without installing other job management services.However, configuring and submitting applications to a Spark 1.0.0 standalone cluster currently requires files to be synchronized across the entire cluster, including the Spark installation directory. This project utilizes Fabric and Puppet to further automate the Spark standalone cluster. The Puppet scripts are MIT-licensed from stefanvanwouw/puppet-spark and wikimedia/puppet-cdh4.