-
Notifications
You must be signed in to change notification settings - Fork 269
How to run with standalone #57
Comments
HI, Can you tell me which sample you are referring to? Is your standalone cluster an EMR cluster? |
Hi I am referring to HadoopTerasort . Yes I want to run against my own https://github.com/awslabs/data-pipeline-samples/tree/master/samples/HadoopTerasort Thanks Srikrishna On Thu, Aug 11, 2016 at 11:05 PM, Marc Beitchman [email protected]
|
Hi Srikrishna, You will need to run the taskrunner on your cluster. Please see this link for more details. http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-using-task-runner.html I think you can process as much as you want. Of course, runtime will depend on your cluster size. Marc |
Hi Thanks for your quick response . Can you explain the high level steps so I don't want to use EMR Cluster . I have a stand alone spark cluster 1.6.1 Have nice day. Thanks Srikrishna On Thu, Aug 11, 2016 at 11:51 PM, Marc Beitchman [email protected]
|
Hi That means what you are saying you can replace EMR cluster with Physical I am very eager to receive your response. Thanks Srikrishna On Fri, Aug 12, 2016 at 12:03 AM, Srikrishna Parthasarathy <
|
yes, that is correct. The task runner is an agent that runs on AWS or on premises resources to execute the activities in the pipeline. The above documentation will explain this in more detail. Please follow up if you have questions once you get started. |
Hi Thanks . How do you modify your script/code to include my standalone spark srikrishna On Sat, Aug 13, 2016 at 2:40 AM, Marc Beitchman [email protected]
|
To connect a Task Runner that you've installed to the pipeline activities it should process, add a workerGroup field to the object, and configure Task Runner to poll for that worker group value. You do this by passing the worker group string as a parameter (for example, --workerGroup=wg-12345) when you run the Task Runner JAR file. http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-how-task-runner-user-managed.html |
If I have my standalone spark cluster with hdfs/yarn configured , What changes are required to run this code?
The text was updated successfully, but these errors were encountered: