Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will you provide the ''job submitting'' scripts (i.e., bash shell)?? #6

Open
xiandong79 opened this issue Aug 25, 2017 · 5 comments
Open

Comments

@xiandong79
Copy link

Hi, This is Xiandong(显东)。

Would you provide the ''job submitting'' scripts of your original experiments? I am preparing a similar project but have no idea how to design a "load generator" as described in your paper.

My Email: [email protected], Wechat: qi839395901.

@yncxcw
Copy link
Owner

yncxcw commented Aug 25, 2017

hi, Xiandong

Thanks for interesting, I wrote a generator here for my paper:
https://github.com/yncxcw/YarnBench

It now can (1) support trace jobs generation( from a trace file) and poisson job generation( from user configurations) (2) monitor cluster information via REST API. I do not have a document for it yet, I will do soon. You can refer conf.default as a starting point as well as some code for your own purpose.

For more questions, feel free to contact me.

Wei Chen

@xiandong79
Copy link
Author

Thank you so much.

Q1: Will you categorize Big-C into Yarn on Docker, Docker on Yarn or a new type?

Maybe Docker on Yarn?

Q2: Does each time a new job is submitted, a new docker/container is launched?

If so, I am concerned about the cost of launching a docker frequently, 10ms per job?

@yncxcw
Copy link
Owner

yncxcw commented Aug 29, 2017

Hi, Xiandong

For Q1, I would rather say Big-c is a yarn on docker. Since yarn now already supports docker containers, we leverage the docker as well as cgroup in our project.

For Q2, It depends on many factors, like your hard drive, image size ... I test one of my images(I build jdk, hadoop and spark in this image) on my server (equipped raid-5 hdd):

admin@host7:~$ time docker run -d cwei/hadoop:3.0.0 /bin/bash
0ab19bae13f7d2020a966aed6c9c76712faa73d194377941a154c8ba2fe4f65a

real 0m0.354s
user 0m0.020s
sys 0m0.004s

It turns out launching a container takes 0.35s which could be a big problem for a 10ms task.
However, based on my experience I did not find any frameworks(mapreduce task and spark executor) that have a duration of 10ms yet, since JVM launching is still slow.
The shortest task I observed was about 10 seconds.

By the way, I would love to know if you are using some tiny workloads, since I am trying to optimize the scheduling delay for YARN.

@xiandong79
Copy link
Author

Thanks for your illustration.

  1. “10ms per job", I mean that on average, the launch of one container costs 10ms (I remember I saw this number-10ms from a Google/? report). At that time, I thought it would be wonderful if we can reuse the JVMs or launched containers facing frequent launching and closing.

I understand your idea that compared with a job costing 10 mins, 10ms launch time is negligible.

  1. Sorry that I did not find any new tiny workloads since I also mainly use the Hibench/Sparkbench as the benchmark of tests. Definitely, the first step is to point out that current policy is not good enough (i.e. large scheduling delays vs small task durations).

What about SparkStreaming jobs in Hibench where the task durations are relatively small?

@yncxcw
Copy link
Owner

yncxcw commented Sep 5, 2017

@xiandong79

I understand. I think streaming jobs may not be fit for this situation because the spark executors are allocated at the beginning of the job and the resources will also be kept during its lifetime.

Wei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants