DC/OS tests with torpedo #63

piyush-nimbalkar · 2017-11-07T05:04:23Z

** Hardcoded nodes and marathon leader, will change in next review **

setup - tear down test works with DC/OS
creates supports application using px volumes
creates application and waits for it to be ready
deletes and validates deletion of application
sample mysql app deployed using marathon
no volume inspection
torpedo runs as a metronome job

disrani-px · 2017-11-07T05:18:54Z

Can you separate out the vendor changes into a separate commit.

piyush-nimbalkar · 2017-11-07T06:26:56Z

@disrani-px do you mean just a separate commit or a separate review as well?

disrani-px · 2017-11-07T06:31:02Z

Just a separate commit with the same PR

piyush-nimbalkar · 2017-11-07T06:40:15Z

Done

harsh-px · 2017-11-07T16:45:11Z

deployments/torpedo-dcos-ssh.json

+    "id": "torpedo",
+    "description": "Run torpedo on DC/OS",
+    "run": {
+        "cmd": "docker run -e 'TORPEDO_SSH_PASSWORD=root' --entrypoint ginkgo piyushpx/torpedo:latest --slowSpecThreshold 180 -v bin/basic.test -- --spec-dir ../drivers/scheduler/dcos/specs --scheduler dcos",


Can we pass in environment variables?

Many of the params to ginkgo need to be controlled by the caller (e.g which tests to skip, scale factor etc). See the aws job for example https://github.com/portworx/torpedo/blob/master/deployments/deploy-aws.sh

This is run as a job in dcos. The use will have to create it manually using the file and change the appropriate env variables.
For deployment from jenkins, we will have a shell script like we have for deploy-aws later which can take multiple params.

harsh-px · 2017-11-07T16:47:58Z

drivers/scheduler/dcos/dcos.go

+	return specs, nil
+}
+
+func (d *dcos) IsNodeReady(n node.Node) error {


Add TODO's stating functions are to be implemented. For all places in the PR.

done in incremental

harsh-px · 2017-11-07T16:49:09Z

drivers/scheduler/dcos/dcos.go

+	nodes, _ := getNodes()
+
+	for _, n := range nodes {
+		node.AddNode(n)


Add the call to IsNodeReady (even though unimplemented) here.

done in incremental

harsh-px · 2017-11-07T16:50:15Z

drivers/scheduler/dcos/dcos.go

+	var nodes []node.Node
+	nodes = append(nodes, node.Node{
+		Name:      "a1.dcos",
+		Addresses: []string{"192.168.65.111"},


I'm guessing these are IP's of your private setup? Add a comment if so.

Is there a way to query DCOS for nodes in the cluster?

If DCOS does not have such an API, can we pass it through ginkgo params or env variables?

There is a way to get nodes from mesos. But no way to get the marathon or dcos master. So here the user will have to give us the master ip using env variable which we can use to get nodes or communicate to marathon.
Will be doing the whole thing in the next PR. This PR mainly focused on the torpedo to run the setup teardown test.

Actually wrong info. Mesos provides a nice set of dns names. We can use "leader.mesos" to get to the mesos host. Similarly for marathon.

Marathon is running on the master node, so it'll be the same as leader.mesos.

You can use mesos APIs to get the list of nodes. For example using curl: curl http://master.mesos:5050/slaves

mesos-go library should have the APIs for this

Yes, the mesos-go library seemed a bit complicated, hence decided to do that in a later commit.

harsh-px · 2017-11-07T17:37:23Z

drivers/volume/portworx/portworx.go

+func (d *portworx) updateNode(n node.Node, pxNodes []api.Node) {
+	for _, address := range n.Addresses {
+		for _, pxNode := range pxNodes {
+			if address == pxNode.DataIp {


Add check for MgmtIP too.

harsh-px · 2017-11-07T17:38:34Z

drivers/scheduler/dcos/specs/mysql/px-mysql.json

+                },
+                {
+                    "key": "volume",
+                    "value": "size=20,repl=2,name=px_mysql_vol:/var/lib/mysql"


Let's use repl 3. Will give write quorum if we kill one of the nodes.

harsh-px · 2017-11-07T17:39:45Z

drivers/scheduler/dcos/marathon_ops.go

+			return nil, err
+		}
+
+		_, err := m.client.Application(name)


Can be

if _, err := m.client.Application(name); err != nil

harsh-px · 2017-11-07T17:43:25Z

drivers/scheduler/dcos/marathon_ops.go

+// Initialize Marathon client if not initialized
+func (m *marathonOps) initMarathonClient() error {
+	if m.client == nil {
+		marathonURL := "http://192.168.65.90:8080"


How are we eventually going to find the marathon URL? (For e.g k8s gets it by the mounted px-account or the KUBE_CONFIG env variable).

Use leader.mesos

adityadani · 2017-11-07T18:24:12Z

drivers/scheduler/dcos/dcos.go

+	var nodes []node.Node
+	nodes = append(nodes, node.Node{
+		Name:      "a1.dcos",
+		Addresses: []string{"192.168.65.111"},


If DCOS does not have such an API, can we pass it through ginkgo params or env variables?

adityadani · 2017-11-07T18:36:50Z

drivers/scheduler/dcos/marathon_ops.go

+		return err
+	}
+
+	return m.client.WaitOnApplication(name, 5*time.Minute)


Is this a blocking call?
The callers are not expecting ValidateApplication to be a blocking call. Also the time period can be a constant.

Renamed the methods

adityadani · 2017-11-07T18:38:27Z

drivers/scheduler/dcos/marathon_ops.go

+		return nil, nil
+	}
+
+	if _, err := task.DoRetryWithTimeout(t, 5*time.Minute, 10*time.Second); err != nil {


May be we could regulate the timeouts and wait period constants by defining them in a constant file. All these constants should be common for both k8s and dcos

I agree that we should have constants for all these things. There are many such places, lets change it everywhere all at once.
Created an issue to track:
#67

adityadani · 2017-11-07T18:45:50Z

drivers/volume/portworx/portworx.go

 	t := func() (interface{}, error) {
-		clusterManager := d.getClusterManager()
-		if status, _ := clusterManager.NodeStatus(); status != api.Status_STATUS_OK {
+		if status, _ := d.getClusterManager().NodeStatus(); status != api.Status_STATUS_OK {


I don't understand the use of this function if we have WaitForNode. This function is only going to check the status of the node which is currently the endpoint.

If the intention of this function is to just check Cluster status then use
clusterManager.Enumerate() which returns api.Cluster object which has cluster status.

done in incremental

harsh-px · 2017-11-08T00:53:23Z

deployments/torpedo-dcos-ssh.json

+        "cpus": 0.5,
+        "mem": 256,
+        "docker": {
+            "image": "piyushpx/torpedo:latest"


Before final checkin, change to portworx/torpedo:latest

harsh-px · 2017-11-08T01:25:19Z

drivers/scheduler/dcos/dcos.go

+	if value, ok := opts[scheduler.OptionsWaitForResourceLeakCleanup]; ok && value {
+		if err := d.WaitForDestroy(ctx); err != nil {
+			return err
+		}


Add a TODO to add call to waitForCleanup here once it's implemented.

harsh-px · 2017-11-08T02:03:28Z

drivers/scheduler/dcos/specs/mysql/px-mysql.json

+                },
+                {
+                    "key": "volume",
+                    "value": "size=10,repl=3,name=px_mysql_vol:/var/lib/mysql"


reduce size to 2. 10G will cause issues on dev systems which might have less storage.

harsh-px · 2017-11-08T02:05:38Z

tests/common.go

 		Step(fmt.Sprintf("wait for %s app to start running", ctx.App.Key), func() {
 			err := Inst().S.WaitForRunning(ctx)
 			expect(err).NotTo(haveOccurred())
 		})
+
+		Step(fmt.Sprintf("validate %s app's volumes", ctx.App.Key), func() {


Any reason to move volume check after app check? The volume check was before applications since volumes are a pre-requisite for a app container to start. So that itself fails, there is no need to WaitForRunning.

done as per discussed.

harsh-px · 2017-11-08T02:07:16Z

drivers/volume/portworx/portworx.go

+
+	cluster, err := task.DoRetryWithTimeout(t, 2*time.Minute, 10*time.Second)
+	if err != nil {
+		return api.Cluster{}, err


In an error case, it's better to return a nil object than a default one.

it was an object, hence could not return a nil
but changed it to a reference

piyush-nimbalkar · 2017-11-08T20:31:16Z

Thanks for the review guys!

** Hardcoded nodes and marathon leader, will change in next review ** - setup - tear down test works with DC/OS - creates supports application using px volumes - creates application and waits for it to be ready - deletes and validates deletion of application - sample mysql app deployed using marathon - no volume inspection - torpedo runs as a metronome job

disrani-px · 2017-11-07T21:54:07Z

drivers/scheduler/dcos/marathon_ops.go

+	// ValidateAppliation checks if the aplication is running and healthy
+	ValidateApplication(string) error
+	// DeleteApplication deletes the given application
+	DeleteApplication(string) error


Marathon has 3 different operations: stop, remove, kill

All 3 behave differently and need to be tested. So there should be interfaces for all of them.

Follow up on the discussion -
There are counter intuitive apis to do the stop and kill operations.
Stop is basically updating the app with instance count of 0
While kill operation calls a DELETE on the tasks of that app.

We can implement these as we need going forward.

disrani-px · 2017-11-07T21:54:51Z

drivers/scheduler/dcos/marathon_ops.go

+// Initialize Marathon client if not initialized
+func (m *marathonOps) initMarathonClient() error {
+	if m.client == nil {
+		marathonURL := "http://192.168.65.90:8080"


Use leader.mesos

disrani-px · 2017-11-08T20:51:11Z

drivers/scheduler/dcos/specs/mysql/px-mysql.json

+            "parameters": [
+                {
+                    "key": "volume-driver",
+                    "value": "pxd"


These specs should be templated out. The driver field should be populated depending on which volume driver is being tested. It looks like it is the same for the k8s specs. So they can be fixed together.

Created an issue to track this: #70

piyush-nimbalkar requested review from adityadani and harsh-px November 7, 2017 05:04

Adding dependency on go-marathon

b1ac796

piyush-nimbalkar force-pushed the dcos branch from 685bf0a to a933249 Compare November 7, 2017 06:39

harsh-px requested changes Nov 7, 2017

View reviewed changes

piyush-nimbalkar requested a review from disrani-px November 7, 2017 18:40

adityadani requested changes Nov 7, 2017

View reviewed changes

harsh-px requested changes Nov 8, 2017

View reviewed changes

harsh-px approved these changes Nov 8, 2017

View reviewed changes

adityadani approved these changes Nov 8, 2017

View reviewed changes

piyush-nimbalkar force-pushed the dcos branch from 236c18b to e07ad9a Compare November 8, 2017 20:35

disrani-px reviewed Nov 8, 2017

View reviewed changes

piyush-nimbalkar merged commit 906e4e9 into portworx:master Nov 8, 2017

piyush-nimbalkar deleted the dcos branch November 8, 2017 21:49

DC/OS tests with torpedo #63

DC/OS tests with torpedo #63

Conversation

piyush-nimbalkar commented Nov 7, 2017

disrani-px commented Nov 7, 2017

piyush-nimbalkar commented Nov 7, 2017

disrani-px commented Nov 7, 2017

piyush-nimbalkar commented Nov 7, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piyush-nimbalkar Nov 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

disrani-px Nov 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piyush-nimbalkar commented Nov 8, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piyush-nimbalkar Nov 7, 2017 •

edited

Loading

disrani-px Nov 7, 2017 •

edited

Loading