GitHub - jatinganhotra/modelCheckingYCSB

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
YCSB		YCSB
clusterSetup		clusterSetup
monotonic_reads		monotonic_reads
.gitignore		.gitignore
README		README
YCSBResultParser.py		YCSBResultParser.py
YCSBRun.sh		YCSBRun.sh
plot.py		plot.py
plot_bins_count.py		plot_bins_count.py
sshall.sh		sshall.sh

Repository files navigation

Cassandra Maude project

Codebase overview:
-------------------
There are 2 repositories:
1. https://github.com/jatinganhotra/modelCheckingYCSB
This contains YCSB, cluster management and deployment code as well as
scripts to run experiments and analyize results.
The scripts are designed to run on any computer (either an emulab server
or your own local pc).
2. https://github.com/jatinganhotra/modelCheckingCassandra
This contains modified Cassandra.
Currently there are two branches: "timestampBased" and "agnostic"
corressponding to TB and TA strategies.

Set up cluster
-------------------
0. Variables

You need to supply correct values for variables in the commands below.
Here is the explanation:

N: the number of nodes in your cluster

1. Initialization

The initialzation is to be done once even if your nodes are swapped out and
in again.

a. Symlink

You need to be able to compile Cassandra and YCSB without going over disk
quota assigned by Emulab. Since compiling Cassandra and YCSB involves
downloading a lots of required packages to ~/.m2 folder, you need to first
symlink it to a directory with higher quota (for example /proj or /scratch).

mkdir /scratch/.m2
ln -s /scratch/.m2 /users/yourid/.m2

b. Environment variables:

You need to export the following environment variables to both
~/.bashrc and and ~/.profile:

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64
export CASSANDRA_HOME=/modelCheckingCassandra
export CASSANDRA_INCLUDE=$CASSANDRA_HOME/bin/cassandra.in.sh

c. Modify account file
Modify clusterSetup/account with your Emulab account and experiment's domain:

For example, I normally ssh using
[email protected] then my USER is sonnbc
and DOMAIN is riak.confluence.emulab.net

2. Install necessary packages

After everytime your nodes are swapped in, you need to install necessary
packages for the nodes in your cluster. This can be done by:

clusterSetup/install_packages.sh N

3. Set up cluster

Run the script to set up the cluster.
usage: clusterSetup/cluster.sh -n numnode -b cassandraBranch [timestampBased | agnostic] [-d (deploy) | -u (update) recompile/reset/keep]
recompile: recompile cassandra and YCSB and reset cluster
reset: don't recompile the code but reset cluster (empty all data)
keep: just git pull but do nothing else

This will set up Cassandra cluster for your Emulab nodes. In order to do this,
you need access to the two github repos above. You might need to change the
the variables at top of the file if necessary. These variables include:

CASSANDRA_HOME: Where Cassandra binary will be on each node.
This needs to be a local directory (non NFS)

SYNC_POINT_CASSANDRA: Where Cassandra code will be cloned from git and compiled.
This needs to be a directory on the NFS (synced between nodes)

SYNC_POINT_YCSB: Where YCSB and scripts will be cloned from git and YCSB is
compiled. This needs to be a directory on the NFS.

CASSANDRA_DATA: Where Cassandra stores its data. Non-NFS. You typically
dont need to change this unless you know what you are doing.

CASSANDRA_LOG: Where Cassandra stores its log. Non-NFS. You typically
dont need to change this unless you know what you are doing.

The -d switch is typically used when you first swap your nodes in. It first
deletes everything in your current nodes, then clones the git repos and
compiles the code from scratch. Cassandra cluster is deployed after that.

The -u switch assumes you already have git repos in your NFS. It will just call
"git pull" instead of "git clone". The 3 options are explained in the usage above.
Typically you want to use 'keep' when you just change YCSB workload file.

Make sure to check the log on the screen to see if YCSB and Cassandra are
compiled correctly. For YCSB, you just need the following. The rest might
fail to compile but it's not important.

[INFO] YCSB Root ......................................... SUCCESS [36.239s]
[INFO] Core YCSB ......................................... SUCCESS [26.911s]
[INFO] Cassandra DB Binding .............................. SUCCESS [30.420s]

4. Check cluster status

Make sure Cassandra is up and running for all nodes. This is done by going to
CASSANDRA_HOME from one of the nodes and calling:

bin/nodetool status

Check if all nodes are in the UP state

5. Set up database

If you have chosen to run cluster.sh with options different than "-u keep",
you need to reinitialize the database. Go to one of the nodes, invoke
Cassandra-cli (CASSANDRA_HOME/bin/cassandra-cli) and issue these
commands:

CREATE KEYSPACE usertable with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor:3};
use usertable;
create column family data with comparator=UTF8Type and default_validation_class=UTF8Type and read_repair_chance=0.0;

6. Use netem to change network delay if necessary (ask Muntasir)

Run experiments
-------------------

You might need to change YCSB_HOME in YCSBRun.sh if you have changed
SYNC_POINT_YCSB in clusterSetup/cluster.sh

The workload used is YCSB_HOME/workloads/modelCheckingWorkload. Since
the nodes have a copy of the file in their cloned repo, you NEED TO MAKE SURE
any changes in this file are reflected at the nodes. Otherwise they will
just run YCSB with the old version that they have. So you need to commit your
changes and push them to github. Then call clusterSeup/cluster.sh with
"-u keep" option to instruct the nodes to pull the file.

1. YCSB Load:
./YCSBRun.sh -l -n N

2. YCSB Test:
./YCSBRun.sh -t -n N 2>&1 | grep INFO | tee temp.log

3. Parse result:
python YCSBResultParser temp.log

Contact:
-------------------
Jatin Ganhotra - [email protected]