Skip to content

qwercxzsda/cs434-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed Sorting

Overview

This is a git repository for CS434-AdvancedProgramming at POSTECH.

This repository implements the final project which is distributed sorting using multiple servers.

You can find the implementation details here.

Among three teams Red, Blue, and Green, our team (Blue) achieved the best performance by a significant margin. You can find the final results here.

Usage

Clone the git repository.

git clone https://github.com/qwercxzsda/cs434-project.git

Run sbt assembly.

cd cs434-project
sbt assembly

Two files master.jar and worker.jar will be created at the project root directory.

Run Master

Run the following code.

java -jar master.jar [worker_number]

For example,

java -jar master.jar 1

Run Worker

Run the following code. Use 30962 for master_port. See below for more information.

java -jar worker.jar [master_ip]:[master_port] -I [input_directory1] [input_directory2] ... -O [output_directory]

For examble,

java -jar worker.jar 2.2.2.142:30962 -I /home/blue/test2/input_dir -O /home/blue/test2/output_dir

Output will be created in the output_directory as partition*.

Disregard the tmp1, tmp2 folders created in the output_directory. These are not part of the output. They are temporary files used during the worker execution.

Caution

master_port is always assumed to be 30962. If you use other number as master_port it will just be disregarded.

For master and worker to run properly, port 30962 must be free(not in use). If you really need to change the port as port 30962 is already in use, change the variable NetworkConfig.port in the file cs434-project/utils/src/main/scala/NetworkConfig.scala and run sbt assembly again(for both master and worker).

Tests

Test Large

1 Master
4 Workers each with 20000 * 2490 = 49.8 M Records(49.8 M * 100 B = 4.98 GB)
Total 49.8 M * 4 = 199.2 M Records(199.2 M * 100 B = 19.92 GB)

image image image image image

Result

image image image image

Test Simple

1 Master
2 Workers each with 2 Records(2 * 100 B = 200 B)
Total 2 * 2 = 4 Records(4 * 100 B = 400 B)

image image image

Result

input
image image
output
image image

Test Large2(no duplicates)

1 Master
4 Workers, 1 directory, each directory with 100,000,000 = 100 M Records(100 M * 100 B = 10 GB)
Totla 100 M * 4 = 400 M Records(400 M * 100 B = 40 GB)

Result

Input

100,000,000 * 4 = 400 M Records

Created using the below code on each workers.

gensort -b[start_index] [number] partition0

For examble, on worker 3,

gensort -b200000000 100000000
Output

99586002 + 94949919 + 107194027 + 98270052 = 400,000,000 = 400 M

All sorted, no duplicates

image image image image image image

Test Large3(no duplicates)

1 Master
8 Workers, 3 directories, each directory with 100,000 = 100 K Records(100 K * 100 B = 10 MB)
Total 100 K * 3 * 8 = 2.4 M Records(2.4 M * 100 B = 240 MB)

Result

Input

100,000 * 3 * 8 = 2,400,000 = 2.4 M Records

Created using the below code on each workers.

gensort -b[start_index] [number] partition0

For examble,

image
Output

283952 + 295771 + 308125 + 307138 + 303761 + 305078 + 307586 + 288589 = 2,400,000 = 2.4 M

All sorted, no duplicates

image image image image image image image image image image image image image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages