Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mertaytore authored Dec 24, 2016
1 parent 802a26c commit 97cb503
Showing 1 changed file with 23 additions and 0 deletions.
23 changes: 23 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,25 @@
# IMDb-degrees-of-separation
Degrees of separation applied to IMDb film dataset.

This repository implements the idea of [Six degrees of separation](https://en.wikipedia.org/wiki/Six_degrees_of_separation). The idea is implemented with IMDb's [datasets](http://www.imdb.com/interfaces)

- The dataset is processed through a MapReduce algorithm to have a meaningful data.
- IMDb_adjustment/trim_dataset.sh was ran to get the data needed for MapReduce steps.
- Next, we processed the dataset in two steps of MapReduce, /HADOOP/try.jar and /HADOOP/try2.jar.
- Having obtained and output from second MapReduce algorithm, to remove duplicated (if by chance any) IMDb_adjustment/rm_duplicates.py was ran. This step gives us a graph that has every actors' friends that has took a role in the same movie.
- Lastly, resulting graph is inserted into an ANF Algorithm to find the average degrees of separation among all actors.

## Requirements
* [Hadoop](https://github.com/apache/hadoop) for MapReduce
* Python 3.x for duplicate removal
* Gephi - graph visuals

## A graph visualization with ~9000 nodes (actors)
![9k nodes](https://github.com/mertaytore/IMDb-degrees-of-separation/blob/master/9k_nodes.jpg "A visualization with 9k nodes")

### Questions?
Reach out to us!
- :octocat: @mertaytore
- :octocat: @gokhansim
- :octocat: @ebsenol
- :octocat: @orhca

0 comments on commit 97cb503

Please sign in to comment.