Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spellings and coma #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
This repo contains code to take [hourly page view count data](https://dumps.wikimedia.org/other/pagecounts-raw/)
from Wikipedia, and construct page counts for longer periods using PostgreSQL.

**Note: This requires PostgreSQL 9.5 to take advantage of the new
**Note: This requires PostgreSQL 9.5 to take advantages of the new
[UPSERT](https://wiki.postgresql.org/wiki/UPSERT) feature, or it's just too slow**


Expand All @@ -17,7 +17,7 @@ fun queries can be run against the resulting data.

So far this has been used only to aggregate a single month of Wikipedia logs.

* 64GB of disk space and/or an internet connection fast enough to download 64GB, to store one months
* 64GB of disk space and/or an internet connection fast enough to download 64GB, to one month
worth of logs
* Another 35GB or so of disk space to be used by Postgres
* About a day of processing on a modern quad core CPU
Expand All @@ -29,6 +29,6 @@ So far this has been used only to aggregate a single month of Wikipedia logs.
[here](https://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Georeferenzierung/Hauptseite/Wikipedia-World/en)
3. Gernerate a list of pagecount files to download (currently manual, a sample for Sept 2015 is
included)
4. Read and modify crunch.sh to suit your needs, run it, and wait
4. Read and modify crunch.sh to your needs, run it and wait
5. View the output of top viewed pages with location data that is calculated automatically
6. Run other interesting queries and report back!