wiki-talk-parser

This little program can:

Parse wikipedia dump files (xml) to wiki-talk networks. Original wikipedia UIDs are remained.
"Shrink" the resulting network, so it is an unweighted directed network w/o loops, like in the SNAP wiki-Talk dataset.
Group users according to their roles.

Usage with stu

Use stu for easy lives. The only file you need is main.stu. Simply type in stu or:

$ nohup stu -k -j 3 &

Stu will automatically start downloading this program and the datasets, then parsing. The parameter -j defines the number of jobs that will run in parallel. For downloading, more than 3 is not recommended.

Usage without stu

Installation

Manually download the latest jar files.

Parse

$ java -jar parser.jar *input-file* *lang* > *output-file*

Shrink

$ java -jar shrinker.jar *input-file* > *output-file*

Group users

$ java -jar grouper.jar *input-file* > *output-file*

Compilation

$ lein with-profile parser:shrinker:grouper uberjar

License

Distributed under the Eclipse Public License either version 1.0 or any later version.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
KONECT		KONECT
desc		desc
doc		doc
src		src
test/wiki_talk_parser		test/wiki_talk_parser
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
main.stu		main.stu
project.clj		project.clj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wiki-talk-parser

Usage with stu

Usage without stu

Installation

Parse

Shrink

Group users

Compilation

License

About

Releases 1

Packages

Languages

License

yfiua/wiki-talk-parser

Folders and files

Latest commit

History

Repository files navigation

wiki-talk-parser

Usage with stu

Usage without stu

Installation

Parse

Shrink

Group users

Compilation

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages