This little program can:
- Parse wikipedia dump files (xml) to wiki-talk networks. Original wikipedia UIDs are remained.
- "Shrink" the resulting network, so it is an unweighted directed network w/o loops, like in the SNAP wiki-Talk dataset.
- Group users according to their roles.
Use stu for easy lives. The only file you need is main.stu. Simply type in stu
or:
$ nohup stu -k -j 3 &
Stu will automatically start downloading this program and the datasets, then parsing.
The parameter -j
defines the number of jobs that will run in parallel.
For downloading, more than 3 is not recommended.
Manually download the latest jar files.
$ java -jar parser.jar *input-file* *lang* > *output-file*
$ java -jar shrinker.jar *input-file* > *output-file*
$ java -jar grouper.jar *input-file* > *output-file*
$ lein with-profile parser:shrinker:grouper uberjar
Copyright © 2023 Yfiua
Distributed under the Eclipse Public License either version 1.0 or any later version.