This proyect is composed of three applications:
- Hephaestus: Workers and background tasks
- Aphrodite: Web application
- Chaos: Core models and libs
- Aeolus: JS app
- Ruby 2.1.2
- Bundler (
bundler
gem) - Nokogiri dependencies
- MongoDB server
- Redis server
- elasticsearch
- Node.js
- Docsplit
- FreeLing 3.1
- Poppler 0.20+
It's highly recommended that you install a Ruby version manager as RVM or Rbenv.
$ \curl -sSL https://get.rvm.io | bash -s stable --ruby
$ source ~/.bashrc
$ rvm install ruby-2.1.2
$ rvm use 2.1.2 --default
We recommend rvm for installing and managing the
Ruby interpreter and environment. Refer to the installation
page for instructions on installing Ruby 2.1.2
with rvm
.
# apt-get install libxslt-dev libxml2-dev
On Debian / Ubuntu machines, install from the package manager:
# apt-get install mongodb mongodb-server redis-server
You need Java 6 (or newer) to run ElasticSearch. If on Debian / Ubuntu, you can install OpenJDK JRE from the package manager:
# apt-get install openjdk-7-jre
Then, download and install the .deb package:
$ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.3.2.deb
# dpkg -i elasticsearch-1.3.2.deb
Keep in mind that elasticsearch produces a large amount of logs, so it's a good idea to setup a logrotate for this tool. Also, elasticsearch needs to keep lots of files open simultaneously so you'll probably need to run this (for the elasticsearch runner user):
$ ulimint -n 32000
For more information about this issue, please read.
There are alternative downloads here.
$ curl https://raw.github.com/creationix/nvm/v0.3.0/install.sh | sh
$ source ~/.bashrc
$ nvm install 0.10
$ nvm alias default 0.10
Install Docsplit dependencies
# apt-get install -y graphicsmagick poppler-utils poppler-data ghostscript pdftk libreoffice
Detailed dependencies listed in Docsplit documentation.
Download the tarball of Poppler 0.20.1 and extract it somewhere, like /usr/local/src
.
Run apt-get build-dep poppler-utils
to make sure you have all of its
dependencies. Then, just execute ./configure
, make
and make install
as
usual.
The NER module currently uses FreeLing, an open source suite of language analyzers written in C++.
This has been tested on FreeLing 3.1. You can download the source here (147Mb~) and compile it. If you are a happy Ubuntu user, check out this link. You will be able to find .deb easily to use files.
For compiling the source, you need the build-essential
, libboost
and
libicu
libraries. On Debian / Ubuntu machines, you can run:
# apt-get install build-essential libboost-dev libboost-filesystem-dev \
libboost-program-options-dev libboost-regex-dev \
libicu-dev
Then, just execute ./configure
, make
and make install
as usual.
# apt-get install git
$ git clone [email protected]:hhba/mapa76.git
First, run bundle install
to install all gem dependencies.
$ cd mapa76
$ cd aphrodite
$ bundle install
$ cd ../hephaestus
$ bundle install
$ cd ../aeolus
Both, aphrodite and hephaestus are ruby applications and each one of them has their own configuration files. They live in ./config
and they have .yml
extensions. You need to adjust them to your workstation needs. For easy setup, just rename *.yml.sample
to *.yml
.
If the servers will be running on the same machine as Mapa76, you don't need to change anything.
Just follow the instructions here.
You will need to run the aeolus file watcher:
$ cd aeolus
$ grunt w
Fire Rails app:
$ cd aphrodite
$ rails s
To start workers for document processing, you need to run at least one Resque worker:
$ cd hephaestus
$ QUEUE=* bundle exec rake resque:work
you can run multiple workers with the resque:workers
task:
$ COUNT=2 QUEUE=* bundle exec rake resque:workers
And you also need to freeling
as a server. The .sh
file only works in OSX, but it shouldn't be hard to make it work on Ubuntu:
$ cd hephaestus
$ sh ./start-freeling.sh
- Split workers from web app.
- Upgrade to Rails 4.