Favara

A simple and easy to use crawler for web sources (fb, twitter, nodebb, etc)

favara is a Siculo-Arabic word meaning: water source. The Siculo-Arabic language is dead now (IX-XIV century), but we believe the word favara sounds great and its meaning really reflects the purpose of the project.

What it does:

crawls posts and events from several sources, inserting them into a database

Supported sources:

only FB at the moment

How to use favara

Requirements

A recent version of ruby and rubygems installed.
A Postgres database, where Favara will put the crawled data.

Installing

Clone this repo
Install any dependencies via $ bundle install
Configure the database - you can override the settings in database.yml using the following env variables
- FAVARA_DB_ADAPTER
- FAVARA_DB_ENCODING
- FAVARA_DB_POOL
- FAVARA_DB_USERNAME
- FAVARA_DB_PASSWORD
- FAVARA_DB_HOST
- FAVARA_DB_DATABASE
Configure the sources - config.yml

You will then have to make a choice regarding the ownership of the database tables favara uses:

I want favara to put its contents into some existing tables

If you want to run Favara with isamuni, then let isamuni create the tables for you, no other configuration is required.
You can edit the models in the models folder to reflect your tables' structure.

I want favara to create and manage its tables

You can ask favara to create the required tables via rake create_tables.
If you run a rails app, you can generate a new migration and then copy the contents of migrations/001_init.rb inside of it.
You also manually create the required tables by yourself referring to migrations/001_init.rb.

Running

Run favara issuing rake favara to crawl only the latest contents
Run rake "favara[true]" to crawl all posts from all sources
Run clockwork clock.rb to leave favara running, and automatically crawl the latest posts at regular intervals (the default configurtation runs a complete crawling between 11pm and 5am).

Make a custom crawler

Favara is designed to import the crawled contents into a database. If that doesn't suit your needs, feel free to copy the files in crawlers/lib/* containing the database-independent logic and use them as any other ruby library.

Basic testing

We also provide a very thin Sinatra webservice. This is not supposed to be used in production, but it may come in handy for testing or diagnostic. To run it, simply run ruby server.rb, then point your browser to localhost:4567.

You can check the crawled events under /events and posts under /posts

Use cases

Isamuni

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
crawlers		crawlers
db		db
models		models
Dockerfile		Dockerfile
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE		LICENSE
README.md		README.md
Rakefile		Rakefile
clock.rb		clock.rb
config.yml		config.yml
database.yml		database.yml
docker-compose.yml		docker-compose.yml
server.rb		server.rb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Favara

What it does:

Supported sources:

How to use favara

Requirements

Installing

I want favara to put its contents into some existing tables

I want favara to create and manage its tables

Running

Make a custom crawler

Basic testing

Use cases

About

Releases

Packages

Languages

License

sic2/favara

Folders and files

Latest commit

History

Repository files navigation

Favara

What it does:

Supported sources:

How to use favara

Requirements

Installing

I want favara to put its contents into some existing tables

I want favara to create and manage its tables

Running

Make a custom crawler

Basic testing

Use cases

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages