Fetchino is a lightweight web scraper, helping you get structured information from websites. It uses configuration files to describe how the data you're looking for can be retrieved and makes it super easy to access the data in an API-like way.
The following configuration will tell Fetchino to open the imdb.com page for "The Lord of the Rings: The Fellowship of the Ring" and find all actors.
<?xml version="1.0" encoding="UTF-8"?>
<config>
<data>
<list name="actors" type="string" />
</data>
<workflow>
<request url="http://www.imdb.com/title/tt0120737/fullcredits" />
<addToList list="actors" path="//table[@class='cast_list']/tbody/tr[@class]/td[2]/a" />
</workflow>
</config>
The names of the actors are stored in a list "actors" which can be accessed as shown below:
Fetchino fetchino = Fetchino.fromConfig("./LordOfTheRingsActors.xml"));
fetchino.fetch();
fetchino.getContext().getList("actors").forEach(System.out::println);