Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add grabber for Slovenia #110

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Add grabber for Slovenia #110

wants to merge 14 commits into from

Conversation

sagudev
Copy link

@sagudev sagudev commented Aug 28, 2020

What type of Pull Request is this?

  • adds new functionality
  • fixes/improves existing functionality

Does this PR close any currently open issues?

No

Please explain what this PR does

It add grabber for Slovenia. It takes data from spored.siol.net

Any other information?

No

Where have you tested these changes?

Operating System: Ubuntu 20.04

Perl Version: v5.30.0

@knowledgejunkie
Copy link
Contributor

@sagudev Thank you for your contribution! A couple of questions:

i) is there anything on the siol.net that prohibits using their listings data?

ii) as this would be a new addition to XMLTV, have you considered using ParseOptions (see lib/Options.pm) to manage the grabber at runtime and simplify the structure of the code?

@sagudev
Copy link
Author

sagudev commented Sep 1, 2020

i) is there anything on the siol.net that prohibits using their listings data?

Not that I am aware of. Is there any place I should be looking at (like robots.txt)?

ii) as this would be a new addition to XMLTV, have you considered using ParseOptions (see lib/Options.pm) to manage the grabber at runtime and simplify the structure of the code?

I based my grabber on tv_grab_ch_search, as it was mentioned in last commit, so I thought it uses the new addition to the Xmltv framework (ParseOptions). When I was almost done I realized that it does not use ParseOptions. If I will have time I will rewrite using ParseOptions.

@knowledgejunkie
Copy link
Contributor

i) is there anything on the siol.net that prohibits using their listings data?

Not that I am aware of. Is there any place I should be looking at (like robots.txt)?

I was thinking of the site's Terms and Conditions of use, or any agreement in order to use the site.

ii) as this would be a new addition to XMLTV, have you considered using ParseOptions (see lib/Options.pm) to manage the grabber at runtime and simplify the structure of the code?

I based my grabber on tv_grab_ch_search, as it was mentioned in last commit, so I thought it uses the new addition to the Xmltv framework (ParseOptions). When I was almost done I realized that it does not use ParseOptions. If I will have time I will rewrite using ParseOptions.

No problem. ParseOptions takes care of a lot of things but if the grabber is already done don't worry about taking time to refactor it.

@sagudev
Copy link
Author

sagudev commented Sep 4, 2020

I was thinking of the site's Terms and Conditions of use, or any agreement in order to use the site.

As far as I checked it's all clear. And also webgreb++ is using it as data source.

@garybuhrmaster
Copy link
Contributor

garybuhrmaster commented Sep 4, 2020

i) is there anything on the siol.net that prohibits using their listings data?

Not that I am aware of. Is there any place I should be looking at (like robots.txt)?

While you are likely not a lawyer (and no one here would expect a legal opinion anyway), one commonly should review the sites Terms of Use / Terms of Service (in the native tongue of those terms) to determine if it mentions anything about restricting use only to subscribers while using their website, or the data on the site being copyrighted (i.e. not available for use without permission), or not allowing screen scraping, or allowing only linking (and not retrieval) to the site (there are, of course, many possible restrictions, but you likely get the idea of what to look for). Sometimes the restrictions/requirements are clear, or sometimes the site explicitly allows the data to be accessed, but more commonly the terms are a bit vague, which makes it much more of a judgement call. I would think that doing a good faith review of the terms of service / terms of use is about all one can be expected to perform.

@sagudev
Copy link
Author

sagudev commented Sep 6, 2020

It does not mention any grabing or scraping. Only in article 4, it is stated that

  1. Copyright

Content posted by the service owner is allowed to be reviewed. Content may not be reproduced, modified, transcribed, republished or distributed for either commercial or non-commercial purposes without the express prior written permission of the service owner.
In the event of any permitted use of the content of these pages, all copyright and industrial property rights notices and other notices and warnings must be retained.

so does that mean we can grab data?

@pmhahn
Copy link
Contributor

pmhahn commented Sep 11, 2020

It does not mention any grabing or scraping. Only in article 4, it is stated that

  1. Copyright

Content posted by the service owner is allowed to be reviewed. Content may not be reproduced, modified, transcribed, republished or distributed for either commercial or non-commercial purposes without the express prior written permission of the service owner.
In the event of any permitted use of the content of these pages, all copyright and industrial property rights notices and other notices and warnings must be retained.

so does that mean we can grab data?

My reading of this is no as your grabber is transcoding the Web-Page into some other XMLTV format.

This business model of those pages is mostly to get you there as a person so you see their advertisements. By using a grabber, which filters out those advertisements.

If in doubt ask them directly and get their written permit.

@knowledgejunkie
Copy link
Contributor

It's good to see this grabber being developed - do we have a consensus about whether we can add this grabber to the project?

@sagudev
Copy link
Author

sagudev commented Oct 21, 2020

It's good to see this grabber being developed - do we have a consensus about whether we can add this grabber to the project?

I didn't get any answer. I will still be maintaining grabber for my personal needs.

Lower max retries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants