Add grabber for Slovenia #110

sagudev · 2020-08-28T08:36:35Z

What type of Pull Request is this?

adds new functionality
fixes/improves existing functionality

Does this PR close any currently open issues?

No

Please explain what this PR does

It add grabber for Slovenia. It takes data from spored.siol.net

Any other information?

No

Where have you tested these changes?

Operating System: Ubuntu 20.04

Perl Version: v5.30.0

knowledgejunkie · 2020-08-31T20:25:42Z

@sagudev Thank you for your contribution! A couple of questions:

i) is there anything on the siol.net that prohibits using their listings data?

ii) as this would be a new addition to XMLTV, have you considered using ParseOptions (see lib/Options.pm) to manage the grabber at runtime and simplify the structure of the code?

sagudev · 2020-09-01T04:32:34Z

i) is there anything on the siol.net that prohibits using their listings data?

Not that I am aware of. Is there any place I should be looking at (like robots.txt)?

ii) as this would be a new addition to XMLTV, have you considered using ParseOptions (see lib/Options.pm) to manage the grabber at runtime and simplify the structure of the code?

I based my grabber on tv_grab_ch_search, as it was mentioned in last commit, so I thought it uses the new addition to the Xmltv framework (ParseOptions). When I was almost done I realized that it does not use ParseOptions. If I will have time I will rewrite using ParseOptions.

knowledgejunkie · 2020-09-01T20:31:40Z

i) is there anything on the siol.net that prohibits using their listings data?

Not that I am aware of. Is there any place I should be looking at (like robots.txt)?

I was thinking of the site's Terms and Conditions of use, or any agreement in order to use the site.

ii) as this would be a new addition to XMLTV, have you considered using ParseOptions (see lib/Options.pm) to manage the grabber at runtime and simplify the structure of the code?

I based my grabber on tv_grab_ch_search, as it was mentioned in last commit, so I thought it uses the new addition to the Xmltv framework (ParseOptions). When I was almost done I realized that it does not use ParseOptions. If I will have time I will rewrite using ParseOptions.

No problem. ParseOptions takes care of a lot of things but if the grabber is already done don't worry about taking time to refactor it.

sagudev · 2020-09-04T14:59:20Z

I was thinking of the site's Terms and Conditions of use, or any agreement in order to use the site.

As far as I checked it's all clear. And also webgreb++ is using it as data source.

garybuhrmaster · 2020-09-04T16:44:54Z

i) is there anything on the siol.net that prohibits using their listings data?

Not that I am aware of. Is there any place I should be looking at (like robots.txt)?

While you are likely not a lawyer (and no one here would expect a legal opinion anyway), one commonly should review the sites Terms of Use / Terms of Service (in the native tongue of those terms) to determine if it mentions anything about restricting use only to subscribers while using their website, or the data on the site being copyrighted (i.e. not available for use without permission), or not allowing screen scraping, or allowing only linking (and not retrieval) to the site (there are, of course, many possible restrictions, but you likely get the idea of what to look for). Sometimes the restrictions/requirements are clear, or sometimes the site explicitly allows the data to be accessed, but more commonly the terms are a bit vague, which makes it much more of a judgement call. I would think that doing a good faith review of the terms of service / terms of use is about all one can be expected to perform.

sagudev · 2020-09-06T09:20:41Z

It does not mention any grabing or scraping. Only in article 4, it is stated that

Copyright

Content posted by the service owner is allowed to be reviewed. Content may not be reproduced, modified, transcribed, republished or distributed for either commercial or non-commercial purposes without the express prior written permission of the service owner.
In the event of any permitted use of the content of these pages, all copyright and industrial property rights notices and other notices and warnings must be retained.

so does that mean we can grab data?

pmhahn · 2020-09-11T09:21:18Z

It does not mention any grabing or scraping. Only in article 4, it is stated that

Copyright

Content posted by the service owner is allowed to be reviewed. Content may not be reproduced, modified, transcribed, republished or distributed for either commercial or non-commercial purposes without the express prior written permission of the service owner.
In the event of any permitted use of the content of these pages, all copyright and industrial property rights notices and other notices and warnings must be retained.

so does that mean we can grab data?

My reading of this is no as your grabber is transcoding the Web-Page into some other XMLTV format.

This business model of those pages is mostly to get you there as a person so you see their advertisements. By using a grabber, which filters out those advertisements.

If in doubt ask them directly and get their written permit.

knowledgejunkie · 2020-10-20T20:32:58Z

It's good to see this grabber being developed - do we have a consensus about whether we can add this grabber to the project?

sagudev · 2020-10-21T14:29:15Z

It's good to see this grabber being developed - do we have a consensus about whether we can add this grabber to the project?

I didn't get any answer. I will still be maintaining grabber for my personal needs.

Lower max retries

sagudev added 2 commits August 28, 2020 10:24

Added tv_grab_si for grabing data for Slovenia

f7615c2

Added Slovenia to QuickStart

91500c6

tv_grab_si now uses ParseOptions

30008ad

Minor fixes

7507857

sagudev and others added 9 commits September 12, 2020 09:15

Fix for website errors

d33282b

Season fix

cd5e445

Fix date looping

acd6bdd

dbg

56acd8a

Quiet dbg

0af2f80

Fix: Can't call method "extract_links" on an undefined value

d9a2b7f

die if 10

98161fe

Fix nested whiles

854d74a

Do not die on max retry, but skip.

e8d0a78

Update tv_grab_si.in

9c1943a

Lower max retries

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add grabber for Slovenia #110

Add grabber for Slovenia #110

sagudev commented Aug 28, 2020 •

edited

Loading

knowledgejunkie commented Aug 31, 2020

sagudev commented Sep 1, 2020

knowledgejunkie commented Sep 1, 2020

sagudev commented Sep 4, 2020

garybuhrmaster commented Sep 4, 2020 •

edited

Loading

sagudev commented Sep 6, 2020

pmhahn commented Sep 11, 2020

knowledgejunkie commented Oct 20, 2020

sagudev commented Oct 21, 2020

Add grabber for Slovenia #110

Are you sure you want to change the base?

Add grabber for Slovenia #110

Conversation

sagudev commented Aug 28, 2020 • edited Loading

What type of Pull Request is this?

Does this PR close any currently open issues?

Please explain what this PR does

Any other information?

Where have you tested these changes?

knowledgejunkie commented Aug 31, 2020

sagudev commented Sep 1, 2020

knowledgejunkie commented Sep 1, 2020

sagudev commented Sep 4, 2020

garybuhrmaster commented Sep 4, 2020 • edited Loading

sagudev commented Sep 6, 2020

pmhahn commented Sep 11, 2020

knowledgejunkie commented Oct 20, 2020

sagudev commented Oct 21, 2020

sagudev commented Aug 28, 2020 •

edited

Loading

garybuhrmaster commented Sep 4, 2020 •

edited

Loading