Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraping Pinterest? #25

Open
venuv opened this issue Oct 18, 2019 · 11 comments
Open

Scraping Pinterest? #25

venuv opened this issue Oct 18, 2019 · 11 comments

Comments

@venuv
Copy link

venuv commented Oct 18, 2019

First of all, nicely done (good docs, easy install, works like a charm). Would find pinterest scraping quite useful ..

@cwerner
Copy link
Owner

cwerner commented Oct 18, 2019

Hi.

Thanks for using the tool. I have not had the use case myself, but if there are easy options to include this - why not? However, the tool piggy-backs on icrawler which only deals with Google, Bing and Baidu.
Would you know of the required tools for this?

@venuv
Copy link
Author

venuv commented Oct 18, 2019

I could manually search for the Pinterest board of relevance to the keyword(s) I'm searching (say on 'recliners') and put the url in the excel file. Then fastclass could crawl that using a capability such as https://github.com/xjdeng/pinterest-image-scraper

@cwerner
Copy link
Owner

cwerner commented Oct 18, 2019

Interesting.

I quickly peeked into the mentioned package and unfortunately this requires selenium which is a pretty hefty burden for such a small package.
I will think about it. Would be happy to look at a PR though if you want to give it a shot...

@oezeadi
Copy link

oezeadi commented Mar 30, 2020

How can I use the tool to image process a of my own images in a folder? I can't seem to get it to work (I'm new)

@cwerner
Copy link
Owner

cwerner commented Mar 30, 2020

Hi @oezeadi ,

Not quite sure what you are trying to do? If you want to classify a bunch of images you use the fc_clean command...

@oezeadi
Copy link

oezeadi commented Mar 30, 2020

I have a folder of my own images and I'd like to process that folder as I would using the fcd after it crawls..

@cwerner
Copy link
Owner

cwerner commented Mar 30, 2020

Let me get back to you tomorrow... I’ll have a look at it 👨‍💻

@oezeadi
Copy link

oezeadi commented Mar 31, 2020

ok thanks, can't wait!

@cwerner
Copy link
Owner

cwerner commented Mar 31, 2020

Ok @oezeadi

I have some unfinished changes to fcc (there're also tk ui bugs I want to fix), but I just gave the current GitHub status a spin and I get the result that I expect...

So if you want to assign images to certain classes you start the command with

fcc folder/where/your/files/are

Then you use the numbers 1-9 to assign a class number to the image. You will automatically move to the next after press... If you need to change the assignment you can use the left/ right arrow keys. Once done you save the assignment by pressing the X key.
You should get a report file in the folder which has the filename and the assigned class in there.

Is this what you are seeing, too?

@oezeadi
Copy link

oezeadi commented Apr 5, 2020

Hey, so I guess my first question should have been: does your program actually do other types of image processing (resizing, checking for right number of channels, remove duplicates, etc)? That is what I assumed fcd was doing after it crawled for images.

If yes to above, then does the new fcc you updated do these checks too? I don't know how to tell if a batch of images have been properly "checked/process"...

@cwerner
Copy link
Owner

cwerner commented Apr 5, 2020

Ah ok.

Right, this would make sense. I anticipated fcc to weed through the download and basically mark bad or misclassified ones (so run after fcd).
I’ll have a look how to share these checks between the scripts. A bit busy next week but hope to update soon. However, if you have specific needs/ ideas I would also appreciate a PR 😉

C

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants