Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignore and continue: STOPPED ('The read operation timed out',) #6

Open
drandreaskrueger opened this issue Nov 5, 2017 · 3 comments

Comments

@drandreaskrueger
Copy link

drandreaskrueger commented Nov 5, 2017

EDIT: sorted hahah but the error mentioned at the bottom is still worthwhile looking at, so I rename this to

ignore and continue: STOPPED ('The read operation timed out',)


old:

for each retweet, the image is redownloaded. And some of the #devcon tweets are retweeted dozens of times, so I have tons of duplicate images now.

Do you have any idea how to prevent that? Perhaps keep some sqlite of which tweet's image is already downloaded, and then ignore that image for all of its retweets?

For my one task, all the #devcon3 images, I will now just be patient, and I hope it won't stop prematurely with the error (*) described below; and then I use a photo sorting program to remove duplicates. But for the general usefulness, it would be nice to have a switch -noretweeetimages which does prints the tweet texts, but does not download the image again.

But hey - great tool, I am very very happy.


mentioned error (*)

BobSummerwill: RT @slockitproject: USN Architecture Diagram - want to work with us? [email protected]. #devcon #ethereum #blockchain https://t.co/7hjpKUbjHU
Sun Nov 05 09:12:39 +0000 2017
*** STOPPED ('The read operation timed out',)

probably the gps lookup failed?

in most cases, we would like to (perhaps retry once, then) simply catch-and-ignore that exception; so I am waiting for it to happen again, to see which exact exception to catch: drandreaskrueger@0deb046

@drandreaskrueger
Copy link
Author

hahaha:

[-no_retweets]

usage: SearchOldTweets.py [-h] [-count COUNT] [-location LOCATION]
                          [-oauth FILENAME] [-no_retweets]
                          [-photo_dir DIRECTORYNAME] [-stalk]
                          [-words W [W ...]]

@geduldig
Copy link
Owner

geduldig commented Nov 5, 2017

Hi - Glad you are liking the tool. I'll follow up on your comments and patches in the next couple of days. Keep them coming!
Jonas

@drandreaskrueger drandreaskrueger changed the title (feature request) retweets - do not re-download image - any ideas? ignore and continue: STOPPED ('The read operation timed out',) Nov 5, 2017
@drandreaskrueger
Copy link
Author

drandreaskrueger commented Nov 5, 2017

Great. Thanksl, Jonas.

I have improved it a bit more.

e.g.

-no_images_of_retweets

Also, any download-img or geo-stalk errors are getting ignored now:

https://github.com/drandreaskrueger/TwitterGeoPics/blob/0609dddb751a269efcef520e5e07a3c9cd3cb3a0/TwitterGeoPics/SearchOldTweets.py#L64-L70

not perfect, but a good temporary workaround.

the hashtag #devcon3 has already resulted in 450 images for the past 3 days - so I cannot afford the tool to stop & restart. I rather loose a few pics in between.

Idea: Store tweet IDs, and if a tweet has already been stored, skip it - then the tool could simply be restarted, and just those tweets are added which are not in the database yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants