Skip to content
This repository has been archived by the owner on Nov 10, 2024. It is now read-only.

save_as_csv not working: "data frame still contains recursive columns!" #759

Closed
ccamara opened this issue Jan 23, 2023 · 5 comments
Closed

Comments

@ccamara
Copy link

ccamara commented Jan 23, 2023

Problem

I've retreived some tweets and want to store it in a csv file. save_as_csv() and write_as_csv() would fail stating that:

> save_as_csv(rt, "rt.csv")
Error in `vectbl_as_row_location()`:
! Can't subset rows with `i`.
✖ Logical subscript `i` must be size 1 or 997, not 43.
Run `rlang::last_error()` to see where the error occurred.
Warning message:
In flatten(x) : data frame still contains recursive columns!

Expected behavior

I'd like my dataframe to be flattened and stored in a csv file.

Reproduce the problem

library(rtweet)

# Autenticate
# auth_setup_default() # No longer working.
auth <- rtweet_app()

rt <- search_tweets("#rstats", n = 1000, include_rts = FALSE, token = auth)

write_as_csv(rt, "rt.csv")

Same goes if I simply try to write a subset of data (i.e. save_as_csv(head(rt), "rt.csv")) or if I do another query, so I don't think is about the specific data on my data frame.

rtweet version

## copy/paste output
packageVersion("rtweet")

‘1.1.0’

Session info

> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS/LAPACK: /opt/conda/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] rtweet_1.1.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10       compiler_4.1.3    pillar_1.8.1      later_1.3.0       prettyunits_1.1.1 tools_4.1.3       progress_1.2.2   
 [8] digest_0.6.31     bit_4.0.5         jsonlite_1.8.4    evaluate_0.20     lifecycle_1.0.3   tibble_3.1.8      pkgconfig_2.0.3  
[15] rlang_1.0.6       cli_3.6.0         curl_5.0.0        yaml_2.3.6        xfun_0.36         fastmap_1.1.0     withr_2.5.0      
[22] httr_1.4.4        knitr_1.41        vctrs_0.5.2       askpass_1.1       hms_1.1.2         bit64_4.0.5       glue_1.6.2       
[29] R6_2.5.1          fansi_1.0.4       rmarkdown_2.20    magrittr_2.0.3    promises_1.2.0.1  ellipsis_0.3.2    htmltools_0.5.4  
[36] renv_0.16.0       httpuv_1.6.8      utf8_1.2.2        openssl_2.0.5     crayon_1.5.2    
@llrs
Copy link
Collaborator

llrs commented Jan 23, 2023

Hi, I assume you tried that several times because a warning or an error should have explained that this function no longer works with the latest output of the rtweet functions.

The reason this fails and I didn't decide to flatten the information is explained in this post (basically I found no good way to flatten the nested data in a tweet). There is a specific section explaining how to save the data. You should decide how to flatten your dataset, but I recommend to save it via saveRDS(rt, "rt.RDS").

@ccamara
Copy link
Author

ccamara commented Jan 23, 2023

Thanks for your reply, @llrs

Yes, you're right: I've tried several times before and I got that warning. I was confused, though, by the message warning and thought there was a typo or some error in the writing. After all, if the functions are not longer working for 1.x, why is this function included in that version?

But then, I tried finding info about saving data and didn't find anything on the docs, so thanks for pointing out to this blog post.

I did save the dataframe as a RDS object, but this only solves my problem partially, as I wanted to store it into a CSV so I could read it from a different software later.

Because of this and the error in the authentication (#756 ) I may revert back to 0.7.0

@ccamara ccamara closed this as completed Jan 23, 2023
@llrs
Copy link
Collaborator

llrs commented Jan 23, 2023

write_as_csv is still there because there is a function in rtweet that allows to read a csv and revert to the old structure read_twitter_csv. You might edit or filter the data and save back to csv. I will remove it next release. Thanks for the hint!

To save it as csv you'll need to decide what to do with the user data, and all the data about entities that each tweet provides. I was thinking for 6 months for such way last time I updated rtweet and I didn't arrive to any good solution or got any proposal. If you find a way to save the output of the function as a flat csv that makes sense I will gladly incorporate it to rtweet and restore the functionality of save_as_csv and read_twitter_csv for the current output.

The authentication error, as far as I know, is a problem on the Twitter side: some users report it works while some others it doesn't. But we can discuss this in the other issue if you have more information. I hope you have a great analysis (by the way, there is a newer version of rtweet with some bugs fixed, you might want to update if you want to keep using rtweet > 0.7 with less bugs).

@ccamara
Copy link
Author

ccamara commented Jan 23, 2023

thank you so much, Lluís! I have little time for this project, hence my intention to reverting back to the version I used to work with before and used to work for me in previous code, but I'll have to reconsider the strategy and maybe deciding on how to flatten the dataframe. In my case, probably storing multiple values separated by some delimitator, or store values as a vector that I can then expand if needed. but problably none of those approaches make sense, as I do not intend to solve in a minute what you couldn't in six months.

@llrs
Copy link
Collaborator

llrs commented Jan 23, 2023

Oh, understood, familiarity counts.
You might solve the problem for yourself, as in you will keep the fields you need/want (probably text, id_str of the tweet, id_str of the user, and username) but me as a package maintainer I cannot guess how best to organize 2 urls and 3 media in a tweet while keeping all the information.
Best of luck with your project! You might want to submit it to rOpenSci use cases

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants