-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple author institutions lost from works #50
Comments
Hi @zilch42 — thanks for raising this issue. 🌈 You're right: by default, If you'd like the original nested list without any simplification, you could do library(openalexR)
dat <- oa_fetch("W2898962279", output = "list")
do.call(rbind.data.frame, dat$author[[3]]$institutions)
#> id display_name
#> 1 https://openalex.org/I1337719021 Australian Research Council
#> 2 https://openalex.org/I4210161554 CSIRO Land and Water
#> ror country_code type
#> 1 https://ror.org/05mmh0f86 AU government
#> 2 https://ror.org/057xz1h85 AU facility Created on 2022-11-21 with reprex v2.0.2 TLDR; we were intentional in keeping only one institution for each author in the author column of the tibble output. But we would love to hear other ideas on this simplification. 🌱 |
Thanks @trangdata , glad to know there is a method for getting at the data. It would be intuitive in my mind to flatten to the lowest possible level and therefore include duplication rather than drop data. So in the case of the author table I would have expected to see 2 rows for Tim McVicar, each with a different institution, but I can appreciate how that may confuse other folks as you would then have to deduplicate on au_id if authors were what you were interested in. In my personal opinion though, that would still be easier than needing to use a list approach for one case and a tibble approach for another (me being not very familiar with lists 😄). It would be good to have some more detailed documentation, particularly on this page Correct me if I'm wrong, but Example 2 on that page, which is about finding the institutions associated with a group of works, isn't actually correct because it is using the It would be great to have a list of any other tables one needs to be careful with, where records may be dropped using |
Thank you for this explanation @zilch42. 🌻 The purpose of the "About the tibble output" vignette was to show a few different ways for the user to extract the data. But you're right. I have added more information and clarified the assumption we made in 77b37b8. A table of specific simplifications makes sense. I came to this project later and these simplifications were already there, but we should revisit and write up more clearly what is being done in oa2df. 👍🏽 |
Thanks @trangdata. The updated documentation is definitely clearer |
Hi @trangdata, I have modified |
@zilch42 I'm happy to look at what you'd like to change, but I'm not sure if 2 rows for one author in the author column is intuitive. I was hoping to keep only one unique OpenAlex ID for each row. @massimoaria what do you think? |
@trangdata I fully agree with you. |
I would be interested in an option for getting all institution data in works2df like it is mentioned above. I updated works2df in https://github.com/mariusbommert/openalexR/blob/main/R/oa2df.R with 2 additional parameters use_first_institution and use_first_affiliation_string for allowing to get multiple affiliations. If both parameters are TRUE (default) you get the same result as in the original version of works2d. If one or both parameters are FALSE multiple institutions are considered and the corresponding information is available as tibble. There is still only one row per author and only the institution and/or raw_affiliation are changed. Is there any chance that such a feature will be added/merged? |
Hi there,
oa2df()
appears to be dropping subsequent institutional affiliations from authors when returning works.See this example:
https://explore.openalex.org/works/W2898962279
The 3rd Author Tim McVicar is affiliated with both the Australian Research Council and CSIRO Land and Water.
The raw JSON from oa_request() includes both affiliations
(output below is just the relevant subset because it's long)
But when using
oa_fetch()
the flattening process appears to lose CSIRO Land and Water.author table in Rstudio
The text was updated successfully, but these errors were encountered: