Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open access #135

Merged
merged 5 commits into from
Jul 21, 2023
Merged

Open access #135

merged 5 commits into from
Jul 21, 2023

Conversation

trangdata
Copy link
Collaborator

@trangdata trangdata commented Jul 20, 2023

Closes #118, #134

@trangdata trangdata requested a review from yjunechoe July 20, 2023 22:28
Copy link
Collaborator

@yjunechoe yjunechoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Works as expected!

@yjunechoe
Copy link
Collaborator

Good to merge as is, but one thing for us to think about is that sometimes the is_oa column doesn't match the open_access$is_oa nested column.

Our hands are of course kinda tied in these cases, but there are some false negatives (is_oa is FALSE but the nested open_access$is_oa column is TRUE) which we could catch and fix (by also setting is_oa to TRUE in those cases):

false_negative_OAs <- c(
  "https://openalex.org/W2165545766",
  "https://openalex.org/W2104143313",
  "https://openalex.org/W1533806699",
  "https://openalex.org/W2005616914",
  "https://openalex.org/W106193068",
  "https://openalex.org/W2626261620",
  "https://openalex.org/W2219574487",
  "https://openalex.org/W4236254570",
  "https://openalex.org/W2800319334",
  "https://openalex.org/W1947610836",
  "https://openalex.org/W3168388584",
  "https://openalex.org/W2325295993",
  "https://openalex.org/W4235001864
")

oa_fetch(identifier = false_negative_OAs) |> 
  dplyr::select(id, is_oa, open_access) |> 
  tidyr::unnest_longer(open_access)
#> # A tibble: 13 × 3
#>    id                               is_oa open_access$is_oa $oa_status $oa_url                                                                                   $any_repository_has_fulltext
#>    <chr>                            <lgl> <lgl>             <chr>      <chr>                                                                                     <lgl>                       
#>  1 https://openalex.org/W2165545766 FALSE TRUE              green      http://www.mapageweb.umontreal.ca/tuitekj/cours/chomsky/Hauser-Chomsky-Fitch.pdf          TRUE                        
#>  2 https://openalex.org/W2104143313 FALSE TRUE              green      https://dash.harvard.edu/bitstream/1/3117935/1/Hauser_EvolutionLanguageFaculty.pdf        TRUE                        
#>  3 https://openalex.org/W1533806699 FALSE TRUE              green      https://dspace.mit.edu/bitstream/1721.1/86586/2/48125267-MIT.pdf                          TRUE                        
#>  4 https://openalex.org/W2005616914 FALSE TRUE              green      https://dspace.mit.edu/bitstream/1721.1/103525/2/10936_2014_9331_ReferencePDF.pdf         TRUE                        
#>  5 https://openalex.org/W106193068  FALSE TRUE              green      http://www.mapageweb.umontreal.ca/tuitekj/cours/chomsky/Hauser-Chomsky-Fitch.pdf          TRUE                        
#>  6 https://openalex.org/W2626261620 FALSE TRUE              green      https://dspace.mit.edu/bitstream/1721.1/128683/2/EveraertTICS2017online.pdf               TRUE                        
#>  7 https://openalex.org/W2219574487 FALSE TRUE              green      https://europepmc.org/articles/pmc2755450?pdf=render                                      TRUE                        
#>  8 https://openalex.org/W4236254570 FALSE TRUE              green      https://dspace.mit.edu/bitstream/1721.1/127796/2/10767_2015_9206_ReferencePDF.pdf         TRUE                        
#>  9 https://openalex.org/W2800319334 FALSE TRUE              green      https://openyls.law.yale.edu/bitstream/20.500.13051/15438/2/51_80YaleLJ1456_June1971_.pdf TRUE                        
#> 10 https://openalex.org/W1947610836 FALSE TRUE              green      http://www.sfu.ca/~jeffpell/papers/Lepore-Pelletier.pdf                                   TRUE                        
#> 11 https://openalex.org/W3168388584 FALSE TRUE              green      http://escholarship.mcgill.ca/downloads/3j333404c                                         TRUE                        
#> 12 https://openalex.org/W2325295993 FALSE TRUE              green      https://journals.openedition.org/lettre-cdf/pdf/898                                       TRUE                        
#> 13 https://openalex.org/W4235001864 FALSE TRUE              green      https://ejop.psychopen.eu/index.php/ejop/article/download/574/574.pdf                     TRUE

@trangdata
Copy link
Collaborator Author

Nice catch, June 👀 . It's interesting that primary_location$is_oa is not the same as open_access$is_oa. So the closest-to-version-of-record copy of this work is not open but some other version of it is open elsewhere.

I think I'll do this: unnest this open_access column to 4 columns, and I'll rename open_access$is_oa to is_oa_anywhere. Another advantage of doing this is bringing the oa_url up one level and make that more obvious to the user. What do you think? @yjunechoe

@yjunechoe
Copy link
Collaborator

I like the is_oa_anywhere renaming/hoisting - good idea!

I'm also on board with bring oa_url up, but curious whether that's guaranteed to be length-1 (I didn't encounter any in my quick search but 👀 ).

In any case, I agree that a top-level oa_url column would be super useful!

@trangdata
Copy link
Collaborator Author

curious whether that's guaranteed to be length-1

You're thorough as always! I just checked and it looks like this is the "best" URL, so we're expecting length-1 for oa_url!
https://docs.openalex.org/api-entities/works/work-object#oa_url

@trangdata
Copy link
Collaborator Author

What it looks like now:

library(openalexR)
false_negative_OAs <- c(
  "https://openalex.org/W2165545766",
  "https://openalex.org/W2104143313",
  "https://openalex.org/W1533806699",
  "https://openalex.org/W2005616914",
  "https://openalex.org/W106193068",
  "https://openalex.org/W2626261620",
  "https://openalex.org/W2219574487",
  "https://openalex.org/W4236254570",
  "https://openalex.org/W2800319334",
  "https://openalex.org/W1947610836",
  "https://openalex.org/W3168388584",
  "https://openalex.org/W2325295993",
  "https://openalex.org/W4235001864
")
oa_fetch(identifier = false_negative_OAs) |> 
  dplyr::select(id, dplyr::contains("oa"), pdf_url)
#> # A tibble: 13 × 6
#>    id                               is_oa is_oa_anywhere oa_sta…¹ oa_url pdf_url
#>    <chr>                            <lgl> <lgl>          <chr>    <chr>  <lgl>  
#>  1 https://openalex.org/W2165545766 FALSE TRUE           green    http:… NA     
#>  2 https://openalex.org/W2104143313 FALSE TRUE           green    https… NA     
#>  3 https://openalex.org/W1533806699 FALSE TRUE           green    https… NA     
#>  4 https://openalex.org/W2005616914 FALSE TRUE           green    https… NA     
#>  5 https://openalex.org/W106193068  FALSE TRUE           green    http:… NA     
#>  6 https://openalex.org/W2626261620 FALSE TRUE           green    https… NA     
#>  7 https://openalex.org/W2219574487 FALSE TRUE           green    https… NA     
#>  8 https://openalex.org/W4236254570 FALSE TRUE           green    https… NA     
#>  9 https://openalex.org/W2800319334 FALSE TRUE           green    https… NA     
#> 10 https://openalex.org/W1947610836 FALSE TRUE           green    http:… NA     
#> 11 https://openalex.org/W3168388584 FALSE TRUE           green    http:… NA     
#> 12 https://openalex.org/W2325295993 FALSE TRUE           green    https… NA     
#> 13 https://openalex.org/W4235001864 FALSE TRUE           green    https… NA     
#> # … with abbreviated variable name ¹​oa_status

Created on 2023-07-21 with reprex v2.0.2

@yjunechoe
Copy link
Collaborator

this is the "best" URL, so we're expecting length-1

Oh this is neat!

That's it from me then - everything looks good!

@trangdata trangdata merged commit b4fe83d into main Jul 21, 2023
9 checks passed
@trangdata trangdata deleted the open-access branch July 21, 2023 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add language and open_access as columns in works2df output
2 participants