-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getpapers has many fewer hits than EUPMC interface #140
Comments
I'm having a similar problem. I'm only getting around 10 papers when using EUPMC 'AUTH:' query when I know I should be getting nearly 60 papers. |
We get our results directly from the EUPMC API, so this sounds like an API bug. @tarrow can you follow up with EUPMC? |
as scientists and as developers it is imperative to keep an open mind. Pure logic suggests that only the following can be inferred with certainty:
Being in favour of either inference, without checking the facts, is detrimental to both research and development. I say this because I have the strong suspicion that it is rather the EUPMC online interface that has a bug - or a feature, depending on interpretation. :-) Let's do a EUPMC search with an undocumented search feature (undocumented in the API, but fully working, both online and in the API): search by SUBJECT. NOTE: Subject areas, as they are called by PLOS, are documented here: Search PLOS by Subject Areas OnlineOpen http://journals.plos.org/plosone/search make sure you click on 'Advanced Search' and then click on 'All Fields' and select 'Subject'. Type 'Algebra' as the subject. I have checked that this is a valid 'subarea' (I have looked at the HTML code of http://journals.plos.org/plosone/browse/mathematics?resultView=list - if you are the 'GUI type', you can just click on the unintuitive 'down arrow' (looks like an inverted ^) on the black navigation bar...). You get a page saying "1,208 results for subject:algebra" (notice the notation - that's undocumented!), all finely listed in 81 pages of 15 results each: http://journals.plos.org/plosone/search?q=subject%3AAlgebra&page=1 EUPMC API (undocumented)Gathering hope from the above, let's try the undocumented SUBJECT: XXX method in the API using getpapers:
197 results for algebra ONLY? This cannot be explained solely on the grounds of the open/closed access dichotomy - so who is right? Well, I did NOT check ALL 1000+ results after result no. 197 in the online output - but the few I checked convinced me that everything after result 200+ in the online output is unrelated to algebra, even in the most relaxed version of "relation"! It's all biology - and no algebra at all. The right results, the results that seem to implement what is said in Search PLOS by Subject Areas, are the ones served by the API! My explanation for this discrepancy is that the online interface adds results from a different source to the ones returned by the exact query, possibly in an attempt to increase what search specialists call recall - losing what they call precision... So what do we learn from this? Three things:
|
For the sake of completeness: this issue is a duplicate of #95 |
@petermr , I start to realize that, if the method I found to search by (sub)category/(sub) subject area is really new, it opens many new possibilities. This is so, because these categories/subject areas implement a kind of semantic ontology - see Search PLOS by Subject Areas to see how they were created. You might want to experiment with SUBJECT:XXX and blog about it to spread the word. ;-) |
Searching for subject 'mathematical economics' gives a different picture that seems to contradict my above observations. With getpapers:
I get exactly one result:
which is definitely NOT a mathematical economics paper. On the other side, trying the online interface at http://journals.plos.org/plosone/browse/mathematical_economics brings up 31 results which DO look like mathematical economics papers! So - again - who is right? The API, or the online interface? At this point, I have only theories to offer - theories that need experimental evidence:
|
It seems that Europe PMC (EUPMC) has listened to complaints about sudden API changes and has modified its procedures. I have just stumbled upon the EUPMC SOAP Web Service Reference Guide. There, in Introduction (p. 6 of the document, p.7 of the PDF file), it says:
You can thus
|
from a correspondent:
I installed ContentMine on my Mac laptop. I tried to do content mine to my research topic – “Postdoc career outcome”. I was able to get 78 open access full-text papers. See the logs of “getpapers” output below,
I did the same search through “Europe PMC” web interface. I got total 297 results, in which 296 are full-text articles and 172 are open-access articles. See the screenshot below,
My questions are:
Thank you so much for developing this great open-source software! I’m looking forward to hearing from you soon.
The text was updated successfully, but these errors were encountered: