Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Draft) ScienceDirect: Object Retrieval API #353

Closed
nils-herrmann opened this issue Sep 3, 2024 · 7 comments
Closed

(Draft) ScienceDirect: Object Retrieval API #353

nils-herrmann opened this issue Sep 3, 2024 · 7 comments
Assignees

Comments

@nils-herrmann
Copy link
Collaborator

nils-herrmann commented Sep 3, 2024

Research for the implementation of the ScienceDirect: Object Retrieval API. The description of the API states: These interfaces represent retrieval of objects associated with a full text article. This resource can also return reference details for an individual object or an entire full text article. The reference metadata response will contain links to the associated Full-Text article.

@nils-herrmann
Copy link
Collaborator Author

nils-herrmann commented Sep 3, 2024

The Object Retrieval API returns different things depending on the query:

  • Object references metadata of a document
  • A specific object
  • A thumbnail image
  • A regular-sized image
  • A high resolution image

I think the best choice is to represent the object references metadata of a document as a property and then allow through functions to get specific objects. Here is an example:

  1. All the object references metadata of a document are retrieved at initialisation
objects = ObjectRetrieval('S0360131524001623', refresh=True)
objects.object_references
[{'url': 'https://api.elsevier.com/content/object/eid/1-s2.0-S0360131524001623-gr1.jpg?httpAccept=%2A%2F%2A',
  'eid': '1-s2.0-S0360131524001623-gr1.jpg',
  'ref': 'gr1',
  'filename': 'gr1.jpg',
  'mimetype': 'image/jpeg',
  'size': '135574',
  'height': '346',
  'width': '491',
  'type': 'IMAGE-DOWNSAMPLED'},
 {'url': 'https://api.elsevier.com/content/object/eid/1-s2.0-S0360131524001623-gr2.jpg?httpAccept=%2A%2F%2A',
  'eid': '1-s2.0-S0360131524001623-gr2.jpg',
  'ref': 'gr2',
  'filename': 'gr2.jpg',
  'mimetype': 'image/jpeg',
  'size': '129935',
  'height': '365',
  'width': '624',
  'type': 'IMAGE-DOWNSAMPLED'}
  ...]
  1. Specific objects can be queried with a function
gr1 = objects.get_specific_object('gr1')
image = Image.open(gr1)
display(image)

image

Seems pretty cool that the library will also allow users to access more than text (images, videos, excel sheets, word documents ). What do you think @Michael-E-Rose ?

nils-herrmann added a commit to nils-herrmann/pybliometrics that referenced this issue Sep 5, 2024
@Michael-E-Rose
Copy link
Contributor

We had such a case before, with the SerialSearch() and the SerialTitle() classes which both access the Serial Title API. That API has a search part and a retrieval part, so we created two classes. I'm thinking about the same for this class. However, we might put the three image retrieval classes together and handle the quality of the image via view. Could you please check whether the return values of the three image access points are (almost) the same?

@nils-herrmann
Copy link
Collaborator Author

nils-herrmann commented Sep 9, 2024

Implementing two classes (1 for metadata and 1 for the objects) is a good idea.

Regarding the images:

  • There are 3 image views (STANDARD,THUMBNAIL , HIGH)
  • STANDARD is always available
  • THUMBNAIL , HIGH are not always available

@nils-herrmann
Copy link
Collaborator Author

nils-herrmann commented Sep 9, 2024

(Update with answers) For sake of documenting: I'm having some trouble with the retrieval of .svg objects. In the
file attached the problem is exemplified with two questions:

  1. Why do I get a 404 when retrieving .svg objects?

It is an elsevier error. The solution is to query with a view (id/{id}/ref/{ref}/{view})

  1. Why does the metadata returns the mime type 'image/svg+xml' although it cannot be found on the documentation.

SVG is extensible, conformant "image/svg+xml" processors must expect that content received is well-formed XML, but it cannot be guaranteed that the content is valid to a particular DTD or Schema or that the processor will recognize all of the elements and attributes in the document.

nils-herrmann added a commit to nils-herrmann/pybliometrics that referenced this issue Sep 10, 2024
@nils-herrmann nils-herrmann changed the title ScienceDirect: Object Retrieval API (Documentation) ScienceDirect: Object Retrieval API Sep 11, 2024
@nils-herrmann nils-herrmann changed the title (Documentation) ScienceDirect: Object Retrieval API (Draft) ScienceDirect: Object Retrieval API Sep 11, 2024
@Michael-E-Rose
Copy link
Contributor

Scopus doesn't maintain its documentation well. There are things in the API but not in the documentation, and vice versa. So, don't investigate the issue too much; pybliometrics can go without svg if needed.

@nils-herrmann
Copy link
Collaborator Author

I contacted Elsevier's Data Support Team and they clarified the issue. The answers are documented above.

@nils-herrmann
Copy link
Collaborator Author

This draft resulted in two issues: #355 and #360

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants