Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented autocomplete endpoint and added documentation #33

Merged
merged 8 commits into from
Mar 25, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 41 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ The following features of OpenAlex are currently supported by PyAlex:
- [x] Select fields
- [x] Sample
- [x] Pagination
- [ ] [Autocomplete endpoint](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/autocomplete-entities)
- [x] Autocomplete endpoint
- [x] N-grams
- [x] Authentication

Expand All @@ -47,10 +47,10 @@ pip install pyalex

## Getting started

PyAlex offers support for all [Entity Objects](https://docs.openalex.org/api-entities/entities-overview): [Works](https://docs.openalex.org/api-entities/works), [Authors](https://docs.openalex.org/api-entities/authors), [Sources](https://docs.openalex.org/api-entities/sourcese), [Institutions](https://docs.openalex.org/api-entities/institutions), [Concepts](https://docs.openalex.org/api-entities/concepts), [Publishers](https://docs.openalex.org/api-entities/publishers), and [Funders](https://docs.openalex.org/api-entities/funders).
PyAlex offers support for all [Entity Objects](https://docs.openalex.org/api-entities/entities-overview): [Works](https://docs.openalex.org/api-entities/works), [Authors](https://docs.openalex.org/api-entities/authors), [Sources](https://docs.openalex.org/api-entities/sourcese), [Institutions](https://docs.openalex.org/api-entities/institutions), [Concepts](https://docs.openalex.org/api-entities/concepts), [Publishers](https://docs.openalex.org/api-entities/publishers), [Funders](https://docs.openalex.org/api-entities/funders), and [Autocompletes](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/autocomplete-entities).

```python
from pyalex import Works, Authors, Sources, Institutions, Concepts, Publishers, Funders
from pyalex import Works, Authors, Sources, Institutions, Concepts, Publishers, Funders, Autocompletes
```

### The polite pool
Expand All @@ -65,6 +65,18 @@ import pyalex
pyalex.config.email = "[email protected]"
```

### Max retries

By default, PyAlex will raise an error at the first failure when querying the OpenAlex API. You can set `max_retries` to a number higher than 0 to allow PyAlex to retry when an error occurs. `retry_backoff_factor` is related to the delay between two retry, and `retry_http_codes` are the HTTP error codes that should trigger a retry.

```python
from pyalex import config

config.max_retries = 0
config.retry_backoff_factor = 0.1
config.retry_http_codes = [429, 500, 503]
```

### Get single entity

Get a single Work, Author, Source, Institution, Concept, Publisher or Funder from OpenAlex by the
Expand Down Expand Up @@ -307,6 +319,32 @@ for page in pager:
```


### Autocomplete

OpenAlex reference: [Autocomplete entities](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/autocomplete-entities).

Autocomplete a string:
```python
from pyalex import Autocompletes

Autocompletes()["stockholm"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from pyalex import Autocompletes
Autocompletes()["stockholm"]
from pyalex import autocomplete
autocomplete("stockholm")

My suggestion would be to rewrite it into this as autocomplete is not a separate resource (although the OpenAlex API design might let users think this).

With this change, we reserve the item retrieval for identifiers.

```

Autocomplete a string to get a specific type of entities:
```python
from pyalex import Institutions

Institutions().autocomplete("stockholm")
```

You can also use the filters to autocomplete:
```python
from pyalex import Works

Works().filter(cited_by_count=">1000", publication_year=2009).autocomplete("planetary boundaries")
J535D165 marked this conversation as resolved.
Show resolved Hide resolved
```


### Get N-grams

OpenAlex reference: [Get N-grams](https://docs.openalex.org/api-entities/works/get-n-grams).
Expand Down
4 changes: 4 additions & 0 deletions pyalex/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@

from pyalex.api import Author
from pyalex.api import Authors
from pyalex.api import Autocomplete
from pyalex.api import Autocompletes
from pyalex.api import Concept
from pyalex.api import Concepts
from pyalex.api import Funder
Expand All @@ -31,6 +33,8 @@
"Work",
"Authors",
"Author",
"Autocomplete",
"Autocompletes",
"Sources",
"Source",
"Funder",
Expand Down
41 changes: 35 additions & 6 deletions pyalex/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,8 +169,12 @@ def __init__(self, params=None):
def _get_multi_items(self, record_list):
return self.filter(openalex_id="|".join(record_list)).get()

def _full_collection_name(self):
return config.openalex_url + "/" + self.__class__.__name__.lower()
def _full_collection_name(self, autocomplete=False):
if autocomplete:
base_url = config.openalex_url + "/autocomplete/"
return base_url + self.__class__.__name__.lower()
else:
return config.openalex_url + "/" + self.__class__.__name__.lower()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify this and keep it unchanged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could replace if autocomplete: by if 'q' in self.params.keys():


def __getattr__(self, key):
if key == "groupby":
Expand All @@ -197,7 +201,13 @@ def __getitem__(self, record_id):
@property
def url(self):
if not self.params:
return self._full_collection_name()
return self._full_collection_name(autocomplete=False)

if 'q' in self.params.keys():
# if q is in params, then it means that an autocomplete query was asked
autocomplete = True
else:
autocomplete = False
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And delete this part


l_params = []
for k, v in self.params.items():
Expand All @@ -212,9 +222,10 @@ def url(self):
l_params.append(k + "=" + quote_plus(str(v)))

if l_params:
return self._full_collection_name() + "?" + "&".join(l_params)
return self._full_collection_name(
autocomplete=autocomplete) + "?" + "&".join(l_params)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove autocomplete=autocomplete


return self._full_collection_name()
return self._full_collection_name(autocomplete=autocomplete)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here


def count(self):
_, m = self.get(return_meta=True, per_page=1)
Expand Down Expand Up @@ -264,7 +275,6 @@ def get(self, return_meta=False, page=None, per_page=None, cursor=None):
self._add_params("per-page", per_page)
self._add_params("page", page)
self._add_params("cursor", cursor)

return self._get_from_url(self.url, return_meta=return_meta)

def paginate(self, method="cursor", page=1, per_page=None, cursor="*", n_max=10000):
Expand Down Expand Up @@ -321,6 +331,11 @@ def select(self, s):
self._add_params("select", s)
return self

def autocomplete(self, s, **kwargs):
""" autocomplete the string s, for a specific type of entity """
self._add_params("q", s)
return self.get(**kwargs)
Comment on lines +349 to +352
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion would be to implement it like this:

Suggested change
def autocomplete(self, s, **kwargs):
""" autocomplete the string s, for a specific type of entity """
self._add_params("q", s)
return self.get(**kwargs)
def autocomplete(self, q, return_meta=False):
"""Autocomplete query q for entity"""
self._add_params("q", q)
# manipulate URL for autocomplete
url_split = urlsplit(self.url)
new_url = urlunsplit(
(
url_split.scheme,
url_split.netloc,
f"/autocomplete{url_split.path}",
url_split.query,
url_split.fragment,
)
)
return self._get_from_url(
new_url, return_meta=return_meta, resource_class=Autocomplete
)

Also, it's not the nicest option, but it keeps the autocomplete functionality separated from everything else. We don't need the changes above. As I don't think this was the most intuitive implementation OpenAlex chose, we might benefit from keeping it separate from the rest of the class.

I changed _get_from_url to accept an extra argument.



# The API

Expand Down Expand Up @@ -401,6 +416,20 @@ class Funders(BaseOpenAlex):
resource_class = Funder


class Autocomplete(OpenAlexEntity):
pass


class Autocompletes(BaseOpenAlex):
""" Class to autocomplete without being based on the type of entity """
resource_class = Autocomplete

def __getitem__(self, key):
return self._get_from_url(
config.openalex_url + "/autocomplete" + "?q=" + key, return_meta=False
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we can make this into something like this:

def autocomplete(q, return_meta=False):
    BaseOpenAlex().autocomplete(q, return_meta=return_meta)

I expect this to require some changes to the base class as well.



def Venue(*args, **kwargs): # deprecated
# warn about deprecation
warnings.warn(
Expand Down