Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for TinyDB feature provider #1724

Merged
merged 14 commits into from
Jul 16, 2024
Merged

add support for TinyDB feature provider #1724

merged 14 commits into from
Jul 16, 2024

Conversation

tomkralidis
Copy link
Member

Overview

Add support for TinyDB as a feature provider.

Additionally updates CITE test setup and docs.

Related Issue / discussion

Fixes #1723

Additional information

Dependency policy (RFC2)

  • I have ensured that this PR meets RFC2 requirements

Updates to public demo

Contributions and licensing

(as per https://github.com/geopython/pygeoapi/blob/master/CONTRIBUTING.md#contributions-and-licensing)

  • I'd like to contribute [feature X|bugfix Y|docs|something else] to pygeoapi. I confirm that my contributions to pygeoapi will be compatible with the pygeoapi license guidelines at the time of contribution
  • I have already previously agreed to the pygeoapi Contributions and Licensing Guidelines

@tomkralidis tomkralidis added enhancement New feature or request OGC API - Features OGC API - Features labels Jul 15, 2024
@tomkralidis tomkralidis added this to the 0.18.0 milestone Jul 15, 2024
@tomkralidis tomkralidis changed the title add support for TInyDB feature provider add support for TinyDB feature provider Jul 15, 2024
tests/test_tinydb_provider.py Outdated Show resolved Hide resolved
@tomkralidis
Copy link
Member Author

FYI the associated CITE instance PR can be merged if/once this is approved/merge.

Copy link
Member

@justb4 justb4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two main comments:

  • refactor Catalogue-specific statements into subclass TinyDBCatalogueProvider
  • add also unit tests for TinyDBCatalogueProvider

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are quite some if self._catalogue statements in the base class TinyDBProvider, while at the same time there is a subclass TinyDBCatalogueProvider . My suggestion is to refactor those conditionals into the subclass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my original implementation, but I decided to add this conditional (3x) in the code so as not to have to loop after search results which would affect performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you may use overloaded functions and make _excluded a class var extended in. See rough example attached:

tinydb_.py.txt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(maybe not use return values, but like:

    def _add_text_search_field(self, fields):
        fields['q'] = {'type': 'string'}
        // return fields

@tomkralidis
Copy link
Member Author

Two main comments:

* refactor Catalogue-specific statements into subclass `TinyDBCatalogueProvider`

* add also unit tests for `TinyDBCatalogueProvider`

Note that these already exist in tests/test_tinydb_catalogue_provider.py.

@tomkralidis
Copy link
Member Author

@justb4 PR updated, for review.

@tomkralidis tomkralidis requested a review from justb4 July 16, 2024 00:54
Copy link
Member

@justb4 justb4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments: think all will be resolved by simply removing self.fields = self.get_fields() from constructor.

def __init__(self, provider_def):
super().__init__(provider_def)

self._excludes = ['_metadata-anytext']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major: IMO will not have desired effect: self.fields = self.get_fields() is already called in the base class constructor. Or self.fields = self.get_fields() should be called again, but see my suggestion to make _excludes a class var as it is in effect a property of the classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To consider: self.fields is never referenced after assignment so maybe self.fields = self.get_fields() can be removed and the above self._excludes = -statement will be ok. get_fields() is a Provider base.py method and the way to access fields I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove self.fields = self.get_fields() from the constructor then this will affect other parts of pygeoapi.api.itemtypes. Suggest we leave this as is in this PR and address in a subsequent PR later on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then the only remaining is IMO to resolve the proper _excludes.

Copy link
Member

@justb4 justb4 Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you agree that the current order will not work? Hence my suggestion for a class var (see my example) or another solution? i.e. get_fields() will not include self._excludes = ['_metadata-anytext'] as it is set after the parent class constructor. Reversing will also not work, as it will be overridden as self._excludes = [] in base class. (That is why I usually do not recommend too much initialization in a constructor, but explicit init/lifecycle methods.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default TinyDB provider as a result of this PR has an empty _excludes member. The TinyDBCatalogue provider sets _excludes to remove a given property that is a pygeoapi "extra".

Copy link
Member

@webb-ben webb-ben Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw other providers make the call to self.get_fields() at the end of the init method which will either fetch self.fields if set, or set self.fields.

esri.py

sqlite.py

Happy to establish a normative behavior between the two implementations. self.fields = self.get_fields() is present in

postgresql.py

elasticsearch_.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. Suggest we address this as part of another PR

@tomkralidis tomkralidis requested a review from justb4 July 16, 2024 10:12
def __init__(self, provider_def):
super().__init__(provider_def)

self._excludes = ['_metadata-anytext']
Copy link
Member

@justb4 justb4 Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you agree that the current order will not work? Hence my suggestion for a class var (see my example) or another solution? i.e. get_fields() will not include self._excludes = ['_metadata-anytext'] as it is set after the parent class constructor. Reversing will also not work, as it will be overridden as self._excludes = [] in base class. (That is why I usually do not recommend too much initialization in a constructor, but explicit init/lifecycle methods.)

@tomkralidis
Copy link
Member Author

@justb4 this order works and accomplishes the need to remove the specified fields from catalogue responses.

@justb4
Copy link
Member

justb4 commented Jul 16, 2024

@justb4 this order works and accomplishes the need to remove the specified fields from catalogue responses.

Hmm, I still don't think so. Here is a similar example I used to investigate, showing both the instance and class var:

class Base:
    # class variable
    class_var = ['Base']
    def __init__(self):
        self.exclude = 'Exclude by Base'
        print('Parent init')
        self.show_info()
    def show_info(self):
        # Accessing class variable
        print("Base: self.exclude=", self.exclude)
        print("Base: Base.class_var=", Base.class_var)

class Derived(Base):

    def __init__(self):
        print('Derived init')
        Base.class_var.append('Derived')
        super().__init__()
        self.exclude = 'Exclude by Derived'

    def show_info(self):
        # Accessing class variable
        super().show_info()
        print("Derived: self.exclude=", self.exclude)
        print("Derived: Base.class_var=", Base.class_var)

derived = Derived()

self.exclude = 'Exclude by Derived' is called after the Base constructor. Guess the outcome...

@tomkralidis
Copy link
Member Author

@justb4 feel free to test the branch with/without the exclude in the TinyDBCatalogue class?

@justb4
Copy link
Member

justb4 commented Jul 16, 2024

Ok, all tests pass. But makes sense as p.get_fields() is called. If I make

def test_query(config):
    p = TinyDBCatalogueProvider(config)

    fields = p.fields

as it results from constructor, the field is not excluded and test fails:

(pygeoapi-3.10.12) just@savu:~/project/pygeoapi/pygeoapi.git$ pytest tests/test_tinydb_catalogue_provider.py 
========================================================= test session starts =========================================================
platform darwin -- Python 3.10.12, pytest-8.2.2, pluggy-1.5.0
rootdir: /Users/just/project/pygeoapi/pygeoapi.git
configfile: pytest.ini
plugins: Faker-26.0.0, cov-5.0.0, env-1.1.3, anyio-4.4.0
collected 5 items                                                                                                                     

tests/test_tinydb_catalogue_provider.py F....                                                                                   [100%]

============================================================== FAILURES ===============================================================
_____________________________________________________________ test_query ______________________________________________________________

config = {'data': PosixPath('/private/var/folders/89/ggkklkwd5d7g9sw3pfbqwcvm0000gn/T/pytest-of-just/pytest-1/test_query0/sample-records.tinydb'), 'id_field': 'externalId', 'name': 'TinyDBCatalogue', 'time_field': 'created', ...}

    def test_query(config):
        p = TinyDBCatalogueProvider(config)
    
        fields = p.fields
>       assert len(fields) == 9
E       AssertionError: assert 10 == 9
E        +  where 10 = len({'_metadata-anytext': {'type': 'string'}, 'created': {'format': 'date-time', 'type': 'string'}, 'description': {'type': 'string'}, 'externalIds': {'type': 'string'}, ...})

tests/test_tinydb_catalogue_provider.py:97: AssertionError
======================================================= short test summary info =======================================================
FAILED tests/test_tinydb_catalogue_provider.py::test_query - AssertionError: assert 10 == 9
===================================================== 1 failed, 4 passed in 0.10s =====================================================
(py`

'_metadata-anytext' is not excluded... as can be expected.

@justb4
Copy link
Member

justb4 commented Jul 16, 2024

By using the class var, and an extra test, all is now well. Shall I commit/push my version on this branch?

@tomkralidis
Copy link
Member Author

What if we normalized all access via get_fields(), and remove the .fields property? Again, this would be part of another PR.

@justb4
Copy link
Member

justb4 commented Jul 16, 2024

Function access is always to be preferred over exposing internal instance vars.

But harder to assess (performance) impact. As for some Providers it may mean accessing a remote source like a DB or even WFS. But then again get_fields() may smart-cache. Could even be implemented once in Base class e.g. with @property ...like inBaseProvider:

@property
def fields(self):
  if (not self._fields) {
    self._fields = self.get_fields()
  return self._fields

@tomkralidis
Copy link
Member Author

tomkralidis commented Jul 16, 2024

\@Property
def fields(self):
return self.get_fields()

I guess you mean:

@property
def fields(self):
    return self.get_fields()

Suggest we address as part of another PR (given the scope of this PR is to expand TinyDB support). While we've identified some inconsistency/issue with .fields/get_fields(), this is better addressed in a separate PR given it will be a substantial change across numerous feature/record providers and their calling code.

@justb4 works for you?

@justb4
Copy link
Member

justb4 commented Jul 16, 2024

Ok, yes, but in my code I meant to cache the fields on first access (or leave that to each provider). Still in the current state TinyDBCatalogueProvider self.fields is incorrect. But don't know how much impact that has.

@tomkralidis
Copy link
Member Author

ll in the current state TinyDBCatalogueProvider self.fields is incorrect. But don't know how much impact that has.

I think there is no impact in the context of the codepath and how the plugin is implemented.

Copy link
Member

@justb4 justb4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we'll fix 'excluds' issue later. For sake of progress.

@tomkralidis tomkralidis merged commit d9adbbd into master Jul 16, 2024
9 checks passed
@tomkralidis tomkralidis deleted the tinydb-features branch July 16, 2024 15:41
Copy link
Contributor

@doublebyte1 doublebyte1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request OGC API - Features OGC API - Features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add TinyDB feature provider
4 participants