Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix _get_dialects - properties #1859

Merged
merged 3 commits into from
Jan 23, 2025
Merged

Conversation

markus-96
Copy link
Contributor

Description

properties in _db_{dialect} classes were not taken into account.

Motivation and Context

see #1858

How Has This Been Tested?

make build test_sqlite test_asyncpg

also, I am not able to test against an actual oracle database, and as far as I can tell is that it is currently not tested at all in CICD. So please, someone check it for me? :)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added the changelog accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Copy link

codspeed-hq bot commented Jan 22, 2025

CodSpeed Performance Report

Merging #1859 will not alter performance

Comparing markus-96:get_dialects (6ff7cee) with develop (948ccdb)

Summary

✅ 8 untouched benchmarks

@coveralls
Copy link

coveralls commented Jan 22, 2025

Pull Request Test Coverage Report for Build 12924178121

Details

  • 8 of 8 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.002%) to 89.423%

Totals Coverage Status
Change from base Build 12906318884: 0.002%
Covered Lines: 6494
Relevant Lines: 7079

💛 - Coveralls

@markus-96
Copy link
Contributor Author

the following get_for_dialect is like 100 times faster:

    def get_for_dialect(self, dialect: str, key: str) -> Any:
        """
        Returns a field by dialect override.

        :param dialect: The requested SQL Dialect.
        :param key: The attribute/method name.
        """
        cls = getattr(self, f"_db_{dialect}", None)
        value = getattr(cls, key, None)
        if value is None:
            v = getattr(self, key, None)
            if isinstance(v, property):
                return getattr(self, key)
            return v
        elif isinstance(value, property):
            return getattr(cls(self), key)
        return value

with the optimized version of get_for_dialect, get_db_field_types can also be optimized like 40%:

    def get_db_field_types(self) -> Optional[dict[str, str]]:
        """
        Returns the DB types for this field.

        :return: A dictionary that is keyed by dialect.
            A blank dialect `""` means it is the default DB field type.
        """
        if not self.has_db_field:  # pragma: nocoverage
            return None
        default = getattr(self, "SQL_TYPE")
        return {
            "": getattr(self, "SQL_TYPE"),
            **{
                dialect: sql_type
                for dialect, sql_type in (
                    (key[4:], self.get_for_dialect(key[4:], "SQL_TYPE"))
                    for key in dir(self)
                    if key.startswith("_db_")
                )
                if sql_type is not None and sql_type != default
            },
        }

should I include it in this PR? It is not much, but maybe for large models the startup time could be increased by like one second

**Full example**

{func}_alternative are the optimized versions...

import time

from tortoise.fields import Field


class MyField(Field):
    @property
    def MY_PROPERTY(self):
        return f"hi from {self.__class__.__name__}!"

    OTHER_PROPERTY = "something else"

    class _db_sqlite:
        def __init__(self, field: "Field"):
            self.field = field

        @property
        def MY_PROPERTY(self):
            return f"hi from {self.__class__.__name__} of {self.field.__class__.__name__}!"

    class _db_oracle:
        MY_PROPERTY = "who uses oracle?"


if __name__ == "__main__":
    print(MyField().get_for_dialect("sqlite", "MY_PROPERTY"))
    print(MyField().get_for_dialect_alternative("sqlite", "MY_PROPERTY"))
    print(MyField().get_for_dialect("postgres", "MY_PROPERTY"))
    print(MyField().get_for_dialect_alternative("postgres", "MY_PROPERTY"))
    print(MyField().get_for_dialect("oracle", "MY_PROPERTY"))
    print(MyField().get_for_dialect_alternative("oracle", "MY_PROPERTY"))
    print(MyField().get_for_dialect("sqlite", "OTHER_PROPERTY"))
    print(MyField().get_for_dialect_alternative("sqlite", "OTHER_PROPERTY"))

    start_time = time.time()
    for i in range(1000):
        MyField().get_for_dialect("sqlite", "MY_PROPERTY")
        MyField().get_for_dialect("postgres", "MY_PROPERTY")
        MyField().get_for_dialect("oracle", "MY_PROPERTY")
        MyField().get_for_dialect("sqlite", "OTHER_PROPERTY")
    end_time = time.time()
    duration = end_time - start_time
    print(f"Time before: {duration}")

    start_time = time.time()
    for i in range(1000):
        MyField().get_for_dialect_alternative("sqlite", "MY_PROPERTY")
        MyField().get_for_dialect_alternative("postgres", "MY_PROPERTY")
        MyField().get_for_dialect_alternative("oracle", "MY_PROPERTY")
        MyField().get_for_dialect_alternative("sqlite", "OTHER_PROPERTY")
    end_time = time.time()
    duration = end_time - start_time
    print(f"Time after: {duration}")

    start_time = time.time()
    for i in range(1000):
        MyField().get_db_field_types()
    end_time = time.time()
    duration = end_time - start_time
    print(f"Time before: {duration}")

    start_time = time.time()
    for i in range(1000):
        MyField().get_db_field_types_alternative()
    end_time = time.time()
    duration = end_time - start_time
    print(f"Time after: {duration}")

output:

hi from _db_sqlite of MyField!
hi from _db_sqlite of MyField!
hi from MyField!
hi from MyField!
who uses oracle?
who uses oracle?
something else
something else
Time before: 0.10262465476989746
Time after: 0.003044605255126953
Time before: 0.025957345962524414
Time after: 0.015998125076293945

@waketzheng
Copy link
Contributor

the following get_for_dialect is like 100 times faster:

    def get_for_dialect(self, dialect: str, key: str) -> Any:
        """
        Returns a field by dialect override.

        :param dialect: The requested SQL Dialect.
        :param key: The attribute/method name.
        """
        cls = getattr(self, f"_db_{dialect}", None)
        value = getattr(cls, key, None)
        if value is None:
            v = getattr(self, key, None)
            if isinstance(v, property):
                return getattr(self, key)
            return v
        elif isinstance(value, property):
            return getattr(cls(self), key)
        return value

with the optimized version of get_for_dialect, get_db_field_types can also be optimized like 40%:

    def get_db_field_types(self) -> Optional[dict[str, str]]:
        """
        Returns the DB types for this field.

        :return: A dictionary that is keyed by dialect.
            A blank dialect `""` means it is the default DB field type.
        """
        if not self.has_db_field:  # pragma: nocoverage
            return None
        default = getattr(self, "SQL_TYPE")
        return {
            "": getattr(self, "SQL_TYPE"),
            **{
                dialect: sql_type
                for dialect, sql_type in (
                    (key[4:], self.get_for_dialect(key[4:], "SQL_TYPE"))
                    for key in dir(self)
                    if key.startswith("_db_")
                )
                if sql_type is not None and sql_type != default
            },
        }

should I include it in this PR? It is not much, but maybe for large models the startup time could be increased by like one second

Full example
{func}_alternative are the optimized versions...

import time

from tortoise.fields import Field


class MyField(Field):
    @property
    def MY_PROPERTY(self):
        return f"hi from {self.__class__.__name__}!"

    OTHER_PROPERTY = "something else"

    class _db_sqlite:
        def __init__(self, field: "Field"):
            self.field = field

        @property
        def MY_PROPERTY(self):
            return f"hi from {self.__class__.__name__} of {self.field.__class__.__name__}!"

    class _db_oracle:
        MY_PROPERTY = "who uses oracle?"


if __name__ == "__main__":
    print(MyField().get_for_dialect("sqlite", "MY_PROPERTY"))
    print(MyField().get_for_dialect_alternative("sqlite", "MY_PROPERTY"))
    print(MyField().get_for_dialect("postgres", "MY_PROPERTY"))
    print(MyField().get_for_dialect_alternative("postgres", "MY_PROPERTY"))
    print(MyField().get_for_dialect("oracle", "MY_PROPERTY"))
    print(MyField().get_for_dialect_alternative("oracle", "MY_PROPERTY"))
    print(MyField().get_for_dialect("sqlite", "OTHER_PROPERTY"))
    print(MyField().get_for_dialect_alternative("sqlite", "OTHER_PROPERTY"))

    start_time = time.time()
    for i in range(1000):
        MyField().get_for_dialect("sqlite", "MY_PROPERTY")
        MyField().get_for_dialect("postgres", "MY_PROPERTY")
        MyField().get_for_dialect("oracle", "MY_PROPERTY")
        MyField().get_for_dialect("sqlite", "OTHER_PROPERTY")
    end_time = time.time()
    duration = end_time - start_time
    print(f"Time before: {duration}")

    start_time = time.time()
    for i in range(1000):
        MyField().get_for_dialect_alternative("sqlite", "MY_PROPERTY")
        MyField().get_for_dialect_alternative("postgres", "MY_PROPERTY")
        MyField().get_for_dialect_alternative("oracle", "MY_PROPERTY")
        MyField().get_for_dialect_alternative("sqlite", "OTHER_PROPERTY")
    end_time = time.time()
    duration = end_time - start_time
    print(f"Time after: {duration}")

    start_time = time.time()
    for i in range(1000):
        MyField().get_db_field_types()
    end_time = time.time()
    duration = end_time - start_time
    print(f"Time before: {duration}")

    start_time = time.time()
    for i in range(1000):
        MyField().get_db_field_types_alternative()
    end_time = time.time()
    duration = end_time - start_time
    print(f"Time after: {duration}")

output:

hi from _db_sqlite of MyField!
hi from _db_sqlite of MyField!
hi from MyField!
hi from MyField!
who uses oracle?
who uses oracle?
something else
something else
Time before: 0.10262465476989746
Time after: 0.003044605255126953
Time before: 0.025957345962524414
Time after: 0.015998125076293945

Would be better to start a new PR.

@@ -15,6 +15,7 @@ Fixed
^^^^^
- Rename pypika to pypika_tortoise for fixing package name conflict (#1829)
- Concurrent connection pool initialization (#1825)
- `_get_dialects`: support properties (#1859)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more explicit to use the issue title: _get_dialects not working for properties

@waketzheng
Copy link
Contributor

@markus-96 I updated the _get_dialects function to make it more readable.

@waketzheng waketzheng mentioned this pull request Jan 23, 2025
@henadzit
Copy link
Contributor

@markus-96

the following get_for_dialect is like 100 times faster:

A PR would be appreciated. We use Codspeed for benchmarking and it would be great if your changes are measurable with benchmarks. You can find the benchmarks here. But your changes won't necessary be reflected in the existing benchmarks, so you might need a separate PR to add a benchmark first. You can run the benchmarks locally too but the results might be a bit inconsistent: pytest tests/benchmarks --codspeed.

@henadzit henadzit merged commit 415d1ae into tortoise:develop Jan 23, 2025
9 checks passed
@markus-96
Copy link
Contributor Author

Thanks for reviewing @henadzit and @waketzheng !

@markus-96 I updated the _get_dialects function to make it more readable.

there might be the usecase where you want to declare something as instance attribute in the init of a dialect class, this is not covered in this case. like this:

class MyField(Field):
    OTHER_PROPERTY = "something"

    class _db_sqlite:
        def __init__(self, field: "Field"):
            self.field = field
            self.OTHER_PROPERTY = "something else"

@waketzheng
Copy link
Contributor

@markus-96 It is recommended to use like this:

class MyField(Field):
    OTHER_PROPERTY = "something"

    class _db_sqlite:
        OTHER_PROPERTY = "something else"

        def __init__(self, field: "Field"):
            self.field = field

@markus-96
Copy link
Contributor Author

I know, I just wanted to point this out in case someone stumbles across this in a few years or so ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants