Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tags = NULL in middle tables if object doesn't have any tags #2099

Merged
merged 2 commits into from
Oct 31, 2023

Conversation

joto
Copy link
Collaborator

@joto joto commented Oct 28, 2023

This doesn't make much of a difference for the ways and rels table, but if we store all nodes in the database, it does make a huge difference, because most nodes don't have any tags. For a current planet, disk usage for the nodes table goes from 476 GB down to 409 GB saving 67 GB or nearly 15%.

Additionally it makes use of that table simpler. If you want to do any queries on tags, you need an index on the tags column on the nodes/ways/rels tables like this:

CREATE INDEX ON planet_osm_ways USING gin (tags);

But that is wasteful, because of the empty tags. We probably want to generate them as

CREATE INDEX ON planet_osm_ways USING gin (tags) WHERE tags != '{}'::jsonb;

But now all queries on those tables have to include that extra condition so that the query planner will use the index.

SELECT * FROM planet_osm_ways WHERE tags ? 'highway' AND tags != '{}'::jsonb;

If we use NULLs, the index can be created as:

CREATE INDEX ON planet_osm_ways USING gin (tags) WHERE tags IS NOT NULL;

And now the query becomes simpler, because the NOT NULL is automatically taken into account by the query planner:

SELECT * FROM planet_osm_ways WHERE tags ? 'highway';

Note that this is an incompatible change to the new format middle tables, but they are still marked as experimental, so we can do this.

This PR also contains a second commit for future proofing the members list of the rels middle table in case we want to do a similar change for that column in the future.

@joto
Copy link
Collaborator Author

joto commented Oct 29, 2023

The failing test is not related to this PR. I re-run an older run on master and it now fails also. It looks like a bug in the fmt lib or std lib to me. And it only appears when running clang in C++20 mode. We run this test so that if and when we eventually switch to C++20 we can be sure it will work, but that is way off, so I think we can ignore this for now.

joto added 2 commits October 30, 2023 13:57
This doesn't make much of a difference for the ways and rels table, but
if we store all nodes in the database, it does make a huge difference,
because most nodes don't have any tags. For a current planet, disk usage
for the nodes table goes from 476 GB down to 409 GB saving 67 GB or
nearly 15%.

Additionally it makes use of that table simpler. If you want to do any
queries on tags, you need an index on the tags column on the
nodes/ways/rels tables like this:

CREATE INDEX ON planet_osm_ways USING gin (tags);

But that is wasteful, because of the empty tags. We probably want to
generate them as

CREATE INDEX ON planet_osm_ways USING gin (tags) WHERE tags != '{}'::jsonb;

But now all queries on those tables have to include that extra condition
so that the query planner will use the index.

SELECT * FROM planet_osm_ways WHERE tags ? 'highway' AND tags != '{}'::jsonb;

If we use NULLs, the index can be created as:

CREATE INDEX ON planet_osm_ways USING gin (tags) WHERE tags IS NOT NULL;

And now the query becomes simpler, because the NOT NULL is automatically
taken into account by the query planner:

SELECT * FROM planet_osm_ways WHERE tags ? 'highway';

Note that this is an incompatible change to the new format middle
tables, but they are still marked as experimental, so we can do this.
This makes osm2pgsql a bit more future proof by allowing the list of
members (which is encoded as JSON in the new middle format) to be empty,
i.e. to contain NULL. We currently don't write empty member lists as
NULL but as an empty JSON list, but if we change this in the future,
older versions of osm2pgsql will be able to read this correctly.
@joto joto force-pushed the middle-tags-allow-null branch from 4f34ee9 to 5b25afe Compare October 30, 2023 12:57
@lonvia lonvia merged commit d80bbcb into osm2pgsql-dev:master Oct 31, 2023
28 checks passed
@joto joto deleted the middle-tags-allow-null branch November 2, 2023 14:47
@pnorman
Copy link
Collaborator

pnorman commented Nov 3, 2023

Note: The documentation needs updating to reflect this, as it currently states they are NOT NULL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants