Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ Remove usage of db.getCommittedOpVersion() #657

Closed
wants to merge 1 commit into from

Conversation

alecgibson
Copy link
Collaborator

The DB.getCommitedOpVersion() function is only used in one place: in a corner case that differentiates between:

  • a create op that conflicts with another create op
  • a create op that has been re-submitted because the connection was closed before the ack was received

The first of these cases should result in an error, and the second is non-fatal and the error can be swallowed.

At the moment, this differentiation is made using the special DB.getCommittedOpVersion() function, which - given an op with a src and seq combination - will return null if the op hasn't been committed, or its version number if it has.

If the op has a committed version number, we know that the submit is a duplicate, and we can safely ignore it.

This approach is problematic, because:

This change proposes that we actually don't need this function at all, and implements an alternative approach to differentiate between the above cases.

Instead of having to fetch all ops, we just fetch a single op, by version number, which is already indexed anyway. We then check to see if this retrieved op is our creation op.

You can't do this in the general case, because an op's version number may be changed between submit and commit because of conflicts with remote ops, which cause it to be transformed up, and its version number incremented.

However, the create op is a special case, which cannot be transformed against any other op, which means that the version number it's submitted with must be the version number it's committed with.

We can therefore use this special property of the create op to safely retrieve itself from the data store by version if it has been committed.

The `DB.getCommitedOpVersion()` function is only used [in one place][1]:
in a corner case that differentiates between:

 - a create op that conflicts with another create op
 - a create op that has been re-submitted because the connection was
   closed before the ack was received

The first of these cases should result in an error, and the second is
non-fatal and the error can be swallowed.

At the moment, this differentiation is made using the special
`DB.getCommittedOpVersion()` function, which - given an op with a
`src` and `seq` combination - will return `null` if the op hasn't been
committed, or its version number if it has.

If the op has a committed version number, we know that the submit is a
duplicate, and we can safely ignore it.

This approach is problematic, because:

 - it [needs a whole op index][2] just to handle this niche corner case
 - the default behaviour [fetches **all** ops][3] unless the driver
   overrides this behaviour

This change proposes that we actually don't need this function
**at all**, and implements an alternative approach to differentiate
between the above cases.

Instead of having to fetch all ops, we just fetch a *single* op, by
version number, which [is already indexed][4] anyway. We then check to
see if this retrieved op is our creation op.

You can't do this in the general case, because an op's version number
may be changed between submit and commit because of conflicts with
remote ops, which cause it to be transformed up, and its version number
incremented.

**However**, the create op is a **special case**, which
[**cannot** be transformed against **any** other op][5], which means
that the version number it's submitted with **must** be the version
number it's committed with.

We can therefore use this special property of the create op to safely
retrieve itself from the data store by version if it has been committed.

[1]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/lib/submit-request.js#L112
[2]: share/sharedb-mongo#94
[3]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/lib/db/index.js#L69
[4]: https://github.com/share/sharedb-mongo/blob/7e88b0a23e968e3672ee151bf6abaf3bfdf62484/index.js#L401
[5]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/test/ot.js#L152-L162
@alecgibson alecgibson requested review from nateps and ericyhwang May 17, 2024 14:52
@alecgibson
Copy link
Collaborator Author

@nateps I know you're busy, but could you please sense-check this change and the assumptions that underpin it? It would be good to know if I'm missing something here?

@coveralls
Copy link

Coverage Status

coverage: 97.24% (-0.1%) from 97.353%
when pulling d8ba01c on no-commited-op
into 7b20313 on master.

@alecgibson
Copy link
Collaborator Author

NB: this actually increases coverage in submit-request.js; the overall drop is because we're now not covering getCommittedOpVersion(), which I'd like to deprecate/remove in a future change anyway.

alecgibson added a commit to share/sharedb-mongo that referenced this pull request May 20, 2024
- Fixes #94
- Follows on from share/sharedb#657

This change is to work alongside upstream work to remove ShareDB's
usage of `getCommittedOpVersion()`.

In `sharedb-mongo` this function requires an entire extra op index,
just to handle a corner case where two `create` ops are submitted at the
same time, which should happen relatively infrequently.

This change allows consumers to opt out of individual indexes. For
example, if we're not using `getCommittedOpVersion()`, consumers can
opt out of the automatic `src`/`seq`/`v` index creation with:

```js
new ShareDbMongo(
  mongoUrl,
  {
    disableIndexCreation: {
      src_seq_v: true,
    },
  },
);
```

Note that if this index already exists, consumers will need to manually
remove it.

Previous behaviour will still work, so setting:

```js
new ShareDbMongo(
  mongoUrl,
  {
    disableIndexCreation: true,
  },
);
```

disables **all** index creation.
@alecgibson
Copy link
Collaborator Author

Okay as discussed, I'm an idiot, and the op.v in this if check is actually null (should add a test for conflicting creates with non-zero version number after a del).

Will instead try a new strategy: add src/seq/v of the op that created a snapshot to snapshot.m, so we can quickly check ops against the snapshot if the op is the op that created the snapshot or not. We then fall back to existing getCommittedOpVersion() behaviour, but we should rarely (if ever) hit that condition, so consumers could opt out of the expensive extra src+seq+v index.

@alecgibson alecgibson closed this May 21, 2024
alecgibson added a commit that referenced this pull request May 22, 2024
We have [logic][1] that tries to differentiate between:

 - a create op that is incorrectly trying to create a `Doc` that already
   exists
 - a create op that has already been committed, and has been resubmitted
   (possibly because the connection closed before the ack was received)

The latter case is [not currently covered by tests][2].

This change adds a test to cover this branch, and a second similar test
which explains why #657 was not a
viable change (and correctly fails for that case).

[1]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/lib/submit-request.js#L105-L121
[2]: https://coveralls.io/builds/67552613/source?filename=lib%2Fsubmit-request.js#L117
alecgibson added a commit that referenced this pull request May 22, 2024
Supersedes #657

The `DB.getCommitedOpVersion()` function is only used [in one place][1]:
in a corner case that differentiates between two "blind", versionless
(created without first fetching) create op conflict cases:

 - a versionless create op that conflicts with another create op
 - a versionless create op that has been re-submitted because the
   connection was closed before the ack was received

The first of these cases should result in an error, and the second is
non-fatal and the error can be swallowed.

At the moment, this differentiation is made using the special
`DB.getCommittedOpVersion()` function, which - given an op with a
`src` and `seq` combination - will return `null` if the op hasn't been
committed, or its version number if it has.

If the op has a committed version number, we know that the submit is a
duplicate, and we can safely ignore it.

This approach is problematic, because:

 - it [needs a whole op index][2] just to handle this niche corner case
 - the default behaviour [fetches **all** ops][3] unless the driver
   overrides this behaviour

This change proposes that we actually don't need this function in most
cases, and implements an alternative approach to differentiate between
the above cases.

When creating an snapshot, we store the `src`, `seq` and `v` of the op
that was used to create it.

If a conflicting `create` is received, we can then compare directly with
this metadata, without needing to fetch anything extra from the
database.

This should work in the majority of cases, but won't work if the
metadata is missing, which could happen if:

 - the snapshot is "old": it was created before this change
 - the driver doesn't support metadata (eg [Postgres][4])

In the case where metadata is unavailable, we fall back to the existing
method, using `getCommittedOpVersion()`. However, if drivers support
metadata, this should happen sufficiently infrequently that consumers
could potentially remove the `src`/`seq`/`v` index with no noticeable
performance penalty.

[1]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/lib/submit-request.js#L112
[2]: share/sharedb-mongo#94
[3]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/lib/db/index.js#L69
[4]: https://github.com/share/sharedb-postgres/blob/499702acd478645bcc249fa50ba6fc066d257d04/index.js#L140
alecgibson added a commit that referenced this pull request May 23, 2024
Supersedes #657

The `DB.getCommitedOpVersion()` function is only used [in one place][1]:
in a corner case that differentiates between two "blind", versionless
(created without first fetching) create op conflict cases:

 - a versionless create op that conflicts with another create op
 - a versionless create op that has been re-submitted because the
   connection was closed before the ack was received

The first of these cases should result in an error, and the second is
non-fatal and the error can be swallowed.

At the moment, this differentiation is made using the special
`DB.getCommittedOpVersion()` function, which - given an op with a
`src` and `seq` combination - will return `null` if the op hasn't been
committed, or its version number if it has.

If the op has a committed version number, we know that the submit is a
duplicate, and we can safely ignore it.

This approach is problematic, because:

 - it [needs a whole op index][2] just to handle this niche corner case
 - the default behaviour [fetches **all** ops][3] unless the driver
   overrides this behaviour

This change proposes that we actually don't need this function in most
cases, and implements an alternative approach to differentiate between
the above cases.

When creating an snapshot, we store the `src`, `seq` and `v` of the op
that was used to create it.

If a conflicting `create` is received, we can then compare directly with
this metadata, without needing to fetch anything extra from the
database.

This should work in the majority of cases, but won't work if the
metadata is missing, which could happen if:

 - the snapshot is "old": it was created before this change
 - the driver doesn't support metadata (eg [Postgres][4])

In the case where metadata is unavailable, we fall back to the existing
method, using `getCommittedOpVersion()`. However, if drivers support
metadata, this should happen sufficiently infrequently that consumers
could potentially remove the `src`/`seq`/`v` index with no noticeable
performance penalty.

[1]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/lib/submit-request.js#L112
[2]: share/sharedb-mongo#94
[3]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/lib/db/index.js#L69
[4]: https://github.com/share/sharedb-postgres/blob/499702acd478645bcc249fa50ba6fc066d257d04/index.js#L140
alecgibson added a commit that referenced this pull request May 23, 2024
Supersedes #657

The `DB.getCommitedOpVersion()` function is only used [in one place][1]:
in a corner case that differentiates between two "blind", versionless
(created without first fetching) create op conflict cases:

 - a versionless create op that conflicts with another create op
 - a versionless create op that has been re-submitted because the
   connection was closed before the ack was received

The first of these cases should result in an error, and the second is
non-fatal and the error can be swallowed.

At the moment, this differentiation is made using the special
`DB.getCommittedOpVersion()` function, which - given an op with a
`src` and `seq` combination - will return `null` if the op hasn't been
committed, or its version number if it has.

If the op has a committed version number, we know that the submit is a
duplicate, and we can safely ignore it.

This approach is problematic, because:

 - it [needs a whole op index][2] just to handle this niche corner case
 - the default behaviour [fetches **all** ops][3] unless the driver
   overrides this behaviour

This change proposes that we actually don't need this function in most
cases, and implements an alternative approach to differentiate between
the above cases.

When creating an snapshot, we store the `src`, `seq` and `v` of the op
that was used to create it.

If a conflicting `create` is received, we can then compare directly with
this metadata, without needing to fetch anything extra from the
database.

This should work in the majority of cases, but won't work if the
metadata is missing, which could happen if:

 - the snapshot is "old": it was created before this change
 - the driver doesn't support metadata (eg [Postgres][4])

In the case where metadata is unavailable, we fall back to the existing
method, using `getCommittedOpVersion()`. However, if drivers support
metadata, this should happen sufficiently infrequently that consumers
could potentially remove the `src`/`seq`/`v` index with no noticeable
performance penalty.

[1]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/lib/submit-request.js#L112
[2]: share/sharedb-mongo#94
[3]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/lib/db/index.js#L69
[4]: https://github.com/share/sharedb-postgres/blob/499702acd478645bcc249fa50ba6fc066d257d04/index.js#L140
alecgibson added a commit that referenced this pull request May 28, 2024
Supersedes #657

The `DB.getCommitedOpVersion()` function is only used [in one place][1]:
in a corner case that differentiates between two "blind", versionless
(created without first fetching) create op conflict cases:

 - a versionless create op that conflicts with another create op
 - a versionless create op that has been re-submitted because the
   connection was closed before the ack was received

The first of these cases should result in an error, and the second is
non-fatal and the error can be swallowed.

At the moment, this differentiation is made using the special
`DB.getCommittedOpVersion()` function, which - given an op with a
`src` and `seq` combination - will return `null` if the op hasn't been
committed, or its version number if it has.

If the op has a committed version number, we know that the submit is a
duplicate, and we can safely ignore it.

This approach is problematic, because:

 - it [needs a whole op index][2] just to handle this niche corner case
 - the default behaviour [fetches **all** ops][3] unless the driver
   overrides this behaviour

This change proposes that we actually don't need this function in most
cases, and implements an alternative approach to differentiate between
the above cases.

When creating an snapshot, we store the `src`, `seq` and `v` of the op
that was used to create it.

If a conflicting `create` is received, we can then compare directly with
this metadata, without needing to fetch anything extra from the
database.

This should work in the majority of cases, but won't work if the
metadata is missing, which could happen if:

 - the snapshot is "old": it was created before this change
 - the driver doesn't support metadata (eg [Postgres][4])

In the case where metadata is unavailable, we fall back to the existing
method, using `getCommittedOpVersion()`. However, if drivers support
metadata, this should happen sufficiently infrequently that consumers
could potentially remove the `src`/`seq`/`v` index with no noticeable
performance penalty.

[1]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/lib/submit-request.js#L112
[2]: share/sharedb-mongo#94
[3]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/lib/db/index.js#L69
[4]: https://github.com/share/sharedb-postgres/blob/499702acd478645bcc249fa50ba6fc066d257d04/index.js#L140
@alecgibson alecgibson deleted the no-commited-op branch May 28, 2024 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants