Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove timestamp optimization for full syncs #1907

Merged

Conversation

seanstory
Copy link
Member

Closes #1372

When connectors were first being built, there was an assumption that a distinction between "full" and "incremental" syncs would not be needed. We hoped to only have one sync type, and to have it be capable of doing full table scans on its first run, but only partial scans thereafter.

In reality, this has not worked, and we've begun to introduce new sync types, specifically the Incremental sync. At the same time, we've received more and more reports that this optimization is actually a hiderance to customers working to move into production. While it was intended to save time, it in fact takes more, as every code, pipeline, or configuration change requires data to be deleted, modified, or a new index to be started from scratch. This friction was not intended or anticipated, and is considered to be a bug.

In the unlikely event that a customer is relying on this optimization, the diff is quite simple and can be added back from a fork as a stop-gap until we are able to deliver Incremental Syncs to all our connectors.

Checklists

Pre-Review Checklist

  • this PR has a meaningful title
  • this PR links to all relevant github issues that it fixes or partially addresses
  • this PR has a thorough description
  • Covered the changes with automated tests
  • Tested the changes locally
  • Added a label for each target release version (example: v7.13.2, v7.14.0, v8.0.0)

Related Pull Requests

Release Note

Addressed a bug where full syncs would not fully-resync data as expected, which caused friction when upgrading, making customized code changes, or modifying ingest pipelines. The fix may result in increased full sync times, as more documents may now be downloaded and indexed.

(Note to the docs team - please link this PR in the release note, so that users can easily revert it on a fork in the event that the performance impacts for them outweigh the value of fixing the bug).

jedrazb
jedrazb previously approved these changes Nov 20, 2023
@artem-shelkovnikov artem-shelkovnikov force-pushed the seanstory/1372-remove-full-sync-overoptimization branch from e5dc859 to 7d3cdff Compare November 20, 2023 14:15
@artem-shelkovnikov artem-shelkovnikov enabled auto-merge (squash) November 20, 2023 14:15
Copy link

💚 Backport PR(s) successfully created

Status Branch Result
8.10 #1909
8.11 #1910

The backport PRs will be merged automatically after passing CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure that connector full syncs fully re-sync
3 participants