Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(datastore): base sync when sync expression changes #2937

Merged
merged 36 commits into from
Jan 2, 2025

Conversation

edisooon
Copy link
Member

@edisooon edisooon commented Oct 10, 2024

  • PR title and description conform to Pull Request guidelines.

Issue #, if available:

Description of changes:
This is in an effort to address a current issue customers had reported in our Datastore category.

The issue is that the "changing sync expression in Runtime" doesn't work as expected as described in our doc.

For example, when we initialize the DataStorePlugin with a sync expression for Student model to only sync down all the students who are >= 20-year-old, even if we change this sync expression to sync down students who are >= 17-year-old in run-time followed by Amplify.DataStore.stop({},{}) and Amplify.DataStore.start({}, {}): the new synced expression will only be applied to the newly-created/updated students, i.e. pre-existing instances of Students whose ages are in between 17 and 20 in remote database won't get synced, which is not as expected.

The bug is originated from the hydrate() method in SyncProcessor.java, which is being called by the startApiSync() method in Orchestrator.java, a component responsible for "Synchronizing changed data between the LocalStorageAdapter and AppSync".
To build a Sync Request (syncModel(..) method in SyncProcessor.java), we need to pass in

  • a model schema of the Model we want to sync
  • last_sync_timestamp of this model
  • predicate / sync expression

In order to achieve this, our implementation persists the last_sync_timestamp of models in LastSyncMetadata table whenever the code initiate a sync request, and uses the persisted last_sync_timestamp from LastSyncMetadata to initiate the next sync. (createHydrationTasks(..) in SyncProcessor.java)

This would allow us to initiate either a Delta sync or Base sync based on the lastSyncTime parameter we pass in, which is defined by the nature of AppSync's Sync API.

To further explain the cause of this bug, let's keep using the previous example:
After we initiate the first sync with sync expression (age>=20), we would add a new row in LastSyncMetadata, which might look like [Student(model name), 3(last sync timestamp)], assuming the current timestamp is 3.
After we change the sync expression in runtime (age>=17), the implementation would:

  • retrieve the Student model's last sync time (3) from the database
  • use 3 as last sync time and the updated sync expression (age>=17) to build a new sync request for Student model
  • update the row in LastSyncMetadata table, which might look like [Student(model name), 4(last sync timestamp)]

This will lead to the bug behavior described above, because:
AppSync will use BOTH updated_sync_expression(age>=17) and last_sync_timestamp(3) to initiate a delta sync (base sync will be performed only when last_sync_timestamp is 0). And for delta sync, under the hood, this last_sync_timestamp will be used to compare with the metadata _lastChangedAt in each rows of Student table in remote database.

So previous students who haven't been updated since last_sync_timestamp(3), i.e. _lastChangedAt<3, won't get synced down, even though they meet the updated sync expression, e.g. a row in remote Student table like [student1(name), 18(age), 2(_lastChangedAt)].

But students who are added/updated later will get synced, e.g., a row in remote Student table like [student2(name), 17(age), 5(_lastChangedAt)].

The essence of this problem is that: the last_sync_timestamp was only associated with the model, but should be associated with the model and the last_sync_expression being used.

To address this, we need to:

  • add a new column in LastSyncMetadata to store the last_sync_expression being used
  • store the last_sync_expression after we request a sync, and use last_sync_expression to adjust the retrieved last_sync_timestamp when we try to build a new sync request. To be specific, if the current_sync_expression != last_sync_expression, we return 0 as adjusted last_sync_timestamp to initiate a full sync.

The solution has been broke down into three steps:

  1. DB migration for LastSyncMetadata table
  2. add new QueryPredicate field in LastSyncMetadata model [There are some changes related to (de)serialization of QueryPredicate in GsonPredicateAdapater]
  3. logic changes in SyncProcessor & SyncTimeRegistry to make use of the new field

How did you test these changes?
(Please add a line here how the changes were tested)
manual testing, new and existing test cases in SyncProcessorTest

Documentation update required?

  • No
  • Yes (Please include a PR link for the documentation update)

General Checklist

  • Added Unit Tests
  • Added Integration Tests
  • Security oriented best practices and standards are followed (e.g. using input sanitization, principle of least privilege, etc)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@edisooon edisooon requested a review from a team as a code owner October 10, 2024 17:42
@edisooon edisooon marked this pull request as draft October 10, 2024 18:11
@edisooon edisooon changed the base branch from edisooon/db-migration-for-syncexpression-field-in-LastSyncMetadata to main October 15, 2024 23:49
@edisooon edisooon changed the title fix(datastore): Store and use syncExpression field in LastSyncMetadata table to retrieve adjusted lastSyncTime fix(datastore): base sync when sync expression changes Oct 16, 2024
@edisooon edisooon marked this pull request as ready for review October 16, 2024 22:45
@edisooon edisooon requested a review from a team as a code owner October 21, 2024 19:56
Copy link
Member

@tylerjroach tylerjroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets say a customer runs the app for the first time after updating with this fix.

You migrate the database and add null to the syncExpression. How do you differentiate between a real null sync expression, and one from the migration? We don't really know if thats accurate or not right? Would we be better or not removing last sync for models? I understand this would force a full sync for all initial app opens on this upgrade.

@edisooon
Copy link
Member Author

How do you differentiate between a real null sync expression, and one from the migration? We don't really know if thats accurate or not right?

@tylerjroach Sync expression wouldn't be stored as null if a sync operation occurs - it would be the serialization value of either MatchAllQueryPredicate, MatchNoneQueryPredicate, QueryPredicateGroup or QueryPredicateOperation. So it should be accurate.

tylerjroach
tylerjroach previously approved these changes Dec 31, 2024
@edisooon edisooon merged commit 7d1603c into main Jan 2, 2025
3 checks passed
@edisooon edisooon deleted the edisooon/use-last-syncexpression-in-lookupLastSyncTime branch January 2, 2025 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants