Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL event.timestamp is not always unique #13019

Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion rasa/core/exporter.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ def _validate_all_requested_ids_exist(
self, conversation_ids_in_tracker_store: Set[Text]
) -> None:
"""Warn user if `self.requested_conversation_ids` contains IDs not found in
`conversation_ids_in_tracker_store`
`conversation_ids_in_tracker_store`.

Args:
conversation_ids_in_tracker_store: Set of conversation IDs contained in
Expand Down Expand Up @@ -241,6 +241,11 @@ async def _fetch_events_within_time_range(self) -> AsyncIterator[Dict[Text, Any]
continue

events = self._get_events_for_conversation_id(_events, conversation_id)

# the order of events was changed after ATO-2192
ancalita marked this conversation as resolved.
Show resolved Hide resolved
# we should sort the events by timestamp to keep the order
events.sort(key=lambda x: x["timestamp"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certain tests in exporter failed nonetheless so I've added this statement to sort the event list. This sorting does not inflate the performance cost much as it is only used during rasa export command -- which is a fire-and-forget command that is seldom used

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we update the tests instead of adding this extra sort function 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance may be inflated quite well for large dataset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this sorting shouldn't be required, we only need this here because exporter relies on synthetic data that isn't realistic. However adding sorting here is a way to minimise the 2nd order effects from this change. In terms of performance, sorting the events explicitly here is better than sorting by timestamp in the tracker store query. The tracker store query is used much more frequently compared to rasa export command


# the conversation IDs are needed in the event publishing
for event in events:
if (
Expand Down
2 changes: 1 addition & 1 deletion rasa/core/tracker_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -1325,7 +1325,7 @@ def _event_query(
)
)

return event_query.order_by(self.SQLEvent.timestamp)
return event_query.order_by(self.SQLEvent.id)
ancalita marked this conversation as resolved.
Show resolved Hide resolved

async def save(self, tracker: DialogueStateTracker) -> None:
"""Update database with events from the current conversation."""
Expand Down
6 changes: 3 additions & 3 deletions tests/core/test_exporter.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,9 @@ async def test_fetch_events_within_time_range():
conversation_ids = ["some-id", "another-id"]

# prepare events from different senders and different timestamps
event_1 = random_user_uttered_event(3)
event_2 = random_user_uttered_event(2)
event_3 = random_user_uttered_event(1)
event_1 = random_user_uttered_event(1)
event_2 = random_user_uttered_event(3)
event_3 = random_user_uttered_event(2)
Comment on lines +76 to +78
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arguments here refer to the timestamp of events. The test here was to assume that,
conversation1 = [event1, event2] where event2 happens before event1. However this is not a fair assumption since the event timestamps are not externally assigned but depend on the timestamp of machine -- as defined here https://github.com/RasaHQ/rasa/blob/3.6.x/rasa/shared/core/events.py#L284

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand why the initial test setup assumed the order of events is [event1, event2] if event2 happens first? Was that to test that the timestamp ordering happens? If so, why do you need to alter the test if you've sorted the events by timestamp in the exporter code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ancalita I'm not sure why did the test assumed that order of events. The changes we've made in the Tracker Store are only limited to SQL Tracker Store. However there's an in-memory tracker store being used here. My additional changes to sort events in exporter still apply. The test fails with the following diff,

>           assert event_timestamps == [e.timestamp for e in events[_id]]
E           assert [2, 3] == [3, 2]
E             At index 0 diff: 2 != 3
E             Full diff:
E             - [3, 2]
E             + [2, 3]

Which is, that it is trying to assert that the order of events returned is the same as given. I changed the given here since it is a false test case.

events = {conversation_ids[0]: [event_1, event_2], conversation_ids[1]: [event_3]}

def _get_tracker(conversation_id: Text) -> DialogueStateTracker:
Expand Down
Loading