Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial proposal for history fetching through contacts #167

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

iurimatias
Copy link
Member

No description provided.

@iurimatias iurimatias changed the title initial proposal for history fetching through contact history initial proposal for history fetching through contacts Mar 10, 2022
```

`HistoryResponse.message` contains the encoded original message.
A `HistoryResponse` is sent for each message in the requested time range.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could be burdensome. Why not package chats into a blob and send that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can potentially turn these messages into WakuMessageArchives, marshall them and send them across the wire. However, that requires individual nodes to store messages as waku messages (instead of just application messages), which means potentially they'll double every message stored (in the worst case)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to this, even without message archives and bundling, I think messages have to be sent as waku messages. That means, when users switch between different capabilities, we need to think about how to handle this. Do we start/stop storing waku messages when it's turned on/off? Do we just always store waku messages, in which case turning it on/off won't be an issue because all messages always exist?

Copy link
Member

@0x-r4bbit 0x-r4bbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea, but it has some impact on storage that should be considered..


```
{
"0x123": -1, // all available chat history
Copy link
Member

@0x-r4bbit 0x-r4bbit Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "all available chat history" mean here? Is this just whatever is in the local database of the contact?


## Requesting Chat history

The client chooses one of his online contacts at random that indicated capability to send the history for the target chat or community.
Copy link
Member

@0x-r4bbit 0x-r4bbit Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target chat or community

Does this mean individual contacts should be able to provide the history of an entire community?

```

`HistoryResponse.message` contains the encoded original message.
A `HistoryResponse` is sent for each message in the requested time range.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can potentially turn these messages into WakuMessageArchives, marshall them and send them across the wire. However, that requires individual nodes to store messages as waku messages (instead of just application messages), which means potentially they'll double every message stored (in the worst case)


## Requesting a community chat history

The client chooses one of the online contacts that displayed intentions to send chat history for the community in question.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all of these scenarios, status nodes will have to store each message as waku message.
Something to take into consideration as this will have an impact on storage.

Also, it should be taking into account here in the spec that this in fact needs to happen (waku messages need to be stored separately).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah this ties into the fact that we should have a dont-store flag on waku-2, I think there's should be an issue somewhere about it in the vac repos

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference and context, the Community History Archive Protocol requires the same #164

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@John-44
Copy link

John-44 commented Mar 13, 2022

@iurimatias I love this idea :-)

From a UX prospective, would the following be possible:

  • Minimize this service's UI to the greatest extent possible. Ideally, the only UI related to this entire service would be a single checkbox in settings, labelled something like "Fetch message history gaps from mutual contacts". This option would be on by default. This toggle switch in settings would give users a global ability to switch this service off if they want to.

  • In the future, in Settings we could add the ability to switch off this service of for a specific mutual contact. But perhaps the ability to do this can be descoped from the MVP. Obviously the reason to give the user more granularity is to protect the user from a mutual contact who has gone rouge, and has modified their status client to spam their mutual contacts with fake chat history that they have generated. But in this case, the user could (and should) 'un-friend' the attacker which would stop the attack. Therefore it might be better to just let users know that if they are attacked by a mutual contact, they need to un-friend the mutual contact (so they are no longer a mutual contact) to stop the attack. Less options is better from a UX prospective! To help users identify which mutual contact is performing an attack via this service, we may want to add some ephemeral indication in the message UI indicating which messages have been backfilled by a mutual contact using this service. Design will look into how to do this from a design prospective.

Some random thoughts and questions:

  • For users who have this service switched on, it could automatically query all of the user's mutual contacts (who also have this service switched on) say every 3(??) days, to see if any of their mutual contacts have any messages (in shared chats) that they don't have (or visa versa). This could be done by sharing hashes of all the messages in a shared chat. If it's detected that a mutual contact may have messages that the user doesn't have or visa versa (I'm not sure if it's possible to detect on which side the missing messages are via hash comparison, and also in some cases both sides might have missing messages), then both clients could send binary blobs containing all the messages they store for that specific chat in the date range where the presence of missing messages has been detected via a comparison of hashes. More details on how this could work below.

  • To check if a mutual contact has messages the user doesn't have in a shared chat (or visa versa), could both users start by sending each other hashes of all the messages in each of the chats they have in common. If the hashes differ for a particular chat, then that single chat could perhaps be split into four date range segments, hashes generated for each of the four date range segments, and then the four hashes exchanged between both clients. This would let the clients narrow down the date range that contains the missing messages. This process (of splitting the history of a chat into 4 smaller segments, and making hashes of the progressively smaller segments) could continue for further cycles, iteratively narrowing down the time range that contains missing messages until the time range that contains missing messages has been narrowed down to (for example) a 5 day date range. Then both clients package all messages they each have for that 5 day date range for that specific chat (identified via the hash comparison process) into a binary blob, and they then send these binary blobs to each other. The binary blobs are unpacked on both ends, and are then used by both clients to fill any missing messages they don't have in their message databases. This functionality could re-use the 'export messages from a date range into a binary blob' and the 'unpack the binary blob and fill the message database with any missing messages that the binary blob contains' functionality that @PascalPrecht has already built for the 'Community history archive service'.

  • If we were to use a scheme similar to the one described above, as an optimisation, we might want to bias it to check for differences in the most recent messages. For example, the initial hash comparison could (for each chat in common) send hashes for messages from the current time to 5 days, from 5 days to 10 days, and a 3rd has for all message in that chat older than 10 days.

  • If a mutual contact isn't seen for more than 3(??) days, as soon as that mutual contact is next seen this check is performed.

  • The 'message history fetching through though contacts' service could exclude (e.g. not check for and then if missing messages are found attempt to fetch) messages older than 7 days in Communities that have the 'community history archive service' switched on. This is because we should be able to rely on the 'community history archive service' to backfill any missing messages older than 7 days in these communities.

  • What frequency of checking with mutual contacts for missing messages would be the best tradeoff? Check every 12h? Or every 24h? Or every 2 days? Or every 3 days? Checking more frequently will increase load on the clients, and bandwidth usage by the clients. But checking less frequently means a user will have to wait longer for any missing messages to be filled from mutual contacts. More frequent checks are better from a user experience prospective, but technical constraints probably need to be taken into account.

  • If the most straightforward approach to implementing this service increases storage requirements I think this is absolutely ok. We can always build additional functionality to optimise storage usage in a subsequent iteration of this service. IMHO best to keep the MVP of this service as minimal as possible.

@John-44
Copy link

John-44 commented Mar 14, 2022

Some further random thoughts:

  • If we go with the approach of creating hashes of blocks of message history, these hashes could be created ahead of when they need to be shared and cached.

  • Any time missing messages are inserted into the message database for a chat, this could trigger a recompute of the hashes for that particular chat

  • One possible scheme for the hashes for each individual conversation: have hashes of each week of the conversation history, have hashes for every 4 weeks of conversation history, have hashes for each 16 weeks of conversation history, etc, etc... (obvs dependent on the length of time that conversation has existed. Then comparisons of conversation history between Mutual Contacts can happen very quickly and with little bandwidth, because when a mutual contact of the use comes online, the precomputed and cached hashes can be quickly and easily compared to see if they match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

6 participants