initial proposal for history fetching through contacts #167

iurimatias · 2022-03-10T19:36:25Z

No description provided.

corpetty · 2022-03-11T12:32:01Z

docs/draft/17-contact-history.md

+```
+
+`HistoryResponse.message` contains the encoded original message.
+A `HistoryResponse` is sent for each message in the requested time range.


I feel like this could be burdensome. Why not package chats into a blob and send that?

We can potentially turn these messages into WakuMessageArchives, marshall them and send them across the wire. However, that requires individual nodes to store messages as waku messages (instead of just application messages), which means potentially they'll double every message stored (in the worst case)

To add to this, even without message archives and bundling, I think messages have to be sent as waku messages. That means, when users switch between different capabilities, we need to think about how to handle this. Do we start/stop storing waku messages when it's turned on/off? Do we just always store waku messages, in which case turning it on/off won't be an issue because all messages always exist?

0x-r4bbit

Interesting idea, but it has some impact on storage that should be considered..

0x-r4bbit · 2022-03-11T14:13:49Z

docs/draft/17-contact-history.md

+
+```
+{
+  "0x123": -1, // all available chat history


What does "all available chat history" mean here? Is this just whatever is in the local database of the contact?

0x-r4bbit · 2022-03-11T14:15:39Z

docs/draft/17-contact-history.md

+
+## Requesting Chat history
+
+The client chooses one of his online contacts at random that indicated capability to send the history for the target chat or community.


target chat or community

Does this mean individual contacts should be able to provide the history of an entire community?

0x-r4bbit · 2022-03-11T14:17:26Z

docs/draft/17-contact-history.md

+```
+
+`HistoryResponse.message` contains the encoded original message.
+A `HistoryResponse` is sent for each message in the requested time range.


We can potentially turn these messages into WakuMessageArchives, marshall them and send them across the wire. However, that requires individual nodes to store messages as waku messages (instead of just application messages), which means potentially they'll double every message stored (in the worst case)

0x-r4bbit · 2022-03-11T14:19:30Z

docs/draft/17-contact-history.md

+
+## Requesting a community chat history
+
+The client chooses one of the online contacts that displayed intentions to send chat history for the community in question.


In all of these scenarios, status nodes will have to store each message as waku message.
Something to take into consideration as this will have an impact on storage.

Also, it should be taking into account here in the spec that this in fact needs to happen (waku messages need to be stored separately).

ah this ties into the fact that we should have a dont-store flag on waku-2, I think there's should be an issue somewhere about it in the vac repos

For reference and context, the Community History Archive Protocol requires the same #164

vacp2p/rfc#441

John-44 · 2022-03-13T15:44:16Z

@iurimatias I love this idea :-)

From a UX prospective, would the following be possible:

Minimize this service's UI to the greatest extent possible. Ideally, the only UI related to this entire service would be a single checkbox in settings, labelled something like "Fetch message history gaps from mutual contacts". This option would be on by default. This toggle switch in settings would give users a global ability to switch this service off if they want to.
In the future, in Settings we could add the ability to switch off this service of for a specific mutual contact. But perhaps the ability to do this can be descoped from the MVP. Obviously the reason to give the user more granularity is to protect the user from a mutual contact who has gone rouge, and has modified their status client to spam their mutual contacts with fake chat history that they have generated. But in this case, the user could (and should) 'un-friend' the attacker which would stop the attack. Therefore it might be better to just let users know that if they are attacked by a mutual contact, they need to un-friend the mutual contact (so they are no longer a mutual contact) to stop the attack. Less options is better from a UX prospective! To help users identify which mutual contact is performing an attack via this service, we may want to add some ephemeral indication in the message UI indicating which messages have been backfilled by a mutual contact using this service. Design will look into how to do this from a design prospective.

Some random thoughts and questions:

For users who have this service switched on, it could automatically query all of the user's mutual contacts (who also have this service switched on) say every 3(??) days, to see if any of their mutual contacts have any messages (in shared chats) that they don't have (or visa versa). This could be done by sharing hashes of all the messages in a shared chat. If it's detected that a mutual contact may have messages that the user doesn't have or visa versa (I'm not sure if it's possible to detect on which side the missing messages are via hash comparison, and also in some cases both sides might have missing messages), then both clients could send binary blobs containing all the messages they store for that specific chat in the date range where the presence of missing messages has been detected via a comparison of hashes. More details on how this could work below.
To check if a mutual contact has messages the user doesn't have in a shared chat (or visa versa), could both users start by sending each other hashes of all the messages in each of the chats they have in common. If the hashes differ for a particular chat, then that single chat could perhaps be split into four date range segments, hashes generated for each of the four date range segments, and then the four hashes exchanged between both clients. This would let the clients narrow down the date range that contains the missing messages. This process (of splitting the history of a chat into 4 smaller segments, and making hashes of the progressively smaller segments) could continue for further cycles, iteratively narrowing down the time range that contains missing messages until the time range that contains missing messages has been narrowed down to (for example) a 5 day date range. Then both clients package all messages they each have for that 5 day date range for that specific chat (identified via the hash comparison process) into a binary blob, and they then send these binary blobs to each other. The binary blobs are unpacked on both ends, and are then used by both clients to fill any missing messages they don't have in their message databases. This functionality could re-use the 'export messages from a date range into a binary blob' and the 'unpack the binary blob and fill the message database with any missing messages that the binary blob contains' functionality that @PascalPrecht has already built for the 'Community history archive service'.
If we were to use a scheme similar to the one described above, as an optimisation, we might want to bias it to check for differences in the most recent messages. For example, the initial hash comparison could (for each chat in common) send hashes for messages from the current time to 5 days, from 5 days to 10 days, and a 3rd has for all message in that chat older than 10 days.
If a mutual contact isn't seen for more than 3(??) days, as soon as that mutual contact is next seen this check is performed.
The 'message history fetching through though contacts' service could exclude (e.g. not check for and then if missing messages are found attempt to fetch) messages older than 7 days in Communities that have the 'community history archive service' switched on. This is because we should be able to rely on the 'community history archive service' to backfill any missing messages older than 7 days in these communities.
What frequency of checking with mutual contacts for missing messages would be the best tradeoff? Check every 12h? Or every 24h? Or every 2 days? Or every 3 days? Checking more frequently will increase load on the clients, and bandwidth usage by the clients. But checking less frequently means a user will have to wait longer for any missing messages to be filled from mutual contacts. More frequent checks are better from a user experience prospective, but technical constraints probably need to be taken into account.
If the most straightforward approach to implementing this service increases storage requirements I think this is absolutely ok. We can always build additional functionality to optimise storage usage in a subsequent iteration of this service. IMHO best to keep the MVP of this service as minimal as possible.

John-44 · 2022-03-14T10:27:54Z

Some further random thoughts:

If we go with the approach of creating hashes of blocks of message history, these hashes could be created ahead of when they need to be shared and cached.
Any time missing messages are inserted into the message database for a chat, this could trigger a recompute of the hashes for that particular chat
One possible scheme for the hashes for each individual conversation: have hashes of each week of the conversation history, have hashes for every 4 weeks of conversation history, have hashes for each 16 weeks of conversation history, etc, etc... (obvs dependent on the length of time that conversation has existed. Then comparisons of conversation history between Mutual Contacts can happen very quickly and with little bandwidth, because when a mutual contact of the use comes online, the precomputed and cached hashes can be quickly and easily compared to see if they match.

initial proposal for history fetching through contact history

59a72c8

iurimatias changed the title ~~initial proposal for history fetching through contact history~~ initial proposal for history fetching through contacts Mar 10, 2022

update table of contents

3bd7ec0

corpetty reviewed Mar 11, 2022

View reviewed changes

0x-r4bbit reviewed Mar 11, 2022

View reviewed changes

0x-r4bbit mentioned this pull request Mar 14, 2022

Handle history archive magnetlink messages status-im/status-go#2585

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial proposal for history fetching through contacts #167

initial proposal for history fetching through contacts #167

iurimatias commented Mar 10, 2022

corpetty Mar 11, 2022

0x-r4bbit Mar 11, 2022

0x-r4bbit Mar 11, 2022

0x-r4bbit left a comment

0x-r4bbit Mar 11, 2022 •

edited

Loading

0x-r4bbit Mar 11, 2022 •

edited

Loading

0x-r4bbit Mar 11, 2022

0x-r4bbit Mar 11, 2022

cammellos Mar 11, 2022

0x-r4bbit Mar 11, 2022

richard-ramos Mar 11, 2022

John-44 commented Mar 13, 2022 •

edited

Loading

John-44 commented Mar 14, 2022


		## Requesting Chat history

		The client chooses one of his online contacts at random that indicated capability to send the history for the target chat or community.


		## Requesting a community chat history

		The client chooses one of the online contacts that displayed intentions to send chat history for the community in question.

initial proposal for history fetching through contacts #167

Are you sure you want to change the base?

initial proposal for history fetching through contacts #167

Conversation

iurimatias commented Mar 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0x-r4bbit left a comment

Choose a reason for hiding this comment

0x-r4bbit Mar 11, 2022 • edited Loading

Choose a reason for hiding this comment

0x-r4bbit Mar 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

John-44 commented Mar 13, 2022 • edited Loading

John-44 commented Mar 14, 2022

0x-r4bbit Mar 11, 2022 •

edited

Loading

0x-r4bbit Mar 11, 2022 •

edited

Loading

John-44 commented Mar 13, 2022 •

edited

Loading