Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on usage #40

Open
ccollie opened this issue Oct 8, 2024 · 3 comments
Open

Question on usage #40

ccollie opened this issue Oct 8, 2024 · 3 comments

Comments

@ccollie
Copy link

ccollie commented Oct 8, 2024

I have another use-case i'd like to use blart for. Assume I have a series of chunks bounded by timestamp and containing samples.

pub struct Sample {
  pub timestamp: Timestamp,
  pub value: f64
}

pub struct Chunk {
  pub start: Timestamp,
  pub end: Timestamp,
  pub samples: Vec<Sample>
}

I store multiple chunks per timeseries, and later I need to upsert a Sample in a chunk. For this i need to efficiently locate a chunk into which to insert the sample.

It seems to me that chunks.range(new_sample.timestamp..) won't do the trick, but I can't see any means short of iteration from the start.

Edit: chunks would be indexed by chunk.start

@declanvk
Copy link
Owner

declanvk commented Oct 8, 2024

So you would be using a TreeMap<Timestamp, Chunk> as your index structure? And you mentioned that the key would be chunk.start?

I think it would possibly work if you did chunks.range(..=new_sample.timestamp).rev(). The ..=new_sample.timestamp will give you an iterator of chunks where the chunk.start <= new_sample.timestamp. The .rev() is so that the first element in the iterator will be the most likely match.

One underlying assumption I have is that the byte representation of Timestamp is ordered in the same way that Timestamp is


Somewhat off-topic questions:

For this i need to efficiently locate a chunk into which to insert the sample.

When you insert a new sample into a chunk, do you update the start and end bounds of the chunk? Or is it the case that you'd instead insert a new chunk if the new_sample.timestamp is outside the bounds of any existing chunk?

@ccollie
Copy link
Author

ccollie commented Oct 8, 2024

So you would be using a TreeMap<Timestamp, Chunk> as your index structure? And you mentioned that the key would be chunk.start?

Yes.Timestamp is essentially an alias for i64

When you insert a new sample into a chunk, do you update the start and end bounds of the chunk? Or is it the case that you'd instead insert a new chunk if the new_sample.timestamp is outside the bounds of any existing chunk?

In this case I'm specifically looking at upserts. Imagine a TimeSeries struct that tracks its range..

pub struct TimeSeries {
  pub start: Timestamp,
  pub end: Timestamp,
  pub chunks: TreeMap<Timestamp, Chunk>
}

Checking an incoming Sample against the range we can determine if we need to upsert or add a new chunk. It is of course possible that a chunk may have to be re-indexed (we also want to handle the case when chunks are imbalanced w.r.t the number of samples)

@declanvk
Copy link
Owner

declanvk commented Oct 8, 2024

Yes.Timestamp is essentially an alias for i64

One thing to note if you're using an i64 (or any unsigned integer) as a key, is that you'll need to apply a mapping so that the ordering of the byte representation matches the ordering of the type. https://docs.rs/blart/latest/blart/struct.ToIBE.html#impl-BytesMapping%3Ci32%3E-for-ToIBE is the impl you're looking for, and you would combine it with the Mapped<_, _> struct like so:

use blart::{TreeMap, Mapped, ToIBE, BytesMapping};

// assumption
type Timestamp = i64;

type Map = TreeMap<Mapped<ToIBE, Timestamp>, Chunk>

This does make keys a little more difficult to use sadly, but not impossible. Let me know about any difficulties in this part or any proposals to make this translation more transparent. Its definitely a rough area of the crate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants