Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No way to query past Query.Prefix? #116

Open
drew-512 opened this issue Feb 4, 2019 · 31 comments
Open

No way to query past Query.Prefix? #116

drew-512 opened this issue Feb 4, 2019 · 31 comments

Comments

@drew-512
Copy link

drew-512 commented Feb 4, 2019

Hi gents!

Currently, Query.Prefix is used to do the initial seek when a Query begins, and iteration only continues while that prefix holds. There needs to be a flag that basically says, hey, don't use Prefix once iteration has begun. Otherwise, there's no way to effectively do a seek and do an arbitrary amount of scanning. This seems like a critical feature to leave behind, yikes.

Hopefully, I'm missing something!?

For example, in go-ds-badger: (line 270+)

it := txn.NewIterator(opt)
it.Seek(prefix)
...
qrb.Process.Go(func(worker goprocess.Process) {
    ...
    for sent := 0; it.ValidForPrefix(prefix); sent++ {
        ...
    }
}
...

could be enhanced to:

it := txn.NewIterator(opt)
it.Seek(prefix)
if q.PrefixOnlySeeks {
    prefix = nil
}
...

Adding this would be easy, but all the impls would need to be updated of course. So maybe it's done as a selectively available feature?

This is a dealbreaker for us, but we love Datastore and want to use and support it! If some at PL agrees to accept our pull requests in a timely fashion, we'll help do the work. I'm sad we're now in this position, and would take less than a few mins for dbs like leveldb, badger, and bolt. What do y'all think?

We kindly need resolution on this relatively soon one way or the other so that we know we need to pass on Datastore.

Thanks,
Drew

@drew-512 drew-512 changed the title No way to query past the Query.Prefix? No way to query past Query.Prefix? Feb 4, 2019
@magik6k
Copy link
Member

magik6k commented Feb 4, 2019

Some datastores may not be able to support this (like flatfs, but it doesn't even support ordered iterating, so it's probably fine).

👍 from me, just make sure that it's optional and not enabled by default (default = zero value)

cc @Stebalien @bigs

@drew-512
Copy link
Author

drew-512 commented Feb 4, 2019

Ok, cool, thanks for the quick response.

How about I whip up some changes, post them here for initial review/discussion and if they look good, we can take it from there.

@drew-512
Copy link
Author

drew-512 commented Feb 4, 2019

Ok, so so here's my proposed change to Query.

query.go ~ line 60

type Query struct {
	Prefix            string   // namespaces the query to results whose keys have Prefix
	IterationPrefix   string   // if set, this overrides Prefix during query iteration (n/a for some impls)

At first I was thinking a bool as I described in my prev post, but the drawback is that client would be forced to step entry by entry in order to close the query (as she checks each Result.Entry.Key to see if iteration has proceeded sufficiently far). That would mean the client couldn't use Query.Rest() since iteration would go on and on.

So If we go with adding the IterationPrefix field (above), the behavior augmentation is clear and well-defined. Impl changes are also trivial and unambiguous.

If y'all like this, give me the green light and I'll start submitting the changes. I'm OCD when it comes to naming and comments, so don't be shy for any name, style, or comment edits -- I get it. :)

@magik6k
Copy link
Member

magik6k commented Feb 4, 2019

Prefix/IterationPrefix don't really seem intuitive, how about Prefix/StartPrefix:

type Query struct {
  Prefix            string
  StartPrefix     string
[...]
}

func Query[...]
  if q.StartPrefix == "" {
    q.StartPrefix = q.Prefix
  }
  if !strings.HasPrefix(q.StartPrefix, q.Prefix) {
    return errors.New("StartPrefix not within Prefix")
  }
  [...]

  it.Seek(q.StartPrefix)
  for [...] it.ValidForPrefix(q.Prefix)

@drew-512
Copy link
Author

drew-512 commented Feb 4, 2019

Love it -- great suggestion!

@drew-512
Copy link
Author

drew-512 commented Feb 4, 2019

SeekPrefix or StartPrefix? SeekPrefix may more accurately describe the behavior for impls that literally have seek (vs prefix) behavior.

@magik6k
Copy link
Member

magik6k commented Feb 4, 2019

Yeah, SeekPrefix makes more sense

@bigs
Copy link
Contributor

bigs commented Feb 4, 2019

i'm definitely not against this—i think it could be quite useful. i think we'll need to update datastores that don't natively support this to provide it or at least heavily document its absence.

@drew-512
Copy link
Author

drew-512 commented Feb 4, 2019

How about:

type Query struct {
	Prefix            string   // namespaces the query to results whose keys have Prefix
	SeekPrefix        string   // if set, an alternate to Prefix that initially positions the query (n/a for some impls)
...

@drew-512
Copy link
Author

drew-512 commented Feb 6, 2019

Ok, so I made the discussed changes to leveldb, badger, and bolt -- changes were clean, cost-free, were only a couple lines, and have been tested. Meanwhile, I'm not familiar with S3 to know the obvious path, so I was going to pass on S3 for now. If y'all want, I can have go-ds-S3/s3.go::Query() return an unimp error if Query.SeekPrefix is set -- just let me know.

@magik6k
Copy link
Member

magik6k commented Feb 6, 2019

Feel free to open the PRs, it's easier to discuss changes there. (https://github.com/whyrusleeping/sql-datastore/ will also need support)

@drew-512
Copy link
Author

drew-512 commented Feb 6, 2019

ok great

@magik6k
Copy link
Member

magik6k commented Feb 6, 2019

Can you link the PRs here? (makes it easier for others to follow)

@drew-512
Copy link
Author

drew-512 commented Feb 6, 2019

Oops, yeah good idea. :D

4 PRs submitted. Will do SQL once we're past this step if that werks.

The 3 ds impl changes will of course get compile fails until the Query.SeekPrefix change is merged.

ds.Query:
#119

badger:
ipfs/go-ds-badger#47

leveldb:
ipfs/go-ds-leveldb#26

bolt: (I also added support for Query.Prefix)
ipfs/go-ds-bolt#1

@Stebalien
Copy link
Member

Note: S3 calls this "StartAfter" (but SeekPrefix is fine with me).

@Stebalien
Copy link
Member

Question: How should this interact with sorting by value?

There are two reasonable solutions I can think of:

  1. Apply after sorting. This is useful for resuming query.
  2. Apply before sorting. This is useful for, e.g., scanning a range of dates. Maybe we even want to replace this field with a range (Start, Stop)? This is also likely easier to implement everywhere.

Thoughts?

@bigs
Copy link
Contributor

bigs commented Feb 7, 2019 via email

@drew-512
Copy link
Author

drew-512 commented Feb 7, 2019

Liking that Start Stop idea, and it maps well to most impls (tho I can't speak for S3). Steb, as you raised in the other thread, there's the bookkeeping factor w/ ascending vs descending order thing, so that's the main PITA factor I see. With a Stop though, that means there'd now be another reason for a query to stop (which is useful b/c even w/ the SeekPrefix I'm using for now, for example, I have to check each result key to see if it's time to stop). So I'm seeing that Start/Stop is a nice alternative to Prefix (using Prefix is equivalent to using the same key for Start and Stop if I'm seeing things right).

So the bottom line seems to be that Start+Stop seems to be a wealthier alternative to Prefix, but having both may be over-defined-ish.

I'd happily help support a switch to Start+Stop if that's what others would be into. I already have test code that tests our ds stuff (independent of the impl used), so I'd be up to make common_test.go that would run core tests for bolt, badger, SQL, leveldb, and other impls that support the whole shootin match. I was surprised it didn't already exist, but who wants to do that -- I get it. Literally the opposite of feeling productive lol.

@Stebalien
Copy link
Member

So I'm seeing that Start/Stop is a nice alternative to Prefix

Yes.

(using Prefix is equivalent to using the same key for Start and Stop if I'm seeing things right).

Almost. Stop would have to be Start+1. E.g., /a -> /b (non-inclusive).

I already have test code that tests our ds stuff (independent of the impl used), so I'd be up to make common_test.go that would run core tests for bolt, badger, SQL, leveldb, and other impls that support the whole shootin match. I was surprised it didn't already exist, but who wants to do that -- I get it. Literally the opposite of feeling productive lol.

We have a shared test suite in the test directory of this package (the entrypoint is SubtestAll). However, we're not using it everywhere and it's far from complete. See https://github.com/ipfs/go-ds-leveldb/blob/d9761ea1cccd53b8a27a50c1e6a2c4e975567699/ds_test.go#L325-L329 for an example.

But yeah, a complete shared test suite would be amazing.

I'd happily help support a switch to Start+Stop if that's what others would be into.

Thoughts @magik6k, @bigs? TL;DR: "seeking" implies seeking through some ordered set while a range is just a range of keys. IMO, this better matches user needs and should be easier to implement correctly.

@bigs
Copy link
Contributor

bigs commented Feb 12, 2019

yeah, i'm into it. we could even implement a member method, Prefix(string) that configures this for you. this definitely increases the power of datastores.

@drew-512
Copy link
Author

drew-512 commented Feb 13, 2019

(using Prefix is equivalent to using the same key for Start and Stop if I'm seeing things right).

Almost. Stop would have to be Start+1. E.g., /a -> /b (non-inclusive).

Well, my thought was to define the Stop key as inclusive, which of course means either <= or >=, depending on ascending/descending keys. This would appear to fit in everywhere. So maybe instead of Start/Stop, it's Start/Last, First/Last, or Seek/Last?

@bigs
Copy link
Contributor

bigs commented Feb 13, 2019

i think the [Start, Stop) are a bit more reasonable, even if they require a little extra legwork for the Prefix case. that's not a particularly strong feeling, however, so feel free to disregard if you feel the implementation benefits. for your semantics, [Start, End] may suit as names

@Stebalien
Copy link
Member

which of course means either <= or >=, depending on ascending/descending keys.

With start/stop, I wouldn't take order into account (we also support things like ordering by value). That is, start/stop mean "all keys between start and stop" where start <= stop. That's really the main reason I prefer start/stop over seek.

Well, my thought was to define the Stop key as inclusive,

Stop inclusive can't mimic Prefix. For example, [/a, /a/zzz] doesn't include /a/zzzz.


Note: We could also just use a custom range type:

type Bound {
    Value string
    Exclusive bool
}

func (b Bound) Open() bool {
    return b == Bound{} // zero value is "open"
}

func (b Bound) Before(key string) bool {
    if b.Exclusive {
        return bound < key
    } else if key == "" {
        return true
    } else {
        return bound <= key
    }
}

func (b Bound) After(key string) bool {
    if b.Exclusive {
        return key < bound
    } else if key == "" {
        return true // open range
    } else {
        return key <= bound
    }
}

type Range struct {
    Start, Stop Bound
}

func (r *Range) Contains(key string) bool {
    return r.Start.Before(key) && r.Stop.After(key)
}

yeah, i'm into it. we could even implement a member method, Prefix(string) that configures this for you. this definitely increases the power of datastores.

You mean get rid of Query.Prefix?

@bigs
Copy link
Contributor

bigs commented Feb 13, 2019

my thinking was that if Start and Stop were still performing prefix matching, ["/a", "/a"] would be equivalent to {Prefix: "/a"}. i do agree, however, that exclusive upper bounds are, perhaps, more intuitive.

re: removing Prefix, i think i misunderstood earlier the intent of this change. i thought we were replacing it with stop/start functionality. i don't know how the two would interact.

@Stebalien
Copy link
Member

my thinking was that if Start and Stop were still performing prefix matching, ["/a", "/a"] would be equivalent to {Prefix: "/a"}. i do agree, however, that exclusive upper bounds are, perhaps, more intuitive.

Got it. Yeah, that would also work (although I mildly prefer ranges).

re: removing Prefix, i think i misunderstood earlier the intent of this change. i thought we were replacing it with stop/start functionality. i don't know how the two would interact.

It just feels like an unnecessary breaking change (although we could deprecate it). With prefix and start/stop, I assume you'd turn the prefix into a range and then take the intersection. We can even add helper functions for this.

@Stebalien
Copy link
Member

Range code: https://gist.github.com/66d63607ee56de9192025c8606f57e3b

(needs tests).

Really, this is starting to make me like my ranges less... (but at least that code only has to be written once).

@jsimnz
Copy link

jsimnz commented Nov 14, 2021

Is there anything notable blocking from this landing in master? Would be great to use proper ranges, and if possible, let the underlying storage backend optimize range queries.

@Stebalien
Copy link
Member

Stebalien commented Nov 15, 2021 via email

@jsimnz
Copy link

jsimnz commented Nov 16, 2021

Would it make sense to scope this to another interface, something like RangeDatastore or something.

Implemented as:

type RangeDatastore interface {
   Datastore
   QueryRange(ctx context.Context, q query.Range)
}

Implementation based on your range code gist

This way it can be implemented by only those datastore implementations that actually support this kind of Query effectiently, and not worry about the weird edge case of implementing range queries on those that don't make sense to.

@Stebalien
Copy link
Member

Stebalien commented Nov 16, 2021 via email

@jsimnz
Copy link

jsimnz commented Nov 17, 2021

Would this then need to be split up into two issues?

  1. A Query refactor to remove the noted elements
  2. A updated query implementation that operates on Ranges instead of Prefix

Or leave it as a single refactor + feature PR?

I have a pretty good sense of how a Range might be implemented based on my existing usage and understanding of the package, along with your example gist for ranges.

I would need to look a little more into the NaiveQuery implementation stuff to make sure I knew all the various elements to effectively and safely remove points #1 and #2 you suggested from the existing codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants