Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dict-like interface isn't looking good! #102

Open
buriy opened this issue Jul 2, 2019 · 7 comments
Open

Dict-like interface isn't looking good! #102

buriy opened this issue Jul 2, 2019 · 7 comments

Comments

@buriy
Copy link

buriy commented Jul 2, 2019

first_batch = x[{"batch": 1}]
three_examples = x[{"batch": slice(1, 4)}]

It doesn't look well, no IDE hints, and constructs a new dict every time.

Would you consider pandas-like interface instead?
x[x.batch==1] and x[x.batch>=1 && x.batch<4] syntax looks much better and allows for IDE hints.
x.batch[1] and x.batch[1:4] also can be used for this specific case -- that would be amazing!

@EndingCredits
Copy link

I was just about to make a separate issue for this, but I'll roll it into this one.

I have been thinking about this issue a bit and this is the solution I've come up with:

Method 1:
We set an order for slicing, and then use regular slicing. N.B: We set this order as a variable of the named-tensor, we don't do any transposing of the tensor.

e.g. We can set an order by using t['name1','name2,'name3'] and then access using t[a, b, c] which returns the equivalent to t[ { 'name1': a, 'name2': b, 'name3': c }].

This can be used in a number of ways:

  • t1 = t['name1','name2','name3'][a, b, c]
  • t1 = t['name1','name2','name3', a, b, c]
  • t1 = t['name1', a, 'name2', b, 'name3', c]

Plus we can set once and then treat our tensor as a normal tensor for a while:

t = t['name1','name2','name3']
t1 = t[a, b, c]
t2 = t[d, e, f]

Or maybe something like:

with t.order('name1','name2','name3'):
    t1 = t[a, b, c]
    t2 = t[d, e, f]

Method 2:

We abuse slicing notation and allow:
t1 = t['name1': x, 'name2': y:z, 'name3': w]
In this case we have three slices where the start member of each slice is set to the name string. This frees up the stop and step parts of the slice for our use, which we can treat as start and stop for our actual slicing. This does remove the capability to use the step component, although that currently isn't implemented anyway, and can be achieved through other means.

Example:
assert t['name1': 3, 'name2': 2:5, 'name3': -1] == t[{'name1': 3, 'name2': slice(2,5), 'name3': -1}]

Both these methods can be used together, although there's some question as to certain minor details of how this would work, e.g. what should the behaviour of t['name1', 'name2', 1, 'name3': 3, 2] be.

If you want to have a play around, I have a basic implementation here:
https://github.com/EndingCredits/namedtensor

@buriy
Copy link
Author

buriy commented Jul 5, 2019

@EndingCredits
Using raw names still isn't good.
You seemed to ignore all important points in my message.
Someone will write t['neme1'] and it won't work, because it's t['name1'].
t[t.name1] is much better, and allows operations like t[t.name1>5] or t.name[1:5].
And your t['name1': 3, 'name2': 2:5, 'name3': -1] even isn't a valid python code ( list indices must be integers or slices, not tuple ).
There are two good sources to look for similar well-known solutions:
Django API and SQLAlchemy API. Pandas used SQLAlchemy API (and did it lousily).

@EndingCredits
Copy link

I was't really commenting on your suggestion, just rolling the discussion into one thread.

Also, looks fine to me

@buriy
Copy link
Author

buriy commented Jul 6, 2019

@EndingCredits it is much better when linter/compiler warns you about this, instead of at run-time.

@srush
Copy link
Contributor

srush commented Jul 7, 2019

Thanks for the suggestions, this is helpful and I hear the two different approaches.

I agree the dict construction approach is a bit heavyweight. particularly in loops. I need to look into whether that is actually a problem though from a memory standpoint.

I see why the x.batch approach is appealing in a an editor (if you somehow declare the names). However [x.batch == 1] seems like a strange way to do constant time indexing, unless you see that being lazy somehow.

@buriy
Copy link
Author

buriy commented Jul 8, 2019

@srush for that you might have x.batch[1] as well, and anyway it's no worse than x[{"batch": 1}] .

@EndingCredits
Copy link

Memory-wise I imagine the overhead of constructing dicts is negligible compared to the actual tensor operations, unless somehow a tensor itself is being copied.

More importantly, the current way of doing thing is just very clunky, and a bit unintuitive. I feel any of the methods above would be a big improvement to usability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants