-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify SharedTensor syncing #37
Comments
Interface can be implemented like this (breaking change):
It's also possible to add some restricted slicing to What I don't like here:
|
I've created a draft/prototype here. It's still work in progress, but please let me know what do you think about it! |
I've updated the prototype. There is full list of features in comments in the head of the file, here is short version:
What is not done:
@hobofan @MichaelHirn I'm ready to refactor collenchyma, plugins and leaf according to this proposal, but I'd like to know your opinion on it beforehand. It'd be a waste if after all is done it'd turn out that it was clear from the start that PR wouldn't be accepted. Of course during refactoring something unexpected may still show up and it'll turn out that those changes are a bad idea after all, but that's another matter. |
Hey @alexandermorozov, That is pretty huge, I looked through the code and read about the intention. Following my thoughts and ideas: (It might be, that I take concepts from your earlier comments, which are no longer in the code. I try to avoid it, though.)
Conclusion: I like your approach and your proof of concept. I just have the feeling, that the code changes might have gone a bit further than necessary. Before we move further, I would like to find a solution that gives us all the benefits while changing only what is really needed. |
@MichaelHirn, hi! I agree, scope is rather big, but that's mostly to see what is possible within this approach and what is impossible, to avoid restricting future possibilities. I don't think that every feature mentioned here should be implemented. Answering your questions:
Can you tell more about multi-device and async? Multi-device as in sync between OpenCL devices and Cuda? Or between several Cuda devices? If latter, then would they share same Cuda context? From cursory glance it looks like async transfers require pinned host memory, is this the case? That may not mix well with native ndarray types... And it looks like from performance point of view collenchyma should be able to set up several Cuda -> Hosts transfers and be able to wait on all of them for completition... |
Nice, thank you so much for the clarification. Helped a lot. I think there are some great concepts, that we should lend from and use for the 0.1 release of Collenchyma. I am very excited to move this forward with you. I will make some free time to expand my answer and suggestions on Thursday. |
Wow, that looks pretty awesome! It generally looks good to me. As already mentioned in 1. I think an enum instead of version numbers are probably faster and should be well optimizable by the compiler. Apart from that I can only provide the bikeshedding opinon that I think |
@hobofan Well, if you mean enums like I'll begin to work on a stand-alone PR for enchanced synchronization, since it mostly has no downsides (and big upsides!). After that I'll try to benchmark/implement decoupling, it'll become clear what it costs in runtime and if it's worth it. |
@alexandermorozov enum SyncState {
Uninitalized,
Outdated,
Latest
} though that would require setting all other copies to |
@hobofan Ah, I see. Well, in practice |
I need to push it until tomorrow, I think. I reviewed the topic re the |
@MichaelHirn No problem! I'm quite busy too, most likely I won't have enough mindspace to implement anything until weekend. |
I discussed most of the topics with @hobofan as well last week and we reached a conclusion on nearly all points. The summary is, that the implementations you proposed make a lot of sense and would provide great value for the future. Also, sorry for the delay, the last days were kind of unusual and I wanted to make sure, that I can follow the reasoning of the new
|
|
Remove methods `sync()`, `get()`, `get_mut()`, `remove_copy()` of `SharedTensor` and introduce new set of methods: `read()`, `read_write()`, `write_only()`, `drop_at()`. Signature of `SharedTensor::new()` has also changed. New API has following benefits: - limited checks of use of uninitialized memory, - better memory tracking: several memories can be simultaneously marked as up-to-date, so some synchronization operations might be skipped, - backing memory is automatically allocated on the first use and added to `SharedTensor`, even if it's immutable. Mutability is required only for reshaping, modifying actual data and dropping memories. Rationale and design decisions are discussed at the corresponding bugtracker issue. BREAKING CHANGE: sync and memory management API of `SharedTensor` CLOSE: autumnai#37
Remove methods `sync()`, `get()`, `get_mut()`, `remove_copy()` of `SharedTensor` and introduce new set of methods: `read()`, `read_write()`, `write_only()`, `drop_at()`. Signature of `SharedTensor::new()` has also changed. New API has following benefits: - limited checks of use of uninitialized memory, - better memory tracking: several memories can be simultaneously marked as up-to-date, so some synchronization operations might be skipped, - backing memory is automatically allocated on the first use and added to `SharedTensor`, even if it's immutable. Mutability is required only for reshaping, modifying actual data and dropping memories. Rationale and design decisions are discussed at the corresponding bugtracker issue. BREAKING CHANGE: sync and memory management API of `SharedTensor` CLOSE: autumnai#37
Remove methods `sync()`, `get()`, `get_mut()`, `remove_copy()` of `SharedTensor` and introduce new set of methods: `read()`, `read_write()`, `write_only()`, `drop_at()`. Signature of `SharedTensor::new()` has also changed. New API has following benefits: - limited checks of use of uninitialized memory, - better memory tracking: several memories can be simultaneously marked as up-to-date, so some synchronization operations might be skipped, - backing memory is automatically allocated on the first use and added to `SharedTensor`, even if it's immutable. Mutability is required only for reshaping, modifying actual data and dropping memories. Rationale and design decisions are discussed at the corresponding bugtracker issue. BREAKING CHANGE: sync and memory management API of `SharedTensor` CLOSE: autumnai#37
Remove methods `sync()`, `get()`, `get_mut()`, `remove_copy()` of `SharedTensor` and introduce new set of methods: `read()`, `read_write()`, `write_only()`, `drop_device()`. Signature of `SharedTensor::new()` has also changed. New API has following benefits: - limited checks of use of uninitialized memory, - better memory tracking: several memories can be simultaneously marked as up-to-date, so some synchronization operations might be skipped, - backing memory is automatically allocated on the first use and added to `SharedTensor`, even if it's immutable. Mutability is required only for reshaping, modifying actual data and dropping memories. Rationale and design decisions are discussed at the corresponding bugtracker issue. BREAKING CHANGE: sync and memory management API of `SharedTensor` CLOSE: autumnai#37
Remove methods `sync()`, `get()`, `get_mut()`, `remove_copy()` of `SharedTensor` and introduce new set of methods: `read()`, `read_write()`, `write_only()`, `drop_device()`. Signature of `SharedTensor::new()` has also changed. New API has following benefits: - limited checks of use of uninitialized memory, - better memory tracking: several memories can be simultaneously marked as up-to-date, so some synchronization operations might be skipped, - backing memory is automatically allocated on the first use and added to `SharedTensor`, even if it's immutable. Mutability is required only for reshaping, modifying actual data and dropping memories. Rationale and design decisions are discussed at the corresponding bugtracker issue. BREAKING CHANGE: sync and memory management API of `SharedTensor` CLOSE: autumnai#37
During refactoring (autumnai#37) several error were upgraded into panics. Those errors may happen only if internal logic of `SharedTensor` is incorrect and leads to inconsistent state and broken invariants.
Remove methods `sync()`, `get()`, `get_mut()`, `remove_copy()` of `SharedTensor` and introduce new set of methods: `read()`, `read_write()`, `write_only()`, `drop_device()`. Signature of `SharedTensor::new()` has also changed. New API has following benefits: - limited checks of use of uninitialized memory, - better memory tracking: several memories can be simultaneously marked as up-to-date, so some synchronization operations might be skipped, - backing memory is automatically allocated on the first use and added to `SharedTensor`, even if it's immutable. Mutability is required only for reshaping, modifying actual data and dropping memories. Rationale and design decisions are discussed at the corresponding bugtracker issue. BREAKING CHANGE: sync and memory management API of `SharedTensor` CLOSE: autumnai#37
During refactoring (autumnai#37) several error were upgraded into panics. Those errors may happen only if internal logic of `SharedTensor` is incorrect and leads to inconsistent state and broken invariants.
Remove methods `sync()`, `get()`, `get_mut()`, `remove_copy()` of `SharedTensor` and introduce new set of methods: `read()`, `read_write()`, `write_only()`, `drop_device()`. Signature of `SharedTensor::new()` has also changed. New API has following benefits: - limited checks of use of uninitialized memory, - better memory tracking: several memories can be simultaneously marked as up-to-date, so some synchronization operations might be skipped, - backing memory is automatically allocated on the first use and added to `SharedTensor`, even if it's immutable. Mutability is required only for reshaping, modifying actual data and dropping memories. Rationale and design decisions are discussed at the corresponding bugtracker issue. BREAKING CHANGE: sync and memory management API of `SharedTensor` CLOSE: autumnai#37
During refactoring (autumnai#37) several error were upgraded into panics. Those errors may happen only if internal logic of `SharedTensor` is incorrect and leads to inconsistent state and broken invariants.
…_CHANGELOG] REFERENCE: autumnai#37
Refactor code CUDA and Native backend to match #autumnai/collenchyma/62 that provides enchanced memory management and syncronization. Since memory management is now automatic, `*_plain` variants of functions are removed. BREAKING CHANGE: *_plain versions of API functions are removed, arguments of their counterpart functions may have changed in mutablity. REFERENCE: autumnai/collenchyma#37, autumnai/collenchyma#62 refactor/native: convert to the new memory management API Convert Native backend. Code now compiles.
Refactor code CUDA and Native backend to match #autumnai/collenchyma/62 that provides enchanced memory management and syncronization. Since memory management is now automatic, `*_plain` variants of functions are removed. BREAKING CHANGE: *_plain versions of API functions are removed, arguments of their counterpart functions may have changed in mutablity. REFERENCE: autumnai/collenchyma#37, autumnai/collenchyma#62
Refactor code CUDA and Native backend to match #autumnai/collenchyma/62 that provides enchanced memory management and syncronization. Since memory management is now automatic, `*_plain` variants of functions are removed. BREAKING CHANGE: *_plain versions of API functions are removed, arguments of their counterpart functions may have changed in mutablity. REFERENCE: autumnai/collenchyma#37, autumnai/collenchyma#62
Refactor code CUDA and Native backend to match #autumnai/collenchyma/62 that provides enchanced memory management and syncronization. Since memory management is now automatic, `*_plain` variants of functions are removed. BREAKING CHANGE: *_plain versions of API functions are removed, arguments of their counterpart functions may have changed in mutablity. REFERENCE: autumnai/collenchyma#37, autumnai/collenchyma#62
Refactor code CUDA and Native backend to match #autumnai/collenchyma/62 that provides enchanced memory management and syncronization. Since memory management is now automatic, `*_plain` variants of functions are removed. BREAKING CHANGE: *_plain versions of API functions are removed, arguments of their counterpart functions may have changed in mutablity. REFERENCE: autumnai/collenchyma#37, autumnai/collenchyma#62
Use .read()/.write_only()/.read_write() instead of .sync()/.add_device()/.get() calls. REFERENCE: autumnai/collenchyma#37, autumnai/collenchyma#62
…I [SKIP_CHANGELOG] REFERENCE: autumnai/collenchyma#37, autumnai/collenchyma#62
Fix SharedTensor::new() usage. REFERENCE: autumnai/collenchyma#37, autumnai/collenchyma#62
Here are links related PRs: In my experience those PRs have introduced no regressions and can be merged. Though it's better to resolve autumnai/collenchyma-nn#46 before that, so I could port those changes to my patches. |
|
The current way of syncing a SharedTensor is a bit unintuitive and in some parts not quite complete.
Things to consider changing (loosely ordered by importance):
add_device
to one that doesn't return anError
if the Tensor is already tracking memory for that device. The current behaviour is unintuitive. This change is breaking and will require adjustments in all plugins.add_device
andsync
. This should clean up end-applications and the "managed" part of plugins.Native
without manually having to use the function from1.
The text was updated successfully, but these errors were encountered: