Backend decoupling and typed memory interface #63

alexandermorozov · 2016-05-02T11:25:07Z

I've implemented feature related to decoupling from #37. Main commit can be viewed here. Below is commit message for convenience:

Change SharedTensor::read() signature from
fn read(&self, device: &DeviceType) -> Result<&MemoryType, ...>
into
fn read<D: IDevice(&self, device: &D) -> Result<&D::M, ...>
New signature provides type-level guarantee that if a Cuda device is passed
into read(), then it'll return Cuda memory (and not Native or OpenCL).
Previously required additional unwraps (.as_native().unwrap()) are no
longer required, code is more clear and concise.

Internally SharedTensor uses Any type to store objects of different types
uniformely. Synchronization between memories is also done through type-erased
interface. This makes it possible to define a new Framework in an external
crate, or extract Cuda and OpenCL frameworks into their own crates. Though
error types would require some additional work.

Use of "dynamic typing" has drawbacks -- mainly slightly larger runtime
overhead. Before this patch benchmarks showed that SharedTensor::read() takes
19-22ns, now it takes 23-26ns. For comparison, minimal synchronized CUDA
operation will take about 10-40us. Small NN layers on CPU are much faster,
e.g. 10-input softmax layer takes about 500ns. Still, in typical NNs overhead
looks negligible, and I think it's fair tradeoff for code clarity and better
decoupling.

Here are actual benches, before:

test bench_shared_tensor_access_time_first                            ... bench:          19 ns/iter (+/- 2)
test bench_shared_tensor_access_time_second                           ... bench:          21 ns/iter (+/- 0)

after:

test bench_shared_tensor_access_time_first                        ... bench:          23 ns/iter (+/- 0)
test bench_shared_tensor_access_time_second                       ... bench:          26 ns/iter (+/- 3)

What's your opinion on it?

The text was updated successfully, but these errors were encountered:

This was referenced Feb 2, 2017

Merge refactored collenchyma code jonysy/parenchyma#2

Closed

Alexander refactor jonysy/parenchyma#5

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backend decoupling and typed memory interface #63

Backend decoupling and typed memory interface #63

alexandermorozov commented May 2, 2016

Backend decoupling and typed memory interface #63

Backend decoupling and typed memory interface #63

Comments

alexandermorozov commented May 2, 2016