Skip to content
This repository has been archived by the owner on Jul 7, 2021. It is now read-only.

Fix up chunked uploading #36

Merged
merged 4 commits into from
Nov 2, 2020
Merged

Fix up chunked uploading #36

merged 4 commits into from
Nov 2, 2020

Conversation

xloem
Copy link
Contributor

@xloem xloem commented Oct 2, 2020

Somehow in transitions it looks like the chunked uploading feature broke.

This change fixes it up again.

Chunked uploading is done by passing an iterator as the body data; see https://requests.readthedocs.io/en/master/user/advanced/#chunk-encoded-requests

I believe you can also pass a file-like object to stream with known size for #30

Copy link
Contributor

@mrcnski mrcnski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the contribution. Couple comments:

  1. Can you please create a quick test for this function? See the existing upload tests for examples.
  2. Lint CI is failing

@mrcnski
Copy link
Contributor

mrcnski commented Oct 2, 2020

Hmm should we keep the path parameter? If custom_filename is not passed then the filename is empty. It would be consistent with the Go SDK where we require a filename when uploading generic data.

@xloem
Copy link
Contributor Author

xloem commented Oct 2, 2020

The existing tests are uploading a 5B file and verifying the upload matches the skylink for a 156KB pdf ... this does not seem correct ... EDIT: learning about responses.activate

@xloem
Copy link
Contributor Author

xloem commented Oct 2, 2020

Whew! Confused myself with tests

Hmm should we keep the path parameter? If custom_filename is not passed then the filename is empty. It would be consistent with the Go SDK where we require a filename when uploading generic data.

I've made the interface change you requested, but I'm noting that this changes the interface from how chunked uploading used to be here.

I like the way the go library takes a map of filenames to content: do you think that would work here?

@mrcnski
Copy link
Contributor

mrcnski commented Oct 5, 2020

I like the way the go library takes a map of filenames to content: do you think that would work here?

That would be excellent! Ideally the SDKs are consistent with each other, especially the APIs, but I am short on time. So these kinds of fixes are really appreciated!

@xloem
Copy link
Contributor Author

xloem commented Oct 6, 2020

Looking at this some, I'm noticing that chunked uploading (which wonderfully supports live streaming uploads) works for single file uploads only with the requests library. Additionally the go SDK does not have a feature for chunked uploads. It seems like supporting generic upload, which the go SDK has normalized, is a separate beast from supporting chunked upload. It might not be hard to upload an iterator as chunked if only one file is provided.

with additional support for chunked uploading.
@xloem
Copy link
Contributor Author

xloem commented Oct 6, 2020

@m-cat I've squashed and rebased and simplified to addition of a method comparable to the generic one used by the go sdk. It takes a map of filenames to file-like objects or bytes data, and if there is only one file can do chunked uploading via the requests library's feature, if the file data is an iterator. Does this look good?

siaskynet/_upload.py Outdated Show resolved Hide resolved
@mrcnski
Copy link
Contributor

mrcnski commented Oct 20, 2020

Sorry for the delay in reviewing, we were super busy. Taking a look now.



def upload_file_request_with_chunks(self, path, custom_opts=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't delete any methods because it would break backwards compatibility. I would keep this and just have it call the new method

Copy link
Contributor Author

@xloem xloem Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can do this but i think there are compatibility issues elsewhere too; i originally made this PR while porting my code to the newer interfaces. (EDIT: notably upload_file_request_with_chunks has never functioned in the new api; ever since the path argument, which is actually a data iterator, was treated as a string and normalised)

siaskynet/_upload.py Outdated Show resolved Hide resolved
not isinstance(data, bytes) and
not isinstance(data, str) and
not hasattr(data, 'read')):
# an iterator for chunked uploading
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good but it should be documented so people know this functionality is there

Copy link
Contributor Author

@xloem xloem Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the documentation I added to upload_file_request_with_chunks sufficient?

Copy link
Contributor

@mrcnski mrcnski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good apart from a few points. Thanks for taking the time to make this conform to the Go SDK. BTW, have you tested uploads to skynet with this code?

@mrcnski
Copy link
Contributor

mrcnski commented Oct 20, 2020

Just ran into an issue with the Go SDK that I think applies here too. If we upload a directory with a single file, it will be treated as a file upload instead of a directory upload. This can be unexpected for the user, and a directory should be uploaded even if it's only a single file.

Edit: I'm fixing this in the Go SDK by checking if opts.CustomDirname is set -- if it is, we upload as a directory.

@xloem
Copy link
Contributor Author

xloem commented Oct 23, 2020

@m-cat I believe I've added all the changes you requested.

I'm testing this locally and I discovered that python requests doesn't stream file objects when they are passed as multipart data (it reads the whole content of each file in one go). If you pass a single file as the body of a post request, it does stream it, reading it in a configurable chunk size defaulting to 8K.

Would it be reasonable to change the behavior of python-skynet to pass all single files directly as body content? This would simplify the implementation and provide streaming for all single files. The tests would change to expect this.

@xloem xloem mentioned this pull request Oct 23, 2020

ftuples = []
for filename, data in upload_data.items():
ftuples.append((fieldname,
(filename, data)))

if len(upload_data) == 1:
if opts['portal_file_fieldname'] == fieldname:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't very elegant. I would change this check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it. The reason the code is awkward here is I wanted to keep it similar to the go code to ease new changes. Let me know if you have more preferences around it.

Copy link
Contributor

@mrcnski mrcnski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for your patience! Approving this on its 1-month anniversary 🙂

@mrcnski mrcnski merged commit 354cbe3 into NebulousLabs:master Nov 2, 2020
@mrcnski
Copy link
Contributor

mrcnski commented Nov 2, 2020

Published version 2.1.0 which includes these changes 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants