-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extending RSV to support Base64-encoded binary data out-of-the-box #1
Comments
To my knowledge, if you know that the resulting binary is 8-bit aligned, you can also skip the = character at the end of the Base64 string So you would get |
Base64 encoding is a 6-bit encoding scheme, but since only The only thing would be that you can't cleanly view the characters, which also hinders the ability to copy. I don't know if that's an important consideration though. |
I would like to see the see the RSV spec be kept as minimal as possible, so that it might someday replace CSV which itself is very simple (and I think that's one of it's strongest attributes). I like that RSV has a formal spec, maybe not complete, and maybe needs improvement, but it sure beats the raft of different conventions that plague CSV. I also believe that RSV not being plain text, and therefore not editable by humans in a dumb text editor will remove validation errors that people can introduce. @CC007, I'd like to see something very close to RSV as it's currently defined be formalized. Once we have a very simple and strong base, other formats/encodings, like you've proposed, can be built on top of it. |
The author also addressed this concern:
|
Idea
From a comment on your Youtube video by Rik Schaaf (me) (https://www.youtube.com/watch?v=tb_70o6ohMA&lc=Ugzsfj_OUAK4s_IYaNZ4AaABAg):
Example
So:
Would translate to:
So in essence, without prefix you get UTF-8 encoded data and with the \FB prefix you get Base64 encoded data (ASCII and UTF-8 compatible, to my knowledge)
What is this addition trying to do
The advantage from this encoding addition is that non-unicode characters could also be represented without risk of collisions, including the RSV special characters themselves.
Another advantage is that some data types can be stored more efficiently, like numbers and dates.
What is this addition NOT trying to do (but what could be added in a separate issue)
This is not a change to add the data types themselves to RSV. This additional special character only signifies the encoding, not the datatype, so you wouldn't know if the data represents an integer, timestamp, float, etc., just like you wouldn't know this with the current implementation. This is still left to the program that is using the RSV file.
If the data type would have to be derived from this binary data, the base64 value could be prefixed (after the \FB) by a string surrounded by non-base64 characters, to signify the data type, like
(i32)
for 32-bit integers.Example:
...which would represent a single integer (int32) value that equals 1234567890.
Or you could use something more simple, but restrictive typing system, that uses a single non-base64 character to define the type, followed by a single character for the size.
...where # defines an integer and 4 defines a size of 4 bytes (32 bit):
1234567890
...where ~ defines a floating point value and 4 defines a size of 4 bytes (32 bit):
3.141592...
This is out of scope for this issue though.
Considerations
With this addition, the name isn't really accurate anymore, so would this be RBSV (Rows of Binary or String Values)?
The text was updated successfully, but these errors were encountered: