-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First pass on JOSS paper #72
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #72 +/- ##
=======================================
Coverage 94.90% 94.90%
=======================================
Files 6 6
Lines 157 157
=======================================
Hits 149 149
Misses 8 8 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting there
Somewhere toward the introduction we should say something about the fact that best practice would be two use the standard formats all the time.
But that it is not practical, particularly for data that was intended only for short term use.
But often what was originally intended as short term data is something one much later realized that one wants to open again.
But because these non-standardized formats depend on the exact state of the enviroment which detemines the object layout, one might not be bale to open then.
JLSO makes sure you can always recover
JOSS/paper.md
Outdated
As scientific computing software grows increasingly complex, the need to efficiently and reliably store sophisticated program objects has become a growing need. The expanding list of file formats for serializing objects is evidence of this problem. Unfortunately, these file formats typically come with usability versus reliability tradeoffs. | ||
|
||
Choosing serialization formats such as CSV, JSON, BSON, and HDF5 are prudent choices for long term storage because they are application and language agnostic. Projects that evolve beyond an initial language or library decision can still load old experimental data. | ||
These formats often support a limited set of types, requiring applications to define and maintain serialization methods for their custom objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These formats often support a limited set of types, requiring applications to define and maintain serialization methods for their custom objects. | |
These formats only ever support a limited set of types in the standard. | |
This requires applications to define and maintain their own serialization methods for objects outside that standard set. |
JOSS/paper.md
Outdated
Choosing serialization formats such as CSV, JSON, BSON, and HDF5 are prudent choices for long term storage because they are application and language agnostic. Projects that evolve beyond an initial language or library decision can still load old experimental data. | ||
These formats often support a limited set of types, requiring applications to define and maintain serialization methods for their custom objects. | ||
|
||
Using formats such as JLD (Julia Data), Python pickles or MAT files work best for convenient saving of arbitrary objects. Unfortunately, these formats are often highly coupled to the software dependencies (e.g., language version, software packages), which makes restoring the data more challenging as time passes and the software evolves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we should mention specifically in this section that JLD and MAT are extensions of HDF5, and that while this has some advantages of HDF5, it falls back in other cases to being no better than pickle or the julia serializer.
We should mention fully bespoke things like Pickle and the julia serialiser first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make it clear that this JLSO is firmly inside the second category, and is vulnerable to all the same problems.
but that it seaks to amiliorate them via keeping enough metadata around to recover.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, should we maybe have 3 categories then?
- General/agnostic file formats (e.g., CSV, BSON, HDF5)
- Language specific serializers like pickles or the julia serializer
- File formats that merge the two. Arguably JLSO fits into this category, but in a different way.
Didn't MAT files only extend the HDF spec in 7.3 though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but MatLab 7.3 came out 14 years ago now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say 1 and 2 exist.
and that .JLD, .MAT, and BSON.jl are special cases of 2
but that for the right inputs may be partially or totally readable with tooling for 1.
and JLSO is a related but different case to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MatLab 7.3 came out 14 years ago now.
Lol, good point.
JLSO is a related but different case to that.
Okay, I've opted to remove the reference to JLD and MAT for now. I think this simplifies the discussion, but I can add it back if you think it's worth distinguishing JLSO from those specific formats.
JOSS/paper.md
Outdated
Normal{Float64}(μ=50.2, σ=4.3) | ||
``` | ||
|
||
While the metadata in JLSO files caters towards Julia users, the format itself is mostly language agnostic. A variety of internal object serialization formats can be used, and the metadata itself is saved as a BSON documented. The Julia language provides first-class support for package environments via Project.toml and Manifest.toml files, which provided an intuitive platform for building our prototype. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would restructure this paragraph.
Posibly into two?
Somehere in here should be implementation stuff.
"JLSO achives this via working with a language dependent serializer (julia serialization stdlib, or the extended BSON.jl format), and pairing that with metadata stored ina language independent BSON top level ...)
and that the concept of this format is is not julia specific, but JLSO.jl is.
We also want to avoid being so unclear as "the format itself is mostly language agnostic"
To simplify things, I've condensed the discussion about other file formats into the first paragraph and extended the format specification part. |
No description provided.