Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First pass on JOSS paper #72

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

First pass on JOSS paper #72

wants to merge 2 commits into from

Conversation

rofinn
Copy link
Member

@rofinn rofinn commented Aug 26, 2020

No description provided.

@codecov
Copy link

codecov bot commented Aug 26, 2020

Codecov Report

Merging #72 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #72   +/-   ##
=======================================
  Coverage   94.90%   94.90%           
=======================================
  Files           6        6           
  Lines         157      157           
=======================================
  Hits          149      149           
  Misses          8        8           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 03c30e5...98b6f0c. Read the comment docs.

Copy link
Member

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there

Somewhere toward the introduction we should say something about the fact that best practice would be two use the standard formats all the time.
But that it is not practical, particularly for data that was intended only for short term use.
But often what was originally intended as short term data is something one much later realized that one wants to open again.
But because these non-standardized formats depend on the exact state of the enviroment which detemines the object layout, one might not be bale to open then.
JLSO makes sure you can always recover

JOSS/paper.bib Outdated Show resolved Hide resolved
JOSS/paper.bib Outdated Show resolved Hide resolved
JOSS/paper.md Show resolved Hide resolved
JOSS/paper.md Outdated Show resolved Hide resolved
JOSS/paper.md Outdated Show resolved Hide resolved
JOSS/paper.md Outdated Show resolved Hide resolved
JOSS/paper.md Outdated
As scientific computing software grows increasingly complex, the need to efficiently and reliably store sophisticated program objects has become a growing need. The expanding list of file formats for serializing objects is evidence of this problem. Unfortunately, these file formats typically come with usability versus reliability tradeoffs.

Choosing serialization formats such as CSV, JSON, BSON, and HDF5 are prudent choices for long term storage because they are application and language agnostic. Projects that evolve beyond an initial language or library decision can still load old experimental data.
These formats often support a limited set of types, requiring applications to define and maintain serialization methods for their custom objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These formats often support a limited set of types, requiring applications to define and maintain serialization methods for their custom objects.
These formats only ever support a limited set of types in the standard.
This requires applications to define and maintain their own serialization methods for objects outside that standard set.

JOSS/paper.md Outdated
Choosing serialization formats such as CSV, JSON, BSON, and HDF5 are prudent choices for long term storage because they are application and language agnostic. Projects that evolve beyond an initial language or library decision can still load old experimental data.
These formats often support a limited set of types, requiring applications to define and maintain serialization methods for their custom objects.

Using formats such as JLD (Julia Data), Python pickles or MAT files work best for convenient saving of arbitrary objects. Unfortunately, these formats are often highly coupled to the software dependencies (e.g., language version, software packages), which makes restoring the data more challenging as time passes and the software evolves.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should mention specifically in this section that JLD and MAT are extensions of HDF5, and that while this has some advantages of HDF5, it falls back in other cases to being no better than pickle or the julia serializer.
We should mention fully bespoke things like Pickle and the julia serialiser first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make it clear that this JLSO is firmly inside the second category, and is vulnerable to all the same problems.
but that it seaks to amiliorate them via keeping enough metadata around to recover.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, should we maybe have 3 categories then?

  1. General/agnostic file formats (e.g., CSV, BSON, HDF5)
  2. Language specific serializers like pickles or the julia serializer
  3. File formats that merge the two. Arguably JLSO fits into this category, but in a different way.

Didn't MAT files only extend the HDF spec in 7.3 though?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but MatLab 7.3 came out 14 years ago now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say 1 and 2 exist.

and that .JLD, .MAT, and BSON.jl are special cases of 2
but that for the right inputs may be partially or totally readable with tooling for 1.

and JLSO is a related but different case to that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MatLab 7.3 came out 14 years ago now.

Lol, good point.

JLSO is a related but different case to that.

Okay, I've opted to remove the reference to JLD and MAT for now. I think this simplifies the discussion, but I can add it back if you think it's worth distinguishing JLSO from those specific formats.

JOSS/paper.md Outdated Show resolved Hide resolved
JOSS/paper.md Outdated
Normal{Float64}(μ=50.2, σ=4.3)
```

While the metadata in JLSO files caters towards Julia users, the format itself is mostly language agnostic. A variety of internal object serialization formats can be used, and the metadata itself is saved as a BSON documented. The Julia language provides first-class support for package environments via Project.toml and Manifest.toml files, which provided an intuitive platform for building our prototype.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would restructure this paragraph.
Posibly into two?

Somehere in here should be implementation stuff.
"JLSO achives this via working with a language dependent serializer (julia serialization stdlib, or the extended BSON.jl format), and pairing that with metadata stored ina language independent BSON top level ...)
and that the concept of this format is is not julia specific, but JLSO.jl is.

We also want to avoid being so unclear as "the format itself is mostly language agnostic"

@rofinn
Copy link
Member Author

rofinn commented Aug 26, 2020

To simplify things, I've condensed the discussion about other file formats into the first paragraph and extended the format specification part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants