Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make dependencies optional #18

Open
wants to merge 52 commits into
base: master
Choose a base branch
from

Conversation

har7an
Copy link

@har7an har7an commented Mar 5, 2024

Hello,

thanks for creating this amazing package! I'll use it in a new project of mine to work with JSON and, being a Rust programmer, I was missing a replacement for serde until I found this. Amazing work!

I tend to be very picky about the dependencies I pull into a project and try to keep them minimal. Since I only need JSON personally, I was wondering if the rest could be left out of the package.

This is my first attempt at writing an extension in Julia. What I did is make JSON a weak dependency and create a new module in the ext/ subfolder for this purpose. Thanks to how the package is written, I pulled out the JSON-specific SerJson.jl, ParJson.jl and DeJson.jl into that new extension as Ser.jl, Par.jl and De.jl respectively and had to change barely a thing in the process.

The Serde package now has a new file JSON.jl that checks at runtime whether the JSON extension is present and exposes the functions from there to the user. Users that don't import the JSON package will not be able to serialize/deserialize objects, but neither will JSON be pulled in unconditionally by Serde. The code isn't pretty yet, I was mostly trying to see if it works and wanted to discuss the idea with you before investing more time.

If you're interested, I'd be happy to hear your feedback and convert the other "languages" into extensions as well!

Pull request checklist

  • Did you bump the project version?
  • Did you add a description to the Pull Request?
  • Did you add new tests?
  • Did you add reviewers?

@har7an har7an requested a review from gryumov as a code owner March 5, 2024 09:25
@har7an
Copy link
Author

har7an commented Mar 5, 2024

Oh and here's a REPL session to prove it basically works:

julia> import Pkg;
julia> Pkg.activate(;temp=true);
julia> Pkg.develop(;path=".");
julia> Pkg.add("JSON");
julia> import Serde, JSON;
julia> x = Serde.to_json(5)
"5"

julia> @assert 5 == Serde.deser_json(Number, x)

@dmitrii-doronin
Copy link
Contributor

Hello, @har7an!

We are big fans of Rust here, too.

Thank you for your contribution. I really like the idea of giving options for providing your own parser. Having JSON as a weak dependency is definitely a boon, too. We should include something like that in a future release.

I have only one concern though. Wouldn't it mean that we would have to keep track of compatibility with every parser option we add to ext? Maybe an overloaded function for a parser package would be a better course of action in this case?

@har7an
Copy link
Author

har7an commented Mar 5, 2024

Hi @dmitrii-doronin,

I have only one concern though. Wouldn't it mean that we would have to keep track of compatibility with every parser option we add to ext?

I'm not sure I understand what you're referring to. Serde has to keep track of the version of the parsers it relies on at the moment in any case. At the moment that's managed with the compat table in Project.toml and I see you're not checking in a Manifest.toml either.

With my changes in place, the compat table in Project.toml still exists and is considered by Pkg when performing package resolution. The only difference w.r.t. dependency management that I see is that the dependencies declared as weakdeps aren't installed unconditionally. Everything else should basically stay the same as before.

But I'm not a seasoned Julia developer, so I may have a wrong picture there. That's how I understand the situation.

Maybe an overloaded function for a parser package would be a better course of action in this case?

How exactly would that change the "keeping track of parser compatibility" bit?

@har7an
Copy link
Author

har7an commented Mar 5, 2024

Ah, and I forgot:

I really like the idea of giving options for providing your own parser.

I'm not saying we let users provide their own parsers, that is not this PRs scope (Although I guess that technically it might be possible). I'm merely allowing users to pull in the dependencies they want, but only from the set we support. So if you want to use Serde with Json, you must pull in the JSON package we depend on in the Project.toml (with the same UUID and everything). If a user wants to use Serde with some custom MyCoolJSON package, they'll still have to patch Serde to allow that, same as before.

@dmitrii-doronin
Copy link
Contributor

Ah, I think i get you now. So in the grand scheme of things you want to make CSV, YAML, etc. optional for a user to install?

@har7an
Copy link
Author

har7an commented Mar 5, 2024

Ah, I think i get you now. So in the grand scheme of things you want to make CSV, YAML, etc. optional for a user to install?

Exactly, IMO users should "pay" only for what they really want to have in their application.

@dmitrii-doronin
Copy link
Contributor

That's a really good idea. I'd have to look deeper into Julia extensions, since, to be fair, I have never used this feature before. But we definitely want to merge this at some point.

@gryumov cc

@gryumov
Copy link
Member

gryumov commented Mar 7, 2024

Hello @har7an! Thank you for the suggestion – I really like this idea. @dmitrii-doronin, let's assess the user-friendliness of this feature and aim to propose a solution by next week.

@dmitrii-doronin
Copy link
Contributor

propose a solution by next week.

Great, I'll look into it. @har7an does this sit well with you?

@har7an
Copy link
Author

har7an commented Mar 7, 2024

@dmitrii-doronin I think that'll work, or else I'll let you know. I should have a solution by tuesday, but probably earlier.

I'll leave the question of whether you like my "design decisions" for later then and make a usable "prototype" first. Changing how we integrate the extensions into the main thing is comparatively little effort.

@har7an
Copy link
Author

har7an commented Mar 11, 2024

Okay I added some more changes now, mostly to restore the tests back to working order. All the tests from the "JSON" group seem to pass now, but I get failures for test case 30 from deser.jl where it checks for nothing and missing. As far as I can tell it doesn't construct the expected error type (WrongType), but raises an error somewhere else instead.

My knowledge about Julia macros and the codebase isn't really big enough to determine the cause yet. Pretty much the only change I can think of that may have caused this is in Utl/Macros.jl, where I had to patch a JSON-specific macro. I left the macro there, although one may consider moving it into the JSON extension. If we do that, however, I'm not sure how we could keep the documentation intact since afaik Documenter cannot document items from foreign packages and I don't know how extensions are handled in that regard.

Next I'll work on making all the other "markup languages" extensions as well.

@har7an
Copy link
Author

har7an commented Mar 11, 2024

Moved CSV into an extension now as well.

It just occured to me: I think it would be a good idea to define the prototypes for all the to_* and parse_* functions inside the Serde package. That way we could point to them inside the Serde documentation and I assume that LSPs would pick them up as well. We could make the default impl raise a descriptive error in case the extension is missing and otherwise, transparently forward the arguments to the respective functions inside the extensions. What do you think? And should I keep going to implement the other extensions as well right now, or do JSON and CSV suffice for the user-friendliness evaluation you had in mind?

@dmitrii-doronin
Copy link
Contributor

As far as I can tell it doesn't construct the expected error type (WrongType), but raises an error somewhere else instead.

I've seen this issue, too. It's version dependent, unfortunately. Conversion to an output type of functions seems to throw ErrorException in 1.10 instead of MethodError. Check the tests with 1.9.4.

@dmitrii-doronin
Copy link
Contributor

dmitrii-doronin commented Mar 11, 2024

It just occured to me: I think it would be a good idea to define the prototypes for all the to_* and parse_* functions inside the Serde package. That way we could point to them inside the Serde documentation and I assume that LSPs would pick them up as well.

I like the idea. We could move all the interface functions into a singular module and then have extensions expand on them. it has recently occurred to me that we're missing 'ignore_field' for queries. Extending a singular interface to modules should alleviate this.

What do you think? And should I keep going to implement the other extensions as well right now, or do JSON and CSV suffice for the user-friendliness evaluation you had in mind?

The best course I think would be to go in small incremental steps. We can workout this PR and use it for some time to see, whether it's convenient or not. Does it mean that usage of JSON extension would force user to explicitly import JSON in every use-case?

@dmitrii-doronin
Copy link
Contributor

Btw, you can ping me in Julia official slack.

@har7an
Copy link
Author

har7an commented Mar 12, 2024

@dmitrii-doronin

Check the tests with 1.9.4.

Jup, that works, thanks. Should I investigate that as part of this PR, or do we
accept the situation for now?

We could move all the interface functions into a singular module and then
have extensions expand on them.

I'm not sure we mean the same thing (yet). You can have a look at
src/{CSV,JSON,YAML,...}.jl to see what I have settled on for the moment.
Essentially I'm collecting all args and kwargs (as varargs) and passing them
through to the extension explicitly. I tried defining the raw interface (as
function foo end) and then just extending that in the extensions, but this
would throw errors about unknown functions when calling them. Maybe there was
some "glue" missing, I don't know. My current approach of using the Ext
module I introduced has the benefit that it allows us to throw a descriptive
error message when the extension isn't available, giving the user instructions
how to fix that. It's not ideal yet, but we can smooth that out further, of
course.

Does it mean that usage of JSON extension would force user to explicitly
import JSON in every use-case?

That is correct. Afaik that's also how e.g. savefig in Plots works. At
least the last time I checked it required me to import FreeType, FileIO, but
apart from that I don't have to do anything with these packages.

For the tests, I "solve" this by just importing all the packages at the top of
runtests.jl.

Btw, you can ping me in Julia official slack.

Thanks for the offer, but I don't have Slack and don't feel like changing that.
For the development here I'd prefer communication in this PR so "external"
users can follow what was done and why. :)

I have now moved all the bits that need dependencies to extensions (JSON, YAML,
TOML, XML, CSV). Query doesn't really qualify as an extension since it doesn't
have an external dependency, so maybe we can compact the Query bits into a
single file, wdyt? On the same topic: The src/{De,Ser,Par} folders are now
pretty empty. We could compact/flatten their contents into files perhaps? I'll
leave that decision up to you.

The package in its current state, when imported, pulls in a total of 4
dependencies:

  • Serde
  • Dates
  • Printf
  • Unicode

Afaik all of these are part of the Stdlib, so they don't require external code
at all. All the rest is loaded "on-demand", when the dependencies for the
extensions are imported.

@har7an har7an changed the title WIP: Make dependencies optional Make dependencies optional Mar 12, 2024
@henrik-wolf
Copy link

Hey there! I was playing around with this package a bit and came here to see if there is already some work going into making the parsers optional. Glad to see this already tackled!

From my (little) experience / what I read and heard about package extensions I would like to offer a bit of information about how they are supposed to work. I believe this could be helpful to remove the fairly complicated (and possibly hacky...) infrastructure about exposing the extension functions from the main package.

package extensions are supposed to extend on already existing functions rather than exporting new symbols. The docs have a nice example for this: https://pkgdocs.julialang.org/v1/creating-packages/#Conditional-loading-of-code-in-packages-(Extensions)

The general workflow for the extension should thus be that all functions you want to extend are defined in the main package, either as simple as

module Serde
function to_json end
function from_json end
...
end

or with a default method which throws an informative error:

module Serde
to_json(args...; kwargs...) = error("please import the JSON module...")
from_json(args...; kwargs...) = error("please import the JSON module...")
...
end

and then in the extension you add more specific methods to the respective functions:

module SerdeJSONExt

using Serde  # get access to barebones functions
using JSON

function Serde.to_json(...)
# do things
end

function Serde.from_json(...)
# do things
end

When you want add methods to a function which is owned by another package, you either have to explicity import it as
import Serde.from_joson or put the owning package in front of the definition. Otherwise, you just define a new function in this module which happens to have the same name... (I ran into this multiple times in the past...)

@har7an
Copy link
Author

har7an commented Mar 26, 2024

Hi @SuperGrobi,

thanks for sharing your insights! I already noticed that the current approach doesn't really work, unfortunately. It fails for example when trying to override the serialization behavior for a custom type. So I'll try to rebase the code and then pull out more potentially interesting functions into the base package, so we only override existing things in the extension. I'll keep you posted here.

@har7an har7an force-pushed the feature/optional-dependencies branch from 87d52a8 to 123ee10 Compare March 26, 2024 16:41
@henrik-wolf
Copy link

henrik-wolf commented Mar 26, 2024

Ah. I think I see the problem now. The relevant functions are currently under Serde.SerJSON.ser_value, so you would have to define the methods in Serde.SerJSON and then overwrite them in the extension?

I just had another idea, which might or might not work with the whole package architecture... in theory it should be possible to dispatch on the modules used for serialisation:

ser_value(m::Module, ::Type{T}, ::Val{x}, v::V) where {T,x,V}) = ser_value(Val(m), T, x, v)

and then have users extend for JSON

ser_value(::Val{JSON}, ::Type{MyType}, Val{MyFieldName}, v::MyFieldType) = ...

(just an idea. this breaks if there is no external dependecy for serialisation.)

@har7an
Copy link
Author

har7an commented Mar 26, 2024

@SuperGrobi I've had the same thought already, but I'm indecisive which is better. My thought was that, since we have the "module" names included in all the public functions already (i.e. to_json, parse_json, ...), we may as well stick with how it works right now. This way it will be less surprising for users upgrading to this version (although this change probably justifies a major version increase in any case).

If I were to write the code from scratch I'd probably go for your solution, though.

@henrik-wolf
Copy link

Fair. That would come pretty close to a full rewrite.

@har7an
Copy link
Author

har7an commented Mar 26, 2024

JSON and YAML should be back in a working state now, the rest I'll take care of another day. I've moved the SyntaxError-variants into the public API of the Serde base package now, because I consider them to be part of the public API. This has the direct consequence, that the exception field of YamlSyntaxError is now of type Any, since YAML.ParseError would require us to pull in the YAML dependency (which is exactly what I'm trying not to do) and isn't a subtype of Exception either, unfortunately. I hope that's okay!

If you disagree with my decision please let me know, but then we'll have to introduce a different, opaque error mechanism.

@har7an
Copy link
Author

har7an commented Apr 2, 2024

So I've just tried to integrate the current state of the package with an application of mine, but it doesn't work as I expect. I'm at a loss with the dispatch mechanism and how methods for functions are actually chosen. I think that with the current design, my approach doesn't work. :(

I'll try a few more things with dispatching on modules to see whether what I had initially intended is possible at all. I'll keep you updated.

@dmitrii-doronin
Copy link
Contributor

Hi, @har7an and @SuperGrobi. You're both doing amazing work here. I am preoccupied with other tasks right now, so I can't try tackling this issue currently. However, I know this package pretty well. So, if you need any insight into the inner workings of the package, just give me a ping here and I'll do my best to help you out with it.

@dmitrii-doronin
Copy link
Contributor

@har7an @SuperGrobi I gave it a thought and the best course of action, IMO, is to have own parsers inside Serde. We a) are not breaking the current interface and b) allow for the extensions without major rewrites. What do you think?

@har7an
Copy link
Author

har7an commented May 6, 2024

Hey @dmitrii-doronin,

sorry for the delay, I got a little sidetracked the last weeks. I'm not sure about that. Doesn't that entail effectively rewriting everything that's already done in JSON.jl, YAML.jl, ... ?

Today I did a little exercise with dispatching on Val{...} and it looks like a viable approach. My idea is that we implement a "generic" to_string and from_str(ing) (and possible also to_writer and from_reader) which take a Module type as first argument in the public API. Then inside that function we convert the Module to a Val (thanks for the tip, @henrik-wolf), which we can in turn implement in the package extensions.

If an extension isn't loaded (due to e.g. the dependency not being loaded), it will throw a MethodError (dispatching fails due to unknown signature). So I'd like to catch that MethodError and then display an error message explaining the situation. I'm honestly no fan of exceptions because I don't see how you can tell where the MethodError happened, but I'll see what I can do to determine whether it failed on dispatch. In my little toy example this already (sort of) works.

Then we can decide what to do: We can either keep the current interface and redirect each of these calls. For example: to_yaml then becomes to_string(YAML, ... internally (if we say we want to stick with this interface for the time being). Or we deprecate these "old" functions via the @deprecate macro if we want to nudge users to using the new API.

I'll go ahead and implement this today to see whether it can actually work. My plan is to rewrite the YAML submodule at least (because I already have code that leverages this particular functionality). I'll let you know when it's done.

I see that quite a bit of code has been added since I last rebased. I'll first try to make this version of the code work (somehow) and then think about incorporating all the new code, I think. This is gonna be fun...

@har7an
Copy link
Author

har7an commented May 6, 2024

TOML and YAML are now reimplemented, the tests still work. I'll implement the rest later today and then give you a quick rundown of how it works and what it looks like.

har7an added 28 commits June 4, 2024 07:46
and define all functions to be extended in the "base" module, rather
than defining new functions inside the extension and making them
accessible from the base package.
and define all functions to be extended in the "base" module, rather
than defining new functions inside the extension and making them
accessible from the base package.
of all code in the body to ensure they're recognized as methods without
concrete implementation.
that will allow us to dispatch on functions from package extensions.
for the new public API to allow better function dispatch.
for the new public API to allow better function dispatch.
for the new public API to allow better function dispatch.
for the new public API to allow better function dispatch.
for the new public API to allow better function dispatch.
for when a required module isn't present, to inform the user what they
have to do now.
for all extension modules and throw a descriptive error message when the
user attempts to call a function without having the module imported.
for the `to_string` method and friends by deriving a symbol across the
modules `fullname` rather than the first name component. This
(hopefully) makes symbols unambiguous and allows adding methods for
arbitrary user-provided modules, too.
in function definition and don't derive it by hand any longer.
in the XML and YAML extensions, which would cause `from_string` to fail
in unexpected ways.
for all types that are now implemented with the extension mechanism.
`to_string`, `from_string` and `parse`.
that houses the new public API using the `to_string` and `from_string`
functions and friends. These aren't exported since their names are
pretty nondescript by themselves, so we're reimporting the in the base
module to make them accessible there instead.
and integrate Query into the new module structure. It is now also
implemented using dispatch and the new API functions `to_string` and
friends. The old-style API has been kept intact.
into a new subdirectory `Formats/` so they don't clutter up the project
base.
and, since the containing folder is not empty, move it to the `src/`
directory.
and move the file to the `src/` folder since the containing folder is
now empty.
@har7an har7an force-pushed the feature/optional-dependencies branch from 37f2dd7 to 2716445 Compare June 4, 2024 07:50
@har7an
Copy link
Author

har7an commented Jun 4, 2024

I tried integrating this into some code of mine and I noticed that, since we're using extensions now to provide YAML/JSON/..., users can no longer override the Ser${LANG}/${LANG}_value function at will.

I guess technically these functions have never been part of the public API (they aren't exported, nor documented, nor mentioned in the docs) so we actually achieved encapsulation in Julia. I could imagine that more users than only me sort of use this in their own code, though...

I think in the long run it would be neat to have something like a serialize_with, as Rusts serde crate has. Then users can bring their own transformation functions to modify values as they see fit before we dump them into some IO buffer. But given the size of my PR as it currently is, that's probably out of scope for here. :(

Or do we already have such a thing and I'm unaware of it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants