Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work on YAML-LD #3

Closed
gkellogg opened this issue May 16, 2022 · 29 comments
Closed

Work on YAML-LD #3

gkellogg opened this issue May 16, 2022 · 29 comments

Comments

@gkellogg
Copy link
Member

On the w3c/json-ld-syntax w3c/json-ld-syntax#389 proposes advancing work on YAML-LD. I was not able to transfer the issue to this repository, but further discussion and votes in support of starting the initiative in the JSON-LD CG should be voiced here. Given sufficient support, we'll create a yaml-ld repository for work to procede.

@gkellogg
Copy link
Member Author

With support, I'll set up a new repo with a template ReSpec document and pr-preview.

I support starting such an initiative.

@pchampin
Copy link
Contributor

Quoting @anatoly-scherbakov from w3c/json-ld-syntax#389 (comment)

I would propose the following grounds for the @$ replacement: while JSON is machine readable and writable, it is not very human readable and — especially — writable. YAML is much friendlier in that regard, due to much lower syntactic noise. The replacement of these characters helps to further reduce the said syntactic noise, making the data files therefore faster to type.

In general, I'd name manually writable semantic data the main purpose for YAML-LD.

I will be happy to participate in the standardization process if one is to be initiated, and to assist however I can.

I see your point about easing the manual editing of YAML-LD. It is a valid point.

But on the other hand, the principle of least surprise is important. Having YAML-LD behave differently from YAML2JSON + JSON-LD sounds like a very bad idea.

What I guess I could leave with is for "native YAML-LD" processors to accept $-keyworrds in addition to @-keywords, with a flag (named e.g. strict-keywords) to disable this behaviour (in case someone wants to use $-prefixed names for another purpose.

@gkellogg
Copy link
Member Author

What I guess I could leave with is for "native YAML-LD" processors to accept $-keyworrds in addition to @-keywords, with a flag (named e.g. strict-keywords) to disable this behaviour (in case someone wants to use $-prefixed names for another purpose.

Yes, and I think a similar flag for using $ instead of @ when serializing. Trying to preserve the original form using $ or @ is probably too complicated.

Note, that using the $ "namespace" will overlap with other existing uses, at least from JSON. For example, JSON Schema has $schema, $vocabulary and other other keywords that would not otherwise overlap with the JSON-LD keyword namespace.

@tetron
Copy link

tetron commented May 16, 2022

A formal YAML-LD variant that is a minimum-surprise syntax variant of JSON-LD is a good idea, and I don't want to get in the way of that discussion.

However, I just wanted to link my comment from the other issue:

w3c/json-ld-syntax#389 (comment)

JSON-LD has some limitations when working with certain idiomatic JSON structures. Schema Salad is a schema language for YAML and JSON documents which describes validation and transformation from idiomatic YAML structures to JSON-LD structures and then on to RDF. We have used it extensively for our own use case (describing the Common Workflow Language but I wanted to mention it here to see if there's interest in generalizing it, separately from this YAML-LD discussion.

@anatoly-scherbakov
Copy link
Contributor

@pchampin @gkellogg I agree with using @-keywords as a default, unless a flag is supplied to the processor to enable $-keywords. The YAML-LD preprocessor which converts $ to @ might also take care to convert only the keywords which are reserved by JSON-LD. This will also resolve the possible conflict with JSON Schema (I guess the two standards have different sets of keywords).

@tetron Thank you for the information, will have a look to compare Schema Salad with plain YAML-LD that I am currently using.

@VladimirAlexiev
Copy link
Contributor

@tetron
There is no dispute that JSON-LD is no schema language and is often complemented with JSON Schema.

JSON-LD has some limitations when working with certain idiomatic JSON structures.

But can you elaborate on this point using examples? I think that JSON-LD 1.1 is very flexible, eg local term definitions, add/ remove auxiliary keys that have no reflection in RDF, etc

@VladimirAlexiev
Copy link
Contributor

@OR13 @nicholascar please vote for YAML-LD CG workgroup above

@OR13
Copy link

OR13 commented May 17, 2022

I'm not sure I have the cycles to help much with YAML-LD, but I think its a good idea.... we use OAS / YAML with JSON-LD and JSON Schema often.

In particular, I like the idea of controlling both semantics and data shape at the same time, using only 1 file.

@tetron
Copy link

tetron commented May 17, 2022

But can you elaborate on this point using examples? I think that JSON-LD 1.1 is very flexible, eg local term definitions, add/ remove auxiliary keys that have no reflection in RDF, etc

So, schema salad was created in response to JSON-LD 1.0 (and roughly 5 years before JSON-LD 1.1), the main motivations were:

  • Use of YAML because it has comments and multi-line strings
  • Desire to be able to express the subject in the "key" part of an object. This is basically id maps of JSON-LD 1.1
  • Similarly, an equivalent to type maps
  • Default predicate assignment when a value is a scalar instead of an object
  • Relative identifiers that are syntactically scoped, e.g.

Examples of the last two:

steps:
  # this is an identifier map, the key is the id
  step1:
    in:
      # another identifier map,
      # the identifier is syntactically scoped so it gets appended to the 
      # enclosing id
      # the value is assigned to a default predicate of "source" since it's a scalar
      input_parameter: source_parameter_uri
    out: [output_parameter]

This is translated to json-ld 1.0 that looks something like this:

{
"steps": [{
   "@id": "step1",
    "in": {
      "@id": "step1/input_parameter",
      "source": "source_parameter_uri"
    },
   "out": [{
      "@id": "step1/output_parameter"
    }]
}]
}
  • Validation, but using Avro schema instead of json-schema, one of the reasons being that you could hoist Avro schema on top of JSON-LD (so the schema itself could be valid JSON-LD), at the time it seemed impossible to write a json-ld context for json-schema structures
  • Having a single file that describes the JSON data shape/validation, can generate the JSON-LD context, produce RDFS, and produce documentation (for the low, low price of inventing and maintaining a new data definition system).

We've also implemented code generators for Python and Java. It would probably also be straightforward to write a translator to express the schema as SHACL.

@VladimirAlexiev
Copy link
Contributor

I'll start a "YAML-LD UCRs" issue and include "polyglot modeling"

@nichtich
Copy link

Having YAML-LD behave differently from YAML2JSON + JSON-LD sounds like a very bad idea.

I fully agree. To avoid confusion I'd vote to

  1. either limit the specification of YAML-LD to half a page, essentially stating that YAML-LD is JSON-LD expressed in YAML, so any standard conversion YAML <-> JSON (without any exceptions such as $ / @ keys) can do

  2. or to make clear that YAML-LD is not "JSON-LD in YAML" but an independent data format, loosely based on JSON-LD. Even if it's 99% like JSON-LD, you cannot use standard tools but need a specific YAML-LD aware software library to avoid running into edge cases.

Option 1 can be done as part of JSON-LD, option 2 is an independent project with unknown outcome and adoption.

@anatoly-scherbakov
Copy link
Contributor

In my opinion, writeability is the primary reason to even have YAML-LD, and it is also my feeling that @$ replacement had greatly improved my own writing experience when authoring YAML-LD data files and contexts.

I believe the -LD suffix can be generally used to describe different data formats somehow augmented with Linked Data. For instance,

  • CSV-LD had been mentioned before (though I am uncertain what exactly it is and how it relates to CSVW);
  • IPLD is a medium to describe data in a distributed peer to peer way, and is a foundation for IPFS;
  • One can easily imagine TOML-LD, TSV-LD, Parquet-LD, Protobuf-LD, perhaps even Excel-LD.

This is not an argument but I am also using Markdown with YAML-LD front matter meta data, and call it Markdown-LD :)

Thus, if to choose from the options @nichtich has proposed, I would vote for (2). I still believe the specification can be half a page to describe the exact logic of the conversion from YAML-LD to JSON-LD and vice versa, but I believe that the potential space of -LD data formats is vast. Each of them should be equipped with its own tools, even though the meaning of those formats can probably be in most cases derived from JSON-LD 1.1 specification, and we could use JSON-LD contexts to interpret the data files.

I have written up a paper about YAML-LD recently and I would be happy to get feedback from the community, but the conference I submitted the paper to requires double blind review so I am uncertain whether I am at liberty to share the draft. Guess I would have to wait till the organizers' decision about whether they're going to publish it or not.

@pchampin
Copy link
Contributor

@nichtich I beleive that your option 2 above would be far too confusing for many people. I would rather make YAML-LD a superset of JSON-LD in YAML, i.e. adding some specific idioms / patterns that "native" YAML-LD processors would understand (e.g. the $-keywords replacement). But basically those specific idioms would be pre-processed for producing valid JSON-LD, which could then be fully compliant.

And I would make it easy to recognize/require YAML-LD documents that are strictly JSON-LD in YAML (i.e. do not require the pre-processing), like a media-type parameter (e.g. text/ld+yaml;profile=strict).

@VladimirAlexiev
Copy link
Contributor

Folks, please contribute requirements in #2

@VladimirAlexiev
Copy link
Contributor

VladimirAlexiev commented May 18, 2022

@pchampin @nichtich Sorry, I find your "will be confusing" argument unconvincing. Let me play devil's advocate and apply your argument to other situations:

  • "why do we need Turtle shortenings like s p o1, o2 and a: these are too confusing, let's stick with ntriples"
  • "why do we need N3 short forms like -> for log:infers and {graph1} -> {graph2} for inference. Too confusing, let's stick with Turtle". I think @josd and TimBL may have an issue with that ;-)
  • "why do we need JSON-LD given that RDF JSON can express any RDF? Will be too confusing for the hordes of novices who won't read the (excellent) JSON-LD spec". (I hope I won't get a beating from the folks in this repo for these words)
  • "why do we need SHACL-Compact, just write SHACL. People don't want to learn a second language, and it would be too confusing". But I find SHACLC a major feature for parity against SHEX. (And the argument even goes into details such as what does the grammar allow, see shaclc grammar: nodeOr vs propertyOr w3c/data-shapes#179)

As a data architect, I want to be able to write shortcut notation like this, and get proper RDF (see #2 Shortcuts), and even the other way around. This below is shorthand turtle, but I'd like a similar spirit in YAML.

:Person a rdf:Class;
:born a rdfs:Property; <- :Person; -> xsd:date.
Doc1234 :creator ~tobyink .
~tobyink a :Person; :born 1980-01-01 .
`Example-Distribution 0.001 cpan:TOBYINK` issued 2012-06-18 .

@OR13
Copy link

OR13 commented May 18, 2022

I left an example of our use of OAS (Open API Specification) and JSON-LD here: #2

TLDR; OAS supports JSON Schema represented in YAML, we tweaked the JSON Schema to support JSON-LD terms, so now we can present RDF types and JSON Schema types in a single YAML file.

@gkellogg
Copy link
Member Author

Having YAML-LD behave differently from YAML2JSON + JSON-LD sounds like a very bad idea.

I fully agree. To avoid confusion I'd vote to

  1. either limit the specification of YAML-LD to half a page, essentially stating that YAML-LD is JSON-LD expressed in YAML, so any standard conversion YAML <-> JSON (without any exceptions such as $ / @ keys) can do
  2. or to make clear that YAML-LD is not "JSON-LD in YAML" but an independent data format, loosely based on JSON-LD. Even if it's 99% like JSON-LD, you cannot use standard tools but need a specific YAML-LD aware software library to avoid running into edge cases.

Option 1 can be done as part of JSON-LD, option 2 is an independent project with unknown outcome and adoption.

JSON-LD 1.1 explicitly chose to use an intermediate representation to allow for other formats to map easily into that representation. While your option 1 can't realistically be done in 1/2 page, IMO any spec should confine itself to this level over interoperability. (This is consistent with allowing @ to be optionally replaced with $, I believe).

The challenges of getting JSON-LD standardized in the first place should be a cautionary for trying to do someone similar but different, so I don't see option 2 as viable.

@cmungall
Copy link

I was asked by @VladimirAlexiev to vote on this, so I did, but to be clear I am voting that there should be some official position on how to represent JSON-LD in YAML.

I am equally convinced by the two schools here:

  1. Avoid surprises and use identical keywords
  2. optimize for writability and use more yaml-friendly characters

I am personally biased towards 1 as I rarely author contexts directly, instead I autogenerate these from a yaml-native "polyglot" language (LinkML), but I am sympathetic to those who do author the contexts directly.

@pchampin
Copy link
Contributor

@VladimirAlexiev regarding your examples: Turtle (respectively N3) is a strict superset of N-Triples (respectively Turtle) so I don't see this as confusing. Shacl-C is a completely different syntax from Shacl in Turtle, so I don't see this a confusing. I agree that the co-existence of JSON-LD and RDF/JSON might be confusing, and notice that the RDF WG decided at the time to recommend only one of them (RDF/JSON is only a note).

@nichtich's proposition 2 is "an independent data format, loosely based on JSON-LD", where "loosely" could mean up to "99%". The closest this YAML-LD is to JSON-LD, the more surprising it will be for users when they hit a difference. Furthermore "you cannot use standard tools but need a specific YAML-LD aware softare" sounds a lot like reinventing the wheel.

I guess what I really don't want is to define something from scratch. If YAML-LD is to be based (even loosely) on JSON-LD, then I would strongly advise that YAML-LD processors are based on a JSON-LD processor under the hood. Experience shows that developing a bug-free JSON-LD processor is tricky. Requiring a similar, but different, effort for YAML-LD, seems like a bad idea.

@nichtich
Copy link

nichtich commented May 23, 2022

If YAML-LD is going to be more than a simple application of YAML syntax to express JSON-LD documents (proposition 1), it may help to reduce references to JSON-LD to avoid mixing levels of description (YAML, JSON-LD, RDF...). The specification could be limited to rules how to transform a YAML-LD document into a JSON document (e.g. replace defined use of $ with @, expand YAML tags...) without detailed knowledge of JSON-LD and RDF.

For instance the specification could explain how to transform these YAML documents

$id: "http://example.org"
---
$id: 1

into JSON documents {"@id": "http://example.org"} and {"@id": 1} respectively. The latter is not valid JSON-LD but this would be irrelevant to transformation rules from YAML-LD to JSON.

The final sentence of the specification would be requirement that JSON encoded by YAML-LD transformation rules must be valid JSON-LD. This can (and should) be checked with existing JSON-LD parsers.

@anatoly-scherbakov
Copy link
Contributor

@nichtich I'd agree. I tried to formalize this as follows.

A document D is designated as a valid YAML-LD document if, and only if:

  • D is a valid YAML document;
  • The following transformation converts D into a valid JSON-LD-document:
    • convert to JSON,
    • for every key and value that is a string and start from $ character, replace $ with @ if and only if the resulting string will be a valid JSON-LD reserved keyword as per its specification.

@pchampin
Copy link
Contributor

Idea: instead of $-keywords, why not defining a YAML tag for each JSON-LD keyword (i.e. !context, !type, ...). These tags would only expect an empty string, so they should be use "on their own", e.g.

!contex t:
  !vocab : http:/:example.com/ns/
!id : #test
!type : Foo
bar: baz

PROS: while $context and other $-keywords could (in theory) be intended as a regular JSON keys, tags have no direct JSON interpretation, so tag-keywords are unambiguous
CONS: from a short experiment, it seems you need a space between the tag and the colon (:) (while you don't need it with "regular" keys), which is error-prone...

@gkellogg
Copy link
Member Author

A couple of thoughts on how to specify YAML-LD as an extension of JSON-LD API:

  • All JSON is retrieved using the document loader. A YAML-LD spec can describe the requirements for an alternative loader.
  • Some consideration (for or against) an equivalent for finding a context using an HTTP Link Header.
  • YAML-LD and JSON-LD documents and contexts (and embedded HTML variations) should probably be intermixable, which wouldn't really require much extra to support.
  • Most API entry points end with words including "Resolve the promise with flattened output transforming flattened output from the internal representation to a JSON serialization, if necessary." (There may be some more points internally). Abstracting this to allow the output format to be specified with a new API option would yield more benefits beyond YAML-LD.

@gkellogg
Copy link
Member Author

There is certainly enough interest to start an activity for YAML-LD. I'll create and setup a repo for this purpose. I think both the spec and the URC documents can be in the same repo, but PR Preview will only use one of them for nicely formatted PRs.

I'll put out a proposal for a CG call (maybe @pchampin can help with a Zoom setup), which would be useful for more than just YAML-LD.

@gkellogg
Copy link
Member Author

Repo has be set up at https://github.com/json-ld/yaml-ld. If you would like to contribute, and are a member of the JSON-LD Community Group I can add you to the contributors team. Please create an issue (or respond to an already existing issue) to be added to the team.

@gkellogg
Copy link
Member Author

Moving this issue to the yaml-ld repo.

@gkellogg gkellogg transferred this issue from json-ld/json-ld.org May 25, 2022
@anatoly-scherbakov
Copy link
Contributor

@pchampin this is an interesting idea but I'd say that the required space character is a great irregularity introduced to the syntax, and is a potential source of mistakes.

It might be interesting to use YAML tags for something in YAML-LD context, but I have no idea how at present. $type seems to work fine to assign RDF types to nodes. I do not have any other ideas.

@pchampin
Copy link
Contributor

@anatoly-scherbakov

@pchampin this is an interesting idea but I'd say that the required space character is a great irregularity introduced to the syntax, and is a potential source of mistakes.

I agree, unfortunately.

It might be interesting to use YAML tags for something in YAML-LD context, but I have no idea how at present.

I created a dedicated issue for this: w3c/shacl#6.

@VladimirAlexiev
Copy link
Contributor

Closing this. If you've made important remarks above, please post them as separate issues.
In particular @gkellogg #3 (comment), and maybe @pchampin ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants