-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema.org-flavored Content Models #2
Comments
We could also talk about how we mapped the M1313 schema over with schema.org to identify the most salient metadata :) |
Yes, we should say that. :) I also recall us taking some lessons from RDFa syntax. We should mention that at our webinar next week week. |
I just received a notice about this, and have looked at the website. I get what the webinar is supposed to communicate, but I have some questions:
It looks like a reasonably good idea, but I don't see any technical artifacts in this repo. |
Hi Mark, I'll let the rest of the group chime in, but I do recall that one of the reasons we didn't use vanilla schema.org is due to it's breadth. It's pretty vast and we were trying to pull out the core parts that we felt would be manageable for authors. We used the M1313 (project open data) to help identify candidates (which overlapped with schema.org). Would you have preferred just using schema.org instead? |
I'd be sure to include that not only do the schema.org content models improve SEO (by allowing your templates to include data appropriately for google and other crawlers), but can be reused towards the end of populating the same meta tags that twitter cards (and other social networking sites) use. @elucify , The "schema" is really just that those collections of fields should be present as attributes in your rendered HTML, its not a strict standard of what needs to be in the content types of your CMS. It is not a schema is the truest sense of the word, which I found confusing at first. |
@logantpowell , we do discuss a little bit the relationship to schema.org and RDF-A on the FAQ page in the "what are content models" question, but this could probably be strengthened/calrified. Any suggestions for updated language? @elucify , we've thought about creating formalized schema definitions, but decided to start with the HTML descriptions you see currently. If we do this, what format(s) do you think would be the most useful? |
This discussion really mirrors some of the conversations in the working group and brings up some good issues. Content models can serve two purposes:
|
Thanks for all the responses! That's a lot to reply to, but I'll try to hit the points one at a time. @logantpowell: About the vastness of schema.org: yes, it's big, but it has the benefit that is standardized. Many of the elements are optional, so one approach would have been to just write a document indicating whether elements are required, recommended, optional, or discouraged based on your organizational priorities. I guess you could consider M1313 to be a standard also; as they say, the nice thing about standards as there are so many to choose from. I suppose the difference here is philosophical. It seems to me that a large, possibly unwieldy model that is standardized is better than something that is more or less isomorphic to an existing standard, but not exactly, and introduces more accidental complexity. On the other hand, there's enormous, unwieldy HL7. @vito, when you say the elements should be present as attributes of Web documents, do you mean these elements should be used to markup HTML semantically? What systems are going to be able to make use of that annotation? Or am I misunderstanding your point? @logantpowell: Personally, I would not necessarily have "preferred" pure schema.org. It just seems to me more useful to choose a standardized markups game that is already being interpreted by search engines. Maybe schema.org elements marked up in webpages using RDFa (and not using microdata) would be a good standards-based approach. I think it might be clearer what your content models are for if you were to provide concrete examples of how to use them in information processing systems; for example, using them to mark up webpages for SEO; or, as a document exchange format between agencies. @smileytech, it doesn't seem to me that creating a RelaxNG compact syntax schema for the model you have created would not be too much work. [Since we're on GitHub, I imagine I will be immediately invited to send a pull request :-).] I prefer RelaxNG-CS because it can be transformed to XSD (which many validators, etc. use), but is more easily readable, writable, and explainable, and is transformable to XSD, which most validators use. In fact, if your content model were expressed in RelaxNG, you could use it to generate the documentation. Generally, I think it's easier to start with the machine processable format, and transform it to something human-readable, instead of trying to go the other way. @lgrama, if your content models truly are simplifications of schema.org, and there is an unambiguous map between them, it would be nice to see that map in the repo. Looking between the models, it seems that your Title element is the same as the schema.org event.name. But I'm not really sure, and an explicit map would clarify such questions. Some of the elements I find puzzling. For example, I can't tell what RelatedURLs means. As far as I can tell, there's no way to indicate the relationship between the document into the URLs that it says are "related". This severely limits the utility of those links, because there's no way for information processing system to know what the links are for. An additional benefit of adopting RDFa for link markup would be that the links within related URLs could be scoped by additional RDFa statements, providing semantics to what is now just a bag of links. These are just some thoughts. I look forward to seeing where this goes. Sorry I won't be able to make it to your demos tomorrow, I'm sure things would be clearer to me if I could. Cheers |
@elucify I like the idea of putting a table where we show the schema.org schema along side the M1313 schema to identify 'required' or 'highly recommended' metadata. Actually, this is very much the same as how we ended up with the schema. @ALL: should / could we do something like this using GitHub's new interactive tables? |
FWIW, if it'll help, I'm happy to help anybody get things in Github Pages, a la Project Open Data's tables and charts. G |
I do think it would be helpful to see a mapping to Schema.org at least to see where better alignment is easily possible. In some cases it almost looks like the Article content model intentionally diverges from schema.org. For example the property @logantpowell The M1313 schema (which I'll refer to as the Project Open Data or POD schema) is actually based on DCAT and the schema.org Dataset schema was later based on DCAT as well. You can find the mapping between DCAT and Schema.org Dataset schema at: http://www.w3.org/wiki/WebSchemas/Datasets#Mappings Unfortunately the current POD schema was developed before DCAT was finalized and DCAT evolved a bit since Project Open Data came out. We're now in the process of updating the POD schema to address issues that have come up in the past year and also to better align with DCAT/Schema.org. You can track that progress at https://github.com/project-open-data/project-open-data.github.io/labels/schema You can also see the mapping between the POD Schema, DCAT, and Schema.org at: |
@philipashlock as we progress down this path, I wonder if we should just be spreading the existing gospel rather than rolling our own... Perhaps, if we feel really passionate about adding some metadata to a schema.org model, we could rather act as a liaison for government within the schema.org community? Thoughts? |
@philipashlock Will the updated POD schema include 'audience'? |
I just created a pull request (#6) of a crosswalk table to see where there were opportunities for more alignment between this Article Content model and the schema.org Article Type. There's definitely a lot of alignment to begin with - although none of the camel-cased capitalization matches. The places where they diverge seem almost more accidental rather than intentional since it's not clear what value might come from the alternative. It does look like there are a few instance where fields were based on DCAT/POD instead of Schema.org, but since the Schema.org Dataset Type already defines a mapping to DCAT it would make more sense to stick with one vocabulary. A few notes about ambiguities or incompatibilities in the mapping:
|
@logantpowell There's no proposal I'm aware of to add I personally do think it would make more sense to extend schema.org rather than do something sort of inspired by it, but not actually using it. There have already been a number of Schema.org Types that were developed for more government specific purposes, eg GovernmentService (which is particularly relevant to usa.gov) and Dataset which is based on DCAT (which is primarily focused on use cases where government is the publisher). There's information about extending schema.org at http://schema.org/docs/extension.html or you can join the mailing list http://lists.w3.org/Archives/Public/public-vocabs/ |
One point that I think is worth emphasizing is that the nested types for properties in schema.org aren't necessarily required and can instead just be text. Viewed this way schema.org actually seems quite simple and not so vast. With this in mind, the schema.org Article type could be even simpler than the proposed Article Content Model which has several nested types. That said, I do see the value in providing a profile that's a little more of an explicit usage of a schema.org type, so it might still be good to articulate required properties and whether nested types should be required, encouraged, discouraged, or prohibited for certain properties. Here's the clarifying language:
|
I actually don't think we will need to extend schema.org for our purposes as a working group, but we could definitely share the guidance you shared and perhaps go over some reasons folks might want to do this. I also think we should discuss further your comment about the advantages / disadvantages implicit in using either the nested or text-only approach. |
I want to second the need to map these models to schema.org.
|
Should we mention that these models are mostly derived from schema.org? It might help folks sell it in their agencies if they can say that they're in cahoots with Google, Facebook, Etsy, Github, etc...
source: http://getschema.org/index.php/List_of_websites_using_Schema.org
The text was updated successfully, but these errors were encountered: