-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental language parsing #96
Comments
I'm interested in working on this as i'm trying to add mf2 parsing to https://github.com/fguillot/picoFeed and it currently supports language detection for XML feeds. Recent conversation: https://indiewebcamp.com/irc/2016-05-07#t1462646589527 A tricky scenario that @voxpelli raised with nested p-* and languages specific to them: https://indiewebcamp.com/irc/2016-05-07#t1462651125104 |
@gRegorLove I'm looking at the parsed result and it looks like it's including an <div class="h-entry" lang="sv" id="postfrag123">
<h1 class="p-name">En svensk titel</h1>
<div class="e-content" lang="en">With an <em>english</em> summary</div>
<div class="e-content">Och <em>svensk</em> huvudtext</div>
</div> {
"type": [
"h-entry"
],
"properties": {
"name": [
"En svensk titel"
],
"content": [
{
"html": "With an <em>english<\/em> summary",
"value": "With an english summary",
"html-lang": "en"
},
{
"html": "Och <em>svensk<\/em> huvudtext",
"value": "Och svensk huvudtext",
"html-lang": "sv"
}
],
"html-lang": "sv"
}
} The |
Yeah ... had to solve this locally as well yesterday (kept busting interating over the properties by not providing an array). |
I am moving the language parsing behind a feature flag until this is sorted out. That way you can opt in to have the language parsing happen, but must be aware that it's still experimental. |
Ok. I'm generally interested as other formats support languages as well. Still working on implementing it though. |
Oops. I'll add some explicit tests for that and work on the fix. |
@aaronpk Edit: disregard. Per later conversation, "lang" doesn't appear at the same level as any mf properties in the parsed results, so shouldn't cause conflicts. |
It would be valuable to get a working proof of concept of language parsing built for one of the mf2-parsers and the php-mf2 library along with the javascript one are two good candidates for that.
The discussion around language parsing is happening here: http://microformats.org/wiki/microformats2-parsing-brainstorming#Parse_language_information
There's a similar issue as this in the javascript MF2 parser here: glennjones/microformat-shiv#22
And the original PR to create proof of concept for an old version of the javascript mf2 parser can be found here: glennjones/microformat-node#23
To achieve the language parsing in php-mf2 one can probably utilize the fact that a
DOMNode
has aparentNode
property (see docs) and use that to traverse the document tree upwards until one reach the firstlang=
attribute or one reaches the end of the tree. Then one knows what the language of a node is (apart from some defaults that may have been specified in the eg. the HTTP-response, see HTML5 docs) and one can then know whether to add the language attribute or not.Update: As @gRegorLove pointed out on IRC it may be hard to add the proposed output without breaking backwards compatibility, so the new output would either have to be introduced as a new major version or, probably preferably, as an opt-in feature flag for now that those who wants to use language data here and now can use while those who prefer to wait for a future major version before updating to support the new output could do so.
The text was updated successfully, but these errors were encountered: