Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental language parsing #96

Open
voxpelli opened this issue May 7, 2016 · 8 comments
Open

Experimental language parsing #96

voxpelli opened this issue May 7, 2016 · 8 comments

Comments

@voxpelli
Copy link
Contributor

voxpelli commented May 7, 2016

It would be valuable to get a working proof of concept of language parsing built for one of the mf2-parsers and the php-mf2 library along with the javascript one are two good candidates for that.

The discussion around language parsing is happening here: http://microformats.org/wiki/microformats2-parsing-brainstorming#Parse_language_information

There's a similar issue as this in the javascript MF2 parser here: glennjones/microformat-shiv#22
And the original PR to create proof of concept for an old version of the javascript mf2 parser can be found here: glennjones/microformat-node#23

To achieve the language parsing in php-mf2 one can probably utilize the fact that a DOMNode has a parentNode property (see docs) and use that to traverse the document tree upwards until one reach the first lang= attribute or one reaches the end of the tree. Then one knows what the language of a node is (apart from some defaults that may have been specified in the eg. the HTTP-response, see HTML5 docs) and one can then know whether to add the language attribute or not.

Update: As @gRegorLove pointed out on IRC it may be hard to add the proposed output without breaking backwards compatibility, so the new output would either have to be introduced as a new major version or, probably preferably, as an opt-in feature flag for now that those who wants to use language data here and now can use while those who prefer to wait for a future major version before updating to support the new output could do so.

@gRegorLove
Copy link
Member

I'm interested in working on this as i'm trying to add mf2 parsing to https://github.com/fguillot/picoFeed and it currently supports language detection for XML feeds.

Recent conversation: https://indiewebcamp.com/irc/2016-05-07#t1462646589527

A tricky scenario that @voxpelli raised with nested p-* and languages specific to them: https://indiewebcamp.com/irc/2016-05-07#t1462651125104

@aaronpk
Copy link
Member

aaronpk commented May 27, 2017

@gRegorLove I'm looking at the parsed result and it looks like it's including an html-lang property in the wrong place.

<div class="h-entry" lang="sv" id="postfrag123">
  <h1 class="p-name">En svensk titel</h1>
  <div class="e-content" lang="en">With an <em>english</em> summary</div>
  <div class="e-content">Och <em>svensk</em> huvudtext</div>
</div>
{
    "type": [
        "h-entry"
    ],
    "properties": {
        "name": [
            "En svensk titel"
        ],
        "content": [
            {
                "html": "With an <em>english<\/em> summary",
                "value": "With an english summary",
                "html-lang": "en"
            },
            {
                "html": "Och <em>svensk<\/em> huvudtext",
                "value": "Och svensk huvudtext",
                "html-lang": "sv"
            }
        ],
        "html-lang": "sv"
    }
}

The html-lang property in the content is correct, but there's also an html-lang property inside properties which isn't what's described on the brainstorming page.

@jkphl
Copy link
Contributor

jkphl commented May 27, 2017

Yeah ... had to solve this locally as well yesterday (kept busting interating over the properties by not providing an array).

@aaronpk
Copy link
Member

aaronpk commented May 27, 2017

I am moving the language parsing behind a feature flag until this is sorted out. That way you can opt in to have the language parsing happen, but must be aware that it's still experimental.

@jkphl
Copy link
Contributor

jkphl commented May 27, 2017

Ok. I'm generally interested as other formats support languages as well. Still working on implementing it though.

@gRegorLove
Copy link
Member

Oops. I'll add some explicit tests for that and work on the fix.

@aaronpk
Copy link
Member

aaronpk commented May 27, 2017

Fixed in #124!

I'll push out a new release with this change once #112 is done too!

@gRegorLove
Copy link
Member

gRegorLove commented May 30, 2017

@aaronpk Before you push out a new release, will need to switch back to "html-lang" per https://chat.indieweb.org/microformats/2017-05-30/1496166813294000

Edit: disregard. Per later conversation, "lang" doesn't appear at the same level as any mf properties in the parsed results, so shouldn't cause conflicts.

jkphl added a commit to jkphl/micrometa that referenced this issue Dec 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants