Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alphabetic characters in @after attribute #76

Open
jonathanrobie opened this issue Jun 14, 2023 · 3 comments
Open

Alphabetic characters in @after attribute #76

jonathanrobie opened this issue Jun 14, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@jonathanrobie
Copy link
Contributor

When the text of a node starts with punctuation, the @after attribute contains the last character of the word. This is a bug in our transform.

You can find examples of this using this expression:

//*[matches(@after, '\p{L}')]

Result:

<w role="adv" ref="ROM 15:25!1" after="" class="adv" xml:id="n45015025001" lemma="νυνί" normalized="νυνί" strong="3570" gloss="Now" domain="067002" ln="67.39" morph="ADV" unicode="—νυνὶ">—νυν</w>
<w role="v" ref="2CO 6:2!1" after="ι" class="verb" xml:id="n47006002001" lemma="λέγω" normalized="λέγει" strong="3004" number="singular" person="third" tense="present" voice="active" mood="indicative" gloss="He says" domain="033006" ln="33.69" morph="V-PAI-3S" unicode="—λέγει" frame="A0:n47006001011" subjref="n47006001011">—λέγε</w>
<w ref="2CO 12:2!8" after="ε" class="conj" xml:id="n47012002008" lemma="εἴτε" normalized="εἴτε" strong="1535" gloss="whether" domain="089010" ln="89.69" morph="CONJ" unicode="—εἴτε">—εἴτ</w>
<w role="v" ref="2CO 12:2!22" after="α" class="verb" xml:id="n47012002022" lemma="ἁρπάζω" normalized="ἁρπαγέντα" strong="726" number="singular" gender="masculine" case="accusative" tense="aorist" voice="passive" mood="participle" gloss="having been caught up" domain="018001" ln="18.4" morph="V-2APP-ASM" unicode="—ἁρπαγέντα" frame="A1:n47012002024">—ἁρπαγέντ</w>
<w ref="2CO 12:3!6" after="ε" class="conj" xml:id="n47012003006" lemma="εἴτε" normalized="εἴτε" strong="1535" gloss="whether" domain="089010" ln="89.69" morph="CONJ" unicode="—εἴτε">—εἴτ</w>
<w ref="JHN 4:2!1" after="ε" class="conj" xml:id="n43004002001" lemma="καίτοιγε" normalized="καίτοιγε" strong="2544" gloss="although indeed" domain="089011" ln="89.72" morph="CONJ" unicode="—καίτοιγε">—καίτοιγ</w>
<w ref="JHN 7:22!8" after="χ" class="adv" xml:id="n43007022008" lemma="οὐ" normalized="οὐχ" strong="3756" gloss="not" domain="069002" ln="69.3" morph="PRT-N" unicode="—οὐχ">—οὐ</w>
<w ref="EPH 5:9!1" after="" class="det" xml:id="n49005009001" lemma="" normalized="" strong="3588" number="singular" gender="masculine" case="nominative" gloss="-" domain="092004" ln="92.24" morph="T-NSM" unicode="—ὁ">—</w>
<w role="v" ref="EPH 5:10!1" after="ς" class="verb" xml:id="n49005010001" lemma="δοκιμάζω" normalized="δοκιμάζοντες" strong="1381" number="plural" gender="masculine" case="nominative" tense="present" voice="active" mood="participle" gloss="discerning" domain="027004" ln="27.45" morph="V-PAP-NPM" unicode="—δοκιμάζοντες" frame="A0:n49003001013 A1:n49005010004" subjref="n49003001013">—δοκιμάζοντε</w>
<w role="s" ref="HEB 7:20!7" after="" class="det" xml:id="n58007020007" lemma="" normalized="οἱ" strong="3588" number="plural" gender="masculine" case="nominative" gloss="those ones" domain="092004" ln="92.24" morph="T-NPM" unicode="—οἱ" referent="n58007008006">—ο</w>
<w ref="HEB 7:22!1" after="" class="prep" xml:id="n58007022001" lemma="κατά" normalized="κατά" strong="2596" gloss="By" domain="089005" ln="89.8" morph="PREP" unicode="—κατὰ">—κατ</w>
<w role="p" ref="GAL 2:6!7" after="ί" class="adj" type="interrogative" xml:id="n48002006007" lemma="ὁποῖος" normalized="ὁποῖοι" strong="3697" number="plural" gender="masculine" case="nominative" gloss="whatsoever" domain="058004" ln="58.30" morph="A-NPM" unicode="—ὁποῖοί">—ὁποῖο</w>
<w role="v" ref="ACT 22:2!1" after="ς" class="verb" xml:id="n44022002001" lemma="ἀκούω" normalized="ἀκούσαντες" strong="191" number="plural" gender="masculine" case="nominative" tense="aorist" voice="active" mood="participle" gloss="Having heard" domain="032001" ln="32.1" morph="V-AAP-NPM" unicode="—ἀκούσαντες" frame="A0:n44021040014" subjref="n44021040014">—ἀκούσαντε</w>
<w ref="LUK 2:35!1" after="" class="conj" xml:id="n42002035001" lemma="καί" normalized="καί" strong="2532" gloss="and" domain="089017" ln="89.93" morph="CONJ" unicode="—καὶ">—κα</w>
<w role="s" ref="LUK 23:51!1" after="ς" class="pron" type="demonstrative" xml:id="n42023051001" lemma="οὗτος" normalized="οὗτος" strong="3778" number="singular" gender="masculine" case="nominative" gloss="he" domain="092007" ln="92.29" morph="D-NSM" unicode="—οὗτος" referent="n42023050003">—οὗτο</w>
<w ref="1CO 9:15!22" after="" class="det" xml:id="n46009015022" lemma="" normalized="τό" strong="3588" number="singular" gender="neuter" case="accusative" gloss="the" domain="092004" ln="92.24" morph="T-ASN" unicode="—τὸ">—τ</w>

Here is the text of those nodes:

—νυν
—λέγε
—εἴτ
—ἁρπαγέντ
—εἴτ
—καίτοιγ
—οὐ
—
—δοκιμάζοντε
—ο
—κατ
—ὁποῖο
—ἀκούσαντε
—κα
—οὗτο
—τ
@jonathanrobie jonathanrobie added the bug Something isn't working label Jun 14, 2023
@jonathanrobie
Copy link
Contributor Author

Note that you can get exactly the same set of results with this query:

//w[
  matches(substring(normalize-space(.), 1, 1), '\P{L}')
]

Which verifies that this is a problem when the first character is punctuation.

@jcuenod
Copy link

jcuenod commented Jun 14, 2023

Is the solution here that we need a "before" field (and a less naive implementation of @after)?

@jonathanrobie
Copy link
Contributor Author

Maybe. I'm not sure we need a "before", it would be used only rarely, and it's easy to overlook the fields you don't often use. It might make more sense to put the -- "after" the previous word.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants