Alphabetic characters in @after attribute #76

jonathanrobie · 2023-06-14T15:55:30Z

When the text of a node starts with punctuation, the @after attribute contains the last character of the word. This is a bug in our transform.

You can find examples of this using this expression:

//*[matches(@after, '\p{L}')]

Result:

<w role="adv" ref="ROM 15:25!1" after="ὶ" class="adv" xml:id="n45015025001" lemma="νυνί" normalized="νυνί" strong="3570" gloss="Now" domain="067002" ln="67.39" morph="ADV" unicode="—νυνὶ">—νυν</w>
<w role="v" ref="2CO 6:2!1" after="ι" class="verb" xml:id="n47006002001" lemma="λέγω" normalized="λέγει" strong="3004" number="singular" person="third" tense="present" voice="active" mood="indicative" gloss="He says" domain="033006" ln="33.69" morph="V-PAI-3S" unicode="—λέγει" frame="A0:n47006001011" subjref="n47006001011">—λέγε</w>
<w ref="2CO 12:2!8" after="ε" class="conj" xml:id="n47012002008" lemma="εἴτε" normalized="εἴτε" strong="1535" gloss="whether" domain="089010" ln="89.69" morph="CONJ" unicode="—εἴτε">—εἴτ</w>
<w role="v" ref="2CO 12:2!22" after="α" class="verb" xml:id="n47012002022" lemma="ἁρπάζω" normalized="ἁρπαγέντα" strong="726" number="singular" gender="masculine" case="accusative" tense="aorist" voice="passive" mood="participle" gloss="having been caught up" domain="018001" ln="18.4" morph="V-2APP-ASM" unicode="—ἁρπαγέντα" frame="A1:n47012002024">—ἁρπαγέντ</w>
<w ref="2CO 12:3!6" after="ε" class="conj" xml:id="n47012003006" lemma="εἴτε" normalized="εἴτε" strong="1535" gloss="whether" domain="089010" ln="89.69" morph="CONJ" unicode="—εἴτε">—εἴτ</w>
<w ref="JHN 4:2!1" after="ε" class="conj" xml:id="n43004002001" lemma="καίτοιγε" normalized="καίτοιγε" strong="2544" gloss="although indeed" domain="089011" ln="89.72" morph="CONJ" unicode="—καίτοιγε">—καίτοιγ</w>
<w ref="JHN 7:22!8" after="χ" class="adv" xml:id="n43007022008" lemma="οὐ" normalized="οὐχ" strong="3756" gloss="not" domain="069002" ln="69.3" morph="PRT-N" unicode="—οὐχ">—οὐ</w>
<w ref="EPH 5:9!1" after="ὁ" class="det" xml:id="n49005009001" lemma="ὁ" normalized="ὁ" strong="3588" number="singular" gender="masculine" case="nominative" gloss="-" domain="092004" ln="92.24" morph="T-NSM" unicode="—ὁ">—</w>
<w role="v" ref="EPH 5:10!1" after="ς" class="verb" xml:id="n49005010001" lemma="δοκιμάζω" normalized="δοκιμάζοντες" strong="1381" number="plural" gender="masculine" case="nominative" tense="present" voice="active" mood="participle" gloss="discerning" domain="027004" ln="27.45" morph="V-PAP-NPM" unicode="—δοκιμάζοντες" frame="A0:n49003001013 A1:n49005010004" subjref="n49003001013">—δοκιμάζοντε</w>
<w role="s" ref="HEB 7:20!7" after="ἱ" class="det" xml:id="n58007020007" lemma="ὁ" normalized="οἱ" strong="3588" number="plural" gender="masculine" case="nominative" gloss="those ones" domain="092004" ln="92.24" morph="T-NPM" unicode="—οἱ" referent="n58007008006">—ο</w>
<w ref="HEB 7:22!1" after="ὰ" class="prep" xml:id="n58007022001" lemma="κατά" normalized="κατά" strong="2596" gloss="By" domain="089005" ln="89.8" morph="PREP" unicode="—κατὰ">—κατ</w>
<w role="p" ref="GAL 2:6!7" after="ί" class="adj" type="interrogative" xml:id="n48002006007" lemma="ὁποῖος" normalized="ὁποῖοι" strong="3697" number="plural" gender="masculine" case="nominative" gloss="whatsoever" domain="058004" ln="58.30" morph="A-NPM" unicode="—ὁποῖοί">—ὁποῖο</w>
<w role="v" ref="ACT 22:2!1" after="ς" class="verb" xml:id="n44022002001" lemma="ἀκούω" normalized="ἀκούσαντες" strong="191" number="plural" gender="masculine" case="nominative" tense="aorist" voice="active" mood="participle" gloss="Having heard" domain="032001" ln="32.1" morph="V-AAP-NPM" unicode="—ἀκούσαντες" frame="A0:n44021040014" subjref="n44021040014">—ἀκούσαντε</w>
<w ref="LUK 2:35!1" after="ὶ" class="conj" xml:id="n42002035001" lemma="καί" normalized="καί" strong="2532" gloss="and" domain="089017" ln="89.93" morph="CONJ" unicode="—καὶ">—κα</w>
<w role="s" ref="LUK 23:51!1" after="ς" class="pron" type="demonstrative" xml:id="n42023051001" lemma="οὗτος" normalized="οὗτος" strong="3778" number="singular" gender="masculine" case="nominative" gloss="he" domain="092007" ln="92.29" morph="D-NSM" unicode="—οὗτος" referent="n42023050003">—οὗτο</w>
<w ref="1CO 9:15!22" after="ὸ" class="det" xml:id="n46009015022" lemma="ὁ" normalized="τό" strong="3588" number="singular" gender="neuter" case="accusative" gloss="the" domain="092004" ln="92.24" morph="T-ASN" unicode="—τὸ">—τ</w>

Here is the text of those nodes:

—νυν
—λέγε
—εἴτ
—ἁρπαγέντ
—εἴτ
—καίτοιγ
—οὐ
—
—δοκιμάζοντε
—ο
—κατ
—ὁποῖο
—ἀκούσαντε
—κα
—οὗτο
—τ

The text was updated successfully, but these errors were encountered:

jonathanrobie · 2023-06-14T15:59:54Z

Note that you can get exactly the same set of results with this query:

//w[
  matches(substring(normalize-space(.), 1, 1), '\P{L}')
]

Which verifies that this is a problem when the first character is punctuation.

jcuenod · 2023-06-14T19:57:16Z

Is the solution here that we need a "before" field (and a less naive implementation of @after)?

jonathanrobie · 2023-06-14T20:00:03Z

Maybe. I'm not sure we need a "before", it would be used only rarely, and it's easy to overlook the fields you don't often use. It might make more sense to put the -- "after" the previous word.

jonathanrobie added the bug Something isn't working label Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alphabetic characters in @after attribute #76

Alphabetic characters in @after attribute #76

jonathanrobie commented Jun 14, 2023

jonathanrobie commented Jun 14, 2023

jcuenod commented Jun 14, 2023 •

edited

Loading

jonathanrobie commented Jun 14, 2023

Alphabetic characters in @after attribute #76

Alphabetic characters in @after attribute #76

Comments

jonathanrobie commented Jun 14, 2023

jonathanrobie commented Jun 14, 2023

jcuenod commented Jun 14, 2023 • edited Loading

jonathanrobie commented Jun 14, 2023

jcuenod commented Jun 14, 2023 •

edited

Loading