Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ketiv/Qere without catchWord? #80

Closed
jonathanrobie opened this issue Jul 7, 2021 · 35 comments
Closed

Ketiv/Qere without catchWord? #80

jonathanrobie opened this issue Jul 7, 2021 · 35 comments

Comments

@jonathanrobie
Copy link
Contributor

jonathanrobie commented Jul 7, 2021

In most instances of Ketiv/Qere, the pattern seems to be:

  • The Ketiv appears as a w element
  • Immediately after the Ketiv, there is a note
  • The note contains a catchWord that corresponds to the word just before the note, representing the Ketiv, and a rdg, representing the Qere.

For instance:

          <w lemma="3318" morph="HVhv2ms" id="01Pdv">הוצא</w>
          <note type="variant">
            <catchWord>הוצא</catchWord>
            <rdg type="x-qere">
              <w lemma="3318" morph="HVhv2ms" id="01S7t">הַיְצֵ֣א</w>
            </rdg>
          </note>

I'm I right to expect that for all readings?

The following 9 instances do not follow this pattern because the note does not contain a catchWord:

<note type="variant">
  <rdg type="x-qere">
    <w lemma="6635 b" n="0.0" morph="HNcbpa" id="12kTe">צְבָא֖וֹת</w>
  </rdg>
</note>
<note type="variant">
  <rdg type="x-qere">
    <w lemma="1121 a" n="1.1.0" morph="HNcmpc/Sp3ms" id="12uR9">בָּנָי/ו֙</w>
  </rdg>
</note>
<note  type="variant">
  <rdg type="x-qere">
    <w lemma="1121 a" morph="HNcmpc" id="07Kvd">בְּנֵ֣י</w>
  </rdg>
</note>
<note type="variant">
  <rdg type="x-qere">
    <w lemma="6578" n="0" morph="HNp" id="10dp9">פְּרָֽת</w>
  </rdg>
</note>
<note type="variant">
  <rdg type="x-qere">
    <w lemma="376" n="1.0" morph="HNcmsa" id="10Gmc">אִ֖ישׁ</w>
  </rdg>
</note>
<note type="variant">
  <rdg type="x-qere">
    <w lemma="935" n="1.0" morph="HVqrmpa" id="24hu2">בָּאִ֖ים</w>
  </rdg>
</note>
<note type="variant">
  <rdg type="x-qere">
    <w lemma="l" n="1.2.0" morph="HR/Sp3fs" id="24CRs">לָ/הּ֙</w>
  </rdg>
</note>
<note type="variant">
  <rdg type="x-qere">
    <w lemma="413" n="0.0" morph="HR/Sp1cs" id="08k23">אֵלַ֖/י</w>
  </rdg>
</note>
<note type="variant">
  <rdg type="x-qere">
    <w lemma="413" n="0.1" morph="HR/Sp1cs" id="086C3">אֵלַ֔/י</w>
  </rdg>
</note>
@DavidTroidl
Copy link
Member

There are qere elements without corresponding ketiv in the WLC source.

@jonathanrobie
Copy link
Contributor Author

What I really need right now is an algorithm to construct the Qere reading of a text. I'm sure someone else is doing that. Is this documented somewhere?

I also see 23 instances where the catchWord corresponds to more than one preceding w element, but it's not clear how I am supposed to identify the start and end of the Ketiv reading without using string operations and comparing as I go. Here are some:

  <out count="1" osisID="1Sam.9.1">
    <w lemma="c/1961" morph="HC/Vqw3ms" id="09wci">וַֽ/יְהִי</w>
    <seg type="x-maqqef">־</seg>
    <w lemma="376" morph="HNcmsa" id="09MpA">אִ֣ישׁ</w>
    <w lemma="m/1121 a" morph="HR/Np" id="09Una">מ/בן</w>
    <seg type="x-maqqef">־</seg>
    <w lemma="3225" morph="HNp" id="09jgC">ימין</w>
    <note type="variant">
      <catchWord>מ/בן־ימין</catchWord>
      <rdg type="x-qere">
        <w lemma="m/1144" n="1.0.1" morph="HR/Np" id="09EC9">מִ/בִּנְיָמִ֗ין</w>
      </rdg>
    </note>
  </out>
  <out count="2" osisID="1Sam.20.2">
    <w lemma="c/559" morph="HC/Vqw3ms" id="09EWb">וַ/יֹּ֨אמֶר</w>
    <w lemma="l" morph="HR/Sp3ms" id="09R77">ל֣/וֹ</w>
    <w lemma="2486" n="1.2.0" morph="HTj/Sh" id="09rin">חָלִילָ/ה֮</w>
    <w lemma="3808" morph="HTn" id="09kCJ">לֹ֣א</w>
    <w lemma="4191" n="1.2" morph="HVqi2ms" id="09b5f">תָמוּת֒</w>
    <w lemma="2009" n="1.1.1.1" morph="HTm" id="091nK">הִנֵּ֡ה</w>
    <w lemma="l" morph="HR/Sp3ms" id="09UFD">ל/ו</w>
    <seg type="x-maqqef">־</seg>
    <w lemma="6213 a" morph="HVqp3ms" id="09atM">עשה</w>
    <note type="variant">
      <catchWord>ל/ו־עשה</catchWord>
      <rdg type="x-qere">
        <w lemma="3808" morph="HTn" id="09uGu">לֹֽא</w>
        <seg type="x-maqqef">־</seg>
        <w lemma="6213 a" morph="HVqi3ms" id="09dRu">יַעֲשֶׂ֨ה</w>
      </rdg>
    </note>
  </out>
  <out count="3" osisID="1Sam.24.9">
    <note>KJV:1Sam.24.8</note>
    <w lemma="c/6965 b" morph="HC/Vqw3ms" id="09WJN">וַ/יָּ֨קָם</w>
    <w lemma="1732" n="1.1.1.0" morph="HNp" id="09BZ4">דָּוִ֜ד</w>
    <w lemma="310 a" morph="HR" id="09TCU">אַחֲרֵי</w>
    <seg type="x-maqqef">־</seg>
    <w lemma="3651 c" n="1.1.1" morph="HD" id="09qjr">כֵ֗ן</w>
    <w lemma="c/3318" n="1.1.0" morph="HC/Vqw3ms" id="09wRN">וַ/יֵּצֵא֙</w>
    <w lemma="4480 a" morph="HR" id="0984p">מן</w>
    <seg type="x-maqqef">־</seg>
    <w lemma="d/4631" morph="HTd/Ncfsa" id="09Mw5">ה/מערה</w>
    <note type="variant">
      <catchWord>מן־ה/מערה</catchWord>
      <rdg type="x-qere">
        <w lemma="m/d/4631" n="1.1" morph="HR/Td/Ncfsa" id="098Qr">מֵֽ/הַ/מְּעָרָ֔ה</w>
      </rdg>
    </note>
  </out>
  <out count="4" osisID="Isa.44.24">
    <w lemma="3541" morph="HD" id="23SM6">כֹּֽה</w>
    <seg type="x-maqqef">־</seg>
    <w lemma="559" morph="HVqp3ms" id="237aJ">אָמַ֤ר</w>
    <w lemma="3068" n="1.1.0" morph="HNp" id="232XC">יְהוָה֙</w>
    <w lemma="1350 a" n="1.1" morph="HVqrmsc/Sp2ms" id="23g2n">גֹּאֲלֶ֔/ךָ</w>
    <w lemma="c/3335" n="1.0" morph="HC/Vqrmsc/Sp2ms" id="23BLL">וְ/יֹצֶרְ/ךָ֖</w>
    <w lemma="m/990" n="1" morph="HR/Ncfsa" id="23jck">מִ/בָּ֑טֶן</w>
    <w lemma="595" morph="HPp1cs" id="23Mxs">אָנֹכִ֤י</w>
    <w lemma="3068" n="0.2.0" morph="HNp" id="23y1n">יְהוָה֙</w>
    <w lemma="6213 a" morph="HVqrmsa" id="23FQH">עֹ֣שֶׂה</w>
    <w lemma="3605" n="0.2" morph="HNcmsa" id="23Xv1">כֹּ֔ל</w>
    <w lemma="5186" morph="HVqrmsa" id="23yTs">נֹטֶ֤ה</w>
    <w lemma="8064" n="0.1.0" morph="HNcmpa" id="23ZN9">שָׁמַ֨יִם֙</w>
    <w lemma="l/905" n="0.1" morph="HR/Ncmsc/Sp1cs" id="237wi">לְ/בַדִּ֔/י</w>
    <w lemma="7554" morph="HVqrmsc" id="23ogM">רֹקַ֥ע</w>
    <w lemma="d/776" n="0.0" morph="HTd/Ncbsa" id="23sdY">הָ/אָ֖רֶץ</w>
    <w lemma="4325" morph="HNcmpc" id="23Pzf">מי</w>
    <w lemma="854" morph="HR/Sp1cs" id="23yH3">את/י</w>
    <note type="variant">
      <catchWord>מי את/י</catchWord>
      <rdg type="x-qere">
        <w lemma="m/854" n="0" morph="HR/R/Sp1cs" id="23BvR">מֵ/אִתִּֽ/י</w>
      </rdg>
    </note>
  </out>
  <out count="5" osisID="Isa.52.5">
    <w lemma="c/6258" morph="HC/D" id="23c5v">וְ/עַתָּ֤ה</w>
    <w lemma="4100" morph="HTi" id="235WL">מי</w>
    <seg type="x-maqqef">־</seg>
    <w lemma="l" morph="HR/Sp1cs" id="23JfN">ל/י</w>
    <note type="variant">
      <catchWord>מי־ל/י</catchWord>
      <rdg type="x-qere">
        <w lemma="4100" morph="HTi" id="23VCc">מַה</w>
        <seg type="x-maqqef">־</seg>
        <w lemma="l" morph="HR/Sp1cs" id="23aeF">לִּ/י</w>
      </rdg>
    </note>
  </out>
  <out count="6" osisID="2Chr.34.6">
    <w lemma="c/b/5892 b" morph="HC/R/Ncfpc" id="14Mu8">וּ/בְ/עָרֵ֨י</w>
    <w lemma="4519" morph="HNp" id="14kJu">מְנַשֶּׁ֧ה</w>
    <w lemma="c/669" n="1.0.0" morph="HC/Np" id="143VV">וְ/אֶפְרַ֛יִם</w>
    <w lemma="c/8095" n="1.0" morph="HC/Np" id="14yH2">וְ/שִׁמְע֖וֹן</w>
    <w lemma="c/5704" morph="HC/R" id="14xfs">וְ/עַד</w>
    <seg type="x-maqqef">־</seg>
    <w lemma="5321" n="1" morph="HNp" id="14TB9">נַפְתָּלִ֑י</w>
    <w lemma="b/2022" morph="HR/Ncmsc" id="14iDc">ב/הר</w>
    <w lemma="1004 b" morph="HNcmpc/Sp3mp" id="14rFV">בתי/הם</w>
    <note type="variant">
      <catchWord>ב/הר בתי/הם</catchWord>
      <rdg type="x-qere">
        <w lemma="b/2719" n="0.0" morph="HR/Ncfpc/Sp3mp" id="14CXE">בְּ/חַרְבֹתֵי/הֶ֖ם</w>
      </rdg>
    </note>
  </out>
  <out count="7" osisID="2Kgs.6.25">
    <w lemma="c/1961" morph="HC/Vqw3ms" id="12sGR">וַ/יְהִ֨י</w>
    <w lemma="7458" morph="HNcmsa" id="12FKm">רָעָ֤ב</w>
    <w lemma="1419 a" n="1.1.0" morph="HAamsa" id="12eb4">גָּדוֹל֙</w>
    <w lemma="b/8111" n="1.1" morph="HR/Np" id="12fqb">בְּ/שֹׁ֣מְר֔וֹן</w>
    <w lemma="c/2009" n="1.0" morph="HC/Tm" id="12DUs">וְ/הִנֵּ֖ה</w>
    <w lemma="6696 a" morph="HVqrmpa" id="12NwS">צָרִ֣ים</w>
    <w lemma="5921 a" n="1" morph="HR/Sp3fs" id="12SdY">עָלֶ֑י/הָ</w>
    <w lemma="5704" morph="HR" id="121bU">עַ֣ד</w>
    <w lemma="1961" morph="HVqc" id="12jno">הֱי֤וֹת</w>
    <w lemma="7218 a" morph="HNcmsc" id="12E7d">רֹאשׁ</w>
    <seg type="x-maqqef">־</seg>
    <w lemma="2543" n="0.1.0" morph="HNcbsa" id="12z2B">חֲמוֹר֙</w>
    <w lemma="b/8084" morph="HR/Acbpa" id="12WGp">בִּ/שְׁמֹנִ֣ים</w>
    <w lemma="3701" n="0.1" morph="HNcmsa" id="12CRu">כֶּ֔סֶף</w>
    <w lemma="c/7255" n="0.0.0" morph="HC/Ncmsc" id="12fwL">וְ/רֹ֛בַע</w>
    <w lemma="d/6894" morph="HTd/Ncmsa" id="12FAr">הַ/קַּ֥ב</w>
    <w lemma="2755" morph="HNcmsc" id="12TtK">חרי</w>
    <w lemma="3123" morph="HNcfpa" id="12vw5">יונים</w>
    <note type="variant">
      <catchWord>חרייונים</catchWord>
      <rdg type="x-qere">
        <w lemma="1686" n="0.0" morph="HNcmpa" id="12Bqs">דִּבְיוֹנִ֖ים</w>
      </rdg>
    </note>
  </out>
  <out count="10" osisID="Ezek.42.9">
    <w lemma="c/m/8478" morph="HC/R/R/Sd" id="26Mwv">ו/מ/תחת/ה</w>
    <w lemma="3957" morph="HNcfpa" id="26DfV">לשכות</w>
    <note type="variant">
      <catchWord>ו/מ/תחת/ה לשכות</catchWord>
      <rdg type="x-qere">
        <w lemma="c/m/8478" n="1.0" morph="HC/R/R" id="262Lg">וּ/מִ/תַּ֖חַת</w>
        <w lemma="d/3957" morph="HTd/Ncfpa" id="26CYE">הַ/לְּשָׁכ֣וֹת</w>
      </rdg>
    </note>
  </out>

  <out count="11" osisID="Judg.16.25">
    <w lemma="c/1961" n="1.2.0" morph="HC/Vqw3ms" id="07JGX">וַֽ/יְהִי֙</w>
    <w lemma="3588 a" morph="HC" id="07rb5">כי</w>
    <w lemma="2896 a" morph="HVqp3ms" id="07saF">טוב</w>
    <note type="variant">
      <catchWord>כי טוב</catchWord>
      <rdg type="x-qere">
        <w lemma="k/2896 a" morph="HR/Vqc" id="07URH">כְּ/ט֣וֹב</w>
      </rdg>
    </note>
  </out>
  <out count="12" osisID="2Sam.21.12">
    <w lemma="c/3212" morph="HC/Vqw3ms" id="10ngQ">וַ/יֵּ֣לֶךְ</w>
    <w lemma="1732" n="1.2.2" morph="HNp" id="106FS">דָּוִ֗ד</w>
    <w lemma="c/3947" n="1.2.1.0" morph="HC/Vqw3ms" id="106U7">וַ/יִּקַּ֞ח</w>
    <w lemma="853" morph="HTo" id="1075S">אֶת</w>
    <seg type="x-maqqef">־</seg>
    <w lemma="6106" morph="HNcfpc" id="10Fo2">עַצְמ֤וֹת</w>
    <w lemma="7586" n="1.2.1" morph="HNp" id="10Cb1">שָׁאוּל֙</w>
    <w lemma="c/853" morph="HC/To" id="10z1M">וְ/אֶת</w>
    <seg type="x-maqqef">־</seg>
    <w lemma="6106" n="1.2.0" morph="HNcfpc" id="10e7f">עַצְמוֹת֙</w>
    <w lemma="3083" morph="HNp" id="10pst">יְהוֹנָתָ֣ן</w>
    <w lemma="1121 a" n="1.2" morph="HNcmsc/Sp3ms" id="10THR">בְּנ֔/וֹ</w>
    <w lemma="m/854" n="1.1" morph="HR/R" id="10aXS">מֵ/אֵ֕ת</w>
    <w lemma="1167" n="1.0" morph="HNcmpc" id="10JeL">בַּעֲלֵ֖י</w>
    <w lemma="3003" morph="HNp" id="10Kff">יָבֵ֣ישׁ</w>
    <w lemma="1568" n="1" morph="HNp" id="10YxZ">גִּלְעָ֑ד</w>
    <w lemma="834 a" morph="HTr" id="10jf2">אֲשֶׁר֩</w>
    <w lemma="1589" morph="HVqp3cp" id="10cuN">גָּנְב֨וּ</w>
    <w lemma="853" n="0.1.1.0" morph="HTo/Sp3mp" id="109eN">אֹתָ֜/ם</w>
    <w lemma="m/7339" morph="HR/Ncfsc" id="10rVU">מֵ/רְחֹ֣ב</w>
    <w lemma="1052+" morph="HNp HNp HTr" id="103wq">בֵּֽית שַׁ֗ן אֲשֶׁ֨ר</w>
    <seg type="x-maqqef">־</seg>
    <w lemma="8518" morph="HVqp3cp/Sp3mp" id="104P3">תלו/ם</w>
    <note type="variant">
      <catchWord>תלו/ם</catchWord>
      <rdg type="x-qere">
        <w lemma="8518" morph="HVqp3cp/Sp3mp" id="102PP">תְּלָא֥וּ/ם</w>
      </rdg>
    </note>
    <w lemma="8033" morph="HD" id="10nn6">שם</w>
    <w lemma="d/6430" morph="HTd/Ngmpa" id="10srQ">ה/פלשתים</w>
    <note type="variant">
      <catchWord>שם ה/פלשתים</catchWord>
      <rdg type="x-qere">
        <w lemma="8033" n="0.1.0" morph="HD/Sd" id="10gnL">שָׁ֨מָּ/ה֙</w>
        <w lemma="6430" n="0.1" morph="HNgmpa" id="10u2Q">פְּלִשְׁתִּ֔ים</w>
      </rdg>
    </note>
  </out>

@jonathanrobie
Copy link
Contributor Author

Wouldn't this be a lot easier if all readings were put in an app element with a rdg element for each reading?

<app>
      <rdg type="x-ketiv">
        <w lemma="3588 a" morph="HC" id="07rb5">כי</w>
        <w lemma="2896 a" morph="HVqp3ms" id="07saF">טוב</w>
      </rdg>
      <rdg type="x-qere">
        <w lemma="k/2896 a" morph="HR/Vqc" id="07URH">כְּ/ט֣וֹב</w>
      </rdg>
</app>

@DavidTroidl
Copy link
Member

is not an OSIS element. This would also disrupt the choice of using elements for the qere, because that ys really what they are.
I checked into the schema and found that @type is allowed on elements. So i would propose putting @type=:x-ketiv" on the ketiv elements. If that seems reasonable, i may take some time to implement.

@DavidTroidl
Copy link
Member

Sorry, the elements disappeared. app is not an OSIS element. We are using note elements for qere, because that is what they are.
@type is allowed on a w element, so we could use @type="x-ketiv". Does that make sense?

@jonathanrobie
Copy link
Contributor Author

jonathanrobie commented Jul 7, 2021

Yes, I think that could work, if I understand correctly. As far as timing goes, it's possible that I can produce this and do a pull request, not promising just yet, but that could happen.

You are proposing this?

    <w lemma="c/m/8478" morph="HC/R/R/Sd" id="26Mwv"  type="x-ketiv">ו/מ/תחת/ה</w>
    <w lemma="3957" morph="HNcfpa" id="26DfV"  type="x-ketiv">לשכות</w>
    <note type="variant">
      <catchWord>ו/מ/תחת/ה לשכות</catchWord>
      <rdg type="x-qere">
        <w lemma="c/m/8478" n="1.0" morph="HC/R/R" id="262Lg">וּ/מִ/תַּ֖חַת</w>
        <w lemma="d/3957" morph="HTd/Ncfpa" id="26CYE">הַ/לְּשָׁכ֣וֹת</w>
      </rdg>
    </note>

@DavidTroidl
Copy link
Member

@ just stands for attribute, so i would be type="x-ketiv".
On the 1Sam.9.1 example, I think yu would also have to do the maqqef:
seg type="x-ketiv">־

@jonathanrobie
Copy link
Contributor Author

jonathanrobie commented Jul 8, 2021

Sorry, the elements disappeared. app is not an OSIS element. We are using note elements for qere, because that is what they are.
@type is allowed on a w element, so we could use @type="x-ketiv". Does that make sense?

rdgGroup is an OSIS element. Could we use that instead of app?

Perhaps:

   <verse osisID="Ezek.42.9">
      <rdgGroup type="variant">
         <rdg type="x-ketiv">
            <w lemma="c/m/8478" morph="HC/R/R/Sd" id="26Mwv">ו/מ/תחת/ה</w>
            <w lemma="3957" morph="HNcfpa" id="26DfV">לשכות</w>            
         </rdg>
         <rdg type="x-qere">
            <w lemma="c/m/8478" n="1.0" morph="HC/R/R" id="262Lg">וּ/מִ/תַּחַת</w>
            <w lemma="d/3957" morph="HTd/Ncfpa" id="26CYE">הַ/לְּשָׁכ֣וֹת</w>
         </rdg>
      </rdgGroup>
      <w lemma="d/428" n="1" morph="HTd/Pdxcp" id="26FUJ">הָ/אֵ֑לֶּה</w>
      <rdgGroup type="variant">
         <rdg type="x-ketiv">
            <w lemma="d/3996" morph="HTd/Ncmsa" d="26pCg">ה/מבוא</w>            
         </rdg>
         <rdg type="x-qere">
            <w lemma="d/935" n="0.2.0" morph="HTd/Ncmsa" id="26wTv">הַ/מֵּבִיא֙</w>
         </rdg>
      </rdgGroup>
      <w lemma="m/d/6921" n="0.2" morph="HR/Td/Ncmsa" id="26rrD">מֵֽ/הַ/קָּדִ֔ים</w>
      <w lemma="b/935" morph="HR/Vqc/Sp3ms" id="261fi">בְּ/בֹא֣/וֹ</w>
      <w lemma="l/2007" n="0.1" morph="HR/Pp3fp" id="26i6h">לָ/הֵ֔נָּה</w>
      <w lemma="m/d/2691 a" n="0.0" morph="HR/Td/Ncbsa" id="26rx5">מֵֽ/הֶ/חָצֵ֖ר</w>
      <w lemma="d/2435" n="0" morph="HTd/Aafsa" id="26sM6">הַ/חִצֹנָֽה</w>
      <seg type="x-sof-pasuq">׃</seg>
   </verse>   

@DavidTroidl
Copy link
Member

The rdgGroup is not allowed as a direct child of a verse element. It appears to be designed to be inside a note element.
In addition, I would really prefer the format as it stands. Adding the type-"x-ketiv" should resolve the issue.

@jonathanrobie
Copy link
Contributor Author

jonathanrobie commented Jul 10, 2021 via email

@jonathanrobie
Copy link
Contributor Author

I have a transformation that seems to work, but there's a problem with maqqef, which already uses the @type attribute.

I can't do this because an element cannot have two type attributes:

  <w type="x-ketiv" lemma="m/1121 a" morph="HR/Np" id="09Una">מ/בן</w>
  <seg type="x-ketiv" type="x-maqqef">־</seg>
  <w type="x-ketiv" lemma="3225" morph="HNp" id="09jgC">ימין</w>

I could provide one type attribute with two tokens, but that's likely to mess up existing software if it is assuming there can only be one value in the type attribute:

  <w type="x-ketiv" lemma="m/1121 a" morph="HR/Np" id="09Una">מ/בן</w>
  <seg type="x-ketiv x-maqqef">־</seg>
  <w type="x-ketiv" lemma="3225" morph="HNp" id="09jgC">ימין</w>

Or I could simply leave out the ketiv marking for the maqqef:

  <w type="x-ketiv" lemma="m/1121 a" morph="HR/Np" id="09Una">מ/בן</w>
  <seg type="x-maqqef">־</seg>
  <w type="x-ketiv" lemma="3225" morph="HNp" id="09jgC">ימין</w>

Or I could add some other attribute. Certainly not k, but let me use that as a placeholder to illustrate what this would look like:

<verse osisID="1Sam.9.1">
  <w lemma="c/1961" morph="HC/Vqw3ms" id="09wci">וַֽ/יְהִי</w>
  <seg type="x-maqqef">־</seg>
  <w lemma="376" morph="HNcmsa" id="09MpA">אִ֣ישׁ</w>
  <w k="x-ketiv" lemma="m/1121 a" morph="HR/Np" id="09Una">מ/בן</w>
  <seg k="x-ketiv" type="x-maqqef">־</seg>
  <w k="x-ketiv" lemma="3225" morph="HNp" id="09jgC">ימין</w>
  <note type="variant">
    <catchWord>מ/בן־ימין</catchWord>
    <rdg type="x-qere">
      <w lemma="m/1144" n="1.0.1" morph="HR/Np" id="09EC9">מִ/בִּנְיָמִ֗ין</w>
    </rdg>
  </note>
  <note n="a">Adaptations to a Qere which L and BHS, by their design, do not indicate.</note>

Is there a suitable attribute name to use instead of k here? Is there a better option that I have not considered?

I can generate a patch quickly once we agree on the format.

@DavidTroidl
Copy link
Member

I would think the nest way to do it would be the two token option:

<w type="x-ketiv" lemma="m/1121 a" morph="HR/Np" id="09Una">מ/בן</w>
<seg type="x-ketiv x-maqqef">־</seg>
<w type="x-ketiv" lemma="3225" morph="HNp" id="09jgC">ימין</w>

If you feel that would be problematic, the nest best option would be to leave it out altogether. As long as that doesn't hinder your efforts.
One thing to watch is that you maintain the white space within the verse elements.

@pdurusau
Copy link

David, can you say a word or two about why maqqef has a type attribute for maqqef? Maqqef is U+05BE so why is the type attribute required on the seg that encloses it? Apologies if this is well known but Jonathan asked me to take a look at the issue and that would free up the type attribute for x-ketiv with one token.

@DavidTroidl
Copy link
Member

It is essentially the same reason Jonathan wants to identify the ketiv elements. We want to be able to deal with the OSIS elements without having to analyze the character content.

@pdurusau
Copy link

David, likely my bad but a maqqef in a ketiv doesn't have any character content to be analyzed does it? If I understand your rule, you want to avoid the sometimes problematic parsing of Hebrew by assigning it a fixed value? As an attribute. Yes?

OK, that's understandable but a maqqef, in my ignorance, has no such other parsing. There may be traditions where it has varying meanings, I simply mean to say I am not aware of them.

It would be the difference in part of speech for words in English, your non-parsing of character content, and separating marking a comma with an element, using an attribute that reads k="comma". I'm not seeing what the consistency adds, sorry.

@jonathanrobie
Copy link
Contributor Author

jonathanrobie commented Jul 16, 2021

I think I agree with Patrick. Each seg type simply names the character that the element contains. I think it's just as easy to test the character as the attribute.

  <seg type="x-maqqef">־</seg>
  <seg type="x-sof-pasuq">׃</seg>
  <seg type="x-pe">פ</seg>

In a path expression, I can say if (seg='פ') or I can say if (seg/@type='x-pe''), they are equivalent, and I don't think the one that uses the @type attribute is simpler.

To me, that's different from the Ketiv case. If I want to ignore the Ketiv and follow the Qere, I have to parse the catchWord and work backwards to identify the elements that correspond to it using string operations. Nothing explicitly marks the elements that correspond to the Ketiv reading. The query I do to identify these nodes is more complex than if (seg='פ'), and I don't want to do this every time I need to ignore the Ketiv in a query. Or is there a simpler trick I am missing?

declare function local:get-ketiv($base, $catchword)
{
  let $prev := $base/preceding-sibling::*[1]
  let $prevstring := fn:string($prev)
  where $prev and fn:ends-with($catchword, $prevstring)
  return (
    $prev
    ,
    if ($prevstring != $catchword)
    then get-ketiv($prev, fn:substring($catchword, 1, fn:string-length($catchword) - fn:string-length($prevstring)))
    else ()
  )
};

declare updating function local:mark-ketiv($variant)
{
  for $ketiv in get-ketiv($variant, $variant/catchWord)
  return (
    delete node $ketiv/@type,
    insert node attribute type { fn:string-join(($ketiv/@type, "x-ketiv")," ") } into $ketiv
  )
};

@jonathanrobie
Copy link
Contributor Author

jonathanrobie commented Jul 16, 2021

David, likely my bad but a maqqef in a ketiv doesn't have any character content to be analyzed does it? If I understand your rule, you want to avoid the sometimes problematic parsing of Hebrew by assigning it a fixed value? As an attribute. Yes?

OK, that's understandable but a maqqef, in my ignorance, has no such other parsing. There may be traditions where it has varying meanings, I simply mean to say I am not aware of them.

It would be the difference in part of speech for words in English, your non-parsing of character content, and separating marking a comma with an element, using an attribute that reads k="comma". I'm not seeing what the consistency adds, sorry.

I wonder if @DavidTroidl 's concern is CSS selectors, which work with attribute values but not element content? But in that case, I would think Ketiv would also be important. Ketiv is often formatted differently.

@DavidTroidl
Copy link
Member

These arguments seem to make sense, but the point was raised earlier about necessitating a rewrite of existing applications. I don't think a second token on the magqef type should have any impact, but the suggested changes certainly would. Testing an attribute value for x-maqqef or for x-ketiv should both work.

@jonathanrobie
Copy link
Contributor Author

jonathanrobie commented Jul 16, 2021 via email

@DavidTroidl
Copy link
Member

I apologize for my mistake. Unfortunately the two tokens on the @type do not validate.
Can we just drop the x-ketiv from the maqqef?

@jonathanrobie
Copy link
Contributor Author

jonathanrobie commented Jul 17, 2021

I apologize for my mistake. Unfortunately the two tokens on the @type do not validate.

Ouch. I should have validated before issuing a pull request, my bad.

Can we just drop the x-ketiv from the maqqef?

If we do, neither .CSS nor programs have an easy way to see exactly what is in the Ketiv vs. Qere. To me, the main use cases for marking up Ketiv or Qere are:

  • Formatting a text to distinguish the two - supporting .css is helpful, we want attributes to clearly say what is Ketiv vs. Qere, including both words and maqqef
  • Making it easy for a program or a query to follow either the Ketiv or Qere reading, ignoring the other
  • Making it easy for a program or a query to compare Ketiv to Qere for one or more passages

Am I missing important use cases? For my syntax trees and queries I generally want to follow the Qere as the main reading. If we want to make this possible and make it easy to format like the editions Wikipedia describes, we need to mark the Ketiv somehow. I'm trying to get a feel for what others do here. According to Wikipedia:

Modern editions of the Chumash and Tanakh include information about the qere and ketiv, but with varying formatting, even among books from the same publisher. Usually, the qere is written in the main text with its vowels, and the ketiv is in a side- or footnote (as in the Gutnick and Stone editions of the Chumash, from Kol Menachem[15] and Artscroll,[16] respectively). Other times, the ketiv is indicated in brackets, in-line with the main text (as in the Rubin edition of the Prophets, also from Artscroll).

If we want to follow that approach, then we could put the Ketiv in a note and put the Qere in the mainline text, but I think you would probably dislike that because you think of the Ketiv as the main reading and the Qere as mere commentary. For the record, that would look like this, and it would make it easy to format a text as described above:

  <w lemma="c/1961" morph="HC/Vqw3ms" id="09wci">וַֽ/יְהִי</w>
  <seg type="x-maqqef">־</seg>
  <w lemma="376" morph="HNcmsa" id="09MpA">אִ֣ישׁ</w>
  <note type="variant">
      <rdg type="x-ketiv">
        <w lemma="m/1121 a" morph="HR/Np" id="09Una">מ/בן</w>
        <seg type="x-maqqef">־</seg>
       <w lemma="3225" morph="HNp" id="09jgC">ימין</w>
    </rdg>
  </note>
  <w lemma="m/1144" n="1.0.1" morph="HR/Np" id="09EC9">מִ/בִּנְיָמִ֗ין</w>

Another approach would be to get rid of @type attributes that simply say that a maqqef is a maqqef, a sof-pasuq is a sof-pasuq, a samekh is a same, a pe is a pe, etc. They do not add anything to the semantics, are they needed for css selectors? They certainly are not needed for programs or queries. And for .css selectors, I would think it is more important to say whether it is Ketiv or not, since Ketiv is often in a different font. If we do that, it would look like this:

  <w lemma="c/1961" morph="HC/Vqw3ms" id="09wci">וַֽ/יְהִי</w>
  <seg type="x-maqqef">־</seg>
  <w lemma="376" morph="HNcmsa" id="09MpA">אִ֣ישׁ</w>
  <w type="x-ketiv" lemma="m/1121 a" morph="HR/Np" id="09Una">מ/בן</w>
  <seg type="x-ketiv">־</seg>
  <w type="x-ketiv" lemma="3225" morph="HNp" id="09jgC">ימין</w>
  <note type="variant">
    <catchWord>מ/בן־ימין</catchWord>
    <rdg type="x-qere">
      <w lemma="m/1144" n="1.0.1" morph="HR/Np" id="09EC9">מִ/בִּנְיָמִ֗ין</w>
    </rdg>
  </note>

To me, the cleanest approach would be to wrap both the Ketiv and Qere in rdg elements, but not in a note. @pdurusau, you are the OSIS expert, is there a way to do that? I suggested this earlier, but the rdgGroup element was not allowed there in the OSIS schema:

   <verse osisID="Ezek.42.9">
      <rdgGroup type="variant">
         <rdg type="x-ketiv">
            <w lemma="c/m/8478" morph="HC/R/R/Sd" id="26Mwv">ו/מ/תחת/ה</w>
            <w lemma="3957" morph="HNcfpa" id="26DfV">לשכות</w>            
         </rdg>
         <rdg type="x-qere">
            <w lemma="c/m/8478" n="1.0" morph="HC/R/R" id="262Lg">וּ/מִ/תַּחַת</w>
            <w lemma="d/3957" morph="HTd/Ncfpa" id="26CYE">הַ/לְּשָׁכ֣וֹת</w>
         </rdg>
      </rdgGroup>
      <w lemma="d/428" n="1" morph="HTd/Pdxcp" id="26FUJ">הָ/אֵ֑לֶּה</w>
      <rdgGroup type="variant">
         <rdg type="x-ketiv">
            <w lemma="d/3996" morph="HTd/Ncmsa" d="26pCg">ה/מבוא</w>            
         </rdg>
         <rdg type="x-qere">
            <w lemma="d/935" n="0.2.0" morph="HTd/Ncmsa" id="26wTv">הַ/מֵּבִיא֙</w>
         </rdg>
      </rdgGroup>
      <w lemma="m/d/6921" n="0.2" morph="HR/Td/Ncmsa" id="26rrD">מֵֽ/הַ/קָּדִ֔ים</w>
      <w lemma="b/935" morph="HR/Vqc/Sp3ms" id="261fi">בְּ/בֹא֣/וֹ</w>
      <w lemma="l/2007" n="0.1" morph="HR/Pp3fp" id="26i6h">לָ/הֵ֔נָּה</w>
      <w lemma="m/d/2691 a" n="0.0" morph="HR/Td/Ncbsa" id="26rx5">מֵֽ/הֶ/חָצֵ֖ר</w>
      <w lemma="d/2435" n="0" morph="HTd/Aafsa" id="26sM6">הַ/חִצֹנָֽה</w>
      <seg type="x-sof-pasuq">׃</seg>
   </verse>   

Crosswire uses a seg for marking variants. I tried this in Amos, and the following seems to validate just fine:

        <verse osisID="Amos.9.6">
          <w lemma="d/1129" morph="HTd/Vqrmsa" id="30Rr8">הַ/בּוֹנֶ֤ה</w>
          <w lemma="b/8064" n="1.1.0" morph="HRd/Ncmda" id="30q3q">בַ/שּׁמַ֨יִם֙</w>
          <seg type="x-variant" subType="x-ketiv">
            <w lemma="4609 b" morph="HNcfpc/Sp3ms" id="30zBj">מעלות/ו</w>
          </seg>
          <seg type="x-variant" subType="x-qere">
            <w lemma="4609 b" n="1.1" morph="HNcfpc/Sp3ms" id="30mM8">מַעֲלוֹתָ֔י/ו</w>
          </seg>
          <w lemma="c/92" n="1.0" morph="HC/Ncfsc/Sp3ms" id="30Boi">וַ/אֲגֻדָּת֖/וֹ</w>
       !!! SNIP !!!

That validates (except for the id attributes on w elements, see #84) and does not interfere with using a seg element and a @type attribute for the maqqef.

@DavidTroidl Am I missing any possibilities or use cases? What would you prefer?

@pdurusau What would you suggest? Does the OSIS schema allow us any kind of direct representation of Ketiv and Qere as parallel readings?

@jonathanrobie
Copy link
Contributor Author

Once we agree on what to do, I should be able to turn this around quickly so we don't have XML that won't validate.

@DavidTroidl
Copy link
Member

In developing the OSHB, one of our primary concerns was how to faithfully represent the text, within the limitations of OSIS. We had numerous extended discussions, to resolve various issues. I am in no way saying that the ketiv is the "preferred reading". The point is that the ketiv is an actual part of the consonantal text, the qere is a Massoretic note on that text. Of course, vowel points and cantillation are too. But this gives the underlying logic for the format we are using.
Setting apart maqqef, paseq, etc. and assigning attributes to them, I'm sure goes back to one of these discussions, but I don't recall all the details. Now that they are there, removing the attributes would wreak havoc on existing software developed for dealing with the OSHB.
Looking at the schema, I find that the w and seg elements both have a subType attribute. Would it work to use @subtype at least on the maqqef, or possibly on the w too?

@jonathanrobie
Copy link
Contributor Author

jonathanrobie commented Jul 17, 2021

To me, @subType only makes sense when it describes a subtype of the @type attribute.

Of all the approaches discussed so far, I think this is the one that allows the maqqef attribute, validates under OSIS, and clearly identifies both Ketiv and Qere:

        <verse osisID="Amos.9.6">
          <w lemma="d/1129" morph="HTd/Vqrmsa" id="30Rr8">הַ/בּוֹנֶ֤ה</w>
          <w lemma="b/8064" n="1.1.0" morph="HRd/Ncmda" id="30q3q">בַ/שּׁמַ֨יִם֙</w>
          <seg type="x-variant" subType="x-ketiv">
            <w lemma="4609 b" morph="HNcfpc/Sp3ms" id="30zBj">מעלות/ו</w>
          </seg>
          <seg type="x-variant" subType="x-qere">
            <w lemma="4609 b" n="1.1" morph="HNcfpc/Sp3ms" id="30mM8">מַעֲלוֹתָ֔י/ו</w>
          </seg>
          <w lemma="c/92" n="1.0" morph="HC/Ncfsc/Sp3ms" id="30Boi">וַ/אֲגֻדָּת֖/וֹ</w>

It is also the approach that is documented in Crosswire's documentation.

Would you be OK with that?

I could fix the ID validation problem in a separate pull request.

@DavidTroidl
Copy link
Member

Maqqefs that appear in ketivs would be a valid subtype of maqqefs in general. That would narrow down the usage to the maqqef only.
The seg suggestion would again wreak havoc with existing software.

@jonathanrobie
Copy link
Contributor Author

jonathanrobie commented Jul 17, 2021

Sounds like existing software means we can't change much beyond adding an attribute. If so, I think we are probably looking at solutions that are not semantically clean, but can work in a program. Would it be possible to include the developers of this other software in the conversation? Or perhaps that's you? I'd like to know if any more wiggle room is possible.

Using @subtype without a @type is odd, but I don't know what @type would work for both the w elements and the seg elements.

  <w lemma="c/1961" morph="HC/Vqw3ms" id="09wci">וַֽ/יְהִי</w>
  <seg type="x-maqqef">־</seg>
  <w lemma="376" morph="HNcmsa" id="09MpA">אִ֣ישׁ</w>
  <w subType="x-ketiv" lemma="m/1121 a" morph="HR/Np" id="09Una">מ/בן</w>
  <seg type="x-maqqef" subType="x-ketiv">־</seg>
  <w subType="x-ketiv" lemma="3225" morph="HNp" id="09jgC">ימין</w>
  <note type="variant">
    <catchWord>מ/בן־ימין</catchWord>
    <rdg type="x-qere">
      <w lemma="m/1144" n="1.0.1" morph="HR/Np" id="09EC9">מִ/בִּנְיָמִ֗ין</w>
    </rdg>
  </note>

I could do that. It would work. It would probably confuse some people. It's not semantically clean.

@DavidTroidl
Copy link
Member

Actually, my last suggestion was intended to confine the subType to the maqqef and leave the type="x-ketiv" on the w elements. This would be cleaner and still accomplish the objective. The subType="x-ketiv" would then be a refinement of type="maqqef".

@jonathanrobie
Copy link
Contributor Author

Ah, I misunderstood you. I agree that would be cleaner for maqqefs used within ketiv.

What about <seg> elements that are not ketiv? What type and subtype would a maqqef have in that case?

@DavidTroidl
Copy link
Member

That's akin to asking what a w element would be, when it's not a ketiv. It would be just an ordinary word, or in this case an ordinary maqqef.

jonathanrobie added a commit to biblicalhumanities/morphhb that referenced this issue Jul 18, 2021
… added a leading alphabetic character, (2) single values for @type attribute in Ketiv readings.  See openscriptures#84, openscriptures#80.
@jonathanrobie
Copy link
Contributor Author

jonathanrobie commented Jul 18, 2021

I just created a pull request that fixes the id problem this one, it now validates.  This example shows how I treated maqqef inside and outside of the ketiv, did I understand your intent correctly?

<verse osisID="1Sam.9.1">
          <w ID="i09wci" lemma="c/1961" morph="HC/Vqw3ms">וַֽ/יְהִי</w><seg type="x-maqqef">־</seg><w ID="i09MpA" lemma="376" morph="HNcmsa">אִ֣ישׁ</w>
          <w ID="i09Una" type="x-ketiv" lemma="m/1121 a" morph="HR/Np">מ/בן</w><seg subType="x-maqqef" type="x-ketiv">־</seg><w ID="i09jgC" type="x-ketiv" lemma="3225" morph="HNp">ימין</w><note type="variant"><catchWord>מ/בן־ימין</catchWord><rdg type="x-qere"><w ID="i09EC9" lemma="m/1144" n="1.0.1" morph="HR/Np">מִ/בִּנְיָמִ֗ין</w></rdg></note>
          <note n="a">Adaptations to a Qere which L and BHS, by their design, do not indicate.</note>

@DavidTroidl
Copy link
Member

The id fix was only meant to be a temporary stopgap for testing. I am still discussing the ID format with the authors of the IDs.
Making the maqqef the subType is still going to cause problems.
I don't want to commit either of these to the files.

@jonathanrobie
Copy link
Contributor Author

jonathanrobie commented Jul 18, 2021 via email

@DavidTroidl
Copy link
Member

Thanks for your help.
I corrected the two token issue using type="x-maqqef" subType="x-ketiv".
This validates, using the temporary id correction.

@jonathanrobie
Copy link
Contributor Author

Thanks - I had misunderstood your earlier comment. This will work for me.

@DavidTroidl
Copy link
Member

Glad we came up with something that works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants