Skip to content

Commit

Permalink
Accept SUGGEST instead of &SUGGEST
Browse files Browse the repository at this point in the history
also SUGGESTWF instead of &SUGGESTWF

cf. #73
  • Loading branch information
unhammer committed Apr 30, 2024
1 parent e434139 commit 0772649
Show file tree
Hide file tree
Showing 34 changed files with 577 additions and 577 deletions.
60 changes: 30 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -708,11 +708,11 @@ interface.

We can add a suggestion as well with a `COPY` rule:

COPY:msyn-hallan (Inf &SUGGEST) EXCEPT (Imprt Pl1 Dial/-KJ) TARGET (Imprt Pl1 Dial/-KJ &real-hallan) ;
COPY:msyn-hallan (Inf SUGGEST) EXCEPT (Imprt Pl1 Dial/-KJ) TARGET (Imprt Pl1 Dial/-KJ &real-hallan) ;

This creates a new reading where the tags `Imprt Pl1 Dial/-KJ` have
been changed into `Inf &SUGGEST` (and other tags are unchanged). The
`&SUGGEST` tag is necessary to get `divvun-suggest` (the `<suggest>`
been changed into `Inf SUGGEST` (and other tags are unchanged). The
`SUGGEST` tag is necessary to get `divvun-suggest` (the `<suggest>`
module) to try to generate a form from that reading. It is smart
enough to skip things like weights, tracing and syntax tags when
trying to suggest, but all morphological tags need to be correct and
Expand Down Expand Up @@ -824,7 +824,7 @@ instead of deleting the word "dego" to the left, we should change the
case of the word "lávvomuorran" from essive to nominative case:

ADD (&syn-dego-nom) TARGET Ess IF (-1 ("dego"));
COPY (Sg Nom &SUGGEST) EXCEPT (Ess) TARGET (&syn-dego-nom) ;
COPY (Sg Nom SUGGEST) EXCEPT (Ess) TARGET (&syn-dego-nom) ;

Here we want to keep the suggestions for `&syn-dego-nom` separate from
the suggestions for `&syn-not-dego` – in particular, we don't want to
Expand All @@ -836,12 +836,12 @@ same time. But if we use the above rules, CG gives us this output:
:
"<lávvomuorran>"
"lávvomuorra" N Ess @COMP-CS< &syn-not-dego ID:12 R:DELETE1:11
"lávvomuorra" N Sg Nom @COMP-CS< &syn-dego-nom ID:12 R:DELETE1:11 &SUGGEST
"lávvomuorra" N Sg Nom @COMP-CS< &syn-dego-nom ID:12 R:DELETE1:11 SUGGEST

Notice how the DELETE relation is on both readings, and also how how
the relation target id (`11`) refers to a cohort, not a reading of a
cohort. There is no way from this output to know that "dego" should
not also be deleted from the `&SUGGEST` reading.
not also be deleted from the `SUGGEST` reading.

So when there are such multiple alternative interpretations for errors
spanning multiple words, the less central parts ("dego" above) need a
Expand Down Expand Up @@ -873,7 +873,7 @@ changed to "boahtit" (infinitive). Alternatively, only the first part
is changed and the second part remains unchanged. In this case we can
change the "soaitá" (3.Sg.) to the adverb "kánske".

As usual, this requires `&SUGGEST` readings for the parts that are two
As usual, this requires `SUGGEST` readings for the parts that are two
be changed, and one unique error tag for each interpretation, ie.
`&msyn-kánske` for the "Kánske boađán" correction and
`&msyn-fin_fin-fin_inf` for the "Soaittán boahtit" correction.
Expand Down Expand Up @@ -966,7 +966,7 @@ Then you can first of all turn that blanktag tag into an error tag with

Now, we could just suggest a wordform on the comma and call it a day:

COPY ("<, >" &SUGGESTWF) TARGET ("," &no-space-after-punct-mark) ;
COPY ("<, >" SUGGESTWF) TARGET ("," &no-space-after-punct-mark) ;

but that will

Expand Down Expand Up @@ -996,7 +996,7 @@ word is a "link" word. In the above rules,

Then we can add a suggestion that puts a space between the forms:

COPY:no-space-after-punct ("<$1 $2>"v &SUGGESTWF)
COPY:no-space-after-punct ("<$1 $2>"v SUGGESTWF)
TARGET ("<(.*)>"r &no-space-after-punct-mark)
IF (1 ("<(.*)>"r))
(NOT 0 (co&no-space-after-punct-mark))
Expand All @@ -1014,7 +1014,7 @@ We don't put a suggestion-tag on the `co&` cohort (here the word
`<ja>`), which would lead to some strange suggestions since it is
already part of the suggestion-tag on the comma `<,>` cohort. See
[How underlines and replacements are built](#orgb25740d) for more
on the relationship between `&SUGGESTWF` and replacements.
on the relationship between `SUGGESTWF` and replacements.

Now the output is

Expand All @@ -1024,7 +1024,7 @@ Now the output is
"3" Num Arab Sg Ill Attr @HNOUN
"<,>"
"," CLB <NoSpaceAfterPunctMark> &no-space-after-punct-mark ID:3 R:RIGHT:4
"," CLB <NoSpaceAfterPunctMark> "<, ja>" &no-space-after-punct-mark &SUGGESTWF ID:3 R:RIGHT:4
"," CLB <NoSpaceAfterPunctMark> "<, ja>" &no-space-after-punct-mark SUGGESTWF ID:3 R:RIGHT:4
"<ja>"
"ja" CC @CNP co&no-space-after-punct-mark ID:4

Expand Down Expand Up @@ -1126,17 +1126,17 @@ Note that the readings added by the speller don't include any error
tags (tags with `&` in front). To turn these readings into error
underlines and actually show the suggestions, add a rule like

ADD (&typo &SUGGESTWF) (<spelled>) ;
ADD (&typo SUGGESTWF) (<spelled>) ;

to the grammar checker CG. The reason we add `&SUGGESTWF` and not
`&SUGGEST` is that we're using the wordform-tag directly as the
to the grammar checker CG. The reason we add `SUGGESTWF` and not
`SUGGEST` is that we're using the wordform-tag directly as the
suggestion, and not sending each analysis through the generator (as
`&SUGGEST` would do). See also the next section on how replacements
`SUGGEST` would do). See also the next section on how replacements
are built. So if, after disambiguation and grammarchecker CG's, we had

"<coffes>"
"coffee" N Pl <W:37.3018> <WA:17.3018> <spelled> "<coffees>" &typo &SUGGESTWF
"coffer" N Pl <W:39.1010> <WA:17.3018> <spelled> "<coffers>" &typo &SUGGESTWF
"coffee" N Pl <W:37.3018> <WA:17.3018> <spelled> "<coffees>" &typo SUGGESTWF
"coffer" N Pl <W:39.1010> <WA:17.3018> <spelled> "<coffers>" &typo SUGGESTWF

then the final `divvun-suggest` step would simply use the contents of
the tags
Expand Down Expand Up @@ -1181,38 +1181,38 @@ different parts of the error](#orge26043f) for more info on this.
By default, *a cohort's word form is used to construct the
replacement*. So if we have the sentence "we was" where "was" is
**central** and tagged `&typo`, and there's a `LEFT` relation to "we",
then the default replacement if there were no `&SUGGEST` tags would
then the default replacement if there were no `SUGGEST` tags would
simply be the input "we was" (which would be filtered out since it's
equal, giving no suggestions).

If we now add a `&SUGGEST` reading on "we" that generates "he" then we
get a "he was" suggestion. `&SUGGEST` readings with matching
If we now add a `SUGGEST` reading on "we" that generates "he" then we
get a "he was" suggestion. `SUGGEST` readings with matching
(co-)error tags are prioritised over input word form.

If we also have a `&SUGGEST` for was→are for the possible replacment
If we also have a `SUGGEST` for was→are for the possible replacment
"we are" (tagged `&agr`) – now we don't want both of these to apply at
the same time giving *"we is". In this case, we need to ensure we have
disambiguating `co&errtype` tags on the `&SUGGEST` readings. The
disambiguating `co&errtype` tags on the `SUGGEST` readings. The
following CG parse:

"<we>"
"we" Prn &agr ID:1 R:RIGHT:2
"he" Prn &SUGGEST co&agr-typo ID:1 R:RIGHT:2
"he" Prn SUGGEST co&agr-typo ID:1 R:RIGHT:2
:
"<was>"
"be" V 3Sg &agr-typo ID:2 R:LEFT:1
"be" V 3Pl co&agr &SUGGEST ID:2 R:LEFT:1
"be" V 3Pl co&agr SUGGEST ID:2 R:LEFT:1

will give us all and only the suggestions we want ("he was" and "we
were", but not *"he were").

There is one exception to the above principles; for
backwards-compatibility, `&SUGGESTWF` is still used to mean that the
whole underline should be replaced by what's in `&SUGGESTWF`. This
means that if you combine `&SUGGESTWF` with `RIGHT/LEFT`, you will not
backwards-compatibility, `SUGGESTWF` is still used to mean that the
whole underline should be replaced by what's in `SUGGESTWF`. This
means that if you combine `SUGGESTWF` with `RIGHT/LEFT`, you will not
automatically get the word form for the relation target(s) in your
replacement, you have to construct the whole replacement yourself.
This also means you cannot combine `&SUGGESTWF` with `&SUGGEST` on
This also means you cannot combine `SUGGESTWF` with `SUGGEST` on
other words. (If we ever change how this works, we will have to first
update many existing CG3 rules.)

Expand All @@ -1233,10 +1233,10 @@ don't conflict with the below special tags.

### Tags

- `&SUGGEST` on a reading means that `divvun-suggest` should try to
- `SUGGEST` on a reading means that `divvun-suggest` should try to
generate this reading into a form for suggestions, using the
generator FST. See [Simple grammarchecker.cg3 rules](#org0955ce1).
- `&SUGGESTWF` on a reading means that `divvun-suggest` should use the
- `SUGGESTWF` on a reading means that `divvun-suggest` should use the
reading's wordform-tag (e.g. a tag like

"<Cupertino>"
Expand Down
18 changes: 9 additions & 9 deletions src/suggest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -234,17 +234,17 @@ const Reading proc_subreading(const string& line, bool generate_all_readings) {
if (tag == "COERROR") {
r.coerror = true;
}
else if (tag == "&SUGGEST" || tag == "SUGGEST") { // &SUGGEST kept for backward-compatibility
r.suggest = true;
}
else if (tag == "&SUGGESTWF" || tag == "SUGGESTWF") { // &SUGGESTWF kept for backward-compatibility
r.suggestwf = true;
}
else if (result.empty()) {
gentags.push_back(tag);
}
else if (result[2].length() != 0) {
if (tag == "&SUGGEST") {
r.suggest = true;
}
else if (tag == "&SUGGESTWF") {
r.suggestwf = true;
}
else if (tag == "&ADDED" || tag == "&ADDED-AFTER-BLANK") {
if (tag == "&ADDED" || tag == "&ADDED-AFTER-BLANK") {
r.added = AddedAfterBlank;
}
else if (tag == "&ADDED-BEFORE-BLANK") {
Expand Down Expand Up @@ -612,7 +612,7 @@ if(verbose) std::cerr << "\t\033[0;35mr.suggest=" << tr.suggest << "\033[0m" <
reps_suggestwf.push_back(fromUtf8(withCasing(tr.fixedcase, casing, sf)));
}
else {
std::cerr << "divvun-suggest: WARNING: Saw &SUGGESTWF on non-central (co-)cohort, ignoring" << std::endl;
std::cerr << "divvun-suggest: WARNING: Saw SUGGESTWF on non-central (co-)cohort, ignoring" << std::endl;
}
}
if(verbose) std::cerr << "\t\t\033[1;36msform=\t'" << sf << "'\033[0m" << std::endl;
Expand Down Expand Up @@ -719,7 +719,7 @@ variant<Nothing, Err> Suggest::cohort_errs(const ErrId& err_id, size_t i_c,
UStringVector rep;
for (const Reading& r : c.readings) {
if(r.errtypes.find(err_id) == r.errtypes.end()) {
continue; // We consider sforms of &SUGGEST readings in build_squiggle_replacement
continue; // We consider sforms of SUGGEST readings in build_squiggle_replacement
}
// If there are LEFT/RIGHT added relations, add suggestions with those concatenated to our form
// TODO: What about our current suggestions of the same error tag? Currently just using wordform
Expand Down
2 changes: 1 addition & 1 deletion src/suggest.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ struct Reading {
StringVector sforms;
relations rels; // rels[relname] = target.id
rel_id id = 0; // id is 0 if unset, otherwise the relation id of this word
string wf; // tag of type "wordform"S for use with &SUGGESTWF
string wf; // tag of type "wordform"S for use with SUGGESTWF
bool suggestwf = false;
bool coerror = false; // cohorts that are not the "core" of the underline never become Err's; message template offsets refer to the cohort of the Err
Added added = NotAdded;
Expand Down
Loading

0 comments on commit 0772649

Please sign in to comment.