feat: add dart implementation #210

akshdeep-singh · 2023-07-20T00:18:28Z

Summary

Init Dart Package.

Add Features:

Conversion: Ascii to Gurmukhi
Gurmukhi Utils: Remove vishraams, First Letters, Split, Normalization

Test

PASS

…lization

bhajneet

Great start! I have a few questions/thoughts.

In addition, I would highly recommend you to refactor the tests to get data from the json files in the <root>/test folder. You check the readme there or see how the python/javascript test files did it. This will make it much easier to maintain test data across all the languages. Also it will make the logic of testing simpler (create wrapper functions, pull data, and loop through tests).

dart/LICENSE

dart/CHANGELOG.md

dart/README.md

dart/lib/src/gurmukhi.dart

bhajneet · 2023-07-20T02:04:38Z

dart/lib/src/gurmukhi.dart

+int _gurmukhiDiacriticComparator(int a, int b) {
+  return _getGurmukhiDiacriticOrder(a) - _getGurmukhiDiacriticOrder(b);
+}


Considering the other libs use an array elements position to do the same thing, is there a reason you opted for a verbose switch here?

I am not sure I understand you correctly, but python library uses sorting function to sort the diacritics:
https://github.com/ShabadOS/gurmukhi-utils/blob/e7d7527bcfb2a98b160a7957079a7c67694d7221/python/gurmukhiutils/unicode.py#L345C1-L354C22

In place if sorted(key=...), I have used sort(_gurmukhiDiacriticComparator).

Basically instead of using a switch (which is using up many lines of code), we can just use indexOf in Dart. So you could have an array called gurmukhiDiacriticOrder for example and check indexOf the two characters to get the sorting order.

final gurmukhiDiacriticOrder = <String>['਼', 'ੑ', 'ੵ', '੍', 'ਹ', 'ਰ', 'ਵ', 'ਟ', 'ਤ', 'ਨ', 'ਚ', 'ਿ', 'ੇ', 'ੈ', 'ੋ', 'ੌ', 'ੁ', 'ੂ', 'ਾ', 'ੀ', 'ਁ', 'ੱ', 'ਂ', 'ੰ', 'ਃ', ];

Effectively you're doing the same thing with the switch (iterating case by case until you find a match). This would do the same thing but with a single line of code.

The only thing you have to check is if it messes up the _isGurmukhiDiacritic function used in splitGurmukhi. As far as I can reason, it should not be a problem.

This let's you also control which characters can be used after a virama.

I want to avoid String.indexOf() method, it will be relatively more expensive with worst case complexity O(n), what if we the user wants to convert a very long String?

I can use HashMap instead, it will reduce the number of lines, look simpler, and with O(1) lookup.

Awesome, the performance improvement is worth the extra lines imo.

Last question: How are you going to handle consonants that are not supposed to be subscript letters? For example any letter that's not ਹ, ਰ, ਵ, ਟ, ਤ, ਨ, or ਚ?

Okay, this is what I think we should do (since we have a few things going on now), please let me know your thoughts:

Allow any VIRAMA + CONSONANT sequence (this includes things like ੍ + [ੲ, ਅ, ੳ] which is quite odd!)

Include a linter/checker that warns odd/unusual sequences (like the virama + vowel above) and errors for malformed sequences (e.g. Halant/Udaat + Virama on the same character). This can be done in a separate PR. For the linter, I suggest the following:

normal (no warning/error): ਹ, ਰ, ਵ, ਯ

warn (unsupported by Android, we can term them "archaic/historical"): ਟ, ਤ, ਨ, ਚ, ਠ, ਗ, ਦ, ਚ, ਮ, ਥ, ਖ

error: any other character

Normalize Gurmukhi function can take a flag to normalize ਡ + ਼ = ਡ਼ (would like a specific name for the flag of sorts if possible). This also can be done in a new PR if desired. (Edit: it might also be nice if a linter warned that this may be intended to be a ਡ਼ when it comes across the ਡ + ਼ sequence)

Just providing some insight here: The reason I would like to know about any potential incorrect sequences is for example when we generate an API that could be consumed by a hukamnama bot or such (used in places like Discord, WhatsApp, Telegram, etc.). We already know what to do with yayya, I'm really not sure what we do about the unsupported pairin-akhar, but at the least we might be able to provide a note to end users that the text can't be accurately displayed and to use the website/app to see it properly (using Sant Lipi).

I have some points:

How do you want handle invalid input from user? I have seen in some tests ascii+unicode mixed string is provided, which is in invalid input. What if the user inputs an Ascii letter not having any mapping? What would you like to with a string of random gurmukhi letters?

If you check the behaviour of the text render engine, for following two letters: ਕ੍ਰ ਕ੍ੳ, if you try to move cursor left/right on them, it behaves as they are a single letter.

For linter/checker: It definitely needs a different PR and proper discussion. For virama part, we can give the user an optional parameter so that list of valid subscript letters can be set. It can be similar as in English, red underlines for spelling mistakes (invalid sequence of chars), and green underlines for grammar mistakes (valid sequence but meaningless).

Normalization flag can be named such as - nukta, nuktaNormalization, etc. I have checked on Google fonts, I believe ਡ + ਼ = ਡ਼ is more of font specific, rather than platform specific.

Running toUnicode should convert any ascii that is meant to be converted to gurmukhi, and then normalize all the gurmukhi as best as it can. It's up to the user to provide the correct string. Similarly, going to ascii should only convert unicode characters and ignore the existing ascii, and then normalize all ascii sequences as best as it can (very important for the user to know how the functions work). In either function, any characters that are not meant to be changed should be left untouched. (Edit: but of course, normalization can mess things up especially in the toAscii function, that's just up to the user to understand, and we can also let them do toAscii without normalization by running individual functions or adding flags (Edit Edit: it should also be noted that we really need to focus more on moving people towards unicode, I dont even know what the toascii function is for tbh, it is something we can expose in an API for people making apps that don't use unicode, but it's not going to be something we use in our first-party apps).

That is interesting! I did not know that

I like the idea of adding some linter configuration to control which errors/warnings to show/hide

Yes, this is a font choice. It happens on android keyboards as well as iOS keyboards (because it's built into Noto Gurmukhi as well as Mukta Mahee). The flag nuktaNormalization makes me think it will fix normal nukta letters like ਸ + ਼ = ਸ਼, it doesn't make me think it's going to take a non-normal nukta character and change it into a ੜ. I think we need a better phrase/term.

Just sharing an image showing the font features for Noto Gurmukhi:

The virama part, which is in sort diacritics, will only happen if user inputs a string with a virama between invalid pair of characters (I wouldn't call it a mess up from our side). As for current case, for asciiToGurmukhi(), this part is not a problem (ASCII fonts don't have virama mapping).

What would you actually normalize ਡ + ਼ into? I checked ਡ਼ isn't any Unicode character. We can try to normalize ਡ + ਼ => ਡ + ZWNJ + ਼, but that again works only for some fonts (Works on Noto Sans but not on Noto Serif).

I think normalization and linter should be discussed in a separate issue, they need more research and input from others.

It might be better to check ਡ + ਼ in the linter only. The Sant Lipi font only shows it as ਡ਼ because of the following rule:

As you can see, we can even have characters like ਫ਼਼ (faffa with nukta, or I guess phappha with two nuktas). So it's kind of up to us how we want to render the code (just following other font conventions for now)

dart/pubspec.yaml

Co-authored-by: Bhajneet S.K. <[email protected]>

akshdeep-singh · 2023-07-22T10:27:38Z

I have refactored the tests to get data from the Json files.

I have 2 assertions failing (for functions unicode3, and unisant2):

"iBÎo": "ਭਿ੍ਯੋ"
"kIÎo": "ਕੀ੍ਯੋ"

What I understand is you want to sort diacritics with virama for ਹ, ਰ, ਵ, ਟ, ਤ, ਨ, and ਚ but with the exception of ਯ?

The expected strings ਭਿ੍ਯੋ and ਕੀ੍ਯੋ are themselves invalid Unicode sequence.
It's better if 1st is converted into ਭ੍ਯਿੋ, for the 2nd I don't know.

As Unicode Standard doesn't have ਯ variants as of now, I think we should not try to work around in these cases. We should handle them with Sant-Lipi modifiers.

Readme: Add reference to SantLipi

bhajneet · 2023-07-22T22:22:26Z

I would love to show as accurately as possible. This would require some testing on different platforms / font rendering engines. Very curious what this shows:

ਭ੍ਯਿੋ

As for the other example (where the preceding consonant has a right-side vowel/modifier) of ਕੀ੍ਯੋ, I don't think that will work in any platform/font rendering engine. I'd be very impressed if it did work. We had to create Sant Lipi almost purely for this real issue.

Related shabados/mobile#194

akshdeep-singh · 2023-07-23T09:24:43Z

੍ਯ is not present in SGGS, and SDG has variants of ਯ. So, if the user wants to handle ਯ variants, he has to use Sant-Lipi.

I strongly believe that unicode, santlipi, and unisant test functions should not be expected to give same output in the case of ਯ. It will also let the user to choose the conversion type he wants.

For example:

ascii	unicode	santlipi	unisant
iBÎo	ਭ੍ਯਿੋ	ਭਿ꠳ਯੋ	ਭਿ੍ਯੋ
kIÎo	ਕ੍ਯੋੀ	ਕੀ꠳ਯੋ	ਕੀ੍ਯੋ

looks correct on my devices (Windows, Android).

bhajneet · 2023-07-26T16:44:58Z

Sorry for the delay. Some thoughts on next actions / clarity:

Unicode and unisant functions should return exactly the same output. The whole point of unisant function is to confirm sant lipi markup can be properly converted to unicode
We have to decide on what our unicode standard is basically. This depends on how well all platforms/engines can render the words. For example ਭ੍ਯਿੋ seems to be working, and we can create a rule that says if there is only left-hand vowels on the preceding consonant and none on the half-yayya following then to render it properly in unicode. This is up to us as a team, our parameter is simply to ensure it can be displayed accurately (or with good information) to the end user.

Some actions for you please:

Please create an issue (task) to implement a linter/checker and add some of the info from this PR into there for tracking purposes (or at the least reference this PR link to add reference)
Same thing for normalization / ਡ + ਼ = ੜ and such

This will help us simplify the scope of this PR

Add Addha Yayya Exception

akshdeep-singh · 2023-07-27T19:10:00Z

I also think ੍ਯ should be considered as exception for sorting diacritics in case of Virama.

I have also added ਿ as a further exception in this case as it looks better.
You can see my change for reference.

So, all the tests are passing finally!
Except ਭਿ੍ਯੋ, my result is ਭ੍ਯਿੋ.

Please update ਭ੍ਯਿੋ in the tests.

bhajneet · 2023-07-28T13:58:08Z

Awesome! We are pretty close to finishing up this PR (I still have to do a final review)

Please do the following:

Update global test for ਭ੍ਯਿੋ
Update global readme with Dart support (there is a progress chart under ## WIP)

I will open an issue to investigate ੍ਯ patterns.

Update WIP in global README

bhajneet · 2023-07-28T17:31:02Z

Thank you very much for that! Would you agree that this PR is ready for a final review? Is there anything else you'd like to do?

akshdeep-singh · 2023-07-28T17:39:40Z

Thanks! I would add more features in later PRs.

This PR is more than enough for now.

bhajneet

Tests might be weakly implemented for the future, but it's more than enough for today. We've already reviewed the rest of the PR and I've enjoyed discussing the finer points of gurmukhi-utils with you. Everything looks great! Congratulations on your first PR! Thank you for all the hard work, and looking forward to future contributions :)

akshdeep-singh added 11 commits July 20, 2023 05:39

init

0095c8f

add conversion: ascii to gurmukhi

ae9c1f5

conversion: add more ascii to gurmukhi mappings

bdd2c96

Add Gurmukhi Utils - Remove Vishraams - First Letters - Split - Norma…

fea2f2f

…lization

Add support for SantLipi modifiers

9d5093d

Export Library

f0b0185

Add Example

dc2f2c8

Add License

21cfb1f

Add Tests

72abb17

Fix 'Î' mapping

eca63d8

Add Readme

896d16b

bhajneet reviewed Jul 20, 2023

View reviewed changes

bhajneet changed the title ~~[dart] feat: ascii to gurmukhi conversion, and gurmukhi utils~~ feat: add dart implementation Jul 20, 2023

akshdeep-singh and others added 5 commits July 21, 2023 13:55

Update dart/pubspec.yaml

2808d30

Co-authored-by: Bhajneet S.K. <[email protected]>

Change version, Remove License

94cfa17

Use map instead of switch/case

a312199

Refactor tests to get data from json files

2cf3742

Fix SantLipi normalise to Unicode

9db6595

akshdeep-singh added the .dart Dart label Jul 22, 2023

akshdeep-singh added 2 commits July 22, 2023 16:39

Update Contributing

aef03bf

Remove half-yayya workaround

87ea5fd

Readme: Add reference to SantLipi

akshdeep-singh added 2 commits July 27, 2023 23:35

Add Conversion: Number to Gurmukhi

b91e17f

Fix Normalization:

2f1574d

Add Addha Yayya Exception

This was referenced Jul 27, 2023

Implement a linter/checker #212

Open

Handling nukta normalization #213

Open

bhajneet mentioned this pull request Jul 28, 2023

Investigate rules for ੍ਯ patterns #214

Open

Update ਭ੍ਯਿੋ global test

65d6d95

Update WIP in global README

akshdeep-singh requested a review from Harjot1Singh as a code owner July 28, 2023 16:29

akshdeep-singh removed the request for review from Harjot1Singh July 28, 2023 16:34

bhajneet approved these changes Jul 29, 2023

View reviewed changes

bhajneet merged commit 31a3544 into shabados:main Jul 29, 2023
2 checks passed

akshdeep-singh deleted the dart_ascii_to_gurmukhi branch July 29, 2023 12:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add dart implementation #210

feat: add dart implementation #210

akshdeep-singh commented Jul 20, 2023

bhajneet left a comment

bhajneet Jul 20, 2023

akshdeep-singh Jul 21, 2023

bhajneet Jul 21, 2023

akshdeep-singh Jul 21, 2023

bhajneet Jul 21, 2023

bhajneet Jul 21, 2023 •

edited

Loading

akshdeep-singh Jul 21, 2023 •

edited

Loading

bhajneet Jul 21, 2023 •

edited

Loading

akshdeep-singh Jul 21, 2023

bhajneet Jul 22, 2023

akshdeep-singh commented Jul 22, 2023

bhajneet commented Jul 22, 2023

akshdeep-singh commented Jul 23, 2023 •

edited

Loading

bhajneet commented Jul 26, 2023

akshdeep-singh commented Jul 27, 2023

bhajneet commented Jul 28, 2023

bhajneet commented Jul 28, 2023

akshdeep-singh commented Jul 28, 2023

bhajneet left a comment

feat: add dart implementation #210

feat: add dart implementation #210

Conversation

akshdeep-singh commented Jul 20, 2023

Summary

Test

bhajneet left a comment

Choose a reason for hiding this comment

bhajneet Jul 20, 2023

Choose a reason for hiding this comment

akshdeep-singh Jul 21, 2023

Choose a reason for hiding this comment

bhajneet Jul 21, 2023

Choose a reason for hiding this comment

akshdeep-singh Jul 21, 2023

Choose a reason for hiding this comment

bhajneet Jul 21, 2023

Choose a reason for hiding this comment

bhajneet Jul 21, 2023 • edited Loading

Choose a reason for hiding this comment

akshdeep-singh Jul 21, 2023 • edited Loading

Choose a reason for hiding this comment

bhajneet Jul 21, 2023 • edited Loading

Choose a reason for hiding this comment

akshdeep-singh Jul 21, 2023

Choose a reason for hiding this comment

bhajneet Jul 22, 2023

Choose a reason for hiding this comment

akshdeep-singh commented Jul 22, 2023

bhajneet commented Jul 22, 2023

akshdeep-singh commented Jul 23, 2023 • edited Loading

bhajneet commented Jul 26, 2023

akshdeep-singh commented Jul 27, 2023

bhajneet commented Jul 28, 2023

bhajneet commented Jul 28, 2023

akshdeep-singh commented Jul 28, 2023

bhajneet left a comment

Choose a reason for hiding this comment

bhajneet Jul 21, 2023 •

edited

Loading

akshdeep-singh Jul 21, 2023 •

edited

Loading

bhajneet Jul 21, 2023 •

edited

Loading

akshdeep-singh commented Jul 23, 2023 •

edited

Loading