Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add Korean Cheonjiin input #454

Open
ilovemegf opened this issue Mar 8, 2024 · 24 comments
Open

Feature request: Add Korean Cheonjiin input #454

ilovemegf opened this issue Mar 8, 2024 · 24 comments
Labels
languages Dictionary or language related issues

Comments

@ilovemegf
Copy link

Hello,
Amazing app by the way!
Would it be possible to add Korean inputs into this?
I don't think we need predictive typing as Korean is super straight forward to type using a T9 keyboard.
As a person who speaks both English and Korean it would be awesome if this can support them both :).

@sspanak
Copy link
Owner

sspanak commented Mar 8, 2024

And hello to you!

I have nearly zero knowledge about the language. I only know you combine dots and letters to form vowels and consonants, but how it all works is beyond me. Could you please point me to an article, preferably something wiki-like, where I can first educate myself? Or, if you can explain how letters are built and what to put on each key, it would be even better.

Also, have in mind including Korean may not happen so soon. As you can see, many other bugs and features are in the queue.

@sspanak sspanak added the languages Dictionary or language related issues label Mar 8, 2024
@ilovemegf
Copy link
Author

ilovemegf commented Mar 8, 2024 via email

@sspanak
Copy link
Owner

sspanak commented Mar 9, 2024

Keys from 4 to 0 are all consonants, and typing them should be fairly
obvious:
ㄱ = 4
ㅋ = 44
ㄴ = 5
ㄹ = 55

"ㄹ = 55" isn't obvious. I would have thought this is 45, because it looks like a combination of "ㄱ = 4" and "ㄴ = 5". Anyway, I've tried out Korean with Gboard and it seems to work exactly according to your explanations. I should be able to use it for reference later on.

I have more questions though. Do we want to keep the current functionality of holding a number key to type a number? If so, how do you type punctuation and other special characters? The only way I see it is, typing numbers is only possible in 123 mode, and holding 0-key and 1-key will show the special character lists (as if 0 or 1 were single-pressed in English). Or is there another way?

As for typing letters, I have only one concern. Currently, the entire app is designed to handle only the numeric keys for typing. All others are considered functional keys and pressing them results in performing some predefined action. The actions are listed when you configure the hotkeys.

Typing Korean will definitely require a new input mode, as it doesn't fit the current Predictive or ABC which are designed for European languages. And rerouting one more key to serve as Korean space, instead of the current space on 0-key, is not a problem. The thing is, on a standard keypad, the only available extra keys are #, *, and they are already taken. By default, pressing * opens the Settings, holding it adds a word to the dictionary; pressing # changes the typing mode (this means either switch to ABC/123/Predictive, or change the text case) and holding it changes the language.

I believe the functions on # are a must-have.

A potential solution is dropping the Add Word functionality (well, just unassigning it from the Settings). It won't be necessary, because in Korean one always types letter-by-letter. Or it could be the "Show Settings" function, but I think it would be more useful. Then, I could add an extra "Korean Break Word" or "Korean Space" function, which could be assigned from the Hotkeys settings to whatever physical key you like. The thing is, I don't like the idea. It is a very special exception to general workflow.

Having said all he above, I'm open to other suggestions, how to fit the two extra letters and the Spacebar on the keypad.

And finally, the on-screen virtual keypad must also be redesigned, but it should be easier. I guess I'll replace the hardcoded "?" with space and "!" will be the special character key.

@ilovemegf
Copy link
Author

ilovemegf commented Mar 9, 2024 via email

@sspanak
Copy link
Owner

sspanak commented Mar 20, 2024

See how they have two buttons ([#1], 한/영) where * on a physical T9 keypad
would be

Attachments from emails appear corrupted here. I can not see the image.

Nevertheless, I think I got the idea. I'll consider all the ideas and information you have shared, but honestly, I don't want to do too many customizations, because no other language needs them, and because such code usually becomes hard to maintain as the everything around it evolves. I'll think about it and figure out the best course of action, when I get back to Korean.

This means that it does not need <- and -> buttons

The physical keys move the text cursor to the left and right. When the cursor is at the edge of a text field, they select the closest button or menu item. This is crucial for phones without touchscreen. And even on the rest of them, it is quite useful. I even intend to add the same functionality to the on-screen keyboard.

Which brings me to bugs. In facebook messenger, when I press ok it doesn't
send. The ok button acts like "enter" on a normal keyboard.

Perhaps, you misunderstood how the option works? See the manual for v29.0, in particular, the OK key section and the Compatibility section. If you still think something is not right, please create a new issue. I would like to keep the board tidy and have each problem in a separate issue.

@chrissy0
Copy link

chrissy0 commented Jul 14, 2024

Just wanted to bump this up. As an expat living in Korea with an Android flip phone, having this functionality would be a dream. Maybe I can look into this myself when I have some free time!

@chrissy0
Copy link

chrissy0 commented Jul 14, 2024

In order to prevent massive changes, I want to propose reassigning the , , and keys. This is the layout I'm fiddling with:

layout:
  - [SPECIAL] # 0
  - [PUNCTUATION] # 1
  - [.] # 2
  - [ㅣ, ㅡ] # 3
  - [ㅇ, ㄱ, ㅋ] # 4
  - [ㅁ, ㄴ, ㄹ] # 5
  - [ㄷ, ㅌ] # 6
  - [ㅂ, ㅍ] # 7
  - [ㅅ, ㅎ] # 8
  - [ㅈ, ㅊ] # 9

It's not ideal and natives that are used to the usual way will probably hate it. But for me, I just need some way to somewhat comfortably type Korean on my phone.

Challenges:

  • In order to type , I have to press 5 5 3 2 2 (ㄴㅣ..). In ABC mode (called 가나다 in Korean), typing 2 2 quickly will result in 2 rather than ...
  • There needs to be some post-processing to turn ㄴㅣ.. into . Could you point me to the part of the code where this kind of thing is possible/would fit in well, and maybe give me your thoughts on why it may be difficult, what could go wrong, etc?

Also, is it possible to turn of dictionary/predictive mode for one language?

You might not like this approach, and that's fine. I'm okay with my changes not being merged. I'd still like to develop this for my personal use.

@sspanak
Copy link
Owner

sspanak commented Jul 18, 2024

First of, your post reminds me that I don't actually know the correct layout. I only got that:
1 = ㅣ, 2 = dot, 3 = ㅡ. But what goes on the keys 4-9, and 0?

Could you point me to the part of the code where this kind of thing is possible/would fit in well, and maybe give me your thoughts on why it may be difficult, what could go wrong, etc?

Well... no. It is not possible to do it in a few lines of code or explain it in a couple of sentences. The post-processing to combine the vowels and the consonants does require massive changes.

If you want to experiment, you could roughly do the following:

  1. Disable the language validation in app/build.gradle, because it is not even possible to build the project with this .yml file.
  2. Add the locale name and the dictionary file. Use a dictionary containing one word.
  3. Optionally, fix the validation in DictionaryLoader. It will fail, because the word will not contain any of the "alphabet" characters.
  4. Go to app/java/io.github.sspanak.tt9/ime/ and have fun... Basically, I can't give you hints, because I am not sure how to do it myself. I know there must be a new mode, for example ModeCheonjin in the modes directory. It is where most of the magic will happen. Probably, there will be small changes in the XXXHandler files to make them work with the new mode.
    That's all I have in mind at the moment, nothing specific.

All in all, this is going to be a difficult feature to implement. Beware, you may waste a lot of time, if you start doing it.

@chrissy0
Copy link

I appreciate your answer and warning. It sounds like a bigger project than I have time for right now. But maybe I'll revisit this later. Thank you for your time.

@sspanak
Copy link
Owner

sspanak commented Jul 19, 2024

I appreciate your answer and warning. It sounds like a bigger project than I have time for right now. But maybe I'll revisit this later. Thank you for your time.

No problem. When I started, I was also thinking that creating a keyboard is a much simpler task.

your post reminds me that I don't actually know the correct layout. I only got that:
1 = ㅣ, 2 = dot, 3 = ㅡ. But what goes on the keys 4-9, and 0?

Answering myself, Gboard has a 10-key layout that can be used as an example.

@chrissy0
Copy link

There seem to be many different layouts floating around. Attaching some here. Maybe some are more similar/easily adjustable to the current state of the software.

image
image
image

@jibsaramnim
Copy link

Samsung, LG, and some others all had their own variations on the keyboard layout back then, which is why you can find some variations. I'm not sure which one ended up being the one modern Android went with or if it's customizable in its settings, but I suspect that eventually most might have settled on using Samsung's as the default?

If it helps, a modern Android-powered flip phone I have from a local company has gone with the layout pretty much matching Samsung's layout (the third picture). The only addition to the layout shown in the picture is that the asterisk button also shows icons indicating it's the one you use to switch between Hangeul, roman, and numerical input.

@sspanak
Copy link
Owner

sspanak commented Jul 19, 2024

Thanks for the sample photos!
The layouts all look pretty much the same. Only the special characters and the functions are distributed differently. And Gboard is also very similar to them, which is great. I can use it as an example.

@ariehcore
Copy link

I'm certain this is probably far off still, but I am also looking for an alternate T9 korean keyboard to use as the default one on mive style folder (also follows chujin/samsung layout) does not allow for predictive text in English or Korean either.

I'm not too experienced in working with Android from development perspective (only tried to modify a much smaller project), but if there's any way I can help test this when you do work on the feature, I am very willing to help :)

@sspanak
Copy link
Owner

sspanak commented Sep 19, 2024

Sure, I will upload a testing APK when I have something working. I have no intention of releasing untested code anyway.

And, I am getting closer to this task. Please, have some more patience.

@sspanak sspanak changed the title Feature request: Add Korean Chunjiin input Feature request: Add Korean Cheonjiin input Oct 21, 2024
@sspanak
Copy link
Owner

sspanak commented Oct 22, 2024

Hey, folks!

Could someone please provide examples of words and their respective digit combinations? I would appreciate examples of all possible cases, such as no ending consonants/with ending consonants, double ones, standalone vowels (if possible), any special cases, like the already mentioned 0-key alternatives, and whatnot. I would also be great if the digit combinations cover all keys from 0 to 9.

Scratch the above, here is my summary of Korean. Below is a list of all possible building blocks and their respective digit combinations.

INITIAL CONSONANTS

Latin Korean T9
G 4
K* 44
KK 444
N 5
R/L 55
D 6
T* 66
TT 666
B 7
P* 77
PP 777
S 8
H 88
SS 888
J 9
CH* 99
JJ 999
NG 0
M 00

VOWELS

"ㆍ" = 2 and ":" = 22 are never used separately.

Latin Korean T9
I 1
EU 3
A 12
AE 121
YA 122
YAE 1221
EO 21
E 211
YEO 221
YE 2211
YO 223
O 23
WA 2312
WAE 23121
OE 231
UI 31
U 32
WO 3221
WE 32211
WI 321
YU 322

FINAL CONSONANTS (optional)

All single initial ones + the following:

Latin Korean T9
KK (also initial) 444
SS (also initial) 888
GS 48
LG 554
LB 557
LS 558
LT 5566
LP 5577
LH 5588
LM 5500
NH 588
NJ 59
BS 78
  • To type a character one needs to combine: any inital consonant + any vowel + optionally, any final consonant.
  • Pressing space or OK "ends" the current character
  • Typing one more digit, that does not belong to the end of any character digit sequence, "ends" the current character and "starts" a new one.
  • Naturally, punctuation, space, newline and whatnot also "end" the current character.

All this means "Korea" = ""대한민국"" = "daehanmingug", is equivalent to: 6121 (ok or space) 88125 (ok or space) 0015 (ok or space) 4324.

Yay, I think I got it! But please correct me if I am missing something.

Otherwise, I'll start figuring out how to write all this in Java. Since the fonts do not support automatic consonant+vowel combining like in Thai, for example, it will be more difficult than I initially thought. But... I hope I can make it.

References:

@ilovemegf
Copy link
Author

Yes that looks correct!

Few things that you might know, but may have missed out.

  1. Space is assigned to different buttons depending on the phone. Note the first picture in the previous comments does not have a space, I think some phones used arrow key (->). I think this may require a special GUI, and make it inapplicable in a typical T9 Android phones on the market today. The second picture (white phone), space is assigned to a button above the "red phone" button. And the third picture (Samsung phone) has space assigned to #. It might be worth making space assignable to different buttons.

  2. It is possible to type ㆍ and :separately. Allowing this does not have any practical use, as this is grammatically incorrect, and I do not remember people using them separately intentionally (e.g. emojis). Only time you'd see ㆍ and :are when people made a typo. As for the other vowels (such as ㅏ and ㅗ), they need to be typed separately as they are typically used for emojis.

  3. You typed korea = 대한민국 harder than it needs to be. You put (ok or space) between each character, which should be unnecessary (most of the time). You "could" put space between them, but word such as 대한민국 can be typed without doing that. So this would be: 대한민국 = 61218812500154324.

This is because inputting a vowel after a final consonant changes that final consonant into the first consonant of the next character.

I think the code should have this logic. Sorry, I don't actually code. Thus, I'll describe it in words.

Press 6 and ㄷ (consonant) appears,
Press 1 and ㅣ(vowel) is added on to ㄷ, making 디
Press 2 and ㆍ is added on to 디, making 다
Press 1 and ㅣ(vowel) is added on to 다, making 대.

Now the tricky part

(without pressing ok or space!!)
Press 8 and ㅅ (final consonant) is added to 대, making 댓
Press 8 again, and ㅅ(final consonant) changes to ㅎ, making 댛

Now because the next letter that comes is a vowel (either ㅣ or ㅡ)

Press 1 and ㅣis added, taking away the ㅎ of 댛, and makes 대히
Press 2 and ㆍ is added on to 히, making 대하
Press 5 and ㄴ is added, making 대한

So basically, imagine you typed first consonant + vowel + final consonant.
Inputting a vowel (without pressing ok or space), would take the final consonant, and use it as the first consonant of the next character.
Inputting a consonant, ㆍ, : (without pressing ok or space) would start a new character. e.g. 대ㆍ (61212). Of course, this would be a typo (user error).

Then you might be wondering why we need ok (or space) to start a new character! This is actually what makes Chunjiin method inferior compared to other methods, but like someone else mentioned in the comments , all T9 phones today tend to use it.
I'll give you a word. Hello = 안녕
Note how the final consonant (ㄴ) of the first character (안) and the first consonant(ㄴ) of the second character (녕) are the same!
Now typing 안 = 0125
And typing 녕 = 52210
This is where space is used in between characters. If you don't, the final consonant of 안 (ㄴ=5) turns into (ㄹ=55). This causes the word hello = 안녕, to turn into dumbbell =아령!

Speaking of dumbbell = 아령, the sequence would look like this on screen:

0125 = 안
press 5 again, and makes 알
press 2, (ㆍ), this makes 알ㆍ (note how ㆍ appears as if this is a next character)
press 2, (ㆍ), this makes 알: (: still appears as a next character)
press 0 (ㅇ), this makes 아령

I think I typed more than what you probably wanted to read, but I hope this helps.
Appreciate your amazing work!

@sspanak
Copy link
Owner

sspanak commented Oct 23, 2024

OK, so not confirming each character makes things much more complicated. Currently, TT9 cannot go back and edit previous letters and words but I'll see what I can do. I guess, the correct logic for doing this would be:

  1. Keep consuming digits as long as possible to form a single character.
  2. If there is a vowel (the digit sequence already contains 1, 2 or 3), and the next digit is 1, 2 or 3 (one more vowel), terminate the previous character where the vowel ends (when the 1-2-3-s end) and use the consonant digits (4-9 and 0) + the incoming one to start a new character.
  3. If the current digit sequence contains only one vowel, but it just became too long to match any character, keep the previous character and use the next digit to start a new character.

I also typed more than what you probably wanted to read, but I am trying to explain it to myself, to understand it better. 😄

As for ㆍ and :, I'll keep them for consistency, it isn't a problem. Someone may need to type them in a specific case, such as when explaining to someone else on GitHub what they are.

Finally, as for the space key, I'll use the current SHIFT key for space, because it is useless in Korean. By default SHIFT is ✱, not #, like you are used to, but you can just reconfigure it if you want.

The virtual keyboard, I'll have to redesign it for Korean. I'll probably do something like Gboard, with space on the right-hand side.

@ilovemegf
Copy link
Author

ilovemegf commented Oct 23, 2024

I assumed that it would be a lot harder to not confirm each character :(.

For 1 and 3, I assume that "as long as possible" means time?
In that case you are correct. I forgot to mention that ending a character with a vowel (e.g. ㄷ(consonant)+ㅐ(vowel)) for a long time (I think was about 1 second from distant memory, definitely no longer than 2 seconds), this automatically starts the next character. Even though this is not commonly used as most people type faster than that.

For method 2, you might have to be careful. We are defining keys 1,2 and 3 as "vowels", but technically, they are "components that make up the vowels". In practice, the expected behaviour is to:
Press 21, this makes ㅓ
Press 1 again, this should make ㅔ

If I understood method 2 correctly, pressing 211 would cause it to output ㅓㅣ, or even ㆍ ㅣ ㅣ. Which is not ideal.

There are scenarios where method 2 would work - scenarios where it grammatically makes sense. As an example,
Pressing 11, makes ㅣㅣ (separate characters). This is correct, as there are no vowels in Korean that has ㅣㅣ together.

I hope this helps!

I now see that this is massively complicated...

@sspanak
Copy link
Owner

sspanak commented Oct 27, 2024

We are on the same page, no worries. As a developer, I am probably explaining everything in a non-human-friendly manner. 😆

211 will produce ㅔ both when used alone, or in combination with a consonant, including the NULL consonant (not sure how it is called).

"As long as possible" means: as long as you keep typing digits that make up a character, be it "3", "211" or something longer like "52210". But if you type one more digit, like another "3" after each of these, the digit sequences would be too long to match any character, so that "3" at the end must be the start of a new character. In other words, keep the longest possible digit sequence and try to find a matching character. If there is no such character, then the last digit belongs to a new character.

I assumed that it would be a lot harder to not confirm each character :(.

Every automatic thing is more complicated, because the developer must describe the possible scenarios in the code and figure out which is the optimal one at the moment.

I now see that this is massively complicated...

Oh, there is more. I also have to change the dictionary file format, then I have to change the way the files are imported in the database, and only then I can get to the typing part. But this is the way to East Asian language support. If it wasn't for Korean, I would have had to do it to support Japanese, Chinese, Indic languages, and potentially, Amharic or North American languages, if ever.

In the end, I don't complain, it is sort of a programming challenge quite interesting to do. Not something I would ever do in a big corporate environment.

@sspanak
Copy link
Owner

sspanak commented Nov 14, 2024

@ilovemegf , @chrissy0 , @ariehcore , @jibsaramnim , could you folks please provide a couple of words or even phrases that I can test with?

I currently know the three below:

  • "대한민국" = 6121 88125 0015 4324
  • "안녕" = 0125 52210
  • "아령" = 012 552210

... but I would like to try something more complicated.

Preferably, separate the digit sequences by character like above, so that I can understand them easier. Also, please note when some character must be accepted manually, not to turn into something else.

@chrissy0
Copy link

I would love to help but I neither have a Korean T9 phone to test it out nor have I ever used one, so I'm afraid I might provide you with incorrect digits.

@ilovemegf
Copy link
Author

I'll give you more words that requires to confirm each character. Before examples, let's make it a rule that I will put (space) between each character where it is compulsory. Otherwise I will just put space (like an empty space) between the numbers for ease of reading.

장마 = 9120(space)0012
런닝맨 = 55215(space)510(space)001215
곡괭이 = 4234(space)4231210(space)01

This is probably a good time to tell you about underlining.
The alternative is to wait for a second between characters. The current character that you are typing, and the one before gets underlined as you type. I think this means that the (two most recent) characters are subject to 'potential' modification (e.g. The final consonant may turn into the first consonant in the next character).
If the underline is removed after a second, indicating that you are typing a new character.

So,
장마 = 9120(wait 1 second)0012
런닝맨 = 55215(wait 1 second)510(wait 1 second)001215
곡괭이 = 4234(wait 1 second)4231210(wait 1 second)01
would work as well. Except that 1 second is a looong time when you are texting, and no one actually uses this.

All of the words above requires each character to be confirmed. If you don't, they would end up typing like this:
장마 -> 자아
런닝맨 -> 러리앤
곡괭이 -> 고쾌미

Also, a case where you would have to delete a character before, and delete a character afterwards:
않 = 012588

Say, typing 않 would look like:
안 = 0125
안ㅅ = 01258
않 = 012588

And an example sentence to put it all together:
도움이 되었으면 좋겠습니다 (which means: I hope this helps)
= 623 03200(space) 01(space)(space) 6231 021888 03 002215(space)(space) 92388 4211888(space) 837 51 612

Now you might be wondering why there is (space)(space).
Like I mentioned above, the last two characters that you are working on gets underlined. By pressing (space), you take the cursor after the last character (thus, removing the underline). And by pressing (space) once again, it creates an actual space between the words.

Regards,

@sspanak
Copy link
Owner

sspanak commented Nov 16, 2024

Thanks! It looks like there is more work do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
languages Dictionary or language related issues
Projects
None yet
Development

No branches or pull requests

5 participants