Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sentence can choose tolower or keep origin sentence? #192

Open
ivory2406 opened this issue Nov 4, 2024 · 3 comments
Open

sentence can choose tolower or keep origin sentence? #192

ivory2406 opened this issue Nov 4, 2024 · 3 comments

Comments

@ivory2406
Copy link

ivory2406 commented Nov 4, 2024

hello, I want to keep uppercase letter。 like example:

	text := "Hello world, Helloworld. Winter is coming! 你好世界."
	jieba := new(gse.Segmenter)
	jieba.LoadDict()
	res := jieba.Cut(text)
	println(ToJson(res))

}

the result is : ["hello"," ","world",","," ","helloworld","."," ","winter"," ","is"," ","coming","!"," ","你好","世界","."]

I hope the result is ["Hello"," ","world",","," ","Helloworld","."," ","Winter"," ","is"," ","coming","!"," ","你好","世界","."]


And I have seen the option params: https://github.com/go-ego/gse/blob/master/segmenter.go

image
@ivory2406
Copy link
Author

I want this can be set by params.
image

@ivory2406
Copy link
Author

@vcaesar Could you help me with the option param toLower? thanks very much

@ivory2406
Copy link
Author

@CocaineCong hello, Could you help me with the option param toLower? bacause i want to use this gse for tokenize sentences and then use mmh3 to encode tokens.

the character is lowercase or uppercase, it's very important to me.
Because words mmh3 value are different when they are lowercase or uppercase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant