Postal addresses tokenizer using Wapiti model.
Intended for addresses in CJK (Chinese, Japanese Korean) characters. After wapiti model labels each token(character), this gem combines adjacent word of the same label together. This is important for CJK languages because its phrases (combination of words) are not separated by spaces.
Add this line to your application's Gemfile:
gem 'lulalala_address_tokenizer'
And then execute:
$ bundle
Or install it yourself as:
$ gem install lulalala_address_tokenizer
tokenizer ='address.mod')
# {"city"=>"AA縣", "district"=>"BB鎮", "street"=>"CC路", "housenumber"=>"D號"}
Bug reports and pull requests are welcome on GitHub at[USERNAME]/lulalala_address_tokenizer.