Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse non-ASCII characters in StatsBomb player names #12

Open
JoGall opened this issue Oct 2, 2018 · 2 comments
Open

Parse non-ASCII characters in StatsBomb player names #12

JoGall opened this issue Oct 2, 2018 · 2 comments
Assignees

Comments

@JoGall
Copy link
Owner

JoGall commented Oct 2, 2018

Parse non-ASCII characters in StatsBomb (and other) data for use with soccerPosition and other future plotting functions. For example, Kylian Mbappé currently renders as Kylian Mbappé.

@JoGall JoGall self-assigned this Oct 2, 2018
@Ryo-N7
Copy link

Ryo-N7 commented Apr 9, 2019

hey! was doing something similar at work so thought I might throw in a few functions that might help:

  • tools package: showNonASCII() and showNonASCIIfile(), this is more for examining scripts and not data but might be useful
  • textclean package: replace_non_ascii()
  • stringi package: stri_trans_general(x, "latin-ascii") might be the best way to do it. Use it inside a mutate() to transform the column containing the player names!
x <- c(
    "Hello World", "6 Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher",
    'This is a \xA9 but not a \xAE', '6 \xF7 2 = 3', 
    'fractions \xBC, \xBD, \xBE', 'cows go \xB5', '30\xA2')

data.frame(thingy = x) %>% 
    mutate(thingy2 = stringi::stri_trans_general(thingy, "latin-ascii"))

@JoGall
Copy link
Owner Author

JoGall commented Apr 21, 2019

Thanks for this Ryo, super useful! Hopefully finally get round to fixing this today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants