Parse non-ASCII characters in StatsBomb player names #12

JoGall · 2018-10-02T21:30:37Z

Parse non-ASCII characters in StatsBomb (and other) data for use with soccerPosition and other future plotting functions. For example, Kylian Mbappé currently renders as Kylian MbappÃ©.

The text was updated successfully, but these errors were encountered:

Ryo-N7 · 2019-04-09T14:55:03Z

hey! was doing something similar at work so thought I might throw in a few functions that might help:

tools package: showNonASCII() and showNonASCIIfile(), this is more for examining scripts and not data but might be useful
textclean package: replace_non_ascii()
stringi package: stri_trans_general(x, "latin-ascii") might be the best way to do it. Use it inside a mutate() to transform the column containing the player names!

x <- c(
    "Hello World", "6 Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher",
    'This is a \xA9 but not a \xAE', '6 \xF7 2 = 3', 
    'fractions \xBC, \xBD, \xBE', 'cows go \xB5', '30\xA2')

data.frame(thingy = x) %>% 
    mutate(thingy2 = stringi::stri_trans_general(thingy, "latin-ascii"))

JoGall · 2019-04-21T09:55:15Z

Thanks for this Ryo, super useful! Hopefully finally get round to fixing this today.

JoGall self-assigned this Oct 2, 2018

JoGall added the enhancement label Oct 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse non-ASCII characters in StatsBomb player names #12

Parse non-ASCII characters in StatsBomb player names #12

JoGall commented Oct 2, 2018

Ryo-N7 commented Apr 9, 2019

JoGall commented Apr 21, 2019

Parse non-ASCII characters in StatsBomb player names #12

Parse non-ASCII characters in StatsBomb player names #12

Comments

JoGall commented Oct 2, 2018

Ryo-N7 commented Apr 9, 2019

JoGall commented Apr 21, 2019