Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Examples | Streaming] Count word frequency in a text document (Recreate Elixir Version) #853

Open
emil14 opened this issue Jan 27, 2025 · 0 comments
Labels

Comments

@emil14
Copy link
Collaborator

emil14 commented Jan 27, 2025

In Elixir there is Flow - a library for stream processing. Here is an example of how to count word frequency in a text document:

path_to_file
  |> File.stream!() # streaming read from file
  |> Flow.from_enumerable() # use IO stream as a producer for Flow
  |> Flow.flat_map(fn(line) -> String.split(line, ~r/\s+/) end) # split each line by whitespace characters
  |> Flow.partition() # parallelize further processing across multiple threads
  |> Flow.filter(fn(word) -> Regex.match?(~r/^[\w-]+$/iu, word) end) # filter out non-words
  |> Flow.reduce(fn -> %{} end, fn(word, acc) ->
    Map.update(acc, word, 1, fn(count) -> count + 1 end)
  end) # convert to list - real calculations are initiated here
  |> Enum.to_list() # convert to list - real calculations are initiated here
  |> Enum.sort(fn({_w1, count1}, {_w2, count2}) -> count1 >= count2 end) # sort result by word frequency

Source: https://habr.com/ru/news/876718/comments/#comment_27836892

@emil14 emil14 added the Major label Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant