-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Aho-Corasick to Boost.Algorithm #24
base: develop
Are you sure you want to change the base?
Adding Aho-Corasick to Boost.Algorithm #24
Conversation
Also fixed #include <memory> in aho-corasick implementation.
Now Aho-Corasick uses callback instead of out container. Updated algorithm, documentation, example, tests.
Now if callback returns false for match, we cancel searching.
Good start. Please fix the following:
|
Thanks for the comment. Let's discuss.
Docs, examples and test i will update after finishing work on aho_corasick.hpp . |
Users usually expect the default constructor to b lightweight. Take a look at the |
About comparing with std::find. This comparing is a little bit bad, because:
It means that there isn't any reason to search one-two short entries in small corpus sequence. I will update documentation about it. But on large cases A-C is very-very fast(my benchmark is: corpus string is "War and peace" Tolstoy, patterns for matching British dictionary. A-C is very fast(less than 1 sec); std::find is very-very-very slow....). My system is i7-3630QM, 12 Gib RAM, Samsung 850 Evo 500 Gib, Kubuntu 16.10. About memory allocating. Have you any ideas to optimize memory allocation? I can preallocate pool of memory and use this memory range for creating new nodes. |
Search next pattern from current position:
Make 2 kinds of benchmarks:
First of all, make the Container hold nodes by value, not using |
Container can't contain node by value, because Container declarated in node. I can store in Container only pointers to node. |
Boost containers can. Try to use |
Thank you. It works. |
Actually, it depends on the standard library implementation. For example, this code works fine with libc++:
|
Hmmm, thank you for example with libc++. I don't know about it :) . Now A-C uses boost::container::map and boost::unordered_map, because our library should work also with libstdc++. All works fine. Performance increases in 1.8x (test: British dictionary and "Peace and war"). |
With libstdc++ works well on
There's a small limited amount of possible values for T if T is
|
The idea with using Ok, i will test it. I read already about variable length encoded integer. It may be useful for users. But i have some more questions:
|
The idea is following: patterns are usually not very long, so length 127 is more than enough most of the time. Now, with variable length encoded integer (vint) you can do the following:
You can start by always storing |
I tested new versions, and results are:
|
{ | ||
node new_node; | ||
current_node->links[*it] = std::move(new_node); | ||
child_node = ¤t_node->links[*it]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that lines 102-103 can be simplified to one line, current_node->links[*it] = node();
, and that move semantics will still take place in C++11.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right. Will be fixed later.
@apolukhin, although I am a fan of regular types that have trivial default constructors, the existing searchers (BM, BMH & KMP) don't have them, so I wonder if it makes sense here? |
Alexander, could you update the description with a specific citation of which papers or books you based your implementation on? Thanks. |
ZaMaZaN4iK, @apolukhin, you made a good job. Don't you want to finish it? Check, please, that your code is multi thread |
@toshchev95 Hi! Thank you! Unfortunately now I have no time for finishing the PR. So if you want to continue work on it - it would be awesome! |
I wrote implementation Aho-Corasick's algorithm. C++11 required for it (I used std::unique_ptr, variadic templates, default template parameters).