-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sort could be faster #194
Comments
@jonathanpoelen OMG your sort works up to several thousand elements! You are my hero! |
I think the algorithm is taking most of its time in merge. Problem is that merge is hopelessly recursive, I think we need to do some partitioning to shorten merge runs. |
I added a partition here https://github.com/edouarda/brigand/blob/master/brigand/algorithms/sort.hpp#L191 by splitting our 256 element sorted list and using the middle as a pivot element to partition the rest of the input as described above. The result is a 28% speed up on my machine. I think this is the right direction. |
here is the fast tracked version https://github.com/porkybrain/brigand/tree/merge-partition-sort on my machine 2000 !!! element list in random order not sure yet if I broke any thing, there are a lot of moving parts in this algorithm. |
Out of curiosity, have you benchmarked metal's sort (on branch develop)? It
is not directly fast tracked or anything fancy, so I don't expect it to
beat brigand's, but on my machine it sorts a 1000 elements in around 4
seconds and all it takes is just about 20 lines of code, so perhaps it
could be much simplified and still remain fast?
Don't get me wrong, it's just that I flinch when I see that much fast
tracking and the verbosity it entails.
|
Wow I wasn't aware that you had improved it that much. I tested metal several months ago and it was much slower but that sounds quite similar to brigand and, as most of your stuff is, solved in a puristic fashion. I think we can slim down the brigand algorithm quite a lot. |
Since our last discussions metal's API became more stable, so I decided to investigate which algorithms would benefit most from fast tracking and it basically boiled down to
Take it with a grain of salt until we compare results on the same machine, because we might be comparing apples to bananas here depending on how different our hardware and setups are, but at a first glance it looks like metal doesn't lag too much behind brigand, which makes me proud to be honest. |
either you dev system is awesome or I'm doing something wrong, here is my code: template<typename T, typename U>
using eager_less = meta::number<(T::value < U::value)>;
using a = typename metal::sort<my_list, metal::lambda<eager_less>>::type; using brigand I get 0.55s for 300 elements and 15.26s for meta. Meta crashes my laptop at 400 elements, brigand goes to like 3000. I'm thinking about a making a generic "chunker" which splits a list into smaller lists of a fixed size (like chunker16 or chunker256) which would allow us to fast track and still use generic fold and would save us from writing so damn many |
I suppose you meant metal::number c:
Also, Metal has been eager for a while, so that typenane ::type there is
redundant. Finally, check whether results vary if you use metal::less
instead.
Later tonight I'll try to benchmark brigand's sort on my machine and let
you know.
Also what are you using for an input list? My numbers were based on a
ordered list being sorted with greater (i.e. reversing it).
|
Ok now I'm using a list of ransom |
I have been blindly using a list ordered in reverse order as a worst case test for sort, but that obviously doesn't apply for merge sort, now that I take a closer look at it. Silly me, looks like I have been inadvertedly cheating :c Curiously, that means merging is the culprit, so perhaps it deserves fast tracking. I'll see what I can lern from Brigand's. |
Thanks for benchmarking btw! |
no problem, @ldionne made the same benchmarking mistake a while back ;) fast tracking merge is pretty hard because it is essentially a proper fold, @jonathanpoelen's idea of making a kind of partial merge seems to be the key behind the last speed up and is a genius idea. splitting the merged list and using that element as a pivot in order to partition the rest of input seems to bring another speed up (can be seen in my unmerged branch) but I think that can be improved on further. |
I was just taking a look at his implementation of merge and that partial
fast tracking is pretty ingenious indeed.
I'm still not convinced it is necessary though. The fact metal's sort
crashes your machine means the memory is overflowing and that usually stems
from recursive partial template specialization that can often be worked
around. I have some ideas I still want to give a shot at before going fast
tracking. I'll let you know if I get anywhere.
|
Nice to see attention to sort, it is really something I need done well for kvasir. I would love the hear original ideas! |
BTW, which compiler are you using for these benchmarks? |
Clang 3.7 |
So I was able to pin If only there was a way to implement |
I finally found time to look at your solution. I think one thing your doing that is faster than us is your merge specialization |
Well observed, that is a distinctive feature of Metal. It is very strict
with respect to concepts and requirements, which although often requires
extra overhead (e.g SFINAE friendliness not matter what), sometimes also
opens room for performance improvements as well.
I'm not fond of overly lax requirements, because I feel that makes concepts
harder to grasp and leads the user to unexpected surprises. And maybe
because I'm a little bit too obsessed with symmetry.
|
have not been able to put a whole lot of time into this but brigand::sort could benefit from chunking the input using a meta monad pattern as well as joining many lists rather than 2 at a time. Join can be implemented using more aliases and less complex types as well, see metals implementation of join for a reference. Also the nested alias version of conditional should help too. Just wanted to keep the thought process going. |
Ok I admit its my pet peeve ;) I find it interesting how combining algorithms are not efficient in runtime stuff but are efficient in metaprogramming. Basically the current implementation is good to about 500 elements, however even with the fast tracks merging 16 elements into 500 elements will be slow.
Coming back to the original partition implementation the problem was that a bad pivot element would cause a partition of everything only to eliminate one element from the input list. If we were to build up a large list using the merge/insertion sort, split it in the middle and then use that as a pivot at least we are eliminating half of the output list each partition even if we are not eliminating anything from the input list.
Another approach would be to sort chunks of lets say 64 elements each and then "partition"-ing them would essentially be a "split_if" rather than a normal partition. Since we need to presort the lists for merging any way we aren't really losing much and splitting a sorted list should be more efficient than a classic partition on all of its elements. This would essentially be a pure merge sort combined with a split_if.
I would really like to get to 1000 elements for kvasir.
The text was updated successfully, but these errors were encountered: